Skip to content

Latest commit

 

History

History
107 lines (89 loc) · 6.62 KB

custom-decision-service-experimentation-reference.md

File metadata and controls

107 lines (89 loc) · 6.62 KB
title titlesuffix description services author manager ms.service ms.component ms.topic ms.date ms.author
Experimentation - Custom Decision Service
Azure Cognitive Services
This article is a guide for experimentation with Custom Decision Service.
cognitive-services
marco-rossi29
cgronlun
cognitive-services
custom-decision-service
conceptual
05/10/2018
marossi

Experimentation

Following the theory of contextual bandits (CB), Custom Decision Service repeatedly observes a context, takes an action, and observes a reward for the chosen action. An example is content personalization: the context describes a user, actions are candidate stories, and the reward measures how much the user liked the recommended story.

Custom Decision Service produces a policy, as it maps from contexts to actions. With a specific target policy, you want to know its expected reward. One way to estimate the reward is to use a policy online and let it choose actions (for example, recommend stories to users). However, such online evaluation can be costly for two reasons:

  • It exposes users to an untested, experimental policy.
  • It doesn't scale to evaluating multiple target policies.

Off-policy evaluation is an alternative paradigm. If you have logs from an existing online system that follow a logging policy, off-policy evaluation can estimate the expected rewards of new target policies.

By using the log file, Experimentation seeks to find the policy with the highest estimated, expected reward. Target policies are parameterized by Vowpal Wabbit arguments. In the default mode, the script tests a variety of Vowpal Wabbit arguments by appending to the --base_command. The script performs the following actions:

  • Auto-detects features namespaces from the first --auto_lines lines of the input file.
  • Performs a first sweep over hyper-parameters (learning rate, L1 regularization, and power_t).
  • Tests policy evaluation --cb_type (inverse propensity score (ips) or doubly robust (dr). For more information, see Contextual Bandit example.
  • Tests marginals.
  • Tests quadratic interaction features:
    • brute-force phase: Tests all combinations with --q_bruteforce_terms pairs or fewer.
    • greedy phase: Adds the best pair until there is no improvement for --q_greedy_stop rounds.
  • Performs a second sweep over hyper-parameters (learning rate, L1 regularization, and power_t).

The parameters that control these steps include some Vowpal Wabbit arguments:

  • Example Manipulation options:
    • shared namespaces
    • action namespaces
    • marginal namespaces
    • quadratic features
  • Update Rule options
    • learning rate
    • L1 regularization
    • t power value

For an in-depth explanation of the above arguments, see Vowpal Wabbit command-line arguments.

Prerequisites

  • Vowpal Wabbit: Installed and on your path.
  • Python 3: Installed and on your path.
  • NumPy: Use the package manager of your choice.
  • The Microsoft/mwt-ds repository: Clone the repo.
  • Decision Service JSON log file: By default, the base command includes --dsjson, which enables Decision Service JSON parsing of the input data file. Get an example of this format.

Usage

Go to mwt-ds/DataScience and run Experimentation.py with the relevant arguments, as detailed in the following code:

python Experimentation.py [-h] -f FILE_PATH [-b BASE_COMMAND] [-p N_PROC]
                          [-s SHARED_NAMESPACES] [-a ACTION_NAMESPACES]
                          [-m MARGINAL_NAMESPACES] [--auto_lines AUTO_LINES]
                          [--only_hp] [-l LR_MIN_MAX_STEPS]
                          [-r REG_MIN_MAX_STEPS] [-t PT_MIN_MAX_STEPS]
                          [--q_bruteforce_terms Q_BRUTEFORCE_TERMS]
                          [--q_greedy_stop Q_GREEDY_STOP]

A log of the results is appended to the mwt-ds/DataScience/experiments.csv file.

Parameters

Input Description Default
-h, --help Show help message and exit.
-f FILE_PATH, --file_path FILE_PATH Data file path (.json or .json.gz format - each line is a dsjson). Required
-b BASE_COMMAND, --base_command BASE_COMMAND Base Vowpal Wabbit command. vw --cb_adf --dsjson -c
-p N_PROC, --n_proc N_PROC Number of parallel processes to use. Logical processors
-s SHARED_NAMESPACES, --shared_namespaces SHARED_NAMESPACES Shared feature namespaces (for example, abc means namespaces a, b, and c). Auto-detect from data file
-a ACTION_NAMESPACES, --action_namespaces ACTION_NAMESPACES Action feature namespaces. Auto-detect from data file
-m MARGINAL_NAMESPACES, --marginal_namespaces MARGINAL_NAMESPACES Marginal feature namespaces. Auto-detect from data file
--auto_lines AUTO_LINES Number of data file lines to scan to auto-detect features namespaces. 100
--only_hp Sweep only over hyper-parameters (learning rate, L1 regularization, and power_t). False
-l LR_MIN_MAX_STEPS, --lr_min_max_steps LR_MIN_MAX_STEPS Learning rate range as positive values min,max,steps. 1e-5,0.5,4
-r REG_MIN_MAX_STEPS, --reg_min_max_steps REG_MIN_MAX_STEPS L1 regularization range as positive values min,max,steps. 1e-9,0.1,5
-t PT_MIN_MAX_STEPS, --pt_min_max_steps PT_MIN_MAX_STEPS Power_t range as positive values min,max,step. 1e-9,0.5,5
--q_bruteforce_terms Q_BRUTEFORCE_TERMS Number of quadratic pairs to test in brute-force phase. 2
--q_greedy_stop Q_GREEDY_STOP Rounds without improvements, after which quadratic greedy search phase is halted. 3

Examples

To use the preset default values:

python Experimentation.py -f D:\multiworld\data.json

Equivalently, Vowpal Wabbit can also ingest .json.gz files:

python Experimentation.py -f D:\multiworld\data.json.gz

To sweep only over hyper-parameters (learning rate, L1 regularization, and power_t, stopping after step 2):

python Experimentation.py -f D:\multiworld\data.json --only_hp