A set of tools to record the conditions (code, parameters, ...) and agregate the results of many evaluation runs.
Used to add Player(model weights + flags) to models_to_eval CBT.
#TODO(amj): test this code snippit
# From minigo/
source cluster/common.sh
export PROJECT=that one thing
export SGF_BUCKET_NAME=minigo-pub
mkdir -p temp && cd temp
# Alphabetical for gsutil ls to work below
export MODEL_A=369a0424c4
export MODEL_B=52eb46008a
export CBT_TABLE=$CBT_MODEL_EVAL_TABLE
../cluster/evaluator/evaluator_ringmaster_wrapper.sh
ls
gsutil ls gs://minigo-pub/eval_server/models/games/${MODEL_A}_vs_${MODEL_B}
cbt -project "$PROJECT" -instance "$CBT_INSTANCE" read "$CBT_TABLE"
- Reproducability (see #591)
- Aid sharing
- Communicate exactly (diffs, commands, ...) what was done
- Store all data in a common repository with common naming scheme
- Be easy to extend piecewise as needed
- Be backwards compatible as often as possible
- Guide us towards gating (see #570)
- Results table / Evaluation #591
- Uses bigtable tag #590 to name a comparison
- Use launch_eval.py and record command (somewhere)
- Directory structure
- gs://minigo-pub/experiments/eval/<experiment-tag>/
- sgf/eval/
- YYYY-MM-DD/
- @TS-...-<model_1>-...-<model_2>...sgf (e.g. 1540317107-000011-malabar-000010-defence-200.sgf)
- YYYY-MM-DD/
- results.html (autogenerated)
- command_@TS (e.g command_1544086567)
- Command line invocation of
launch_eval.py
- Command line invocation of
- metadata (json?, contents TBD)
- [optional] command_flags
- [future] <shorttag>_ringmaster.ctl (see #544)
- [future] branch (github branch where code can be found)
- [future] diff.patch
- sgf/eval/
- gs://minigo-pub/experiments/eval/<experiment-tag>/