Skip to content
This repository was archived by the owner on Mar 11, 2021. It is now read-only.

Latest commit

 

History

History

eval_server

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Minigo Evaluation Logger

A set of tools to record the conditions (code, parameters, ...) and agregate the results of many evaluation runs.

add_model.py and evaluator

Used to add Player(model weights + flags) to models_to_eval CBT.

#TODO(amj): test this code snippit

# From minigo/
source cluster/common.sh
export PROJECT=that one thing
export SGF_BUCKET_NAME=minigo-pub

mkdir -p temp && cd temp

# Alphabetical for gsutil ls to work below
export MODEL_A=369a0424c4
export MODEL_B=52eb46008a
export CBT_TABLE=$CBT_MODEL_EVAL_TABLE

../cluster/evaluator/evaluator_ringmaster_wrapper.sh

ls
gsutil ls gs://minigo-pub/eval_server/models/games/${MODEL_A}_vs_${MODEL_B}
cbt -project "$PROJECT" -instance "$CBT_INSTANCE" read "$CBT_TABLE"

Goals

  1. Reproducability (see #591)
  2. Aid sharing
    1. Communicate exactly (diffs, commands, ...) what was done
  3. Store all data in a common repository with common naming scheme
  4. Be easy to extend piecewise as needed
    1. Be backwards compatible as often as possible
  5. Guide us towards gating (see #570)

Methodology

  • Results table / Evaluation #591
    • Uses bigtable tag #590 to name a comparison
    • Use launch_eval.py and record command (somewhere)
    • Directory structure
      • gs://minigo-pub/experiments/eval/<experiment-tag>/
        • sgf/eval/
          • YYYY-MM-DD/
            • @TS-...-<model_1>-...-<model_2>...sgf (e.g. 1540317107-000011-malabar-000010-defence-200.sgf)
        • results.html (autogenerated)
        • command_@TS (e.g command_1544086567)
          • Command line invocation of launch_eval.py
        • metadata (json?, contents TBD)
        • [optional] command_flags
        • [future] <shorttag>_ringmaster.ctl (see #544)
        • [future] branch (github branch where code can be found)
        • [future] diff.patch