Merge branch 'feature/iql-qv' of github.com:Cryolite/kanachan into fe…

…ature/iql-qv
0num4 · May 5, 2023 · 2c02d1d · 2c02d1d
2 parents e21c3fa + 2daaa1c
commit 2c02d1d
Show file tree

Hide file tree

Showing 4 changed files with 76 additions and 44 deletions.
diff --git a/kanachan/training/README.md b/kanachan/training/README.md
@@ -12,8 +12,6 @@ Training programs for [implicit Q-learning (IQL)](https://arxiv.org/abs/2110.061
 
 ### [`kanachan.training.ilql`](ilql) Python submodule
 
-Training programs for [implicit language Q-learning (ILQL)](https://arxiv.org/abs/2206.11871)[^ILQL].
+Training programs for [implicit language Q-learning (ILQL)](https://arxiv.org/abs/2206.11871).
 
 [^BERT]: The use of the term **BERT** here is a deliberate abuse. The term BERT actually refers to a combination of a model with transformer encoder layers and a learning method for that model. However, this project uses the term BERT to refer only to the model. The model should actually be called something like transformer encoder layers, but that would be too long, so this project calls it BERT.
-
-[^ILQL]: The use of the term **ILQL** here is a deliberate abuse. The term ILQL actually refers to a combination of a variant of IQL in which the Q and V models share parameters and a learning method for that model. However, this project uses the term ILQL to refer only to the model. The model should actually be called something like "a variant of IQL with parameter sharing between the Q and V models", but that would be too long, so this project calls it ILQL.
diff --git a/kanachan/training/bert/phase1/README.md b/kanachan/training/bert/phase1/README.md
@@ -13,85 +13,119 @@ The following items are required to run the program in this directory:
 
 For detailed installation instructions for the above prerequisite items, refer to those for each OS and distribution.
 
-After the installation of the prerequisite items, [build the `cryolite/kanachan` Docker image](https://github.com/Cryolite/kanachan/blob/main/kanachan/README.md#cryolitekanachan-docker-image).
+After the installation of the prerequisite items, first [build the `cryolite/kanachan` Docker image](https://github.com/Cryolite/kanachan/blob/main/kanachan/README.md#cryolitekanachan-docker-image). Then, execute the following command with the top directory of the working tree of this repository as the current directory:
+
+```sh
+kanachan$ docker build -f kanachan/training/bert/phase1/Dockerfile -t cryolite/kanachan.training.bert.phase1 .
+```
 
 ## `train.py`
 
 #### Usage
 
 ```
-$ docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v /path/to/host-data:/workspace/data --rm cryolite/kanachan python3 -m kanachan.training.bert.phase1.train OPTIONS...
+$ docker run --gpus all -v /path/to/host-data:/workspace/data --rm cryolite/kanachan.training.bert.phase1 OPTIONS...
 ```
 
-If you want to run this program on multiple GPUs, see [Running programs on multiple GPUs](https://github.com/Cryolite/kanachan/wiki/Running-programs-on-multiple-GPUs).
+If you want to run this program on specific GPUs, modify the `--gpus` option for the `docker run` command (see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#gpu-enumeration) or specify `device.type` option (see below).
 
 #### Options
 
-`--training-data PATH`: Specify the path to the training data. The path must be one that can be interpreted within the Docker guest.
+Options are specified in the [Hydra](https://hydra.cc/) manner.
+
+`training_data=PATH`: Specify the path to the training data. The path must be one that can be interpreted within the Docker guest.
+
+`validation_data=PATH`: Specify the path to the validation data. The path must be one that can be interpreted within the Docker guest.
+
+`num_workers=NWORKERS`: Specify the number of workers used in data loading. The argument must be a non-negative integer. `0` means that the main process is used to load data. Default to `0` for CPU, and `2` for CUDA.
+
+`device={cpu|cuda}`: Specify the device on which the training is performed. Default to the value guessed from PyTorch build information. See the table below for the detailed meaning of the options:
+
+| `device` | `device.type` | `device.dtype` | `device.amp_dtype` |
+|----------|---------------|----------------|--------------------|
+| `cpu`    | `cpu`         | `float64`      | (N/A)              |
+| `cuda`   | `cuda`        | `float32`      | `float16`          |
+
+`device.type={cpu|cuda|cudaN}`: Specify the device on which the training is performed. Override the value by the `device` option.
+
+`device.dtype={float64|double|float32|float|float16|half}`: Specify the PyTorch [`dtype`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-dtype). Override the value by the `device` option.
+
+`device.amp_dtype={float64|double|float32|float|float16|half}`: Specify the PyTorch `dtype` for [automatic mixed precision (AMP)](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html). Override the value by the `device` option.
+
+`encoder={bert_base|bert_large}`: Specify the encoder structure. Default to `bert_base`. See the table below for the detailed meaning of the options:
 
-`--validation-data PAHT`: Specify the path to the validation data. The path must be one that can be interpreted within the Docker guest.
+| `encoder`    | `encoder.position_encoder` | `encoder.dimension` | `encoder.num_heads` | `encoder.dim_feedforward` | `encoder.activation_function` | `encoder.dropout` | `encoder.num_layers` | `encoder.load_from` |
+|--------------|----------------------------|---------------------|---------------------|---------------------------|-------------------------------|-------------------|----------------------|---------------------|
+| `bert_base`  | `position_embedding`       | `768`               | `12`                | `3072`                    | `gelu`                        | `0.1`             | `12`                 | (N/A)               |
+| `bert_large` | `position_embedding`       | `1024`              | `16`                | `4096`                    | `gelu`                        | `0.1`             | `24`                 | (N/A)               |
 
-`--num-workers NWORKERS`: Specify the number of workers used in data loading. The argument must be a positive integer. Default to `2`.
+`encoder.position_encoder={positional_encoding|position_embedding}`: Specify the method of encoding positions. `positional_encoding` is the method used in the paper ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762) to encode positions with a sinusoidal function. `position_embedding` is the method used in the paper ["BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"](https://arxiv.org/abs/1810.04805) to encode positions with embeddings. Override the value specified by the `encoder` option. Default to `position_embedding`.
 
-`--device {cpu|cuda|cudaN}`: Specify the device on which the training is performed.
+`encoder.dimension=DIM`: Specify the embedding dimension for the encoder. The argument must be a positive integer. Override the value by the `encoder` option.
 
-`--amp-optimization-level {O0|O1|O2|O3}`: **THIS OPTION IS DEPRECATED.** Specify the optimization level of [Automatic Mixed Precision](https://developer.nvidia.com/automatic-mixed-precision). Each optimization level corresponds to the one defined in [apex.amp](https://nvidia.github.io/apex/amp.html#opt-levels). Default to `O2`.
+`encoder.num_heads=NHEADS`: Specify the number of heads in each encoder layer. The argument must be a positive integer. Override the value by the `encoder` option.
 
-`--model-preset {base|large}`: Specify the model preset. `base` is the transformer encoder layers used in BERT BASE, and `large` is the one used in BERT LARGE. See the table below for the meaning of the presets:
+`encoder.dim_feedforward=DIM_FEEDFORWARD`: Specify the dimension of the feedforward networks in each encoder layer. The argument must be a positive integer. Override the value by the `encoder` option. Default to `4 * DIM`.
 
-|         | `DIM` | `NHEADS` | `DIM_FEEDFORWARD` | `NLAYERS` |
-|---------|-------|----------|-------------------|-----------|
-| `base`  | 768   | 12       | 3072              | 12        |
-| `large` | 1024  | 16       | 4096              | 24        |
+`encoder.activation_function={relu|gelu}`: Specify the activation function for the feedforward networks in each encoder layer. Override the value specified by the `encoder` option. Default to `gelu`.
 
-`--dimension DIM`: Specify the embedding dimension for the model. The argument must be a positive integer. Override the value by the preset.
+`encoder.dropout=DROPOUT`: Specify the dropout ratio for the feedforward networks in each encoder layer. The argument must be a real number in the range \[0.0, 1.0\). Override the value by the `encoder` option. Default to `0.1`.
 
-`--num-heads NHEADS`: Specify the number of heads in each layer. The argument must be a positive integer. Override the value by the preset.
+`encoder.num_layers=NLAYERS`: Specify the number of encoder layers. The argument must be a positive integer. Override the value by the `encoder` option.
 
-`--dim-feedforward DIM_FEEDFORWARD`: Specify the dimension of the feedforward network in each layer. The argument must be a positive integer. Override the value by the preset. Default to `4 * DIM`.
+`encoder.load_from=INITIAL_ENCODER`: Specify the path to the initial encoder snapshot. The path must be one that can be interpreted within the Docker guest. Mutually exclusive to the `initial_model` and `initial_model_prefix` options.
 
-`--num-layers NLAYERS`: Specify the number of layers. The argument must be a positive integer. Override the value by the preset.
+`decoder={single|double|triple}`: Specify the decoder structure. Default to `double`. See the table below for the detailed meaning of the options:
 
-`--dim-final-feedforward DIM_FINAL_FEEDFORWARD`: Specify the dimension of the final feedforward network. The argument must be a positive integer. Default to `DIM_FEEDFORWARD`.
+| `decoder` | `decoder.dim_feedforward`   | `decoder.activation_function` | `decoder.dropout` | `decoder.num_layers` | `decoder.load_from` |
+|-----------|-----------------------------|-------------------------------|-------------------|----------------------|---------------------|
+| `single`  | (N/A)                       | `gelu`                        | `0.1`             | `1`                  | (N/A)               |
+| `double`  | (`encoder.dim_feedforward`) | `gelu`                        | `0.1`             | `2`                  | (N/A)               |
+| `triple`  | (`encoder.dim_feedforward`) | `gelu`                        | `0.1`             | `3`                  | (N/A)               |
 
-`--activation-function {relu|gelu}`: Specify the activation function for the feedforward networks. Defaults to `gelu`.
+`decoder.dim_feedforward=DIM_FEEDFORWARD`: Specify the dimension of the feedforward networks in each decoder layer. The argument must be a positive integer. Override the value by the `decoder` option. Default to the value specified by the `encoder.dim_feedforward` option.
 
-`--dropout DROPOUT`: Specify the dropout ratio. The argument must be a real value in the range \[0.0, 1.0\). Default to `0.1`.
+`decoder.activation_function={relu|gelu}`: Specify the activation function for the feedforward networks in each decoder layer. Override the value by the `decoder` option. Default to `gelu`.
 
-`--initial-encoder PATH`: Specify the path to the initial encoder. The path must be one that can be interpreted within the Docker guest. Mutually exclusive to `--resume`.
+`decoder.dropout=DROPOUT`: Specify the dropout ratio for the feedforward networks in each decoder layer. The argument must be a real number in the range \[0.0, 1.0\). Override the value by the `decoder` option. Default to `0.1`.
 
-`--initial-decoder PATH`: Specify the path to the initial decoder. The path must be one that can be interpreted within the Docker guest. Mutually exclusive to `--resume`.
+`decoder.num_layers=NLAYERS`: Specify the number of decoder layers. The argument must be a positive integer. Override the value by the `decoder` option.
 
-`--training-batch-size N`: Specify the batch size for training. The argument must be a positive integer.
+`decoder.load_from=INITIAL_DECODER`: Specify the path to the initial decoder snapshot. The path must be one that can be interpreted within the Docker guest. Mutually exclusive to the `initial_model` and `initial_model_prefix` options.
 
-`--training-batch-size N`: Specify the batch size for validation. The argument must be a positive integer.
+`initial_model=INITIAL_MODEL`: Specify the path to the initial model snapshot. The path must be one that can be interpreted within the Docker guest. Mutually exclusive to the `encoder.load_from`, `decoder.load_from`, and `initial_model_prefix` options.
 
-`--freeze-encoder`: Freeze encoder parameters during training.
+`initial_model_prefix=PATH`: Specify the prefix to the initial model snapshot. The path must be one that can be interpreted within the Docker guest. Mutually exclusive to the `encoder.load_from`, `decoder.load_from`, and `initial_model` options.
 
-`--optimizer {sgd|adam|radam|lamb}`: Specify the optimizer. Default to `lamb`.
+`initial_model_index=N`: Speficy the index of the initial model snapshot. The argument must be a non-negative integer. Must be used with `initial_model_prefix` option.
 
-`--momentum MOMENTUM`: Specify the momentum factor. Only meaningful for `sgd`. Default to `0.9`.
+`checkpointing={false|true}`: Enable [checkpointing](https://pytorch.org/docs/stable/checkpoint.html). Default to `false`.
 
-`--learning-rate LR`: Specify the learning rate. Default to `0.1` for `sgd`, `0.001` for `adam`, `radam`, and `lamb`.
+`training_batch_size=N`: Specify the training batch size. The argument must be a positive integer.
 
-`--epsilon EPS`: Specify the epsilon parameter. Only meaningful for `adam`, `radam`, and `lamb`. Default to `1.0e-8` for `adam` and `radam`, `1.0e-6` for `lamb`.
+`validation_batch_size=N`: Specify the validation batch size. The argument must be a positive integer.
 
-`--checkpointing`: Enable checkpointing.
+`gradient_accumulation_steps=NSTEPS`: Specify the number of steps for gradient accumulation. The argument must be a positive integer. Default to `1`.
 
-`--gradient-accumulation-steps NSTEPS`: Specify the number of steps for gradient accumulation. Defaults to `1`.
+`max_gradient_norm=NORM`: Specify the norm threshold for gradient clipping. The argument must be a positive real number. Default to `1.0`.
 
-`--max-gradient-norm NORM`: Specify the norm threshold for gradient clipping. Default to `1.0`.
+`optimizer={sgd|adam|radam|lamb}`: Specify the optimizer preset. Default to `lamb`. See the table below for the detailed meaning of the options:
 
-`--num-epochs`: Specify the number of epochs to iterate. Default to `1`.
+| `optimizer` | `optimizer.type`                             | `optimizer.momentum` | `optimizer.epsilon` | `optimizer.learning_rate` | `optimizer.initialize` |
+|-------------|----------------------------------------------|----------------------|---------------------|---------------------------|------------------------|
+| `sgd`       | `sgd`                                        | `0.0`                | (N/A)               | (EXPLICITLY REQUIRED)     | `false`                |
+| `adam`      | [`adam`](https://arxiv.org/abs/1412.6980)    | (N/A)                | `1.0e-8`            | `0.001`                   | `false`                |
+| `radam`     | [`radam`](https://arxiv.org/abs/1908.03265)  | (N/A)                | `1.0e-8`            | `0.001`                   | `false`                |
+| `lamb`      | [`lamb`](https://arxiv.org/abs/1904.00962)   | (N/A)                | `1.0e-6`            | `0.001`                   | `false`                |
 
-`--initial-optimizer PATH`: Specify the path to the initial optimizer state. The path must be one that can be interpreted within the Docker guest. Mutually exclusive to `--resume`.
+`optimizer.type={sgd|adam|radam|lamb}` Specify the optimizer type. Override the value specified by the `optimizer` option.
 
-`--output-prefix PATH`: Specify the output prefix. The path must be one that can be interpreted within the Docker guest.
+`optimizer.momentum=MOMENTUM`: Specify the momentum factor. Only meaningful for `optimizer.type=sgd`. The argument must be a real number in the range \[0.0, 1.0\). Override the value specified by the `optimizer` option. Default to `0.0`.
 
-`--experiment-name NAME`: Specify the experiment name. Default to the start time of the experiment in the format `YYYY-MM-DD-hh-mm-ss`. The final path to the output will becomes `PATH/NAME`.
+`optimizer.epsilon=EPS`: Specify the epsilon parameter. Only meaningful for `adam`, `radam`, and `lamb`. The argument must be a positive real number. Override the value specified by the `optimizer` option. Default to `1.0e-8` for `adam`, `radam`, and `1.0e-6` for `lamb`.
 
-`--num-epoch-digits`: Specify the number of digits to index epochs. Default to `2`.
+`optimizer.learning_rate=LR`: Specify the learning rate. The argument must be a positive real number. Override the value specified by the `optimizer` option. Default to `0.001` for `adam`, `radam`, and `lamb`.
 
-`--snapshot-interval NSAMPLES`: Specify the interval between snapshots. The argument must be a positive integer. `0` means that no snapshot is taken at all. Default to `0`.
+`optimizer.initialize={false|true}`: Specify whether to start from the initialized optimizer without using a snapshot, even if one is found. Override the value specified by the `optimizer` option.
 
-`--resume`: Resume the experiment from the latest snapshot in the path `PATH/NAME`.
+`snapshot_interval=NSAMPLES`: Specify the interval between snapshots. The argument must be a non-negative integer. `0` means that no snapshot is taken at all. Default to `0`.
diff --git a/kanachan/training/ilql/README.md b/kanachan/training/ilql/README.md
@@ -97,7 +97,7 @@ Options are specified in the [Hydra](https://hydra.cc/) manner.
 
 `initial_model_index=N`: Speficy the index of the initial model snapshot. The argument must be a non-negative integer. Must be used with `initial_model_prefix` option.
 
-`reward_plugin=REWARD_PLUGIN`: Specify the path to the reward plugin. The path must be one that can be interpreted within the Docker guest.
+`reward_plugin=REWARD_PLUGIN`: Specify the path to the [reward plug-in](https://github.com/Cryolite/kanachan/wiki/Reward-Plugin). The path must be one that can be interpreted within the Docker guest.
 
 `discount_factor=GAMMA`: Specify the discount factor (in the sense of reinforcement learning). The argument must be a real number in the range \[0.0, 1.0\]. Default to `1.0`.
 

diff --git a/kanachan/training/iql/README.md b/kanachan/training/iql/README.md
@@ -97,7 +97,7 @@ Options are specified in the [Hydra](https://hydra.cc/) manner.
 
 `initial_model_index=N`: Speficy the index of the initial model snapshot. The argument must be a non-negative integer. Must be used with `initial_model_prefix` option.
 
-`reward_plugin=REWARD_PLUGIN`: Specify the path to the reward plugin. The path must be one that can be interpreted within the Docker guest.
+`reward_plugin=REWARD_PLUGIN`: Specify the path to the [reward plug-in](https://github.com/Cryolite/kanachan/wiki/Reward-Plugin). The path must be one that can be interpreted within the Docker guest.
 
 `discount_factor=GAMMA`: Specify the discount factor (in the sense of reinforcement learning). The argument must be a real number in the range \[0.0, 1.0\]. Default to `1.0`.