Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilippThoelke committed May 11, 2021
1 parent af5bd31 commit b5697f9
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ CUDA_VISIBLE_DEVICES=0 python torchmd-net/scripts/torchmd_train.py --conf torchm
```

## Multi-Node Training
__Currently does not work with the most recent PyTorch Lightning version. Tested for pytorch-lightning==1.1.0__
__Currently does not work with the most recent PyTorch Lightning version. Tested up to pytorch-lightning==1.2.10__

In order to train models on multiple nodes some environment variables have to be set, which provide all necessary information to PyTorch Lightning. In the following we provide an example bash script to start training on two machines with two GPUs each. The script has to be started once on each node. Once [`train.py`](https://github.com/compsciencelab/torchmd-net/blob/main/scripts/train.py) is started on all nodes, a network connection between the nodes will be established using NCCL.

Expand Down

0 comments on commit b5697f9

Please sign in to comment.