Skip to content

Commit

Permalink
Add doc explanation about synchronous algorithm shared GPU utilizatio…
Browse files Browse the repository at this point in the history
…n between workers and driver. (ray-project#8400)
  • Loading branch information
internetcoffeephone authored Jun 11, 2020
1 parent ea965d7 commit 9166e22
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions doc/source/rllib-training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,14 @@ Specifying Resources

You can control the degree of parallelism used by setting the ``num_workers`` hyperparameter for most algorithms. The number of GPUs the driver should use can be set via the ``num_gpus`` option. Similarly, the resource allocation to workers can be controlled via ``num_cpus_per_worker``, ``num_gpus_per_worker``, and ``custom_resources_per_worker``. The number of GPUs can be a fractional quantity to allocate only a fraction of a GPU. For example, with DQN you can pack five trainers onto one GPU by setting ``num_gpus: 0.2``.

For synchronous algorithms like PPO and A2C, the driver and workers can make use of the same GPU. To do this for an amount of ``n`` GPUS:

.. code-block:: python
gpu_count = n
num_gpus = 0.0001 # Driver GPU
num_gpus_per_worker = (gpu_count - num_gpus) / num_workers
.. Original image: https://docs.google.com/drawings/d/14QINFvx3grVyJyjAnjggOCEVN-Iq6pYVJ3jA2S6j8z0/edit?usp=sharing
.. image:: rllib-config.svg

Expand Down

0 comments on commit 9166e22

Please sign in to comment.