Skip to content

Commit

Permalink
[doc] Brief note about RMM SAM allocator. [skip ci] (dmlc#10712)
Browse files Browse the repository at this point in the history
  • Loading branch information
trivialfis authored Aug 16, 2024
1 parent ec3f327 commit fd365c1
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 1 deletion.
18 changes: 17 additions & 1 deletion demo/rmm_plugin/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,20 @@ Since with RMM the memory pool is pre-allocated on a specific device, changing t
device ordinal in XGBoost can result in memory error ``cudaErrorIllegalAddress``. Use the
``CUDA_VISIBLE_DEVICES`` environment variable instead of the ``device="cuda:1"`` parameter
for selecting device. For distributed training, the distributed computing frameworks like
``dask-cuda`` are responsible for device management.
``dask-cuda`` are responsible for device management.

************************
Memory Over-Subscription
************************

.. warning::

This feature is still experimental and is under active development.

The newer NVIDIA platforms like `Grace-Hopper
<https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/>`__ use `NVLink-C2C
<https://www.nvidia.com/en-us/data-center/nvlink-c2c/>`__, which allows the CPU and GPU to
have a coherent memory model. Users can use the `SamHeadroomMemoryResource` in the latest
RMM to utilize system memory for storing data. This can help XGBoost utilize memory from
the host for GPU computation, but it may reduce performance due to slower CPU memory speed
and page migration overhead.
5 changes: 5 additions & 0 deletions doc/gpu/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ Multi-node Multi-GPU Training

XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples :doc:`/python/dask-examples/index`, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.

RMM integration
===============

XGBoost provides optional support for RMM integration. See :doc:`/python/rmm-examples/index` for more info.


Memory usage
============
Expand Down

0 comments on commit fd365c1

Please sign in to comment.