Tidy up CUDA notes

sungin-h · Nov 5, 2017 · 5de7f9e · 5de7f9e
1 parent 5c881f0
commit 5de7f9e
Show file tree

Hide file tree

Showing 2 changed files with 29 additions and 25 deletions.
diff --git a/docs/source/notes/cuda.rst b/docs/source/notes/cuda.rst
@@ -3,18 +3,19 @@
 CUDA semantics
 ==============
 
-:mod:`torch.cuda` keeps track of currently selected GPU, and all CUDA tensors
-you allocate will be created on it. The selected device can be changed with a
+:mod:`torch.cuda` is used to set up and run CUDA operations. It keeps track of
+the currently selected GPU, and all CUDA tensors you allocate will by default be
+created on that device. The selected device can be changed with a
 :any:`torch.cuda.device` context manager.
 
-However, once a tensor is allocated, you can do operations on it irrespectively
-of your selected device, and the results will be always placed in on the same
+However, once a tensor is allocated, you can do operations on it irrespective
+of the selected device, and the results will be always placed in on the same
 device as the tensor.
 
 Cross-GPU operations are not allowed by default, with the only exception of
-:meth:`~torch.Tensor.copy_`. Unless you enable peer-to-peer memory accesses,
-any attempts to launch ops on tensors spread across different devices will
-raise an error.
+:meth:`~torch.Tensor.copy_`. Unless you enable peer-to-peer memory access, any
+attempts to launch ops on tensors spread across different devices will raise an
+error.
 
 Below you can find a small example showcasing this::
 
@@ -52,10 +53,10 @@ device-agnostic (CPU or GPU) code; an example may be creating a new tensor as
 the initial hidden state of a recurrent neural network. 
 
 The first step is to determine whether the GPU should be used or not. A common
-pattern is to use Python's `argparse` module to read in user arguments, and
+pattern is to use Python's ``argparse`` module to read in user arguments, and
 have a flag that can be used to disable CUDA, in combination with
-`torch.cuda.is_available()`. In the following, `args.cuda` results in a flag
-that can be used to cast tensors and modules to CUDA if desired::
+:meth:`~torch.cuda.is_available`. In the following, ``args.cuda`` results in a
+flag that can be used to cast tensors and modules to CUDA if desired::
 
     import argparse
     import torch
@@ -66,7 +67,7 @@ that can be used to cast tensors and modules to CUDA if desired::
     args = parser.parse_args()
     args.cuda = not args.disable_cuda and torch.cuda.is_available()
 
-If modules or tensors need to be sent to the GPU, `args.cuda` can be used as
+If modules or tensors need to be sent to the GPU, ``args.cuda`` can be used as
 follows::
 
     x = torch.Tensor(8, 42)
@@ -84,19 +85,20 @@ dataloader would be as follows::
         x = Variable(x.type(dtype))
 
 When working with multiple GPUs on a system, you can use the
-`CUDA_VISIBLE_DEVICES` environment flag to manage which GPUs are available to
-PyTorch. To manually control which GPU a tensor is created on, the best practice
-is to use the `torch.cuda.device()` context manager::
+``CUDA_VISIBLE_DEVICES`` environment flag to manage which GPUs are available to
+PyTorch. As mentioned above, to manually control which GPU a tensor is created
+on, the best practice is to use a :any:`torch.cuda.device` context manager::
 
     print("Outside device is 0")  # On device 0 (default in most scenarios)
     with torch.cuda.device(1):
         print("Inside device is 1")  # On device 1
     print("Outside device is still 0")  # On device 0
 
 If you have a tensor and would like to create a new tensor of the same type on
-the same device, then you can use the `.new()` function, which acts the same as
-a normal tensor constructor. Whilst the previously mentioned methods depend on
-the current GPU context, `new()` preserves the device of the original tensor.
+the same device, then you can use the :meth:`~torch.Tensor.new` method, which
+acts the same as a normal tensor constructor. Whilst the previously mentioned
+methods depend on the current GPU context, :meth:`~torch.Tensor.new` preserves
+the device of the original tensor.
 
 This is the recommended practice when creating modules in which new
 tensors/variables need to be created internally during the forward pass::
@@ -110,8 +112,9 @@ tensors/variables need to be created internally during the forward pass::
     y_cpu_long = x_cpu_long.new([[1, 2, 3]])
 
 If you want to create a tensor of the same type and size of another tensor, and
-fill it with either ones or zeros, `torch.ones_like()` or `torch.zeros_like()`
-are provided as more convenient functions (which also preserve device)::
+fill it with either ones or zeros, :meth:`~torch.ones_like` or
+:meth:`~torch.zeros_like` are provided as convenient helper functions (which
+also preserve device)::
 
     x_cpu = torch.FloatTensor(1)
     x_gpu = torch.cuda.FloatTensor(1)
@@ -145,9 +148,9 @@ pinned memory by passing ``pin_memory=True`` to its constructor.
 Use nn.DataParallel instead of multiprocessing
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Most use cases involving batched input and multiple GPUs should default to using
-:class:`~torch.nn.DataParallel` to utilize more than one GPU. Even with the GIL,
-a single python process can saturate multiple GPUs.
+Most use cases involving batched inputs and multiple GPUs should default to
+using :class:`~torch.nn.DataParallel` to utilize more than one GPU. Even with
+the GIL, a single Python process can saturate multiple GPUs.
 
 As of version 0.1.9, large numbers of GPUs (8+) might not be fully utilized.
 However, this is a known issue that is under active development. As always,

diff --git a/torch/distributions.py b/torch/distributions.py
@@ -2,9 +2,10 @@
 The ``distributions`` package contains parameterizable probability distributions
 and sampling functions.
 
-Policy gradient methods can be implemented using the :meth:`log_prob` method,
-when the probability density function is differentiable with respect to its
-parameters. A basic method is the REINFORCE rule:
+Policy gradient methods can be implemented using the
+:meth:`~torch.distributions.Distribution.log_prob` method, when the probability
+density function is differentiable with respect to its parameters. A basic
+method is the REINFORCE rule:
 
 .. math::