Update default framework to PyTorch (d2l-ai#2098)

* Make PyTorch primary tab * rebase master & resolve conflicts * Update Jenkinsfile with PyTorch default * [Test] PyTorch Primary build from scratch * resolve conflicts * Test default pytorch tabs * PyTorch Default evaluate all
rdeggau · Apr 28, 2022 · beb907e · beb907e
1 parent 23c26c8
commit beb907e
Show file tree

Hide file tree

Showing 72 changed files with 516 additions and 8 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -30,15 +30,15 @@ stage("Build and Publish") {
       conda activate ${ENV_NAME}
       ./static/cache.sh restore _build/eval/data
       d2lbook build eval
+      d2lbook build slides --tab pytorch
       ./static/cache.sh store _build/eval/data
       """
 
-      sh label: "Execute Notebooks [PyTorch]", script: """set -ex
+      sh label: "Execute Notebooks [MXNet]", script: """set -ex
       conda activate ${ENV_NAME}
-      ./static/cache.sh restore _build/eval_pytorch/data
-      d2lbook build eval --tab pytorch
-      d2lbook build slides --tab pytorch
-      ./static/cache.sh store _build/eval_pytorch/data
+      ./static/cache.sh restore _build/eval_mxnet/data
+      d2lbook build eval --tab mxnet
+      ./static/cache.sh store _build/eval_mxnet/data
       """
 
       sh label: "Execute Notebooks [TensorFlow]", script: """set -ex
@@ -60,9 +60,9 @@ stage("Build and Publish") {
       d2lbook build pdf
       """
 
-      sh label:"Build Pytorch PDF", script:"""set -ex
+      sh label:"Build MXNet PDF", script:"""set -ex
       conda activate ${ENV_NAME}
-      d2lbook build pdf --tab pytorch
+      d2lbook build pdf --tab mxnet
       """
 
       if (env.BRANCH_NAME == 'release') {

diff --git a/chapter_appendix-mathematics-for-deep-learning/distributions.md b/chapter_appendix-mathematics-for-deep-learning/distributions.md
@@ -4,6 +4,7 @@
 Now that we have learned how to work with probability in both the discrete and the continuous setting, let's get to know some of the common distributions encountered.  Depending on the area of machine learning, we may need to be familiar with vastly more of these, or for some areas of deep learning potentially none at all.  This is, however, a good basic list to be familiar with.  Let's first import some common libraries.
 
 ```{.python .input}
+#@tab mxnet
 %matplotlib inline
 from d2l import mxnet as d2l
 from IPython import display
@@ -63,6 +64,7 @@ d2l.plt.show()
 Now, let's plot the cumulative distribution function :eqref:`eq_bernoulli_cdf`.
 
 ```{.python .input}
+#@tab mxnet
 x = np.arange(-1, 2, 0.01)
 
 def F(x):
@@ -99,6 +101,7 @@ If $X \sim \mathrm{Bernoulli}(p)$, then:
 We can sample an array of arbitrary shape from a Bernoulli random variable as follows.
 
 ```{.python .input}
+#@tab mxnet
 1*(np.random.rand(10, 10) < p)
 ```
 
@@ -140,6 +143,7 @@ d2l.plt.show()
 Now, let's plot the cumulative distribution function :eqref:`eq_discrete_uniform_cdf`.
 
 ```{.python .input}
+#@tab mxnet
 x = np.arange(-1, 6, 0.01)
 
 def F(x):
@@ -176,6 +180,7 @@ If $X \sim U(n)$, then:
 We can sample an array of arbitrary shape from a discrete uniform random variable as follows.
 
 ```{.python .input}
+#@tab mxnet
 np.random.randint(1, n, size=(10, 10))
 ```
 
@@ -210,6 +215,7 @@ $$F(x) = \begin{cases} 0 & x < a, \\ \frac{x-a}{b-a} & x \in [a, b], \\ 1 & x >=
 Let's first plot the probability density function :eqref:`eq_cont_uniform_pdf`.
 
 ```{.python .input}
+#@tab mxnet
 a, b = 1, 3
 
 x = np.arange(0, 4, 0.01)
@@ -239,6 +245,7 @@ d2l.plot(x, p, 'x', 'p.d.f.')
 Now, let's plot the cumulative distribution function :eqref:`eq_cont_uniform_cdf`.
 
 ```{.python .input}
+#@tab mxnet
 def F(x):
     return 0 if x < a else 1 if x > b else (x - a) / (b - a)
 
@@ -269,6 +276,7 @@ If $X \sim U(a, b)$, then:
 We can sample an array of arbitrary shape from a uniform random variable as follows.  Note that it by default samples from a $U(0,1)$, so if we want a different range we need to scale it.
 
 ```{.python .input}
+#@tab mxnet
 (b - a) * np.random.rand(10, 10) + a
 ```
 
@@ -306,6 +314,7 @@ $$F(x) = \begin{cases} 0 & x < 0, \\ \sum_{m \le k} \binom{n}{m} p^m(1-p)^{n-m}
 Let's first plot the probability mass function.
 
 ```{.python .input}
+#@tab mxnet
 n, p = 10, 0.2
 
 # Compute binomial coefficient
@@ -364,6 +373,7 @@ d2l.plt.show()
 Now, let's plot the cumulative distribution function :eqref:`eq_binomial_cdf`.
 
 ```{.python .input}
+#@tab mxnet
 x = np.arange(-1, 11, 0.01)
 cmf = np.cumsum(pmf)
 
@@ -403,6 +413,7 @@ If $X \sim \mathrm{Binomial}(n, p)$, then:
 This follows from the linearity of expected value over the sum of $n$ Bernoulli random variables, and the fact that the variance of the sum of independent random variables is the sum of the variances. This can be sampled as follows.
 
 ```{.python .input}
+#@tab mxnet
 np.random.binomial(n, p, size=(10, 10))
 ```
 
@@ -453,6 +464,7 @@ $$F(x) = \begin{cases} 0 & x < 0, \\ e^{-\lambda}\sum_{m = 0}^k \frac{\lambda^m}
 Let's first plot the probability mass function :eqref:`eq_poisson_mass`.
 
 ```{.python .input}
+#@tab mxnet
 lam = 5.0
 
 xs = [i for i in range(20)]
@@ -495,6 +507,7 @@ d2l.plt.show()
 Now, let's plot the cumulative distribution function :eqref:`eq_poisson_cdf`.
 
 ```{.python .input}
+#@tab mxnet
 x = np.arange(-1, 21, 0.01)
 cmf = np.cumsum(pmf)
 def F(x):
@@ -531,6 +544,7 @@ As we saw above, the means and variances are particularly concise.  If $X \sim \
 This can be sampled as follows.
 
 ```{.python .input}
+#@tab mxnet
 np.random.poisson(lam, size=(10, 10))
 ```
 
@@ -558,6 +572,7 @@ $$
 This can be seen to have mean zero and variance one, and so it is plausible to believe that it will converge to some limiting distribution.  If we plot what these distributions look like, we will become even more convinced that it will work.
 
 ```{.python .input}
+#@tab mxnet
 p = 0.2
 ns = [1, 10, 100, 1000]
 d2l.plt.figure(figsize=(10, 3))
@@ -630,6 +645,7 @@ $$p_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}.$$
 Let's first plot the probability density function :eqref:`eq_gaussian_pdf`.
 
 ```{.python .input}
+#@tab mxnet
 mu, sigma = 0, 1
 
 x = np.arange(-3, 3, 0.01)
@@ -663,6 +679,7 @@ d2l.plot(x, p, 'x', 'p.d.f.')
 Now, let's plot the cumulative distribution function.  It is beyond the scope of this appendix, but the Gaussian c.d.f. does not have a closed-form formula in terms of more elementary functions.  We will use `erf` which provides a way to compute this integral numerically.
 
 ```{.python .input}
+#@tab mxnet
 def phi(x):
     return (1.0 + erf((x - mu) / (sigma * np.sqrt(2)))) / 2.0
 
@@ -713,6 +730,7 @@ To close the section, let's recall that if $X \sim \mathcal{N}(\mu, \sigma^2)$,
 We can sample from the Gaussian (or standard normal) distribution as shown below.
 
 ```{.python .input}
+#@tab mxnet
 np.random.normal(mu, sigma, size=(10, 10))
 ```
 

diff --git a/chapter_appendix-mathematics-for-deep-learning/eigendecomposition.md b/chapter_appendix-mathematics-for-deep-learning/eigendecomposition.md
@@ -89,6 +89,7 @@ We can solve this with the vectors $[1, -1]^\top$ and $[1, 2]^\top$ respectively
 We can check this in code using the built-in `numpy.linalg.eig` routine.
 
 ```{.python .input}
+#@tab mxnet
 %matplotlib inline
 from d2l import mxnet as d2l
 from IPython import display
@@ -277,6 +278,7 @@ that the eigenvalues are approximately $0.99$, $2.97$, $4.95$, $9.08$,
 all comfortably inside the ranges provided.
 
 ```{.python .input}
+#@tab mxnet
 A = np.array([[1.0, 0.1, 0.1, 0.1],
               [0.1, 3.0, 0.2, 0.3],
               [0.1, 0.2, 5.0, 0.5],
@@ -344,6 +346,7 @@ a random matrix with Gaussian entries, so let's make one of those.
 To be concrete, we start with a mean zero, variance one Gaussian distributed $5 \times 5$ matrix.
 
 ```{.python .input}
+#@tab mxnet
 np.random.seed(8675309)
 
 k = 5
@@ -393,6 +396,7 @@ Let's see what happens when we repeatedly multiply our matrix $\mathbf{A}$
 against a random input vector, and keep track of the norm.
 
 ```{.python .input}
+#@tab mxnet
 # Calculate the sequence of norms after repeatedly applying `A`
 v_in = np.random.randn(k, 1)
 
@@ -434,6 +438,7 @@ The norm is growing uncontrollably!
 Indeed if we take the list of quotients, we will see a pattern.
 
 ```{.python .input}
+#@tab mxnet
 # Compute the scaling factor of the norms
 norm_ratio_list = []
 for i in range(1, 100):
@@ -481,6 +486,7 @@ By taking the norm of the complex number
 we can measure that stretching factor. Let's also sort them.
 
 ```{.python .input}
+#@tab mxnet
 # Compute the eigenvalues
 eigs = np.linalg.eigvals(A).tolist()
 norm_eigs = [np.absolute(x) for x in eigs]
@@ -550,6 +556,7 @@ so that the largest eigenvalue is instead now just one.
 Let's see what happens in this case.
 
 ```{.python .input}
+#@tab mxnet
 # Rescale the matrix `A`
 A /= norm_eigs[-1]
 
@@ -599,6 +606,7 @@ d2l.plot(tf.range(0, 100), norm_list, 'Iteration', 'Value')
 We can also plot the ratio between consecutive norms as before and see that indeed it stabilizes.
 
 ```{.python .input}
+#@tab mxnet
 # Also plot the ratio
 norm_ratio_list = []
 for i in range(1, 100):

diff --git a/chapter_appendix-mathematics-for-deep-learning/geometry-linear-algebraic-ops.md b/chapter_appendix-mathematics-for-deep-learning/geometry-linear-algebraic-ops.md
@@ -147,6 +147,7 @@ Indeed, we can use this in three or three million dimensions without issue.
 As a simple example, let's see how to compute the angle between a pair of vectors:
 
 ```{.python .input}
+#@tab mxnet
 %matplotlib inline
 from d2l import mxnet as d2l
 from IPython import display
@@ -306,6 +307,7 @@ by just taking the vector between their means to define the decision plane
 and eyeball a crude threshold.  First we will load the data and compute the averages.
 
 ```{.python .input}
+#@tab mxnet
 # Load in the dataset
 train = gluon.data.vision.FashionMNIST(train=True)
 test = gluon.data.vision.FashionMNIST(train=False)
@@ -405,6 +407,7 @@ d2l.plt.show()
 In a fully machine learned solution, we would learn the threshold from the dataset.  In this case, I simply eyeballed a threshold that looked good on the training data by hand.
 
 ```{.python .input}
+#@tab mxnet
 # Print test set accuracy with eyeballed threshold
 w = (ave_1 - ave_0).T
 predictions = X_test.reshape(2000, -1).dot(w.flatten()) > -1500000
@@ -687,6 +690,7 @@ We can test to see this by seeing that multiplying
 by the inverse given by the formula above works in practice.
 
 ```{.python .input}
+#@tab mxnet
 M = np.array([[1, 2], [1, 4]])
 M_inv = np.array([[2, -1], [-0.5, 0.5]])
 M_inv.dot(M)
@@ -782,6 +786,7 @@ This area is referred to as the *determinant*.
 Let's check this quickly with some example code.
 
 ```{.python .input}
+#@tab mxnet
 import numpy as np
 np.linalg.det(np.array([[1, -1], [2, 3]]))
 ```
@@ -903,6 +908,7 @@ As seen in :numref:`sec_linear-algebra`,
 we can create tensors as is shown below.
 
 ```{.python .input}
+#@tab mxnet
 # Define tensors
 B = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
 A = np.array([[1, 2], [3, 4]])
@@ -943,6 +949,7 @@ we can consider the Einstein summation seen above
 and strip out the indices themselves to get the implementation:
 
 ```{.python .input}
+#@tab mxnet
 # Reimplement matrix multiplication
 np.einsum("ij, j -> i", A, v), A.dot(v)
 ```
@@ -970,6 +977,7 @@ $$
 it can be implemented via Einstein summation as:
 
 ```{.python .input}
+#@tab mxnet
 np.einsum("ijk, il, j -> kl", B, A, v)
 ```
 
@@ -991,6 +999,7 @@ by providing integer indices for each tensor.
 For example, the same tensor contraction can also be written as:
 
 ```{.python .input}
+#@tab mxnet
 np.einsum(B, [0, 1, 2], A, [0, 3], v, [1], [2, 3])
 ```
 

diff --git a/chapter_appendix-mathematics-for-deep-learning/information-theory.md b/chapter_appendix-mathematics-for-deep-learning/information-theory.md
@@ -42,6 +42,7 @@ $$I(\text{"0010"}) = - \log (p(\text{"0010"})) = - \log \left( \frac{1}{2^4} \ri
 We can calculate self information as shown below. Before that, let's first import all the necessary packages in this section.
 
 ```{.python .input}
+#@tab mxnet
 from mxnet import np
 from mxnet.metric import NegativeLogLikelihood
 from mxnet.ndarray import nansum
@@ -116,6 +117,7 @@ $$H(X) = - \int_x p(x) \log p(x) \; dx.$$
 We can define entropy as below.
 
 ```{.python .input}
+#@tab mxnet
 def entropy(p):
     entropy = - p * np.log2(p)
     # Operator `nansum` will sum up the non-nan number
@@ -200,6 +202,7 @@ $$
 Let's implement joint entropy from scratch.
 
 ```{.python .input}
+#@tab mxnet
 def joint_entropy(p_xy):
     joint_ent = -p_xy * np.log2(p_xy)
     # Operator `nansum` will sum up the non-nan number
@@ -261,6 +264,7 @@ This has an intuitive interpretation: the information in $Y$ given $X$ ($H(Y \mi
 Now, let's implement conditional entropy :eqref:`eq_cond_ent_def` from scratch.
 
 ```{.python .input}
+#@tab mxnet
 def conditional_entropy(p_xy, p_x):
     p_y_given_x = p_xy/p_x
     cond_ent = -p_xy * np.log2(p_y_given_x)
@@ -328,6 +332,7 @@ In many ways we can think of the mutual information :eqref:`eq_mut_ent_def` as p
 Now, let's implement mutual information from scratch.
 
 ```{.python .input}
+#@tab mxnet
 def mutual_information(p_xy, p_x, p_y):
     p = p_xy / (p_x * p_y)
     mutual = p_xy * np.log2(p)
@@ -409,6 +414,7 @@ As with the pointwise mutual information :eqref:`eq_pmi_def`, we can again provi
 Let's implement the KL divergence from Scratch.
 
 ```{.python .input}
+#@tab mxnet
 def kl_divergence(p, q):
     kl = p * np.log2(p / q)
     out = nansum(kl.as_nd_ndarray())
@@ -453,6 +459,7 @@ Let's go through a toy example to see the non-symmetry explicitly.
 First, let's generate and sort three tensors of length $10,000$: an objective tensor $p$ which follows a normal distribution $N(0, 1)$, and two candidate tensors $q_1$ and $q_2$ which follow normal distributions $N(-1, 1)$ and $N(1, 1)$ respectively.
 
 ```{.python .input}
+#@tab mxnet
 random.seed(1)
 
 nd_len = 10000
@@ -546,6 +553,7 @@ $$\mathrm{CE} (P, Q) = H(P) + D_{\mathrm{KL}}(P\|Q).$$
 We can implement the cross-entropy loss as below.
 
 ```{.python .input}
+#@tab mxnet
 def cross_entropy(y_hat, y):
     ce = -np.log(y_hat[range(len(y_hat)), y])
     return ce.mean()
@@ -570,6 +578,7 @@ def cross_entropy(y_hat, y):
 Now define two tensors for the labels and predictions, and calculate the cross-entropy loss of them.
 
 ```{.python .input}
+#@tab mxnet
 labels = np.array([0, 2])
 preds = np.array([[0.3, 0.6, 0.1], [0.2, 0.3, 0.5]])
 
@@ -647,6 +656,7 @@ Since in maximum likelihood estimation, we maximizing the objective function $l(
 To test the above proof, let's apply the built-in measure `NegativeLogLikelihood`. Using the same `labels` and `preds` as in the earlier example, we will get the same numerical loss as the previous example up to the 5 decimal place.
 
 ```{.python .input}
+#@tab mxnet
 nll_loss = NegativeLogLikelihood()
 nll_loss.update(labels.as_nd_ndarray(), preds.as_nd_ndarray())
 nll_loss.get()