Skip to content

Commit

Permalink
Update default framework to PyTorch (d2l-ai#2098)
Browse files Browse the repository at this point in the history
* Make PyTorch primary tab

* rebase master & resolve conflicts

* Update Jenkinsfile with PyTorch default

* [Test] PyTorch Primary build from scratch

* resolve conflicts

* Test default pytorch tabs

* PyTorch Default evaluate all
  • Loading branch information
AnirudhDagar authored Apr 28, 2022
1 parent 23c26c8 commit beb907e
Show file tree
Hide file tree
Showing 72 changed files with 516 additions and 8 deletions.
14 changes: 7 additions & 7 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ stage("Build and Publish") {
conda activate ${ENV_NAME}
./static/cache.sh restore _build/eval/data
d2lbook build eval
d2lbook build slides --tab pytorch
./static/cache.sh store _build/eval/data
"""

sh label: "Execute Notebooks [PyTorch]", script: """set -ex
sh label: "Execute Notebooks [MXNet]", script: """set -ex
conda activate ${ENV_NAME}
./static/cache.sh restore _build/eval_pytorch/data
d2lbook build eval --tab pytorch
d2lbook build slides --tab pytorch
./static/cache.sh store _build/eval_pytorch/data
./static/cache.sh restore _build/eval_mxnet/data
d2lbook build eval --tab mxnet
./static/cache.sh store _build/eval_mxnet/data
"""

sh label: "Execute Notebooks [TensorFlow]", script: """set -ex
Expand All @@ -60,9 +60,9 @@ stage("Build and Publish") {
d2lbook build pdf
"""

sh label:"Build Pytorch PDF", script:"""set -ex
sh label:"Build MXNet PDF", script:"""set -ex
conda activate ${ENV_NAME}
d2lbook build pdf --tab pytorch
d2lbook build pdf --tab mxnet
"""

if (env.BRANCH_NAME == 'release') {
Expand Down
18 changes: 18 additions & 0 deletions chapter_appendix-mathematics-for-deep-learning/distributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
Now that we have learned how to work with probability in both the discrete and the continuous setting, let's get to know some of the common distributions encountered. Depending on the area of machine learning, we may need to be familiar with vastly more of these, or for some areas of deep learning potentially none at all. This is, however, a good basic list to be familiar with. Let's first import some common libraries.

```{.python .input}
#@tab mxnet
%matplotlib inline
from d2l import mxnet as d2l
from IPython import display
Expand Down Expand Up @@ -63,6 +64,7 @@ d2l.plt.show()
Now, let's plot the cumulative distribution function :eqref:`eq_bernoulli_cdf`.

```{.python .input}
#@tab mxnet
x = np.arange(-1, 2, 0.01)
def F(x):
Expand Down Expand Up @@ -99,6 +101,7 @@ If $X \sim \mathrm{Bernoulli}(p)$, then:
We can sample an array of arbitrary shape from a Bernoulli random variable as follows.

```{.python .input}
#@tab mxnet
1*(np.random.rand(10, 10) < p)
```

Expand Down Expand Up @@ -140,6 +143,7 @@ d2l.plt.show()
Now, let's plot the cumulative distribution function :eqref:`eq_discrete_uniform_cdf`.

```{.python .input}
#@tab mxnet
x = np.arange(-1, 6, 0.01)
def F(x):
Expand Down Expand Up @@ -176,6 +180,7 @@ If $X \sim U(n)$, then:
We can sample an array of arbitrary shape from a discrete uniform random variable as follows.

```{.python .input}
#@tab mxnet
np.random.randint(1, n, size=(10, 10))
```

Expand Down Expand Up @@ -210,6 +215,7 @@ $$F(x) = \begin{cases} 0 & x < a, \\ \frac{x-a}{b-a} & x \in [a, b], \\ 1 & x >=
Let's first plot the probability density function :eqref:`eq_cont_uniform_pdf`.

```{.python .input}
#@tab mxnet
a, b = 1, 3
x = np.arange(0, 4, 0.01)
Expand Down Expand Up @@ -239,6 +245,7 @@ d2l.plot(x, p, 'x', 'p.d.f.')
Now, let's plot the cumulative distribution function :eqref:`eq_cont_uniform_cdf`.

```{.python .input}
#@tab mxnet
def F(x):
return 0 if x < a else 1 if x > b else (x - a) / (b - a)
Expand Down Expand Up @@ -269,6 +276,7 @@ If $X \sim U(a, b)$, then:
We can sample an array of arbitrary shape from a uniform random variable as follows. Note that it by default samples from a $U(0,1)$, so if we want a different range we need to scale it.

```{.python .input}
#@tab mxnet
(b - a) * np.random.rand(10, 10) + a
```

Expand Down Expand Up @@ -306,6 +314,7 @@ $$F(x) = \begin{cases} 0 & x < 0, \\ \sum_{m \le k} \binom{n}{m} p^m(1-p)^{n-m}
Let's first plot the probability mass function.

```{.python .input}
#@tab mxnet
n, p = 10, 0.2
# Compute binomial coefficient
Expand Down Expand Up @@ -364,6 +373,7 @@ d2l.plt.show()
Now, let's plot the cumulative distribution function :eqref:`eq_binomial_cdf`.

```{.python .input}
#@tab mxnet
x = np.arange(-1, 11, 0.01)
cmf = np.cumsum(pmf)
Expand Down Expand Up @@ -403,6 +413,7 @@ If $X \sim \mathrm{Binomial}(n, p)$, then:
This follows from the linearity of expected value over the sum of $n$ Bernoulli random variables, and the fact that the variance of the sum of independent random variables is the sum of the variances. This can be sampled as follows.

```{.python .input}
#@tab mxnet
np.random.binomial(n, p, size=(10, 10))
```

Expand Down Expand Up @@ -453,6 +464,7 @@ $$F(x) = \begin{cases} 0 & x < 0, \\ e^{-\lambda}\sum_{m = 0}^k \frac{\lambda^m}
Let's first plot the probability mass function :eqref:`eq_poisson_mass`.

```{.python .input}
#@tab mxnet
lam = 5.0
xs = [i for i in range(20)]
Expand Down Expand Up @@ -495,6 +507,7 @@ d2l.plt.show()
Now, let's plot the cumulative distribution function :eqref:`eq_poisson_cdf`.

```{.python .input}
#@tab mxnet
x = np.arange(-1, 21, 0.01)
cmf = np.cumsum(pmf)
def F(x):
Expand Down Expand Up @@ -531,6 +544,7 @@ As we saw above, the means and variances are particularly concise. If $X \sim \
This can be sampled as follows.

```{.python .input}
#@tab mxnet
np.random.poisson(lam, size=(10, 10))
```

Expand Down Expand Up @@ -558,6 +572,7 @@ $$
This can be seen to have mean zero and variance one, and so it is plausible to believe that it will converge to some limiting distribution. If we plot what these distributions look like, we will become even more convinced that it will work.

```{.python .input}
#@tab mxnet
p = 0.2
ns = [1, 10, 100, 1000]
d2l.plt.figure(figsize=(10, 3))
Expand Down Expand Up @@ -630,6 +645,7 @@ $$p_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}.$$
Let's first plot the probability density function :eqref:`eq_gaussian_pdf`.

```{.python .input}
#@tab mxnet
mu, sigma = 0, 1
x = np.arange(-3, 3, 0.01)
Expand Down Expand Up @@ -663,6 +679,7 @@ d2l.plot(x, p, 'x', 'p.d.f.')
Now, let's plot the cumulative distribution function. It is beyond the scope of this appendix, but the Gaussian c.d.f. does not have a closed-form formula in terms of more elementary functions. We will use `erf` which provides a way to compute this integral numerically.

```{.python .input}
#@tab mxnet
def phi(x):
return (1.0 + erf((x - mu) / (sigma * np.sqrt(2)))) / 2.0
Expand Down Expand Up @@ -713,6 +730,7 @@ To close the section, let's recall that if $X \sim \mathcal{N}(\mu, \sigma^2)$,
We can sample from the Gaussian (or standard normal) distribution as shown below.

```{.python .input}
#@tab mxnet
np.random.normal(mu, sigma, size=(10, 10))
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ We can solve this with the vectors $[1, -1]^\top$ and $[1, 2]^\top$ respectively
We can check this in code using the built-in `numpy.linalg.eig` routine.

```{.python .input}
#@tab mxnet
%matplotlib inline
from d2l import mxnet as d2l
from IPython import display
Expand Down Expand Up @@ -277,6 +278,7 @@ that the eigenvalues are approximately $0.99$, $2.97$, $4.95$, $9.08$,
all comfortably inside the ranges provided.

```{.python .input}
#@tab mxnet
A = np.array([[1.0, 0.1, 0.1, 0.1],
[0.1, 3.0, 0.2, 0.3],
[0.1, 0.2, 5.0, 0.5],
Expand Down Expand Up @@ -344,6 +346,7 @@ a random matrix with Gaussian entries, so let's make one of those.
To be concrete, we start with a mean zero, variance one Gaussian distributed $5 \times 5$ matrix.

```{.python .input}
#@tab mxnet
np.random.seed(8675309)
k = 5
Expand Down Expand Up @@ -393,6 +396,7 @@ Let's see what happens when we repeatedly multiply our matrix $\mathbf{A}$
against a random input vector, and keep track of the norm.

```{.python .input}
#@tab mxnet
# Calculate the sequence of norms after repeatedly applying `A`
v_in = np.random.randn(k, 1)
Expand Down Expand Up @@ -434,6 +438,7 @@ The norm is growing uncontrollably!
Indeed if we take the list of quotients, we will see a pattern.

```{.python .input}
#@tab mxnet
# Compute the scaling factor of the norms
norm_ratio_list = []
for i in range(1, 100):
Expand Down Expand Up @@ -481,6 +486,7 @@ By taking the norm of the complex number
we can measure that stretching factor. Let's also sort them.

```{.python .input}
#@tab mxnet
# Compute the eigenvalues
eigs = np.linalg.eigvals(A).tolist()
norm_eigs = [np.absolute(x) for x in eigs]
Expand Down Expand Up @@ -550,6 +556,7 @@ so that the largest eigenvalue is instead now just one.
Let's see what happens in this case.

```{.python .input}
#@tab mxnet
# Rescale the matrix `A`
A /= norm_eigs[-1]
Expand Down Expand Up @@ -599,6 +606,7 @@ d2l.plot(tf.range(0, 100), norm_list, 'Iteration', 'Value')
We can also plot the ratio between consecutive norms as before and see that indeed it stabilizes.

```{.python .input}
#@tab mxnet
# Also plot the ratio
norm_ratio_list = []
for i in range(1, 100):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ Indeed, we can use this in three or three million dimensions without issue.
As a simple example, let's see how to compute the angle between a pair of vectors:

```{.python .input}
#@tab mxnet
%matplotlib inline
from d2l import mxnet as d2l
from IPython import display
Expand Down Expand Up @@ -306,6 +307,7 @@ by just taking the vector between their means to define the decision plane
and eyeball a crude threshold. First we will load the data and compute the averages.

```{.python .input}
#@tab mxnet
# Load in the dataset
train = gluon.data.vision.FashionMNIST(train=True)
test = gluon.data.vision.FashionMNIST(train=False)
Expand Down Expand Up @@ -405,6 +407,7 @@ d2l.plt.show()
In a fully machine learned solution, we would learn the threshold from the dataset. In this case, I simply eyeballed a threshold that looked good on the training data by hand.

```{.python .input}
#@tab mxnet
# Print test set accuracy with eyeballed threshold
w = (ave_1 - ave_0).T
predictions = X_test.reshape(2000, -1).dot(w.flatten()) > -1500000
Expand Down Expand Up @@ -687,6 +690,7 @@ We can test to see this by seeing that multiplying
by the inverse given by the formula above works in practice.

```{.python .input}
#@tab mxnet
M = np.array([[1, 2], [1, 4]])
M_inv = np.array([[2, -1], [-0.5, 0.5]])
M_inv.dot(M)
Expand Down Expand Up @@ -782,6 +786,7 @@ This area is referred to as the *determinant*.
Let's check this quickly with some example code.

```{.python .input}
#@tab mxnet
import numpy as np
np.linalg.det(np.array([[1, -1], [2, 3]]))
```
Expand Down Expand Up @@ -903,6 +908,7 @@ As seen in :numref:`sec_linear-algebra`,
we can create tensors as is shown below.

```{.python .input}
#@tab mxnet
# Define tensors
B = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
A = np.array([[1, 2], [3, 4]])
Expand Down Expand Up @@ -943,6 +949,7 @@ we can consider the Einstein summation seen above
and strip out the indices themselves to get the implementation:

```{.python .input}
#@tab mxnet
# Reimplement matrix multiplication
np.einsum("ij, j -> i", A, v), A.dot(v)
```
Expand Down Expand Up @@ -970,6 +977,7 @@ $$
it can be implemented via Einstein summation as:

```{.python .input}
#@tab mxnet
np.einsum("ijk, il, j -> kl", B, A, v)
```

Expand All @@ -991,6 +999,7 @@ by providing integer indices for each tensor.
For example, the same tensor contraction can also be written as:

```{.python .input}
#@tab mxnet
np.einsum(B, [0, 1, 2], A, [0, 3], v, [1], [2, 3])
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ $$I(\text{"0010"}) = - \log (p(\text{"0010"})) = - \log \left( \frac{1}{2^4} \ri
We can calculate self information as shown below. Before that, let's first import all the necessary packages in this section.

```{.python .input}
#@tab mxnet
from mxnet import np
from mxnet.metric import NegativeLogLikelihood
from mxnet.ndarray import nansum
Expand Down Expand Up @@ -116,6 +117,7 @@ $$H(X) = - \int_x p(x) \log p(x) \; dx.$$
We can define entropy as below.

```{.python .input}
#@tab mxnet
def entropy(p):
entropy = - p * np.log2(p)
# Operator `nansum` will sum up the non-nan number
Expand Down Expand Up @@ -200,6 +202,7 @@ $$
Let's implement joint entropy from scratch.

```{.python .input}
#@tab mxnet
def joint_entropy(p_xy):
joint_ent = -p_xy * np.log2(p_xy)
# Operator `nansum` will sum up the non-nan number
Expand Down Expand Up @@ -261,6 +264,7 @@ This has an intuitive interpretation: the information in $Y$ given $X$ ($H(Y \mi
Now, let's implement conditional entropy :eqref:`eq_cond_ent_def` from scratch.

```{.python .input}
#@tab mxnet
def conditional_entropy(p_xy, p_x):
p_y_given_x = p_xy/p_x
cond_ent = -p_xy * np.log2(p_y_given_x)
Expand Down Expand Up @@ -328,6 +332,7 @@ In many ways we can think of the mutual information :eqref:`eq_mut_ent_def` as p
Now, let's implement mutual information from scratch.

```{.python .input}
#@tab mxnet
def mutual_information(p_xy, p_x, p_y):
p = p_xy / (p_x * p_y)
mutual = p_xy * np.log2(p)
Expand Down Expand Up @@ -409,6 +414,7 @@ As with the pointwise mutual information :eqref:`eq_pmi_def`, we can again provi
Let's implement the KL divergence from Scratch.

```{.python .input}
#@tab mxnet
def kl_divergence(p, q):
kl = p * np.log2(p / q)
out = nansum(kl.as_nd_ndarray())
Expand Down Expand Up @@ -453,6 +459,7 @@ Let's go through a toy example to see the non-symmetry explicitly.
First, let's generate and sort three tensors of length $10,000$: an objective tensor $p$ which follows a normal distribution $N(0, 1)$, and two candidate tensors $q_1$ and $q_2$ which follow normal distributions $N(-1, 1)$ and $N(1, 1)$ respectively.

```{.python .input}
#@tab mxnet
random.seed(1)
nd_len = 10000
Expand Down Expand Up @@ -546,6 +553,7 @@ $$\mathrm{CE} (P, Q) = H(P) + D_{\mathrm{KL}}(P\|Q).$$
We can implement the cross-entropy loss as below.

```{.python .input}
#@tab mxnet
def cross_entropy(y_hat, y):
ce = -np.log(y_hat[range(len(y_hat)), y])
return ce.mean()
Expand All @@ -570,6 +578,7 @@ def cross_entropy(y_hat, y):
Now define two tensors for the labels and predictions, and calculate the cross-entropy loss of them.

```{.python .input}
#@tab mxnet
labels = np.array([0, 2])
preds = np.array([[0.3, 0.6, 0.1], [0.2, 0.3, 0.5]])
Expand Down Expand Up @@ -647,6 +656,7 @@ Since in maximum likelihood estimation, we maximizing the objective function $l(
To test the above proof, let's apply the built-in measure `NegativeLogLikelihood`. Using the same `labels` and `preds` as in the earlier example, we will get the same numerical loss as the previous example up to the 5 decimal place.

```{.python .input}
#@tab mxnet
nll_loss = NegativeLogLikelihood()
nll_loss.update(labels.as_nd_ndarray(), preds.as_nd_ndarray())
nll_loss.get()
Expand Down
Loading

0 comments on commit beb907e

Please sign in to comment.