Skip to content

Commit

Permalink
hyperparameter
Browse files Browse the repository at this point in the history
  • Loading branch information
astonzhang committed Jul 2, 2020
1 parent 7fb8c78 commit 5125a00
Show file tree
Hide file tree
Showing 22 changed files with 36 additions and 36 deletions.
2 changes: 1 addition & 1 deletion chapter_attention-mechanisms/seq2seq-attention.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ class Seq2SeqAttentionDecoder(d2l.Decoder):
enc_valid_len]
```

Now we can test the seq2seq with attention model. To be consistent with the model without attention in :numref:`sec_seq2seq`, we use the same hyper-parameters for `vocab_size`, `embed_size`, `num_hiddens`, and `num_layers`. As a result, we get the same decoder output shape, but the state structure is changed.
Now we can test the seq2seq with attention model. To be consistent with the model without attention in :numref:`sec_seq2seq`, we use the same hyperparameters for `vocab_size`, `embed_size`, `num_hiddens`, and `num_layers`. As a result, we get the same decoder output shape, but the state structure is changed.

```{.python .input n=3}
encoder = d2l.Seq2SeqEncoder(vocab_size=10, embed_size=8,
Expand Down
2 changes: 1 addition & 1 deletion chapter_computer-vision/fcn.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ d2l.show_images(imgs[::3] + imgs[1::3] + imgs[2::3], 3, n, scale=2);
## Exercises

1. If we use Xavier to randomly initialize the transposed convolution layer, what will happen to the result?
1. Can you further improve the accuracy of the model by tuning the hyper-parameters?
1. Can you further improve the accuracy of the model by tuning the hyperparameters?
1. Predict the categories of all pixels in the test image.
1. The outputs of some intermediate layers of the convolutional neural network are also used in the paper on fully convolutional networks[1]. Try to implement this idea.

Expand Down
2 changes: 1 addition & 1 deletion chapter_computer-vision/fine-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ As you can see, the fine-tuned model tends to achieve higher precision in the sa
## Exercises

1. Keep increasing the learning rate of `finetune_net`. How does the precision of the model change?
2. Further tune the hyper-parameters of `finetune_net` and `scratch_net` in the comparative experiment. Do they still have different precisions?
2. Further tune the hyperparameters of `finetune_net` and `scratch_net` in the comparative experiment. Do they still have different precisions?
3. Set the parameters in `finetune_net.features` to the parameters of the source model and do not update them during training. What will happen? You can use the following code.

```{.python .input}
Expand Down
10 changes: 5 additions & 5 deletions chapter_computer-vision/kaggle-cifar10.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ print('# training examples:', len(labels))
print('# classes:', len(set(labels.values())))
```

Next, we define the `reorg_train_valid` function to segment the validation set from the original training set. The argument `valid_ratio` in this function is the ratio of the number of examples in the validation set to the number of examples in the original training set. In particular, let $n$ be the number of images of the class with the least examples, and $r$ be the ratio, then we will use $\max(\lfloor nr\rfloor,1)$ images for each class as the validation set. Let us use `valid_ratio=0.1` as an example. Since the original training set has $50,000$ images, there will be $45,000$ images used for training and stored in the path "`train_valid_test/train`" when tuning hyper-parameters, while the other $5,000$ images will be stored as validation set in the path "`train_valid_test/valid`". After organizing the data, images of the same class will be placed under the same folder so that we can read them later.
Next, we define the `reorg_train_valid` function to segment the validation set from the original training set. The argument `valid_ratio` in this function is the ratio of the number of examples in the validation set to the number of examples in the original training set. In particular, let $n$ be the number of images of the class with the least examples, and $r$ be the ratio, then we will use $\max(\lfloor nr\rfloor,1)$ images for each class as the validation set. Let us use `valid_ratio=0.1` as an example. Since the original training set has $50,000$ images, there will be $45,000$ images used for training and stored in the path "`train_valid_test/train`" when tuning hyperparameters, while the other $5,000$ images will be stored as validation set in the path "`train_valid_test/valid`". After organizing the data, images of the same class will be placed under the same folder so that we can read them later.

```{.python .input n=2}
#@save
Expand Down Expand Up @@ -141,7 +141,7 @@ def reorg_cifar10_data(data_dir, valid_ratio):
reorg_test(data_dir)
```

We only set the batch size to $1$ for the demo dataset. During actual training and testing, the complete dataset of the Kaggle competition should be used and `batch_size` should be set to a larger integer, such as $128$. We use $10\%$ of the training examples as the validation set for tuning hyper-parameters.
We only set the batch size to $1$ for the demo dataset. During actual training and testing, the complete dataset of the Kaggle competition should be used and `batch_size` should be set to a larger integer, such as $128$. We use $10\%$ of the training examples as the validation set for tuning hyperparameters.

```{.python .input n=4}
batch_size = 1 if demo else 128
Expand Down Expand Up @@ -270,7 +270,7 @@ loss = gluon.loss.SoftmaxCrossEntropyLoss()

## Defining the Training Functions

We will select the model and tune hyper-parameters according to the model's performance on the validation set. Next, we define the model training function `train`. We record the training time of each epoch, which helps us compare the time costs of different models.
We will select the model and tune hyperparameters according to the model's performance on the validation set. Next, we define the model training function `train`. We record the training time of each epoch, which helps us compare the time costs of different models.

```{.python .input n=12}
def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,
Expand Down Expand Up @@ -305,7 +305,7 @@ def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,

## Training and Validating the Model

Now, we can train and validate the model. The following hyper-parameters can be tuned. For example, we can increase the number of epochs. Because `lr_period` and `lr_decay` are set to 80 and 0.1 respectively, the learning rate of the optimization algorithm will be multiplied by 0.1 after every 80 epochs. For simplicity, we only train one epoch here.
Now, we can train and validate the model. The following hyperparameters can be tuned. For example, we can increase the number of epochs. Because `lr_period` and `lr_decay` are set to 80 and 0.1 respectively, the learning rate of the optimization algorithm will be multiplied by 0.1 after every 80 epochs. For simplicity, we only train one epoch here.

```{.python .input n=13}
ctx, num_epochs, lr, wd = d2l.try_gpu(), 1, 0.1, 5e-4
Expand All @@ -317,7 +317,7 @@ train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,

## Classifying the Testing Set and Submitting Results on Kaggle

After obtaining a satisfactory model design and hyper-parameters, we use all training datasets (including validation sets) to retrain the model and classify the testing set.
After obtaining a satisfactory model design and hyperparameters, we use all training datasets (including validation sets) to retrain the model and classify the testing set.

```{.python .input n=14}
net, preds = get_net(ctx), []
Expand Down
6 changes: 3 additions & 3 deletions chapter_computer-vision/kaggle-dog.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ def evaluate_loss(data_iter, net, ctx):

## Defining the Training Functions

We will select the model and tune hyper-parameters according to the model's performance on the validation set. The model training function `train` only trains the small custom output network.
We will select the model and tune hyperparameters according to the model's performance on the validation set. The model training function `train` only trains the small custom output network.

```{.python .input n=7}
def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,
Expand Down Expand Up @@ -222,7 +222,7 @@ def train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,

## Training and Validating the Model

Now, we can train and validate the model. The following hyper-parameters can be tuned. For example, we can increase the number of epochs. Because `lr_period` and `lr_decay` are set to 10 and 0.1 respectively, the learning rate of the optimization algorithm will be multiplied by 0.1 after every 10 epochs.
Now, we can train and validate the model. The following hyperparameters can be tuned. For example, we can increase the number of epochs. Because `lr_period` and `lr_decay` are set to 10 and 0.1 respectively, the learning rate of the optimization algorithm will be multiplied by 0.1 after every 10 epochs.

```{.python .input n=9}
ctx, num_epochs, lr, wd = d2l.try_gpu(), 1, 0.01, 1e-4
Expand All @@ -234,7 +234,7 @@ train(net, train_iter, valid_iter, num_epochs, lr, wd, ctx, lr_period,

## Classifying the Testing Set and Submitting Results on Kaggle

After obtaining a satisfactory model design and hyper-parameters, we use all training datasets (including validation sets) to retrain the model and then classify the testing set. Note that predictions are made by the output network we just trained.
After obtaining a satisfactory model design and hyperparameters, we use all training datasets (including validation sets) to retrain the model and then classify the testing set. Note that predictions are made by the output network we just trained.

```{.python .input n=8}
net = get_net(ctx)
Expand Down
6 changes: 3 additions & 3 deletions chapter_computer-vision/neural-style.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Neural Style Transfer

If you use social sharing apps or happen to be an amateur photographer, you are familiar with filters. Filters can alter the color styles of photos to make the background sharper or people's faces whiter. However, a filter generally can only change one aspect of a photo. To create the ideal photo, you often need to try many different filter combinations. This process is as complex as tuning the hyper-parameters of a model.
If you use social sharing apps or happen to be an amateur photographer, you are familiar with filters. Filters can alter the color styles of photos to make the background sharper or people's faces whiter. However, a filter generally can only change one aspect of a photo. To create the ideal photo, you often need to try many different filter combinations. This process is as complex as tuning the hyperparameters of a model.

In this section, we will discuss how we can use convolution neural networks
(CNNs) to automatically apply the style of one image to another image, an
Expand Down Expand Up @@ -158,7 +158,7 @@ def tv_loss(Y_hat):

### The Loss Function

The loss function for style transfer is the weighted sum of the content loss, style loss, and total variance loss. By adjusting these weight hyper-parameters, we can balance the retained content, transferred style, and noise reduction in the composite image according to their relative importance.
The loss function for style transfer is the weighted sum of the content loss, style loss, and total variance loss. By adjusting these weight hyperparameters, we can balance the retained content, transferred style, and noise reduction in the composite image according to their relative importance.

```{.python .input n=14}
content_weight, style_weight, tv_weight = 1, 1e3, 10
Expand Down Expand Up @@ -275,7 +275,7 @@ As you can see, each epoch takes more time due to the larger image size. As show
## Exercises

1. How does the output change when you select different content and style layers?
1. Adjust the weight hyper-parameters in the loss function. Does the output retain more content or have less noise?
1. Adjust the weight hyperparameters in the loss function. Does the output retain more content or have less noise?
1. Use different content and style images. Can you create more interesting composite images?

:begin_tab:`mxnet`
Expand Down
4 changes: 2 additions & 2 deletions chapter_computer-vision/ssd.md
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,7 @@ d2l.plt.legend();
In the experiment, we used cross-entropy loss for category prediction. Now,
assume that the prediction probability of the actual category $j$ is $p_j$ and
the cross-entropy loss is $-\log p_j$. We can also use the focal loss
:cite:`Lin.Goyal.Girshick.ea.2017`. Given the positive hyper-parameters $\gamma$
:cite:`Lin.Goyal.Girshick.ea.2017`. Given the positive hyperparameters $\gamma$
and $\alpha$, this loss is defined as:

$$ - \alpha (1-p_j)^{\gamma} \log p_j.$$
Expand All @@ -427,7 +427,7 @@ d2l.plt.legend();

2. When an object is relatively large compared to the image, the model normally adopts a larger input image size.
3. This generally produces a large number of negative anchor boxes when labeling anchor box categories. We can sample the negative anchor boxes to better balance the data categories. To do this, we can set the `MultiBoxTarget` function's `negative_mining_ratio` parameter.
4. Assign hyper-parameters with different weights to the anchor box category loss and positive anchor box offset loss in the loss function.
4. Assign hyperparameters with different weights to the anchor box category loss and positive anchor box offset loss in the loss function.
5. Refer to the SSD paper. What methods can be used to evaluate the precision of object detection models :cite:`Liu.Anguelov.Erhan.ea.2016`?

:begin_tab:`mxnet`
Expand Down
2 changes: 1 addition & 1 deletion chapter_computer-vision/transposed-conv.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ tconv(X)
The multi-channel extension of the transposed convolution is the same as the convolution. When the input has multiple channels, denoted by $c_i$, the transposed convolution assigns a $k_h\times k_w$ kernel matrix to each input channel. If the output has a channel size $c_o$, then we have a $c_i\times k_h\times k_w$ kernel for each output channel.


As a result, if we feed $X$ into a convolutional layer $f$ to compute $Y=f(X)$ and create a transposed convolution layer $g$ with the same hyper-parameters as $f$ except for the output channel set to be the channel size of $X$, then $g(Y)$ should has the same shape as $X$. Let us verify this statement.
As a result, if we feed $X$ into a convolutional layer $f$ to compute $Y=f(X)$ and create a transposed convolution layer $g$ with the same hyperparameters as $f$ except for the output channel set to be the channel size of $X$, then $g(Y)$ should has the same shape as $X$. Let us verify this statement.

```{.python .input}
X = np.random.uniform(size=(1, 10, 16, 16))
Expand Down
2 changes: 1 addition & 1 deletion chapter_convolutional-modern/batch-norm.md
Original file line number Diff line number Diff line change
Expand Up @@ -603,7 +603,7 @@ def net():
])
```

Below, we use the same hyper-parameters to train out model.
Below, we use the same hyperparameters to train out model.
Note that as usual, the high-level API variant runs much faster
because its code has been compiled to C++/CUDA
while our custom implementation must be interpreted by Python.
Expand Down
2 changes: 1 addition & 1 deletion chapter_convolutional-modern/nin.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr)

## Exercises

1. Tune the hyper-parameters to improve the classification accuracy.
1. Tune the hyperparameters to improve the classification accuracy.
1. Why are there two $1\times 1$ convolutional layers in the NiN block? Remove one of them, and then observe and analyze the experimental phenomena.
1. Calculate the resource usage for NiN
* What is the number of parameters?
Expand Down
2 changes: 1 addition & 1 deletion chapter_generative-adversarial-networks/dcgan.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ Y = [nn.LeakyReLU(alpha)(x).asnumpy() for alpha in alphas]
d2l.plot(x.asnumpy(), Y, 'x', 'y', alphas)
```

The basic block of the discriminator is a convolution layer followed by a batch normalization layer and a leaky ReLU activation. The hyper-parameters of the convolution layer are similar to the transpose convolution layer in the generator block.
The basic block of the discriminator is a convolution layer followed by a batch normalization layer and a leaky ReLU activation. The hyperparameters of the convolution layer are similar to the transpose convolution layer in the generator block.

```{.python .input n=11}
class D_block(nn.Block):
Expand Down
2 changes: 1 addition & 1 deletion chapter_generative-adversarial-networks/gan.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ def train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G, latent_dim, data):
f'{metric[2] / timer.stop():.1f} examples/sec')
```

Now we specify the hyper-parameters to fit the Gaussian distribution.
Now we specify the hyperparameters to fit the Gaussian distribution.

```{.python .input n=10}
lr_D, lr_G, latent_dim, num_epochs = 0.05, 0.005, 2, 20
Expand Down
2 changes: 1 addition & 1 deletion chapter_multilayer-perceptrons/kaggle-house-price.md
Original file line number Diff line number Diff line change
Expand Up @@ -727,7 +727,7 @@ The steps are quite simple:
* Real data often contains a mix of different data types and needs to be preprocessed.
* Rescaling real-valued data to zero mean and unit variance is a good default. So is replacing missing values with their mean.
* Transforming categorical variables into indicator variables allows us to treat them like vectors.
* We can use k-fold cross validation to select the model and adjust the hyper-parameters.
* We can use k-fold cross validation to select the model and adjust the hyperparameters.
* Logarithms are useful for relative loss.


Expand Down
6 changes: 3 additions & 3 deletions chapter_multilayer-perceptrons/underfit-overfit.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Model Selection, Underfitting and Overfitting
# Model Selection, Underfitting, and Overfitting
:label:`sec_model_selection`

As machine learning scientists,
Expand Down Expand Up @@ -50,7 +50,7 @@ The phenomena of fitting our training data
more closely than we fit the underlying distribution is called overfitting, and the techniques used to combat overfitting are called regularization.
In the previous sections, you might have observed
this effect while experimenting with the Fashion-MNIST dataset.
If you altered the model structure or the hyper-parameters during the experiment, you might have noticed that with enough nodes, layers, and training epochs, the model can eventually reach perfect accuracy on the training set, even as the accuracy on test data deteriorates.
If you altered the model structure or the hyperparameters during the experiment, you might have noticed that with enough nodes, layers, and training epochs, the model can eventually reach perfect accuracy on the training set, even as the accuracy on test data deteriorates.


## Training Error and Generalization Error
Expand Down Expand Up @@ -245,7 +245,7 @@ we will typically employ a validation set.
### Validation Dataset

In principle we should not touch our test set
until after we have chosen all our hyper-parameters.
until after we have chosen all our hyperparameters.
Were we to use the test data in the model selection process,
there is a risk that we might overfit the test data.
Then we would be in serious trouble.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,8 @@ d2l.predict_sentiment(net, vocab, 'this movie is so bad')

## Exercises

1. Tune the hyper-parameters and compare the two sentiment analysis methods, using recurrent neural networks and using convolutional neural networks, as regards accuracy and operational efficiency.
1. Can you further improve the accuracy of the model on the test set by using the three methods introduced in the previous section: tuning hyper-parameters, using larger pre-trained word vectors, and using the spaCy word tokenization tool?
1. Tune the hyperparameters and compare the two sentiment analysis methods, using recurrent neural networks and using convolutional neural networks, as regards accuracy and operational efficiency.
1. Can you further improve the accuracy of the model on the test set by using the three methods introduced in the previous section: tuning hyperparameters, using larger pre-trained word vectors, and using the spaCy word tokenization tool?
1. What other natural language processing tasks can you use textCNN for?

:begin_tab:`mxnet`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ predict_sentiment(net, vocab, 'this movie is so bad')

## Exercises

1. Increase the number of epochs. What accuracy rate can you achieve on the training and testing datasets? What about trying to re-tune other hyper-parameters?
1. Increase the number of epochs. What accuracy rate can you achieve on the training and testing datasets? What about trying to re-tune other hyperparameters?
1. Will using larger pre-trained word vectors, such as 300-dimensional GloVe word vectors, improve classification accuracy?
1. Can we improve the classification accuracy by using the spaCy word tokenization tool? You need to install spaCy: `pip install spacy` and install the English package: `python -m spacy download en`. In the code, first import spacy: `import spacy`. Then, load the spacy English package: `spacy_en = spacy.load('en')`. Finally, define the function `def tokenizer(text): return [tok.text for tok in spacy_en.tokenizer(text)]` and replace the original `tokenizer` function. It should be noted that GloVe's word vector uses "-" to connect each word when storing noun phrases. For example, the phrase "new york" is represented as "new-york" in GloVe. After using spaCy tokenization, "new york" may be stored as "new york".

Expand Down
Loading

0 comments on commit 5125a00

Please sign in to comment.