Skip to content

Commit

Permalink
Updating readme
Browse files Browse the repository at this point in the history
  • Loading branch information
tomgoldstein authored Sep 21, 2018
1 parent fc06ccc commit e0c362e
Showing 1 changed file with 27 additions and 21 deletions.
48 changes: 27 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
This repository contains the PyTorch code for the paper
> Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer and Tom Goldstein. [*Visualizing the Loss Landscape of Neural Nets*](https://arxiv.org/abs/1712.09913). NIPS, 2018.
Given a network architecture and its pre-trained parameters, this tool calculates and visualizes the model's surrounding loss surface along random direction(s) on the training set.
The calculation can be done in parallel with multiple GPUs with multiple nodes.
The direction(s) and the surface values are stored in HDF5 (`.h5`) files.
Given a network architecture and its pre-trained parameters, this tool calculates and visualizes the loss surface along random direction(s) near the optimal parameters.
The calculation can be done in parallel with multiple GPUs per node, and multiple nodes.
The random direction(s) and loss surface values are stored in HDF5 (`.h5`) files after they are produced.

## Setup

**Environment**: One multi-GPU node with following software/libraries installed:
**Environment**: One or more multi-GPU node(s) with the following software/libraries installed:
- [PyTorch 0.4](https://pytorch.org/)
- [openmpi 3.1.2](https://www.open-mpi.org/)
- [mpi4py 2.0.0](https://mpi4py.scipy.org/docs/usrman/install.html)
Expand All @@ -19,7 +19,7 @@ The direction(s) and the surface values are stored in HDF5 (`.h5`) files.
- [matplotlib 2.0.2](https://matplotlib.org/users/installing.html)

**Pre-trained models**:
The code accepts pre-trained PyTorch models for CIFAR-10 dataset.
The code accepts pre-trained PyTorch models for the CIFAR-10 dataset.
To load the pre-trained model correctly, the model file should contain `state_dict`, which is saved from the `state_dict()` method.
The default path for pre-trained networks is `cifar10/trained_nets`.
Some of the pre-trained models and plotted figures can be downloaded here:
Expand All @@ -29,28 +29,30 @@ Some of the pre-trained models and plotted figures can be downloaded here:
- [DenseNet-121](https://drive.google.com/a/cs.umd.edu/file/d/1oU0nDFv9CceYM4uW6RcOULYS-rnWxdVl/view?usp=sharing) (75 MB)

**Data preprocessing**:
The data normalization method for visualization should be consistent with the one used for model training.
The data pre-processing method used for visualization should be consistent with the one used for model training.
No data augmentation (random cropping or horizontal flipping) is used in calculating the loss values.

## Visualizing 1D loss curve

### 1D linear interpolation
The 1D linear interpolation method [1] evaluates the loss values along the direction between two solutions of the same network. It was used to compare the flatness of minimizers trained with different batch sizes [2].
### Creating 1D linear interpolations
The 1D linear interpolation method [1] evaluates the loss values along the direction between two minimizers of the same network loss function. This method has been used to compare the flatness of minimizers trained with different batch sizes [2].
A 1D linear interpolation plot is produced using the `plot_surface.py' method.

```
mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-0.5:1.5:401 --dir_type states \
--model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \
--model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7
```
- `--x=-0.5:1.5:401` sets the range of step size and the number of sampling points to be 401.
- `--dir_type states` indicates the direction contains dimensions for all parameters as well as the statistics of the BN layers (`running_mean` and `running_var`). Note that ignoring `running_mean` and `running_var` can not produce correct loss values when plotting two solutions in the same figure.
- `--x=-0.5:1.5:401` sets the range and resolution for the plot. The x-coordinates in the plot will run from -0.5 to 1.5 (the minimizers are located at 0 and 1), and the loss value will be evaluated at 401 locations along this line.
- `--dir_type states` indicates the direction contains dimensions for all parameters as well as the statistics of the BN layers (`running_mean` and `running_var`). Note that ignoring `running_mean` and `running_var` cannot produce correct loss values when plotting two solutions togeather in the same figure.
- The two model files contain network parameters describing the two distinct minimizers of the loss function. The plot will interpolate between these two minima.

![VGG-9 SGD, WD=0](doc/images/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1_model_300.t7_vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1_model_300.t7_states.h5_[-1.0,1.0,401].h5_1d_loss_acc.jpg)



### Random normalized direction
A random direction with the same dimension as the model parameters is created and normalized in the filter level.
### Producing plots along random normalized directions
A random direction with the same dimension as the model parameters is created and "filter normalized."
Then we can sample loss values along this direction.

```
Expand All @@ -59,15 +61,15 @@ mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-1:1:51 \
--dir_type weights --xnorm filter --xignore biasbn
```
- `--dir_type weights` indicates the direction has the same dimensions as the learned parameters, including bias and parameters in the BN layers.
- `--xnorm filter` normalizes the random direction in the filter level. Here the filter refers to weights that generate one neuron, which also applies to full-connected layers.
- `--xignore biasbn` ignores the direction corresponding to bias and BN parameters (set to zeros).
- `--xnorm filter` normalizes the random direction at the filter level. Here, a "filter" refers to the parameters that produce a single feature map. For fully connected layers, a "filter" contains the weights that contribute to a single neuron.
- `--xignore biasbn` ignores the direction corresponding to bias and BN parameters (fill the corresponding entries in the random vector with zeros).


![VGG-9 SGD, WD=0](doc/images/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7_weights_xignore=biasbn_xnorm=filter.h5_[-1.0,1.0,51].h5_1d_loss_acc.jpg)



We can also customize the 1D plots with `plot_1D.py` once the surface file is available.
We can also customize the appearance of the 1D plots by calling `plot_1D.py` once the surface file is available.


## Visualizing 2D loss contours
Expand All @@ -82,7 +84,7 @@ mpirun -n 4 python plot_surface.py --model resnet56 --x=-1:1:51 --y=-1:1:51 \

![ResNet-56](doc/images/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5_[-1.0,1.0,51]x[-1.0,1.0,51].h5_train_loss_2dcontour.jpg)

We can also customize the plots given a surface `.h5` file with `plot_2D.py`.
Once a surface is generated and stored in a `.h5` file, we can produce and customize a contour plot using the script `plot_2D.py`.

```
python plot_2D.py --surf_file path_to_surf_file --surf_name train_loss
Expand All @@ -94,18 +96,22 @@ python plot_2D.py --surf_file path_to_surf_file --surf_name train_loss

## Visualizing 3D loss surface
`plot_2D.py` can make a basic 3D loss surface plot with `matplotlib`.
You can also render the loss surface with [ParaView](http://paraview.org).
If you want a more detailed rendering that uses lighting to display details, you can render the loss surface with [ParaView](http://paraview.org).

![ResNet-56-noshort](doc/images/resnet56_noshort_small.jpg) ![ResNet-56](doc/images/resnet56_small.jpg)


1. Convert the surface `.h5` file to the `.vtp` file.
To do this, you must
1. Convert the surface `.h5` file to a `.vtp` file.
```
python h52vtp.py --file path_to_surf_file --surf_name train_loss --zmax 10 --log
```
It will generate a [VTK](https://www.kitware.com/products/books/VTKUsersGuide.pdf) file containing the loss surface with max value 10 in the log scale.
This will generate a [VTK](https://www.kitware.com/products/books/VTKUsersGuide.pdf) file containing the loss surface with max value 10 in the log scale.

2. Open the `.vtp` file with ParaView. In ParaView, open the `.vtp` file with the VTK reader. Click the eye icon in the `Pipeline Browser` to make the figure show up. You can drag the surface around, and change the colors in the `Properties` window.

3. If the surface appears extremely skinny and needle-like, you may need to adjust the "transforming" parameters in the left control panel. Enter numbers larger than 1 in the "scale" fields to widen the plot.

2. Open the `.vtp` file with ParaView. In ParaView, open the `.vtp` file with the VTK reader. Click the eye icon in the `Pipeline Browser` to make the figure show up. You can drag the surface around, and change the colors in the `Properties` window. `Save screenshot` in the File menu saves the image, which can be cropped elsewhere.
4. Select `Save screenshot` in the File menu to save the image.

## Reference

Expand Down

0 comments on commit e0c362e

Please sign in to comment.