added experiments

agrichron · Oct 29, 2020 · 59b3182 · 59b3182
1 parent 0fde977
commit 59b3182
Showing 1 changed file with 13 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@
 
 SAM simultaneously minimizes loss value and loss sharpness. In particular, it seeks parameters that lie in **neighborhoods having uniformly low loss**. SAM improves model generalization and yields [SoTA performance for several datasets](https://paperswithcode.com/paper/sharpness-aware-minimization-for-efficiently-1). Additionally, it provides robustness to label noise on par with that provided by SoTA procedures that specifically target learning with noisy labels.
 
-This is an unofficial repository for [Sharpness-Aware Minimization for Efficiently Improving Generalization](https://arxiv.org/abs/2010.01412). Implementation-wise, SAM class is a light wrapper that computes the regularized "sharpness-aware" gradient, which is used by the underlying optimizer (such as SGD with momentum). This repository also includes a simple [WRN for Cifar10](example); as a proof-of-concept, it beats the performance of SGD with momentum on this dataset.
+This is an **unofficial** repository for [Sharpness-Aware Minimization for Efficiently Improving Generalization](https://arxiv.org/abs/2010.01412). Implementation-wise, SAM class is a light wrapper that computes the regularized "sharpness-aware" gradient, which is used by the underlying optimizer (such as SGD with momentum). This repository also includes a simple [WRN for Cifar10](example); as a proof-of-concept, it beats the performance of SGD with momentum on this dataset.
 
 <p align="center">
   <img src="img/loss_landscape.png" alt="Loss landscape with and without SAM" width="512"/>  
@@ -80,3 +80,15 @@ Performs the second optimization step that updates the original weights with the
 | **Argument**    | **Description** |
 | :-------------- | :-------------- |
 | `zero_grad` (bool, optional) | set to True if you want to automatically zero-out all gradients after this step *(default: False)* |
+
+
+<br>
+
+## Experiments
+
+I've verified that SAM works on a simple WRN 16-8 model run on CIFAR10; you can replicate the experiment by running [train.py](example/train.py). The Wide-ResNet is enhanced only by label smoothing and the most basic image augmentations with cutout, so the errors are higher than those in the [SAM paper](https://arxiv.org/abs/2010.01412). Theoretically, you can get even lower errors by running for longer (1800 epochs instead of 200), because SAM shouldn't be as prone to overfitting.
+
+| Optimizer            | Test error rate |
+| :------------------- |   -----: |
+| SGD + momentum       |   3.35 % |
+| SAM + SGD + momentum |   2.98 % |