Internal change.

PiperOrigin-RevId: 444463756
google-research · Apr 26, 2022 · 9fa1e34 · 9fa1e34
1 parent a70c67e
commit 9fa1e34
Show file tree

Hide file tree

Showing 6 changed files with 2,072 additions and 32 deletions.
diff --git a/language/multiberts/2m_vs_1m.ipynb b/language/multiberts/2m_vs_1m.ipynb
@@ -8,7 +8,7 @@
    "source": [
     "# Multi-bootstrap for evaluating pretrained LMs\n",
     "\n",
-    "This notebook shows an example of the paired analysis described in Section 4.1 of the paper. This type of analysis is applicable for any kind of intervention that is applied independently to a particular pretraining (e.g. BERT) checkpoint, including:\n",
+    "This notebook shows an example of a paired multi-bootstrap analysis. This type of analysis is applicable for any kind of intervention that is applied independently to a particular pretraining (e.g. BERT) checkpoint, including:\n",
     "\n",
     "- Interventions such as intermediate task training or pruning which directly manipulate a pretraining checkpoint.\n",
     "- Changes to any fine-tuning or probing procedure which is applied after pretraining.\n",
@@ -21,7 +21,7 @@
     "\n",
     "## 2M vs. 1M pretraining steps\n",
     "\n",
-    "Here, we'll compare the MultiBERTs models run for 2M steps with those run for 1M steps. We'll use the five pretraining seeds (0,1,2,3,4) for which we have a dense set of checkpoints throughout training, such that we can treat the 2M runs as an \"intervention\" (training for additional time) over the 1M-step models and perform a paired analysis. From each pretraining checkpoint, we'll run fine-tuning 5 times for each of 4 learning rates, select the best learning rate (treating this as part of the optimization), and then run our multibootstrap procedure.\n",
+    "Here, we'll compare the MultiBERTs models run for 2M steps with those run for 1M steps, as described in **Appendix E.1** of [the paper](https://openreview.net/pdf?id=K0E_F0gFDgA). We'll use the five pretraining seeds (0,1,2,3,4) for which we have a dense set of checkpoints throughout training, such that we can treat the 2M runs as an \"intervention\" (training for additional time) over the 1M-step models and perform a paired analysis. From each pretraining checkpoint, we'll run fine-tuning 5 times for each of 4 learning rates, select the best learning rate (treating this as part of the optimization), and then run our multibootstrap procedure.\n",
     "\n",
     "We'll use MultiNLI for this example, but the code below can easily be modified to run on other tasks."
    ]
@@ -1723,7 +1723,7 @@
    "toc_visible": true
   },
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -1737,7 +1737,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.9"
+   "version": "3.9.12"
   }
  },
  "nbformat": 4,

diff --git a/language/multiberts/README.md b/language/multiberts/README.md
@@ -7,9 +7,9 @@ Concretely, the release includes three components:
 
 * A set of 25 BERT-Base models (English, uncased), trained with the same hyper-parameters but different random seeds.
 * For the first five models, 28 checkpoints captured during the course of pre-training (140 checkpoints total).
-* A statistical library and a Colab notebook to demonstrate its use.
+* A statistical library (`multibootstrap.py`) and notebook examples to demonstrate its use.
 
-We describe the release in detail and present example analyses in the [MultiBERTs paper](https://arxiv.org/abs/2106.16163). All the data and checkpoints mentioned on this page and in the paper are available on [our Cloud Bucket](https://console.cloud.google.com/storage/browser/multiberts/public).
+We describe the release in detail and present example analyses in the [MultiBERTs paper](https://arxiv.org/abs/2106.16163), published at ICLR 2022. All the data and checkpoints mentioned on this page and in the paper are available on [our Cloud Bucket](https://console.cloud.google.com/storage/browser/multiberts/public).
 
 
 
@@ -94,26 +94,33 @@ Note: the archives are large (>10 GB). You may download the checkpoints selectiv
 
 ## Statistical Library
 
-The script [`bootstrap.py`](https://github.com/google-research/language/blob/master/language/multiberts/multibootstrap.py) is our implementation of the Multi-Bootstrap, a non-parametric procedure to help researchers estimate significance and report confidence intervals in MultiBERTs experiments.
-Additional details are provided in our [demo Colab](https://github.com/google-research/language/blob/master/language/multiberts/multi_vs_original.ipynb) and the [MultiBERTs paper](https://arxiv.org/pdf/2106.16163).
+[`multibootstrap.py`](https://github.com/google-research/language/blob/master/language/multiberts/multibootstrap.py) is our implementation of the Multi-Bootstrap, a non-parametric procedure to help researchers estimate significance and report confidence intervals when working with multiple pretraining seeds.
 
+Additional details are provided in the [MultiBERTs paper](https://arxiv.org/pdf/2106.16163). The following notebooks also demonstrate example usage, and will reproduce the results from the paper:
+
+- [`coref.ipynb`](https://github.com/google-research/language/blob/master/language/multiberts/coref.ipynb) - Winogender coreference example from Section 4 and Appendix D of the paper; includes both paired and unpaired examples.
+- [`2m_vs_1m.ipynb`](https://github.com/google-research/language/blob/master/language/multiberts/2m_vs_1m.ipynb.ipynb) - Paired analysis from Appendix E.1 of the paper, comparing 2M vs 1M steps of pretraining.
+- [`multi_vs_original.ipynb`](https://github.com/google-research/language/blob/master/language/multiberts/multi_vs_original.ipynb.ipynb) - Unpaired analysis from Appendix E.2 of the paper, comparing MultiBERTs to the original BERT release.
 
 
 ## How to cite
 
 ```
-@article{sellam2021multiberts,
-  title={The MultiBERTs: BERT Reproductions for Robustness Analysis},
-  author={Thibault Sellam and Steve Yadlowsky and Jason Wei and Naomi Saphra and Alexander D'Amour and Tal Linzen and Jasmijn Bastings and Iulia Turc and Jacob Eisenstein and Dipanjan Das and Ian Tenney and Ellie Pavlick},
-  journal={arXiv preprint arXiv:2106.16163},
-  year={2021}
+@inproceedings{sellam2022multiberts,
+  title={The Multi{BERT}s: {BERT} Reproductions for Robustness Analysis},
+  author={Thibault Sellam and Steve Yadlowsky and Ian Tenney and Jason Wei and Naomi Saphra and Alexander D'Amour and Tal Linzen and Jasmijn Bastings and Iulia Raluca Turc and Jacob Eisenstein and Dipanjan Das and Ellie Pavlick},
+  booktitle={International Conference on Learning Representations},
+  year={2022},
+  url={https://openreview.net/forum?id=K0E_F0gFDgA}
 }
 ```
 
 ## Contact information
 
 If you have a technical question regarding the dataset, code, or publication, please send us an email (see paper).
 
+
 ## Disclaimer
+
 This is not an official Google product.