Update CHANGELOG

dbanys · Jan 3, 2021 · fdb6778 · fdb6778
1 parent 202e031
commit fdb6778
Showing 1 changed file with 18 additions and 0 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,24 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## [0.4.0] - TBD
+
+- Made Fast tokenizers the default (as it is the default in `transformers` 4.0.0)
+- Made serialized tokenizers the default for custom tokenizers, and added support for loading them for both `aitextgen` and `TokenDataset`s
+- Added gradient checkpointing for GPT-2, and set it to the default for training 355M and 774M.
+- Added layer freezing to freeze the first `n` layers of GPT-2 while training. This allows 1.5B GPT-2 to be trained with a high `n`.
+- Added schema-based generation for specificed schema_tokens (which can be encoded in the Transformers config). This can be used with an appropriate dataset for schema-based generation.
+
+## [0.3.0] - 2020-11-30
+
+- Increased minimum versions of dependencies (`transformers` to 4.0.0, `pytorch-lightning` to 1.0.8, Pytorch to 1.6)
+- Fixed imports to account for new Transfomers file architecture
+- Fixed training to account for new transformer/pytorch-lightning minimums
+- Fully removed TorchScript code (ONNX implementation will supercede it)
+- Made prompt specification for generation more canonical with Transformers
+- Set default `vocab` size for new tokenizers to `1000`
+- Began work on serializing tokenizers in accordance to the new `tokenizers` approach
+
 ## [0.2.1] - 2020-06-28
 
 ### Added