Skip to content

Commit

Permalink
Update CHANGELOG
Browse files Browse the repository at this point in the history
  • Loading branch information
minimaxir committed Jan 3, 2021
1 parent 202e031 commit fdb6778
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,24 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [0.4.0] - TBD

- Made Fast tokenizers the default (as it is the default in `transformers` 4.0.0)
- Made serialized tokenizers the default for custom tokenizers, and added support for loading them for both `aitextgen` and `TokenDataset`s
- Added gradient checkpointing for GPT-2, and set it to the default for training 355M and 774M.
- Added layer freezing to freeze the first `n` layers of GPT-2 while training. This allows 1.5B GPT-2 to be trained with a high `n`.
- Added schema-based generation for specificed schema_tokens (which can be encoded in the Transformers config). This can be used with an appropriate dataset for schema-based generation.

## [0.3.0] - 2020-11-30

- Increased minimum versions of dependencies (`transformers` to 4.0.0, `pytorch-lightning` to 1.0.8, Pytorch to 1.6)
- Fixed imports to account for new Transfomers file architecture
- Fixed training to account for new transformer/pytorch-lightning minimums
- Fully removed TorchScript code (ONNX implementation will supercede it)
- Made prompt specification for generation more canonical with Transformers
- Set default `vocab` size for new tokenizers to `1000`
- Began work on serializing tokenizers in accordance to the new `tokenizers` approach

## [0.2.1] - 2020-06-28

### Added
Expand Down

0 comments on commit fdb6778

Please sign in to comment.