diff --git a/README.md b/README.md index b28877c..26fc347 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,18 @@ # makemore -makemore is the most accessible way of tinkering with a GPT. +makemore takes one text file as input, where each line is assumed to be one training thing, and generates more things like it. Under the hood, it is an autoregressive character-level language model, with a wide choice of models from bigrams all the way to a Transformer (exactly as seen in GPT). For example, we can feed it a database of names, and makemore will generate cool baby name ideas that all sound name-like, but are not already existing names. Or if we feed it a database of company names then we can generate new ideas for a name of a company. Or we can just feed it valid scrabble words and generate english-like babble. -The one-file script `makemore.py` takes one text file as input, where each line is assumed to be one training thing, and generates more things like it. For example, we can feed it a database of names, and then use it to generate new cool baby name ideas that sound name-like, but are not already existing names. Or if we feed it a database of company names then we can generate new ideas for a name of a company. Or we can just feed it valid scrabble words and generate english-like babble. +This is not meant to be too heavyweight library with a billion switches and knobs. It is one hackable file, and is mostly intended for educational purposes. [PyTorch](https://pytorch.org) is the only requirement. -Under the hood, the script trains a (character-level) Transformer, identical to the one that powers [GPT and friends](). +Current language model neural nets implemented: -This is not meant to be a heavyweight library with switches and knobs. It's one hackable file of ~500 lines of code. [PyTorch](https://pytorch.org) is the only requirement. Go nuts. +- Bigram (one character simply predicts a next one with a lookup table of counts) +- Bag of Words +- MLP, along the lines of [Bengio et al. 2003](https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) +- RNN, along the lines of [Sutskever et al. 2011](https://icml.cc/2011/papers/524_icmlpaper.pdf) +- GRU, following [Kyunghyun Cho et al. 2014](https://arxiv.org/abs/1409.1259) +- Transformer, following [Vaswani et al. 2017](https://arxiv.org/abs/1706.03762) ### Usage @@ -29,13 +34,13 @@ Let's point the script at it: $ python makemore.py -i names.txt -o names ``` -Training progress and logs and model will all be saved to the working directory `names`. The default model is a super tiny 200K param transformer; Many more training configurations are available - see the argparse and read the code. Training does not require any special hardware, it runs on my Macbook Air and will run on anything else, but if you have a GPU then training will fly. As training progresses the script will print some samples throughout. However, if you'd like to sample manually, you can use the `--sample-only` flag, e.g. in a separate terminal do: +Training progress and logs and model will all be saved to the working directory `names`. The default model is a super tiny 200K param transformer; Many more training configurations are available - see the argparse and read the code. Training does not require any special hardware, it runs on my Macbook Air and will run on anything else, but if you have a GPU then training will fly faster. As training progresses the script will print some samples throughout. However, if you'd like to sample manually, you can use the `--sample-only` flag, e.g. in a separate terminal do: ```bash $ python makemore.py -i names.txt -o names --sample-only ``` -This will load the best model so far and print more samples on demand. Here are some unique baby names that get eventually generated from current default settings (test logprob of ~1.92): +This will load the best model so far and print more samples on demand. Here are some unique baby names that get eventually generated from current default settings (test logprob of ~1.92, though much lower logprobs are achievable with some hyperparameter tuning): ``` dontell