Skip to content

Commit 70dba50

Browse files
authored
Fix typos (huggingface#191)
1 parent 94ecae3 commit 70dba50

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

codeparrot.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ The first thing we need is a large training dataset. With the goal to train a Py
3636

3737
You can learn more about our findings in [this Twitter thread](https://twitter.com/lvwerra/status/1458470994146996225). We removed the duplicates and applied the same cleaning heuristics found in the [Codex paper](https://arxiv.org/abs/2107.03374). Codex is the model behind CoPilot and is a GPT-3 model fine-tuned on GitHub code.
3838

39-
The cleaned dataset is still 50GB big and available on the Hugging Face Hub: [codeparrot-clean](http://hf.co/datasets/lvwerra/codeparrot-clean). With that we can setup a new tokenizer and train a model model.
39+
The cleaned dataset is still 50GB big and available on the Hugging Face Hub: [codeparrot-clean](http://hf.co/datasets/lvwerra/codeparrot-clean). With that we can setup a new tokenizer and train a model.
4040

4141
## Initializing the Tokenizer and Model
4242

0 commit comments

Comments
 (0)