-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset #1
Comments
There should be a file named something like: |
I find it, thanks a lot! |
I have a small question. I found that the addition in the +_n_20_m_20_examples_20000000.txt file seems to be in reverse order? For example, the first data: 10063+787995583888172117=887536583888172117 It should be: 36,001+711,271,888,385,599,787=711,271,888,385,635,788 Is there any reason for this design? |
All the data and code is formatted as needed for the experiments shown in the paper. In Section 3 of the paper we detail that we use a least significant digit first (reversed) format for all numbers for addition. Please see generate_and_tokenize_data.sh and create_data_split.py for examples of how you might generate your own datasets if you require more specialised data. |
Hello, are the dataset links provided all tokenized data? Is there any original data?
The text was updated successfully, but these errors were encountered: