RoPE implementation with a shakespeare-char-rope test #590
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Without RoPE:
![20250124_23h05m01s_grim](https://private-user-images.githubusercontent.com/956205/406724343-5aaddead-4d9d-47b4-a15b-ef8de7c23807.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDg3MjQsIm5iZiI6MTczOTA0ODQyNCwicGF0aCI6Ii85NTYyMDUvNDA2NzI0MzQzLTVhYWRkZWFkLTRkOWQtNDdiNC1hMTViLWVmOGRlN2MyMzgwNy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQyMTAwMjRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iNzg0M2RmOTM0ODgyODY1ZTkwNDlkMDU0NTJhNjUyNmZkMzc5NDIzYWJhMmJlNDk1MjIzZDA5NzI1NzNiZjY4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.oIXJ8YsPgsyt0MMS8yL1O5JAgRSGLv60pQ7DBohUqso)
With RoPE:
![20250126_12h15m25s_grim](https://private-user-images.githubusercontent.com/956205/406726443-8ce55dde-7ed7-436d-95b8-a8aff456296b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDg3MjQsIm5iZiI6MTczOTA0ODQyNCwicGF0aCI6Ii85NTYyMDUvNDA2NzI2NDQzLThjZTU1ZGRlLTdlZDctNDM2ZC05NWI4LWE4YWZmNDU2Mjk2Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOFQyMTAwMjRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jNTM5N2M2ODMzMmExMDIwMmMzNzZhM2IwMThlN2Y0NjQ1ZWI4ZmE0YmQ3YjgxODk4NmNkOTk4NWE5MjNjMjAzJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.jNJlXzk6rud4chxv64-bOGdjkkwz2sX-lIsh1gkSh_Q)
Can still be used without RoPE normally. Everything should work as before. Only if in the config file you add use_rope flag, then it will use RoPE instead of the wpe matrix. rope_base is also a configurable value.
Tested on 4090, so has different mfu (because the calculation is based on A100).