-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Gemma 3 (text) #1229
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Anything we can do to help this PR get merged? 🙂 |
I am waiting for this too |
Me as well |
The model works well in Node.js, but for some browsers, we were having some issues when running in-browser due to the large embedding layer. So, we're working on some optimizations so that it runs well on WebGPU. If anyone would like to build and test this PR locally, that would help a ton! |
I'm trying to run in in Chrome in MacOS but getting error: ERROR 3304823240 |
cc @guschmue. Maybe model builder will help fix that? Any updates on that? |
Since the model works correct in Node.js (and the only remaining issue is browser support due to the large embedding size), I'll merge this PR and update the weights eventually once microsoft/onnxruntime-genai#1329 is ready. Usage should not change once a newer export is created. |
Maybe for someone it will be helpful. I run q8 gemma3 in the browser. It is slow but worked ;) |
Great to hear! Is that on WebGPU or WASM? Does q4/q4f16 work for you? |
I run q8 few time with example code, but after 3 time it start generating random text. I try other dtypes and only this one work. But it was really slow it took ~4 mins to initialize it and generate something. |
|
Currently only the 1B model has been converted, but I'll make conversions for the rest soon!
Example usage:
See example output