Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for FLAN-T5 #106

Open
jihan-yin opened this issue Nov 21, 2022 · 3 comments
Open

Support for FLAN-T5 #106

jihan-yin opened this issue Nov 21, 2022 · 3 comments

Comments

@jihan-yin
Copy link

I saw that T5 wasn't in the list of supported huggingface transformers models. Are there plans / ETA for when the T5 family would be added? FLAN-T5 is a very strong llm for zero/fewshot instruction prompting. I am currently building out a hacky implementation for hosting with deepspeed-inference, but having it natively supported in deepspeed-mii would be ideal.

@mrwyattii
Copy link
Contributor

mrwyattii commented Nov 21, 2022

We do support the T5 family with DeepSpeed-Inference with a custom injection policy (see this DeepSpeed unit test). However, we have not yet brought this support into MII. It's on our radar to add this in the future. We are also open to outside contributions if you would like to submit a PR!

@jeffra
Copy link
Contributor

jeffra commented Nov 21, 2022

Also keep an eye on this PR, it’s currently a work in progress for better T5 support: microsoft/DeepSpeed#2451

@mhillebrand
Copy link

Assuming that PR does get merged, would it also support Long T5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants