forked from EricLBuehler/mistral.rs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement the Llama 3.2 vision models (EricLBuehler#796)
* Add the MLlama vision bits * Restructure * Typos * Add skeleton for text model, add text mlp * Add the self and cross attn text model parts * Add mllama model * Add most of the preprocessor * Add the rest of the processor and wire things up * Clean up a bit * Add an example * Rename * Loads now * Another batch of fixes * Vision model forward runs * Add back in the cache for cross attn * Inputs processor gives correct values * Fix the nans * Problem seems to be in vision encoder * Upcasting seems to do something * Problems confirmed to ONLY be in text model * Maybe remove some nans * Seems to work now!! * Confirmed working, remove the debuggers * Preapply the tanh * Rework the interactive mode * A bugfix * Another bugfix! * Add device mapping support * Add ISQ support for mllama * Add ISQ support * Add support for no images and multi images * Fix dim * Fix slice assign dim * Add examples and docs * Add a demo video * Update VLLAMA.md
- Loading branch information
1 parent
776c116
commit f33ac29
Showing
40 changed files
with
3,745 additions
and
360 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.