Extremely hacky implementation of Mixtral 8x7B

New: API access

Try a much faster implementation of this model at https://app.fireworks.ai/

What is it?

Mistral dropped the new MoE model this morning: https://twitter.com/MistralAI/status/1733150512395038967

This is an attempt to hack the original Llama codebase to load it. The implementation is very naive and slow.

You need 2 x 80Gb or 4 x 40Gb cards to load it.

Implementation:

remove model parallelism for simplicity
shard experts in the specified number of GPUs
reverse engineer MoE implementation based on https://arxiv.org/pdf/2211.15841.pdf (seems vanilla MoE)
ambiguity on the ordering of top_k and softmax in MoE. See https://twitter.com/dzhulgakov/status/1733330954348085439 for the discussion.

WARNING: There's no official reference model code. This implementation might be wrong. At least the generation looks coherent which is a good sign :)

Usage

Download the weights for Mixtral from HF or Torrent. HF is the easiest: https://huggingface.co/someone13574/mixtral-8x7b-32kseqlen/tree/main . Make sure you consolidate the weights

Run with 2 GPUs (~45GB required in each):

python example_text_completion.py path/to/mixtral/ path/to/mixtral/tokenizer.model

To run with 4 GPUs pass --num-gpus 4.

Edit prompt in the example if needed.

Sample output

Mistral hallucinates about Mistral:

Mistral.ai is a company that
> provides a platform for building, training, and deploying AI models.

The platform offers a variety of tools and services that can help developers and data scientists build and train AI models.

Some of the key features of Mistral.ai's platform include:

- A drag-and-drop

==================================

Simply put, the theory of relativity states that
> 1) the laws of physics are the same for all observers in uniform motion relative to one another, and 2) the speed of light in a vacuum is the same for all observers, regardless of their relative motion or of the motion of the light source.

The first postulate, the principle of

==================================

A brief message congratulating the team on the launch:

        Hi everyone,

        I just
> wanted to say a big congratulations on the launch of your new website.

        I think it looks fantastic and I am sure it will be a great success.

        Well done everyone and keep up the good work.

        Best wishes,

        XXXX

==================================

Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        cheese =>
> fromage
        teddy bear => ourson en peluche
        polar bear => ours polaire
        cuddly panda => panda câlin
        fluffy sheep => mouton fluffy
        furry kitten => chaton poilu
        fuzzy

==================================

Name	Name	Last commit message	Last commit date
Latest commit dzhulgakov Update README.md Dec 9, 2023 cecee44 · Dec 9, 2023 History 127 Commits
.github/ISSUE_TEMPLATE	.github/ISSUE_TEMPLATE	fix faq link	Nov 8, 2023
llama	llama	Swap topk and softmax in MoE	Dec 9, 2023
.gitignore	.gitignore	Initial commit	Feb 24, 2023
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Initial commit	Feb 24, 2023
CONTRIBUTING.md	CONTRIBUTING.md	llama 2	Jul 18, 2023
LICENSE	LICENSE	Update LICENSE	Jul 21, 2023
MODEL_CARD.md	MODEL_CARD.md	change "Content Length" to "Context Length MODEL_CARD.md	Oct 15, 2023
README.md	README.md	Update README.md	Dec 9, 2023
Responsible-Use-Guide.pdf	Responsible-Use-Guide.pdf	llama 2	Jul 18, 2023
UPDATES.md	UPDATES.md	Update UPDATES.md	Aug 11, 2023
USE_POLICY.md	USE_POLICY.md	llama 2	Jul 18, 2023
download.sh	download.sh	Add "--continue" flag to wget for model binary in order to resume dow…	Sep 23, 2023
example_chat_completion.py	example_chat_completion.py	Make number of GPUs configurable	Dec 8, 2023
example_text_completion.py	example_text_completion.py	Make number of GPUs configurable	Dec 8, 2023
requirements.txt	requirements.txt	llama 2	Jul 18, 2023
setup.py	setup.py	llama 2	Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extremely hacky implementation of Mixtral 8x7B

New: API access

What is it?

Usage

Sample output

About

Releases

Packages

Languages

License

dzhulgakov/llama-mistral

Folders and files

Latest commit

History

Repository files navigation

Extremely hacky implementation of Mixtral 8x7B

New: API access

What is it?

Usage

Sample output

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages