Skip to content

LLM applications running on Apple Silicon thanks to mlx from Apple

License

Notifications You must be signed in to change notification settings

jimlloyd/mlx-llm

 
 

Repository files navigation

mlx-llm

LLM applications running on Apple Silicon in real-time thanks to Apple MLX framework.

Alt Text

Here's also a Youtube Video.

How to install 🔨

git clone https://github.com/riccardomusmeci/mlx-llm
cd mlx-llm
pip install .

Models 🧠

Go check models for a summary of available models.

To create a model with weights:

from mlx_llm.model import create_model

# loading weights from HuggingFace
model = create_model("TinyLlama-1.1B-Chat-v0.6")

# loading weights from local file
model = create_model("TinyLlama-1.1B-Chat-v0.6", weights="path/to/weights.npz")

To list all available models:

from mlx_llm.model import list_models

print(list_models())

Benchmarks 📊

You can run benchmarks with mlx-llm to compare mlx versions, models, and devices:

from mlx_llm.bench import Benchmark

benchmark = Benchmark(
    apple_silicon="m1_pro_32GB",
    model_name="TinyLlama-1.1B-Chat-v0.6",
    prompt="What is the meaning of life?",
    max_tokens=100,
    temperature=0.1,
    verbose=False
)

benchmark.start()
# just the output dir, the file name will be benchmark.csv
benchmark.save("results") # if benchmark.csv is already there, it will append the new results

Warning

Download first the model weights before running the benchmark (just use create_model and then run the test).

Go to benchmark.csv to check my experiments.

If you want to run benchmarks for all available LLMs:

cd scripts
./run_benchmarks.sh

Warning

The test will take a while since it will download all the models if not already present. Also, once test for a model is done, all the 🤗 hub cache will be deleted.

Note

Run the benchmarks on your Apple Silicon device and then PR-me the results. I will be happy to add them to the benchmark.csv file.

Model Embeddings ✴️

Models in mlx-llm are able to extract embeddings from a given text.

import mlx.core as mx
from mlx_llm.model import create_model
from transformers import AutoTokenizer

model = create_model("e5-mistral-7b-instruct")
tokenizer = AutoTokenizer.from_pretrained('intfloat/e5-mistral-7b-instruct')
text = ["I like to play basketball", "I like to play tennis"]
tokens = tokenizer(text)
x = mx.array(tokens["input_ids"])
embeds = model.embed(x)

For a better example go check 🤗 e5-mistral-7b-instruct page.

Applications 📁

With mlx-llm you can run a variety of applications, such as:

  • Chat with an LLM
  • Retrieval Augmented Generation (RAG) running locally

Below an example of how to chat with an LLM, but for more details go check the examples documentation.

Chat with LLM 📱

mlx-llm comes with tools to easily run your LLM chat on Apple Silicon.

You can chat with an LLM by specifying a personality and some examples of user-model interaction (this is mandatory to have a good chat experience):

from mlx_llm.playground.chat import ChatLLM

personality = "You're a salesman and beet farmer know as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."

# examples must be structured as below
examples = [
    {
        "user": "What is your name?",
        "model": "Dwight K Schrute",
    },
    {
        "user": "What is your job?",
        "model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager."
    }
]

chat_llm = ChatLLM.build(
    model_name="LLaMA-2-7B-chat",
    tokenizer="mlx-community/Llama-2-7b-chat-mlx", # HF tokenizer or a local path to a tokenizer
    personality=personality,
    examples=examples,
)

chat_llm.run(max_tokens=500, temp=0.1)

ToDos

[ ] Chat and RAG with streamlit???

[ ] Test with quantized models

[ ] LoRA and QLoRA

📧 Contact

If you have any questions, please email [email protected]

About

LLM applications running on Apple Silicon thanks to mlx from Apple

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.5%
  • Other 1.5%