Nexa SDK

On-device Model Hub / Nexa SDK Documentation

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for function calling and streaming support, and a user-friendly Streamlit UI. Users can run Nexa SDK in any device with Python environment, and GPU acceleration is supported.

Latest News 🔥

[2024/09] Nexa now has executables for easy installation: Install Nexa SDK
[2024/09] Added support for Llama 3.2 models: nexa run llama3.2
[2024/09] Added support for Qwen2.5, Qwen2.5-coder and Qwen2.5-Math models: nexa run qwen2.5
[2024/09] Now supporting pulling and running GGUF models from Hugging Face: nexa run -hf <hf model id>
[2024/09] Added support for ROCm
[2024/09] Added support for Phi-3.5 models: nexa run phi3.5
[2024/09] Added support for OpenELM models: nexa run openelm
[2024/09] Introduced logits API support for more advanced model interactions
[2024/09] Added support for Flux models: nexa run flux
[2024/09] Added support for Stable Diffusion 3 model: nexa run sd3
[2024/09] Added support for Stable Diffusion 2.1 model: nexa run sd2-1

Welcome to submit your requests through issues, we ship weekly.

Features

Model Support:
- ONNX & GGML models
- Conversion Engine
- Inference Engine:
  - Text Generation
  - Image Generation
  - Vision-Language Models (VLM)
  - Text-to-Speech (TTS)

Detailed API documentation is available here.

Server:
- OpenAI-compatible API
- JSON schema mode for function calling
- Streaming support
Streamlit UI for interactive model deployment and testing

Below is our differentiation from other similar tools:

Feature	Nexa SDK	ollama	Optimum	LM Studio
GGML Support	✅	✅	❌	✅
ONNX Support	✅	❌	✅	❌
Text Generation	✅	✅	✅	✅
Image Generation	✅	❌	❌	❌
Vision-Language Models	✅	✅	✅	✅
Text-to-Speech	✅	❌	✅	❌
Server Capability	✅	✅	✅	✅
User Interface	✅	❌	❌	✅

Installation

macOS

Download

Linux

curl -fsSL https://public-storage.nexa4ai.com/install.sh | sh

Windows

Coming soon. Install with Python package below 👇

Python Package

We have released pre-built wheels for various Python versions, platforms, and backends for convenient installation on our index page.

Note

If you want to use ONNX model, just replace pip install nexaai with pip install "nexaai[onnx]" in provided commands.
For Chinese developers, we recommend you to use Tsinghua Open Source Mirror as extra index url, just replace --extra-index-url https://pypi.org/simple with --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple in provided commands.

CPU

pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cpu --extra-index-url https://pypi.org/simple --no-cache-dir

GPU (Metal)

For the GPU version supporting Metal (macOS):

CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir

FAQ: cannot use Metal/GPU on M1

Try the following command:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
conda create -n nexasdk python=3.10
conda activate nexasdk
CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir

GPU (CUDA)

For Linux:

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows PowerShell:

$env:CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"; pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows Command Prompt:

set CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" & pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows Git Bash:

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

FAQ: Building Issues for llava

If you encounter the following issue while building:

try the following command:

CMAKE_ARGS="-DCMAKE_CXX_FLAGS=-fopenmp" pip install nexaai

GPU (ROCm)

For Linux:

CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/rocm621 --extra-index-url https://pypi.org/simple --no-cache-dir

Local Build

How to clone this repo

git clone --recursive https://github.com/NexaAI/nexa-sdk

If you forget to use --recursive, you can use below command to add submodule

git submodule update --init --recursive

Then you can build and install the package

pip install -e .

Supported Models

Model	Type	Format	Command
octopus-v2	NLP	GGUF	`nexa run octopus-v2`
octopus-v4	NLP	GGUF	`nexa run octopus-v4`
gpt2	NLP	GGUF	`nexa run gpt2`
tinyllama	NLP	GGUF	`nexa run tinyllama`
llama2	NLP	GGUF/ONNX	`nexa run llama2`
llama2-uncensored	NLP	GGUF	`nexa run llama2-uncensored`
llama2-function-calling	NLP	GGUF	`nexa run llama2-function-calling`
llama3	NLP	GGUF/ONNX	`nexa run llama3`
llama3.1	NLP	GGUF/ONNX	`nexa run llama3.1`
llama3.2	NLP	GGUF	`nexa run llama3.2`
llama3-uncensored	NLP	GGUF	`nexa run llama3-uncensored`
gemma	NLP	GGUF/ONNX	`nexa run gemma`
gemma2	NLP	GGUF	`nexa run gemma2`
qwen1.5	NLP	GGUF	`nexa run qwen1.5`
qwen2	NLP	GGUF/ONNX	`nexa run qwen2`
qwen2.5	NLP	GGUF	`nexa run qwen2.5`
mathqwen	NLP	GGUF	`nexa run mathqwen`
codeqwen	NLP	GGUF	`nexa run codeqwen`
mistral	NLP	GGUF/ONNX	`nexa run mistral`
dolphin-mistral	NLP	GGUF	`nexa run dolphin-mistral`
codegemma	NLP	GGUF	`nexa run codegemma`
codellama	NLP	GGUF	`nexa run codellama`
deepseek-coder	NLP	GGUF	`nexa run deepseek-coder`
phi2	NLP	GGUF	`nexa run phi2`
phi3	NLP	GGUF/ONNX	`nexa run phi3`
phi3.5	NLP	GGUF	`nexa run phi3.5`
openelm	NLP	GGUF	`nexa run openelm`
nanollava	Multimodal	GGUF	`nexa run nanollava`
llava-phi3	Multimodal	GGUF	`nexa run llava-phi3`
llava-llama3	Multimodal	GGUF	`nexa run llava-llama3`
llava1.6-mistral	Multimodal	GGUF	`nexa run llava1.6-mistral`
llava1.6-vicuna	Multimodal	GGUF	`nexa run llava1.6-vicuna`
stable-diffusion-v1-4	Computer Vision	GGUF	`nexa run sd1-4`
stable-diffusion-v1-5	Computer Vision	GGUF/ONNX	`nexa run sd1-5`
stable-diffusion-v2-1	Computer Vision	GGUF	`nexa run sd2-1`
stable-diffusion-3-medium	Computer Vision	GGUF	`nexa run sd3`
FLUX.1-schnell	Computer Vision	GGUF	`nexa run flux`
lcm-dreamshaper	Computer Vision	GGUF/ONNX	`nexa run lcm-dreamshaper`
hassaku-lcm	Computer Vision	GGUF	`nexa run hassaku-lcm`
anything-lcm	Computer Vision	GGUF	`nexa run anything-lcm`
faster-whisper-tiny	Audio	BIN	`nexa run faster-whisper-tiny`
faster-whisper-small	Audio	BIN	`nexa run faster-whisper-small`
faster-whisper-medium	Audio	BIN	`nexa run faster-whisper-medium`
faster-whisper-base	Audio	BIN	`nexa run faster-whisper-base`
faster-whisper-large	Audio	BIN	`nexa run faster-whisper-large`
whisper-tiny.en	Audio	ONNX	`nexa run whisper-tiny.en`
whisper-tiny	Audio	ONNX	`nexa run whisper-tiny`
whisper-small.en	Audio	ONNX	`nexa run whisper-small.en`
whisper-small	Audio	ONNX	`nexa run whisper-small`
whisper-base.en	Audio	ONNX	`nexa run whisper-base.en`
whisper-base	Audio	ONNX	`nexa run whisper-base`

CLI Reference

Here's a brief overview of the main CLI commands:

nexa run: Run inference for various tasks using GGUF models.
nexa onnx: Run inference for various tasks using ONNX models.
nexa server: Run the Nexa AI Text Generation Service.
nexa pull: Pull a model from official or hub.
nexa remove: Remove a model from local machine.
nexa clean: Clean up all model files.
nexa list: List all models in the local machine.
nexa login: Login to Nexa API.
nexa whoami: Show current user information.
nexa logout: Logout from Nexa API.

For detailed information on CLI commands and usage, please refer to the CLI Reference document.

Start Local Server

To start a local server using models on your local computer, you can use the nexa server command. For detailed information on server setup, API endpoints, and usage examples, please refer to the Server Reference document.

Acknowledgements

We would like to thank the following projects:

Name		Name	Last commit message	Last commit date
Latest commit History 457 Commits
.github		.github
assets		assets
dependency		dependency
docs		docs
examples		examples
nexa		nexa
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CLI.md		CLI.md
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SERVER.md		SERVER.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nexa SDK

Latest News 🔥

Features

Installation

macOS

Linux

Windows

Python Package

CPU

GPU (Metal)

GPU (CUDA)

GPU (ROCm)

Local Build

Supported Models

CLI Reference

Start Local Server

Acknowledgements

About

Releases

Packages

Languages

License

coriskr/nexa-sdk

Folders and files

Latest commit

History

Repository files navigation

Nexa SDK

Latest News 🔥

Features

Installation

macOS

Linux

Windows

Python Package

CPU

GPU (Metal)

GPU (CUDA)

GPU (ROCm)

Local Build

Supported Models

CLI Reference

Start Local Server

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages