Name		Name	Last commit message	Last commit date
parent directory ..
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

README.md

Neural chat example with WasmEdge WASI-NN Neural Speed plugin

This example demonstrates how to use WasmEdge WASI-NN Neural Speed plugin to perform an inference task with Neural chat model.

Install WasmeEdge with WASI-NN Neural Speed plugin

The Neural Speed backend relies on Neural Speed, we recommend the following commands to install Neural Speed.

sudo apt update
sudo apt upgrade
sudo apt install python3-dev
wget https://raw.githubusercontent.com/intel/neural-speed/main/requirements.txt
pip install -r requirements.txt
pip install neural-speed

Then build and install WasmEdge from source:

cd <path/to/your/wasmedge/source/folder>

cmake -GNinja -Bbuild -DCMAKE_BUILD_TYPE=Release -DWASMEDGE_PLUGIN_WASI_NN_BACKEND="neuralspeed"
cmake --build build

# For the WASI-NN plugin, you should install this project.
cmake --install build

Then you will have an executable wasmedge runtime under /usr/local/bin and the WASI-NN with Neural Speed backend plug-in under /usr/local/lib/wasmedge/libwasmedgePluginWasiNN.so after installation.

Model Download Link

In this example, we will use neural-chat-7b-v3-1.Q4_0 model in GGUF format.

# Download model weight
wget https://huggingface.co/TheBloke/neural-chat-7B-v3-1-GGUF/resolve/main/neural-chat-7b-v3-1.Q4_0.gguf
# Download tokenizer
wget https://huggingface.co/Intel/neural-chat-7b-v3-1/raw/main/tokenizer.json -O neural-chat-tokenizer.json

Build wasm

Run the following command to build wasm, the output WASM file will be at target/wasm32-wasi/release/

cargo build --target wasm32-wasi --release

Execute

Execute the WASM with the wasmedge using nn-preload to load model.

wasmedge --dir .:. \
  --nn-preload default:NeuralSpeed:AUTO:neural-chat-7b-v3-1.Q4_0.gguf \
  ./target/wasm32-wasi/release/wasmedge-neural-speed.wasm default

Other

You can change tokenizer_path to your tokenizer path.

let tokenizer_name = "neural-chat-tokenizer.json";

Prompt is the default model input.

let prompt = "Once upon a time, there existed a little girl,";

If your model type not llama, you can set model_type parameter to load different model.

let graph = GraphBuilder::new(GraphEncoding::NeuralSpeed, ExecutionTarget::AUTO)
        .config(serde_json::to_string(&json!({"model_type": "mistral"}))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasmedge-neuralspeed

wasmedge-neuralspeed

README.md

Neural chat example with WasmEdge WASI-NN Neural Speed plugin

Install WasmeEdge with WASI-NN Neural Speed plugin

Model Download Link

Build wasm

Execute

Other

Files

wasmedge-neuralspeed

Directory actions

More options

Directory actions

More options

Latest commit

History

wasmedge-neuralspeed

Folders and files

parent directory

README.md

Neural chat example with WasmEdge WASI-NN Neural Speed plugin

Install WasmeEdge with WASI-NN Neural Speed plugin

Model Download Link

Build wasm

Execute

Other