This is a fork of Auto-GPT (https://github.com/Significant-Gravitas/Auto-GPT) with support for local LLaMA models. At present, this is more of an experimental idea.
Install gpt-llama.cpp
gpt-llama.cpp
is an API wrapper around llama.cpp. It runs a local API server that simulates OpenAI's API GPT endpoints but uses local llama-based models to process requests.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake ..
cmake --build . --config Release
# install Python dependencies
python3 -m pip install -r requirements.txt
- Install and run
gpt-llama.cpp
locally
git clone https://github.com/keldenl/gpt-llama.cpp.git
cd gpt-llama.cpp
# install the required dependencies
npm install
# start the server
npm start
For more details, please refer to gpt-llama.cpp README
- LLaMA-7B-q4
- Vicuna-7B-q4
I have tried above models so far. You can download original LLaMA following the instructions here. Or you can download in other ways : meta-llama/llama#149
For Vicuna weights, you can add its delta to the original LLaMA weights to obtain the Vicuna weights, instructions here.
- Convert your downloaded LLaMa-7B weights to Hugging Face Transformers foramt using the following script (source):
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
- Get Vicuna-7B weights by applying the delta(detailed instructions)
python3 -m fastchat.model.apply_delta \
--base /path/to/llama-7b \
--target /output/path/to/vicuna-7b \
--delta path/to/vicuna-7b-delta-v1.1
- Quantize Vicuna-7B model to 4-bits using
llama.cpp
# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B Vicuna-7B tokenizer_checklist.chk tokenizer.model
# install Python dependencies
python3 -m pip install -r requirements.txt
# convert the 7B model to ggml FP16 format
python3 convert.py models/Vicuna-7B/
# quantize the model to 4-bits (using method 2 = q4_0)
./quantize ./models/Vicuna-7B/ggml-model-f16.bin ./models/Vicuna-7B/ggml-model-q4_0.bin 2
# run the inference
./main -m ./models/Vicuna-7B/ggml-model-q4_0.bin -n 128
Now, the local model is ready to go!
Install Auto-GPT
(Guide)
- Install this
Auto-GPT
based on the DGdev91's PR #2594.
git clone https://github.com/Neronjust2017/Auto-GPT-LOCAL
- Install requirement, and make a .env file
pip install -r requirements.txt
cp .env.template .env
- Edit .env file
OPENAI_API_BASE_URL=http://localhost:443/v1
# you can find proper value for different LLaMA model at https://huggingface.co/shalomma/llama-7b-embeddings#quantitative-analysis
EMBED_DIM=4096
OPENAI_API_KEY= ../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin
- Run Auto-GPT
# On Linux or Mac:
./run.sh start
# On Windows:
.\run.bat
# or with python
python -m autogpt