This repo provide a tool to convert a Huggingface model into GGUF file. It follows the instruction at ggerganov/llama.cpp#2948
- Run
pip install -r requirements.text
- Clone
llama.ccp
repogit clone https://github.com/ggerganov/llama.cpp.git
- Install dependencies for llama.ccp:
pip install -r llama.cpp/requirements.txt
-
Run
python llama.cpp/convert-hf-to-gguf.py <model_path> \ --outfile <output_file>.gguf \ --outtype <quant_type {f16,f32}>
-
Example:
python llama.cpp/convert-hf-to-gguf.py huggingface_models/vince62s-phi2-psy \ --outfile vince62s-phi2-psy_q8.gguf \ --outtype f16
- In llama.cpp run
make
llama.cpp/quantize vince62s-phi2-psy.gguf .vince62s-phi2-psy-Q4_K_M.gguf Q4_K_M
llama.cpp/quantize vince62s-phi2-psy-Q4_K_M.gguf .vince62s-phi2-psy-Q4_K_M.gguf
- Example:
llama.cpp/main -m vince62s-phi2-psy_Q4_K_M.gguf -cml
- Create Modelfile with, e.g.
FROM ./vince62s-phi-2-psy_f16.gguf
- Create the model in ollama
ollama create phi-2_f16 -f Modelfile
- Run the model with ollama
ollama run phi2_f16
- You need to use a Hugging Face Write Token to push your files to Hugging Face
export HUGGING_FACE_HUB_TOKEN=<Your_HF_token>
- Update
upload_hf.py
with your model_id, etc. Then runpython upload_hf.py