Skip to content
View ggerganov's full-sized avatar

Sponsors

Organizations

@ggml-org

Block or report ggerganov

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🦙 ggml

18 repositories

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

C++ 2,966 335 Updated Jul 31, 2024

Falcon LLM ggml framework with CPU and GPU support

C 246 21 Updated Jan 22, 2024

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 6,517 534 Updated Feb 22, 2025

Suno AI's Bark model in C/C++ for fast text-to-speech generation

C++ 782 65 Updated Nov 16, 2024

Web browser version of StarCoder.cpp

C 43 1 Updated Jul 30, 2023

Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)

C++ 563 28 Updated Aug 8, 2023

Stable Diffusion and Flux in pure C/C++

C++ 3,849 345 Updated Feb 22, 2025

Python bindings for ggml

Python 137 11 Updated Sep 2, 2024
Python 3,369 145 Updated Feb 25, 2024

C++ implementation of Qwen-LM

C++ 579 53 Updated Dec 6, 2024

LLaVA server (llama.cpp).

C++ 177 11 Updated Oct 20, 2023

Inference Vision Transformer (ViT) in plain C/C++ with ggml

C++ 256 20 Updated Apr 11, 2024

Run GGML models with Kubernetes.

HCL 174 7 Updated Dec 17, 2023

llama and other large language models on iOS and MacOS offline using GGML library.

Swift 1,603 110 Updated Jan 27, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,112 423 Updated Feb 19, 2025

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

C++ 519 40 Updated Feb 21, 2025

GGML implementation of BERT model with Python bindings and quantization.

C++ 53 5 Updated Feb 19, 2024