- London
Stars
Disaggregated serving system for Large Language Models (LLMs).
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
A high-throughput and memory-efficient inference and serving engine for LLMs
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A latent text-to-image diffusion model
CUDA integration for Python, plus shiny features
CUDA Python: Performance meets Productivity
CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups
A testing framework for Cobol applications
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
LLM training code for Databricks foundation models
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
Handy toolbelt to deal nicely with offline/online connectivity in a React Native app. Smooth redux integration
Source code for Twitter's Recommendation Algorithm
antimatter15 / alpaca.cpp
Forked from ggml-org/llama.cppLocally run an Instruction-Tuned Chat-Style LLM
oneAPI Threading Building Blocks (oneTBB)
marekpiotrow / UWrMaxSat
Forked from karpiu/kp-minisatpUWrMaxSat is a relatively new MiniSat+-based solver participating in MaxSAT Evaluation 2019, where it ranked second places in both main tracks (weighted and unweighted). In MaxSAT Evaluation 2020 …
A minimalistic and high-performance SAT solver