-
Cloned-LMCache Public
Forked from LMCache/LMCacheUltra-Fast and Cheaper Long-Context LLM Inference
Python Apache License 2.0 UpdatedDec 20, 2024 -
RAGLAB Public
Forked from fate-ubw/RAGLABRAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
Python MIT License UpdatedNov 22, 2024 -
-
sarathi-serve Public
Forked from microsoft/sarathi-serveA low-latency & high-throughput serving engine for LLMs
Python Apache License 2.0 UpdatedJul 25, 2024 -
-
PipeRAG Public
Forked from amazon-science/piperagPipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)
Python Apache License 2.0 UpdatedJun 14, 2024 -
vidur-llm-inf-sim Public
Forked from microsoft/vidurA large-scale simulation framework for LLM inference
Python MIT License UpdatedJun 3, 2024 -
DRAMsim3 Public
Forked from umd-memsys/DRAMsim3DRAMsim3: a Cycle-accurate, Thermal-Capable DRAM Simulator
C++ MIT License UpdatedJun 3, 2024 -
SwiftTransformer Public
Forked from LLMServe/SwiftTransformerHigh performance Transformer implementation in C++.
C++ UpdatedApr 22, 2024 -
autofaiss Public
Forked from criteo/autofaissAutomatically create Faiss knn indices with the most optimal similarity search parameters.
Python Apache License 2.0 UpdatedMar 18, 2024 -
pytorch-gpt-fast Public
Forked from pytorch-labs/gpt-fastSimple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Python BSD 3-Clause "New" or "Revised" License UpdatedJan 30, 2024 -
powertcp-linux Public
Forked from inet-tub/powertcp-linuxA proof of concept implementation of PowerTCP within Linux Kernel
C MIT License UpdatedJan 22, 2024 -
astra-sim Public
Forked from astra-sim/astra-simASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
C++ MIT License UpdatedJan 3, 2024 -
-
DCVC Public
Forked from microsoft/DCVCDeep Contextual Video Compression
Python MIT License UpdatedDec 21, 2023 -
-
vllm-llama-pipeline-parallel Public
Forked from irasin/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
-
fastText Public
Forked from facebookresearch/fastTextLibrary for fast text representation and classification.
HTML MIT License UpdatedNov 17, 2023 -
-
-
intel-extension-for-transformers Public
Forked from intel/intel-extension-for-transformers⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
C++ Apache License 2.0 UpdatedNov 10, 2023 -
RETRO-pytorch Public
Forked from lucidrains/RETRO-pytorchImplementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
Python Apache License 2.0 UpdatedOct 30, 2023 -
masstree-beta Public
Forked from kohler/masstree-betaBeta release of Masstree.
C++ Other UpdatedOct 15, 2023 -
Retrieval-QA-Benchmark Public
Forked from myscale/Retrieval-QA-BenchmarkBenchmark baseline for retrieval qa applications
Python GNU General Public License v3.0 UpdatedSep 7, 2023 -
CXLMemSim Public
Forked from SlugLab/CXLMemSimA place to store the CXL simulator
C UpdatedSep 2, 2023 -
-
DiskANN Public
Forked from microsoft/DiskANNGraph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
C++ Other UpdatedAug 29, 2023 -
llama.cpp Public
Forked from ggml-org/llama.cppPort of Facebook's LLaMA model in C/C++
C MIT License UpdatedAug 24, 2023 -
faiss Public
Forked from facebookresearch/faissA library for efficient similarity search and clustering of dense vectors.
C++ MIT License UpdatedAug 22, 2023 -
big-ann-benchmarks Public
Forked from harsha-simhadri/big-ann-benchmarksFramework for evaluating ANNS algorithms on billion scale datasets.
Jupyter Notebook MIT License UpdatedAug 17, 2023