efficient-inference

Here are 70 public repositories matching this topic...

huawei-noah / Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

tensorflow pytorch transformer imagenet convolutional-neural-networks pretrained-models model-compression efficient-inference ghostnet vision-transformer

Updated Mar 15, 2025
Python

SqueezeAILab / LLMCompiler

Star

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

nlp natural-language-processing transformer llama efficient-inference large-language-models llm llms llm-agent function-calling llama2 llm-framework llm-agents parallel-function-call

Updated Jul 10, 2024
Python

snap-research / EfficientFormer

Star

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]

deep-learning detection transformers pytorch transformer imagenet semantic-segmentation mobile-devices efficient-inference efficient-neural-networks

Updated Aug 13, 2023
Python

huawei-noah / AdderNet

Star

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

pytorch imagenet convolutional-neural-networks efficient-inference cvpr2020

Updated Mar 19, 2022
Python

horseee / DeepCache

Star

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

model-compression efficient-inference diffusion-models stable-diffusion training-free

Updated Jun 27, 2024
Python

SqueezeAILab / SqueezeLLM

Star

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

natural-language-processing text-generation transformer llama quantization model-compression efficient-inference post-training-quantization large-language-models llm small-models localllm

Updated Aug 13, 2024
Python

VITA-Group / LightGaussian

Star

[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang

3d-reconstruction efficient-inference gaussian-splatting nurips neurips-2024

Updated Dec 30, 2024
Python

liuzhuang13 / slimming

Star

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

deep-learning convolutional-neural-networks efficient-inference

Updated Jul 14, 2019
Lua

Zhen-Dong / Awesome-Quantization-Papers

Star

List of papers related to neural network quantization in recent AI conferences and journals.

neural-networks awesome-list papers quantization model-compression edge-computing efficient-inference diffusion-models large-language-models

Updated Dec 16, 2024

SqueezeAILab / KVQuant

Star

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

natural-language-processing compression text-generation transformer llama quantization mistral model-compression efficient-inference efficient-model large-language-models llm small-models localllm localllama

Updated Aug 13, 2024
Python

lucidrains / speculative-decoding

Star

Explorations into some recent techniques surrounding speculative decoding

deep-learning transformers artificial-intelligence efficient-inference

Updated Dec 22, 2024
Python

SYSU-SAIL / SMSR

Star

[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference

sparsity super-resolution efficient-inference

Updated Oct 18, 2021
Python

changlin31 / DS-Net

Star

(CVPR 2021, Oral) Dynamic Slimmable Network

pruning model-compression efficient-inference dynamic-networks network-pruning dynamic-pruning

Updated Dec 31, 2021
Python

Picovoice / picollm

Star

On-device LLM Inference Powered by X-Bit Quantization

natural-language-processing compression self-hosted llama language-models quantization language-model gemma mistral model-compression efficient-inference llm llms generative-ai large-language-model llm-inference llama2 mixtral llama3

Updated Mar 17, 2025
Python

xindongzhang / ELAN

Star

[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution

transformer super-resolution efficient-inference

Updated Jul 20, 2022
Python

liuziwei7 / mobile-id

Star

Deep Face Model Compression

computer-vision deep-learning face-recognition model-compression efficient-inference

Updated Aug 21, 2018
MATLAB

czg1225 / AsyncDiff

Star

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

distributed-computing text-to-image efficient-inference diffusion-models text-to-video inference-acceleration stable-diffusion training-free

Updated Feb 22, 2025
Python

xuyang-liu16 / Awesome-Generation-Acceleration

Star

📚 Collection of awesome generation acceleration resources.

image-generation text-to-image efficient-inference video-generation model-acceleration diffusion-models text-to-video efficient-deep-learning

Updated Mar 24, 2025

cure-lab / DeciWatch

Star

[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

deep-learning efficiency pytorch human-pose-estimation pose-estimation eccv efficient-inference 2d-human-pose 3d-pose-estimation efficient-neural-networks body-reconstruction eccv2022 3d-body-recovery

Updated Jul 19, 2022
Python

horseee / learning-to-cache

Star

[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

efficient-inference diffusion-models

Updated Jul 15, 2024
Python

Improve this page

Add a description, image, and links to the efficient-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the efficient-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

efficient-inference

Here are 70 public repositories matching this topic...

huawei-noah / Efficient-AI-Backbones

SqueezeAILab / LLMCompiler

snap-research / EfficientFormer

huawei-noah / AdderNet

horseee / DeepCache

SqueezeAILab / SqueezeLLM

VITA-Group / LightGaussian

liuzhuang13 / slimming

Zhen-Dong / Awesome-Quantization-Papers

SqueezeAILab / KVQuant

lucidrains / speculative-decoding

SYSU-SAIL / SMSR

changlin31 / DS-Net

Picovoice / picollm

xindongzhang / ELAN

liuziwei7 / mobile-id

czg1225 / AsyncDiff

xuyang-liu16 / Awesome-Generation-Acceleration

cure-lab / DeciWatch

horseee / learning-to-cache

Improve this page

Add this topic to your repo