Skip to content
View mengniwang95's full-sized avatar

Block or report mengniwang95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Advanced Quantization Algorithm for LLMs/VLMs.

Python 363 29 Updated Jan 27, 2025

Model compression for ONNX

Python 82 9 Updated Nov 18, 2024

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Python 166 226 Updated Feb 1, 2025

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

LLVM 1,285 752 Updated Feb 3, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 6,584 652 Updated Jan 28, 2025

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Python 2,715 494 Updated Jan 31, 2025

Examples for using ONNX Runtime for machine learning inferencing.

C++ 1,277 348 Updated Jan 23, 2025

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,158 211 Updated Oct 8, 2024

Common utilities for ONNX converters

Python 257 67 Updated Dec 3, 2024

ONNXMLTools enables conversion of models to ONNX

Python 1,044 190 Updated Jan 8, 2025

Sandbox for training deep learning networks

Python 2,990 562 Updated Sep 6, 2024

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

Python 1,732 186 Updated Feb 2, 2025

Intel® Performance Counter Monitor (Intel® PCM)

C++ 2,893 480 Updated Jan 18, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 86,432 23,268 Updated Feb 3, 2025

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,703 1,022 Updated Feb 1, 2025

A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2

C++ 2,079 276 Updated Feb 3, 2025

Open standard for machine learning interoperability

Python 18,338 3,704 Updated Feb 2, 2025

Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs

Python 693 220 Updated Jan 30, 2025

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 15,455 3,022 Updated Feb 3, 2025

Inference of quantization aware trained networks using TensorRT

Python 80 20 Updated Jan 27, 2023

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,263 629 Updated Jan 31, 2025

Reference implementations of MLPerf™ inference benchmarks

Python 1,292 541 Updated Feb 3, 2025

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,311 262 Updated Jan 24, 2025

Pre-trained Deep Learning models and demos (high quality and extremely fast)

Python 4,145 1,376 Updated Jan 16, 2025