Stars
Advanced Quantization Algorithm for LLMs/VLMs.
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Accessible large language models via k-bit quantization for PyTorch.
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Examples for using ONNX Runtime for machine learning inferencing.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Common utilities for ONNX converters
ONNXMLTools enables conversion of models to ONNX
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2
Open standard for machine learning interoperability
Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Inference of quantization aware trained networks using TensorRT
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Reference implementations of MLPerf™ inference benchmarks
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Pre-trained Deep Learning models and demos (high quality and extremely fast)