Skip to content
View yufenglee's full-sized avatar

Organizations

@microsoft

Block or report yufenglee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Microsoft Linear Algebra Subroutines

C++ 4 2 Updated Dec 6, 2024

Generative AI extensions for onnxruntime

C++ 568 140 Updated Dec 25, 2024

Source code examples from the Parallel Forall Blog

HTML 1,250 634 Updated Jul 23, 2024

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 15,161 2,979 Updated Jan 2, 2025

A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2

C++ 2,071 276 Updated Nov 11, 2024

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2,285 258 Updated Jan 2, 2025

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,393 516 Updated Dec 31, 2024

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,918 636 Updated Jan 1, 2025

Cross-platform, customizable ML solutions for live and streaming media.

C++ 28,116 5,210 Updated Dec 21, 2024

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 20,733 4,193 Updated Dec 26, 2024

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

C++ 8,878 1,685 Updated Jan 1, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 30,882 2,845 Updated Jan 1, 2025

Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators

C 1,533 220 Updated Aug 28, 2019

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,230 520 Updated Jan 1, 2025

Low-precision matrix multiplication

C++ 1,785 454 Updated Jan 29, 2024

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook 13,723 3,254 Updated Aug 12, 2024

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,689 449 Updated Oct 9, 2023
Cuda 21 13 Updated Jul 31, 2017

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 85,439 23,005 Updated Jan 2, 2025

Optimized primitives for collective multi-GPU communication

C++ 3,339 842 Updated Sep 17, 2024

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 10,997 2,146 Updated Dec 13, 2024

nGraph has moved to OpenVINO

C++ 1,350 221 Updated Oct 15, 2020

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,940 757 Updated Feb 8, 2024

A natural language modeling framework based on PyTorch

Python 6,336 802 Updated Oct 17, 2022

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,665 1,012 Updated Jan 2, 2025

TensorFlow code and pre-trained models for BERT

Python 38,474 9,631 Updated Jul 23, 2024

Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX

Jupyter Notebook 2,353 431 Updated Dec 25, 2024

An Open Source Machine Learning Framework for Everyone

C++ 187,048 74,384 Updated Jan 2, 2025

Code samples for my book "Neural Networks and Deep Learning"

Python 16,239 6,632 Updated Jun 2, 2024

Open standard for machine learning interoperability

Python 18,175 3,691 Updated Jan 1, 2025