KnowingNothing

Follow

🥰

ZCHNO KnowingNothing

🥰

Follow

Time will tell

307 followers · 30 following

ByteDance
Beijing

Achievements

Achievements

Highlights

Pro

Stars

facebookexperimental / triton

Github mirror of trition-lang/triton repo.

C++ 14 4 Updated Dec 21, 2024

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 87,996 8,380 Updated Dec 23, 2024

SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

Python 1,073 86 Updated May 15, 2024

TiledTensor / TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

C++ 169 10 Updated Nov 18, 2024

NVIDIA / nvidia-resiliency-ext

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 63 4 Updated Dec 19, 2024

tinygrad / open-gpu-kernel-modules

Forked from NVIDIA/open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support

C 954 91 Updated Dec 18, 2024

volcengine / verl

veRL: Volcano Engine Reinforcement Learning for LLM

Python 472 34 Updated Dec 22, 2024

genmoai / mochi

The best OSS video generation models

Python 2,534 260 Updated Dec 18, 2024

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,044 103 Updated Dec 23, 2024

anlongfei / compilerbook

compilerbook

49 28 Updated Apr 25, 2021

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,297 106 Updated Dec 18, 2024

jingyi0000 / VLM_survey

Collection of AWESOME vision-language models for vision tasks

2,644 227 Updated Dec 3, 2024

fengbintu / Neural-Networks-on-Silicon

This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.

1,882 385 Updated Aug 11, 2024

pytorch / torchtitan

A native PyTorch Library for large model training

Python 2,779 227 Updated Dec 20, 2024

microsoft / ParrotServe

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 125 8 Updated Sep 21, 2024

pytorch-labs / applied-ai

Applied AI experiments and examples for PyTorch

Python 191 17 Updated Dec 17, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 6,634 594 Updated Dec 23, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 669 27 Updated Sep 21, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 247 21 Updated Oct 30, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 275 34 Updated Sep 12, 2024

pku-liang / MAGIS

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

Python 47 4 Updated May 29, 2024

KnowingNothing / compiler-and-arch

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

407 35 Updated Nov 28, 2024

IST-DASLab / qmoe

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 262 23 Updated Nov 3, 2023

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,768 100 Updated Jan 21, 2024

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 685 39 Updated Dec 19, 2024

Infini-AI-Lab / Sequoia

scalable and robust tree-based speculative decoding algorithm

Python 322 37 Updated Aug 13, 2024

Infini-AI-Lab / TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 237 13 Updated Aug 31, 2024

maxbbraun / llama4micro

A "large" language model running on a microcontroller

C++ 499 36 Updated Dec 9, 2023

xai-org / grok-1

Grok open release

Python 49,743 8,345 Updated Aug 30, 2024

hwanz / SSR-V2ray-Trojan

机场推荐与机场评测

4,275 108 Updated Nov 18, 2024