DefTruth

Follow

🎯

#pragma unroll

DefTruth DefTruth

🎯

#pragma unroll

Follow

Owner @xlite-dev, Member @vipshop, Previous @PaddlePaddle, Contributor @vllm-project 📚

1.7k followers · 141 following

Achievements

Achievements

Organizations

DefTruth/README.md

🛠Owner @xlite-dev | Member @vipshop | Previous @PaddlePaddle | Contributor @vLLM📚

Pinned Loading

xlite-dev/lite.ai.toolkit xlite-dev/lite.ai.toolkit Public

🛠 A lite C++ toolkit that contains 100+ Awesome AI models (Stable-Diffusion, FaceFusion, YOLO series, Face/Object Detection, Seg, Matting, etc), support MNN, ORT and TensorRT. 🎉🎉

C++ 4k 737
xlite-dev/Awesome-LLM-Inference xlite-dev/Awesome-LLM-Inference Public

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, Prefix Cache, Chunked Prefill, PD Disaggregate, etc. 🎉🎉

Python 3.7k 264
xlite-dev/CUDA-Learn-Notes xlite-dev/CUDA-Learn-Notes Public

📚Modern CUDA Learn Notes with PyTorch: 200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe API (Achieve 98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 3.1k 328
PaddlePaddle/FastDeploy PaddlePaddle/FastDeploy Public

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…

C++ 3.1k 477
xlite-dev/ffpa-attn-mma xlite-dev/ffpa-attn-mma Public

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

Cuda 157 7
vllm-project/vllm vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 43.1k 6.5k