-
Shanghai University (SHU)
- Baoshan, Shanghai
-
23:05
(UTC +08:00) - https://www.zhihu.com/people/drew-44-8
- https://scholar.google.com.hk/citations?user=L220uBgAAAAJ&hl=zh-CN
Stars
Hackable and optimized Transformers building blocks, supporting a composable construction.
The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)
Puzzles for learning Triton, play it with minimal environment configuration!
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Xiao's CUDA Optimization Guide [Active Adding New Contents]
CMake 教程 Modern-CMake 的简体中文翻译,中文版 Gitbook :https://modern-cmake-cn.github.io/Modern-CMake-zh_CN/ Chinese(simplified) translation of famous cmake tutorial Modern CMake. GitHub Pages : https://modern…
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
CMake完整使用教程。CMake教程包括一系列循序渐进的任务,介绍CMake信息,展示如何实现目标。
Development repository for the Triton language and compiler
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
中文nlp解决方案(大模型、数据、模型、训练、推理)
Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs
Learning materials for Stanford CS149 : Parallel Computing
Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models'.
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…
Tensors and Dynamic neural networks in Python with strong GPU acceleration