Stars
国科大硕士/博士学位论文LeTeX模板, 以《中国科学院大学研究生学位论文撰写规范指导意见》(校发学位字[2022]40号, 附件1) 作为撰写要求
LaTeX Thesis Template for the University of Chinese Academy of Sciences
黑马程序员最新Java项目实战《苍穹外卖》,最适合新手的SpringBoot+SSM企业级项目实战 相比于瑞吉外卖苍穹外卖的业务更加真实完整,用户端改为微信小程序,登录改为了微信登录,加入了统计报表,来单提醒,客户催单,订单管理等功能,业务实现了闭环。技术选型更加丰富和实用。可以认为是增强版瑞吉外卖
推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/
Easy-to-Use RAG Framework; CCF AIOps International Challenge 2024 Top3 Solution; CCF AIOps 国际挑战赛 2024 季军方案
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
Development repository for the Triton language and compiler
Inference optimization of the ViT model using TensorRT, NVIDIA's high-performance deep learning inference platform. TensorRT is designed to maximize the efficiency of deep learning models during in…
TensorRT 2022 亚军方案,tensorrt加速mobilevit模型
Flash Attention in ~100 lines of CUDA (forward pass only)
how to optimize some algorithm in cuda.
A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deploym…
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
This is the official code for paper: Token Summarisation for Efficient Vision Transformers via Graph-based Token Propagation