-
Shanghai Jiao Tong University
- Shanghai
- https://scholar.google.com/citations?user=6aARLhMAAAAJ&hl=zh-CN
Starred repositories
Fully open reproduction of DeepSeek-R1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Mixture-of-Experts for Large Vision-Language Models
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Ongoing research training transformer models at scale
Example models using DeepSpeed
Official implementation for "TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables" (NeurIPS 2024)
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
The paper collections for the autoregressive models in vision.
This repository contains a reading list of papers on Time Series Forecasting/Prediction (TSF) and Spatio-Temporal Forecasting/Prediction (STF). These papers are mainly categorized according to the …
[ICLR 2025 Spotlight] Official implementation of "Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts"
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
This repository is for the paper entitled: From News to Forecast: Integrating Event Analysis in LLM-based Time Series Forecasting with Reflection (NeurIPS 2024)
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization"
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Unified Training of Universal Time Series Forecasting Transformers
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".