Stars
(CVPR2024)RMT: Retentive Networks Meet Vision Transformer
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
Triton implement of bi-directional (non-causal) linear attention
🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Fast and memory-efficient exact attention
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Official data preparation and metric evaluation scripts for the Interspeech 2025 URGENT challenge.
Utilizes ONNX Runtime for audio denoising.
This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
A flexible, high-performance 3D simulator for Embodied AI research.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
This is the repository for the speech enhancement model SyncFormer
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurIPS 2024]
ModelScope: bring the notion of Model-as-a-Service to life.
Code for the creation of CommonVoice-DEMAND speech enhancement datasets
Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement"
A first-of-its-kind acoustic simulation platform for audio-visual embodied AI research. It supports training and evaluating multiple tasks and applications.