Stars
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, and more.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[ACM SIGCOMM 2024] "m3: Accurate Flow-Level Performance Estimation using Machine Learning" by Chenning Li*, Arash Nasr-Esfahany*, Kevin Zhao, Kimia Noorbakhsh, Prateesh Goyal, Mohammad Alizadeh, Th…
A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.
Versatile audio super resolution (any -> 48kHz) with AudioSR.
[TMLR 2024] Efficient Large Language Models: A Survey
DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
[ICCV 2023] Official implementation of the paper: "DIRE for Diffusion-Generated Image Detection"
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Defending against Adversarial Audio via Diffusion Model (ICLR 2023)
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on poisoned dataset.
Code and documentation to train Stanford's Alpaca models, and generate the data.
This is the code of ICLR 2022 Oral paper 'Non-Transferable Learning: A New Approach for Model Ownership Verification and Applicability Authorization'.
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
A latent text-to-image diffusion model