Stars
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
🎓 Update Talking-Face Research Papers Daily, Now Integrated with LLM Analysis.
The code used to train and run inference with the ColPali architecture.
An extension that lets the AI take the wheel, allowing it to use the mouse and keyboard, recognize UI elements, and prompt itself :3...now also act as a research assistant
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Implementation of AlphaFold 3 in PyTorch Lightning + Hydra
Implementation of Alphafold 3 from Google Deepmind in Pytorch
Entropy Based Sampling and Parallel CoT Decoding
Kolmogorov-Arnold Transformer: A PyTorch Implementation with CUDA kernel
Mixture-of-Experts for Large Vision-Language Models
A generative speech model for daily dialogue.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark
Text to speech alignment using CTC forced alignment
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
An implementation of piecewise linear time warping for multi-dimensional time series alignment
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
We aim to build the most powerful Next-js Boilerplate. So, people don't have to write the code for same features again and again.
Using Claude Sonnet 3.5 to forward (reverse) engineer code from VASA white paper - WIP - (this is for La Raza 🎷)
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory wh…
[NeurIPS2024] Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)