Skip to content
View hanif-rt's full-sized avatar

Block or report hanif-rt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,341 553 Updated Dec 18, 2024

Puzzles for learning Triton

Jupyter Notebook 1,206 93 Updated Nov 18, 2024

🎓 Update Talking-Face Research Papers Daily, Now Integrated with LLM Analysis.

Python 161 16 Updated Dec 18, 2024

The code used to train and run inference with the ColPali architecture.

Python 1,277 111 Updated Dec 13, 2024

Implementation of F5-TTS in MLX

Python 391 34 Updated Dec 13, 2024

An extension that lets the AI take the wheel, allowing it to use the mouse and keyboard, recognize UI elements, and prompt itself :3...now also act as a research assistant

Python 99 1 Updated Oct 22, 2024

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 8,582 354 Updated Dec 16, 2024

Implementation of AlphaFold 3 in PyTorch Lightning + Hydra

Python 32 7 Updated Oct 4, 2024

Implementation of Alphafold 3 from Google Deepmind in Pytorch

Python 1,289 155 Updated Dec 3, 2024

Entropy Based Sampling and Parallel CoT Decoding

Python 3,168 319 Updated Nov 13, 2024

Kolmogorov-Arnold Transformer: A PyTorch Implementation with CUDA kernel

Python 628 36 Updated Oct 8, 2024

Mixture-of-Experts for Large Vision-Language Models

Python 2,024 126 Updated Dec 3, 2024

A generative speech model for daily dialogue.

Python 33,030 3,584 Updated Dec 3, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 5,040 426 Updated Aug 10, 2024

FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3

Python 175 11 Updated Apr 20, 2024

Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark

Python 25 2 Updated Jul 14, 2024

Text to speech alignment using CTC forced alignment

Python 175 35 Updated Oct 30, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,651 227 Updated Dec 4, 2024

An implementation of piecewise linear time warping for multi-dimensional time series alignment

Python 172 38 Updated Aug 15, 2024

Diffusion Feedback Helps CLIP See Better

Python 228 12 Updated Aug 24, 2024

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,732 586 Updated Jul 2, 2024

Bring portraits to life!

Python 13,364 1,423 Updated Nov 12, 2024

Official implementation of EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars

Jupyter Notebook 328 18 Updated Oct 6, 2024

We aim to build the most powerful Next-js Boilerplate. So, people don't have to write the code for same features again and again.

TypeScript 70 3 Updated Apr 5, 2024

Using Claude Sonnet 3.5 to forward (reverse) engineer code from VASA white paper - WIP - (this is for La Raza 🎷)

Python 249 30 Updated Nov 9, 2024

Code Implementation of EfficientVMamba

Python 189 7 Updated Apr 16, 2024

Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory wh…

Python 408 19 Updated Nov 25, 2024

[NeurIPS2024] Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Python 54 3 Updated Sep 29, 2024

PyTorch implementation of "Distinguishing Homophenes using Multi-Head Visual-Audio Memory" (AAAI2022)

Python 25 5 Updated Mar 9, 2024
Next