Lists (6)
Sort Name ascending (A-Z)
Stars
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
A curated list of awesome voice conversion, projects and communities.
Python client for Baidu Yun (Personal Cloud Storage) 百度云/百度网盘Python客户端
Remote heart rate detection through Eulerian magnification of face videos
Desktop implementation of Remote Photoplethysmography – Measuring heart rate using facial video.
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Like cURL, but for gRPC: Command-line tool for interacting with gRPC servers
Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC…
papers about Face Detection; Face Alignment; Face Recognition && Face Identification && Face Verification && Face Representation; Face Reconstruction; Face Tracking; Face Super-Resolution && Face D…
Paper collection of about the face anti-spoofing
Clone of the mercurial repository http://zbar.hg.sourceforge.net:8000/hgroot/zbar/zbar
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer (CVPR 2024)
A comprehensive collection of IQA papers
寧波話吳語拼音輸入方案 · 宁波话吴语拼音输入方案 · A Rime input schema for Ningbo Dialect
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
UT-Sarulab MOS prediction system using SSL models
Application of MB-iSTFT-VITS components to vits2_pytorch
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform