[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 753 58 Updated Dec 23, 2024

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,355 177 Updated Feb 14, 2025

face3d0725 / FaceExtraction

Python 40 8 Updated Aug 4, 2024

fullstorydev / grpcurl

Like cURL, but for gRPC: Command-line tool for interacting with gRPC servers

Go 11,330 522 Updated Feb 19, 2025

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC…

C++ 5,147 581 Updated Mar 7, 2025

lovemefan / SenseVoice.cpp

Port of Funasr's Sense-voice model in C/C++

C 274 28 Updated Mar 4, 2025

tiantian91091317 / OCR-Corrector

利用语言模型，纠正OCR识别错误

Python 458 101 Updated May 22, 2023

ChanChiChoi / awesome-Face_Recognition

papers about Face Detection; Face Alignment; Face Recognition && Face Identification && Face Verification && Face Representation; Face Reconstruction; Face Tracking; Face Super-Resolution && Face D…

4,593 968 Updated Feb 9, 2023

RizhaoCai / Awesome-FAS

Paper collection of about the face anti-spoofing

365 57 Updated Aug 12, 2022

ZBar / ZBar

Clone of the mercurial repository http://zbar.hg.sourceforge.net:8000/hgroot/zbar/zbar

C 2,516 1,067 Updated Mar 18, 2024

DXOMARK-Research / PIQ2023

Jupyter Notebook 94 3 Updated Mar 14, 2024

chaofengc / IQA-PyTorch

👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...

Python 2,295 190 Updated Mar 3, 2025

openmedlab / USFM

Python 244 21 Updated Nov 28, 2024

Q-Future / Q-Bench

①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

Jupyter Notebook 253 14 Updated Aug 12, 2024