Stars
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
we want to create a repo to illustrate usage of transformers in chinese
Integrated System for Target Detection and Tracking
Multilingual Voice Understanding Model
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
Code for the paper: CNN-generated images are surprisingly easy to spot... for now https://peterwang512.github.io/CNNDetection/
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Framework agnostic sliced/tiled inference + interactive ui + error analysis plots
Python package for automatic tree crown delineation based on the Detectron2 implementation of Mask R-CNN
Supervoice Speaker Separation Network
[ICML 2024] Official repository of the paper: "Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset"
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
vits2 backbone with multilingual-bert
The official implementation of "Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting"
[CVPR 2022] Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Neural Style implementation in PyTorch! 🎨
A fast PyTorch implementation of "A Neural Algorithm of Artistic Style"
Implementation of paper - Rep-RTADet: Reparameterized Real-Time Algae Object Detectors Enhanced through Dynamic Cache-Based Poisson Fusion