-
Samsung
- Paris
- https://twitter.com/altamborrino
Stars
Segment documents into coherent parts using word embeddings.
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
A multilingual version of MS MARCO passage ranking dataset
Benchmarks of approximate nearest neighbor libraries in Python
Python library containing BART query generation and BERT-based Siamese models for neural retrieval.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Manipulate audio with a simple and easy high level interface
⚡ TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
Automatic differentiation with weighted finite-state transducers.
Fast Block Sparse Matrices for Pytorch
Python bindings for FFmpeg - with complex filtering support
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
Shared repository for open-sourced projects from the Google AI Language team.
Resources for the MRQA 2019 Shared Task
Simple text to phones converter for multiple languages
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
S3D Text-Video model trained on HowTo100M using MIL-NCE
A resource to create a multi domain Dialog Act Tagger for conversational agents using publicly available data
BLEURT is a metric for Natural Language Generation based on transfer learning.