Stars
GeoCov v2 (requires Twitter API v2 access to produce additional datasets). Dataset for geographically-specified tweets during Covid and the year prior, along with scripts for producing more for oth…
Pretrained BERT model for analysing COVID-19 Twitter data
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.
A robust Python tool for text-based AI training and generation using GPT-2.
Code and Data for ACL 2020 paper "Few-Shot NLG with Pre-Trained Language Model"
Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development
End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service
Text generation with a Variational Autoencoder
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Code for Dligach and Miller 2018 paper Learning Patient Representations from Text
Deezer source separation library including pretrained models.
A Python library for easy data analysis, visualization, exploration and modeling
Flexible Quantum Circuit Simulator (qFlex) implements an efficient tensor network, CPU-based simulator of large quantum circuits.
Military Service Identification Tool - Example Codebase
The submission template for the MineRL Competition @ NeurIPS 2021. Clone this to make a new submission!
A collection of baselines for the MineRL environment/datasets & the NeurIPS 2021 MineRL competitions
Ongoing research training transformer models at scale
State-of-the-Art Text Embeddings
The first public PyTorch implementation of Skip-Thought Vectors
📄 A PyTorch implementation of Paragraph Vectors (doc2vec).