
-
IIT Madras, AI4Bharat
- Hyderabad
- https://sumanthd17.github.io
- @sumanthd17
Highlights
- Pro
Stars
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Code for training, evaluating and using a cross-lingual Auto Evaluator
Text-to-Speech for languages of India
FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists
Translation models for 22 scheduled languages of India
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Tool to fix bitexts and tag near-duplicates for removal
Code for EMNLP 2022 Paper: On the Calibration of Massively Multilingual Language Models
ML has an impact on the climate. But not all models are born equal. Compute your model's emissions with our calculator and add the results to your paper with our generated latex template
Official code for "Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models" to appear in WMT 2022.
Facebook Low Resource (FLoRes) MT Benchmark
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
terashuf shuffles multi-terabyte text files using limited memory
Seminar on Large Language Models (COMP790-101 at UNC Chapel Hill, Fall 2022)
📝 A not-so-fancy but still a pretty research CV 🎆 🎉
AI4Bharat / webcorpus
Forked from divkakwani/webcorpusGenerate large textual corpora for almost any language by crawling the web
indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2
Archived old website for AI4Bhārat Indic-NLP
Agile reading group that works
Pretraining, fine-tuning and evaluation scripts for Indic-Wav2Vec2
AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems