Skip to content
View sumanthd17's full-sized avatar
:octocat:
Focusing
:octocat:
Focusing

Highlights

  • Pro

Organizations

@AI4Bharat

Block or report sumanthd17

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,571 182 Updated Mar 4, 2025

Code for training, evaluating and using a cross-lingual Auto Evaluator

Python 4 Updated Oct 18, 2024

Text-to-Speech for languages of India

Jupyter Notebook 212 54 Updated Nov 8, 2024

LLM101n: Let's build a Storyteller

32,214 1,742 Updated Aug 1, 2024

FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists

Python 28 3 Updated Dec 6, 2024
Python 2,759 310 Updated Jan 30, 2025

Translation models for 22 scheduled languages of India

Python 283 74 Updated Jan 18, 2025
Python 8 Updated Oct 26, 2023

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,665 355 Updated Dec 7, 2024

Tool to fix bitexts and tag near-duplicates for removal

Python 29 3 Updated Feb 5, 2025

Code for EMNLP 2022 Paper: On the Calibration of Massively Multilingual Language Models

Python 14 2 Updated Jun 12, 2023

ML has an impact on the climate. But not all models are born equal. Compute your model's emissions with our calculator and add the results to your paper with our generated latex template

HTML 219 41 Updated Nov 12, 2024

Official code for "Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models" to appear in WMT 2022.

Python 17 5 Updated Oct 3, 2023

Facebook Low Resource (FLoRes) MT Benchmark

Python 721 125 Updated Nov 20, 2023

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Python 1,297 122 Updated Apr 24, 2024

Tools for checking ACL paper submissions

Python 670 48 Updated Oct 20, 2024

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 31,083 6,486 Updated Jan 9, 2025

terashuf shuffles multi-terabyte text files using limited memory

C++ 212 15 Updated Feb 5, 2023

Yet Another Neural Machine Translation Toolkit

Python 177 32 Updated Jul 3, 2024

Seminar on Large Language Models (COMP790-101 at UNC Chapel Hill, Fall 2022)

310 17 Updated Nov 21, 2022

📝 A not-so-fancy but still a pretty research CV 🎆 🎉

TeX 72 20 Updated May 16, 2021

Generate large textual corpora for almost any language by crawling the web

Python 12 6 Updated Feb 17, 2024

indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2

Jupyter Notebook 123 32 Updated Jan 2, 2024

Archived old website for AI4Bhārat Indic-NLP

HTML 5 14 Updated Sep 10, 2022

Agile reading group that works

13 1 Updated Feb 2, 2022
Jupyter Notebook 6,204 1,051 Updated Sep 22, 2024

Pretraining, fine-tuning and evaluation scripts for Indic-Wav2Vec2

Jupyter Notebook 82 28 Updated Mar 14, 2024

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

Python 45 9 Updated Aug 19, 2022
Next