Stars
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question ans…
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
A high-throughput and memory-efficient inference and serving engine for LLMs
METER: A Multimodal End-to-end TransformER Framework
Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021
Implementation of focal loss in pytorch for unbalanced classification.
💫 A spaCy package for Yohei Tamura's Rust tokenizations library
Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/
Quality information extraction at web scale. Edit
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction
chemical disease relation extraction via maximum-entropy classifiers
Source code for the EMNLP 2019 paper: "Connecting the Dots: Document-level Relation Extraction with Edge-oriented Graphs"
Global-to-Local Neural Networks for Document-Level Relation Extraction, EMNLP 2020
A corpus of Biomedical papers annotated with mentions of UMLS entities.
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
Source code for EMNLP 2020 paper "Denoising Relation Extraction from Document-level Distant Supervision"
Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122
SciDTB: Discourse Dependency TreeBank for Scientific Abstracts
This is the Github repo of "CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset" (https://arxiv.org/abs/2005.02367)
Setup and customize deep learning environment in seconds.
A Python implementation of global optimization with gaussian processes.
A collection of annotated biomedical corpora, which can be used for training supervised machine learning methods for various tasks in biomedical text-mining and information extraction.