Stars
Evaluate your LLM's response with Prometheus and GPT4 💯
Unsupervised text tokenizer for Neural Network-based text generation.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Official inference library for Mistral models
Repo for the paper "Detecting Logical Fallacies: From Quiz to Climate Change News" (2021)
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
DadmaTools is a Persian NLP tools developed by Dadmatech Co.
SemEval2024-task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection
A tool for extracting plain text from Wikipedia dumps
Open source code for paper "Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere" ICML 2020
ssnl / moco_align_uniform
Forked from facebookresearch/mocoMoCo with Alignment and Uniformity Loss.
PromptBERT: Improving BERT Sentence Embeddings with Prompts
TensorFlow implementation of On the Sentence Embeddings from Pre-trained Language Models (EMNLP 2020)
A prize for finding tasks that cause large language models to show inverse scaling
TensorFlow code and pre-trained models for BERT
Multi-Task Deep Neural Networks for Natural Language Understanding
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
State-of-the-Art Text Embeddings
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Datasets, tools, and benchmarks for representation learning of code.
A free and unlimited python API for google translate.
Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle