Stars
Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.
手写实现李航《统计学习方法》书中全部算法
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
A quick guide (especially) for trending instruction finetuning datasets
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Evaluate computational models on their alignment to behavioral and neural measurements in the domain of language
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
[PNAS'21] The neural architecture of language: Integrative modeling converges on predictive processing
Examples and guides for using the OpenAI API
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Matplotlib styles for scientific plotting
A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.
Provide Semantic Parsing solutions and Natural Language Inferences for multiple languages following the idea of the syntax-semantics interface.
Representation Engineering: A Top-Down Approach to AI Transparency
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Scripts for assesing multilingual BERT.
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
Code and plots accompanying paper 'Periodizing Samuel Beckett’s Works: A Stylochronometric Approach' (under review)
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥