Skip to content
View yikang0131's full-sized avatar

Highlights

  • Pro

Organizations

@sjtu-compling

Block or report yikang0131

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Python 280 24 Updated Jul 12, 2024

中文公开聊天语料库

Python 4,053 788 Updated Apr 23, 2024

手写实现李航《统计学习方法》书中全部算法

Python 11,191 2,889 Updated Nov 13, 2024

Some E-books From Internet~

JavaScript 165 55 Updated Mar 7, 2017

chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料

381 62 Updated Oct 22, 2022

中文书籍收录整理, Collection of Chinese Books

Python 174 33 Updated Dec 27, 2023

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,661 255 Updated Jan 24, 2025

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 21,209 2,742 Updated Aug 15, 2024

A quick guide (especially) for trending instruction finetuning datasets

2,806 181 Updated Nov 28, 2023

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

Python 40 5 Updated Nov 30, 2024

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

9,567 1,552 Updated May 23, 2024

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions

Python 688 72 Updated Jan 30, 2025

Evaluate computational models on their alignment to behavioral and neural measurements in the domain of language

Python 31 16 Updated Aug 19, 2024

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

Python 28,886 3,426 Updated Jan 22, 2025

[PNAS'21] The neural architecture of language: Integrative modeling converges on predictive processing

Jupyter Notebook 55 17 Updated Oct 25, 2023

Examples and guides for using the OpenAI API

MDX 61,440 9,866 Updated Jan 30, 2025

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,148 495 Updated May 3, 2024

Matplotlib styles for scientific plotting

Python 7,393 715 Updated Jan 13, 2025

A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.

Python 101 25 Updated Oct 4, 2023

Provide Semantic Parsing solutions and Natural Language Inferences for multiple languages following the idea of the syntax-semantics interface.

Python 236 62 Updated Dec 18, 2023

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook 780 89 Updated Aug 14, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 20,656 2,588 Updated Jan 7, 2025

Scripts for assesing multilingual BERT.

Python 5 2 Updated Aug 17, 2021

Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)

Perl 69 3 Updated Apr 23, 2024

Code and plots accompanying paper 'Periodizing Samuel Beckett’s Works: A Stylochronometric Approach' (under review)

Python 2 4 Updated Oct 16, 2015

Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

Python 1,265 119 Updated Dec 1, 2023