Skip to content
View gqjia's full-sized avatar
☂️
I walk in the rain.
☂️
I walk in the rain.

Block or report gqjia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Datasets

数据集、语料和其他一些用得到的数据
7 repositories

非常全的文言文(古文)-现代文平行语料

Python 1,269 293 Updated Apr 21, 2024

Code for "Jhamtani H.*, Gangal V.*, Hovy E. and Nyberg E. Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models" Workshop on Stylistic Variation, EMNLP 2017

OpenEdge ABL 72 29 Updated Apr 20, 2021

中文近义词表 Chinese Synonyms

250 49 Updated Jan 20, 2018

WuDaoMM this is a data project

70 5 Updated Apr 29, 2022

收集了目前为止中文领域的MRC抽取式数据集

119 15 Updated Jun 20, 2024

10W首中文歌词数据库

463 77 Updated Jun 13, 2021

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,703 259 Updated Feb 18, 2025