Skip to content
View Sewens's full-sized avatar
🏳️‍⚧️
Back to be a programmer
🏳️‍⚧️
Back to be a programmer

Block or report Sewens

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🍭Data

Data and dataset
34 repositories

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

Ruby 2,713 209 Updated Aug 5, 2024

A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.

Python 52 9 Updated Feb 24, 2022

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Python 14,222 2,116 Updated Jul 23, 2024

生如夏花知识库

TypeScript 150 15 Updated Aug 22, 2024

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activelo…

Python 8,250 631 Updated Dec 17, 2024

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

Python 1,190 101 Updated Oct 1, 2024

Web queries dataset for code search

31 1 Updated Jun 3, 2023

<u><a href="https://circse.github.io/LT4HALA/" style="color: white">Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA)</a></u>

Python 32 14 Updated May 28, 2024

SikuBERT:四库全书的预训练语言模型(四库BERT) Pre-training Model of Siku Quanshu

117 14 Updated Jul 30, 2023
Python 23 3 Updated Jun 19, 2024

非常全的文言文(古文)-现代文平行语料

Python 1,208 274 Updated Apr 21, 2024

Auto generated Dash docset feed for .gitlab-ci.yml

JavaScript 4 Updated Jul 14, 2022
Python 44 14 Updated Nov 1, 2024

A tool that AI automatically recommends commit messages.

Python 385 16 Updated Aug 8, 2023

今日头条中文新闻(文本)分类数据集

Python 359 62 Updated May 19, 2021

收集了目前为止中文领域的MRC抽取式数据集

118 14 Updated Jun 20, 2024

3000000+语义理解与匹配数据集。可用于无监督对比学习、半监督学习等构建中文领域效果最好的预训练模型

Python 289 38 Updated Oct 11, 2022

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

Python 202 48 Updated Jul 13, 2020
Python 43 8 Updated Nov 2, 2022

Code and data for "Impact of Evaluation Methodologies on Code Summarization" in ACL 2022.

Python 10 1 Updated Sep 6, 2022

A list and count of keywords in programming languages.

Go 198 21 Updated Oct 26, 2024

中华人民共和国行政区划:省级(省份)、 地级(城市)、 县级(区县)、 乡级(乡镇街道)、 村级(村委会居委会) ,中国省市区镇村二级三级四级五级联动地址数据。

JavaScript 18,772 7,053 Updated Nov 28, 2024

CCL2020,“小牛杯”幽默计算任务数据发布

21 4 Updated Aug 27, 2024

10W首中文歌词数据库

456 75 Updated Jun 13, 2021

COVID-19 open data in Shanghai

Jupyter Notebook 121 12 Updated Jul 28, 2022

Semantic Code Search Tool based on Machine Translation

Jupyter Notebook 5 3 Updated Nov 21, 2022

Cross-Domain Deep Code Search with Few-Shot Learning

Python 11 10 Updated Jul 5, 2023

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network

Python 281 19 Updated Sep 28, 2024

A catalog of more than 400 design patterns collected from multiple sources

2 Updated Mar 5, 2018

supplement materials for the paper Mining Attributes of Design Patterns: A Case Study on Online Posts

C 1 1 Updated Feb 17, 2020