Sewens

Follow

🏳️‍⚧️

Back to be a programmer

Sewens

🏳️‍⚧️

Back to be a programmer

Follow

NLP,TouHou,Learning hard on Lambda Calculus. May my friends Live Long and Prosper.

48 followers · 95 following

Lawbda Co.
PRC::LiaoNing
pages.lawbda.org
@LaWbda

Achievements

Achievements

Stars

🍭Data

Data and dataset

34 repositories

igrigorik / gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

Ruby 2,713 209 Updated Aug 5, 2024

csebuetnlp / CoDesc

A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.

Python 52 9 Updated Feb 24, 2022

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

Python 14,222 2,116 Updated Jul 23, 2024

viva-la-vita / wiki

生如夏花知识库

TypeScript 150 15 Updated Aug 22, 2024

activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activelo…

Python 8,250 631 Updated Dec 17, 2024

bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

Python 1,190 101 Updated Oct 1, 2024

microsoft / Search4Code

Web queries dataset for code search

31 1 Updated Jun 3, 2023

CIRCSE / LT4HALA

<u><a href="https://circse.github.io/LT4HALA/" style="color: white">Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA)</a></u>

Python 32 14 Updated May 28, 2024

hsc748NLP / SikuBERT-for-digital-humanities-and-classical-Chinese-information-processing

SikuBERT：四库全书的预训练语言模型（四库BERT） Pre-training Model of Siku Quanshu

117 14 Updated Jul 30, 2023

mly-nlp / LJP-MSJudge

Python 23 3 Updated Jun 19, 2024

NiuTrans / Classical-Modern

非常全的文言文（古文）-现代文平行语料

Python 1,208 274 Updated Apr 21, 2024

xingrz / docset_gitlab-ci

Auto generated Dash docset feed for .gitlab-ci.yml

JavaScript 4 Updated Jul 14, 2022

geopanag / pandemic_tgnn

Python 44 14 Updated Nov 1, 2024

graykode / commit-autosuggestions

A tool that AI automatically recommends commit messages.

Python 385 16 Updated Aug 8, 2023

aceimnorstuvwxz / toutiao-text-classfication-dataset

今日头条中文新闻（文本）分类数据集

Python 359 62 Updated May 19, 2021

sherlcok314159 / ChineseMRC-Data

收集了目前为止中文领域的MRC抽取式数据集

118 14 Updated Jun 20, 2024

CLUEbenchmark / SimCLUE

3000000+语义理解与匹配数据集。可用于无监督对比学习、半监督学习等构建中文领域效果最好的预训练模型

Python 289 38 Updated Oct 11, 2022

EdinburghNLP / code-docstring-corpus

Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.

Python 202 48 Updated Jul 13, 2020

Jun-jie-Huang / CoCLR

Python 43 8 Updated Nov 2, 2022

EngineeringSoftware / time-segmented-evaluation

Code and data for "Impact of Evaluation Methodologies on Code Summarization" in ACL 2022.

Python 10 1 Updated Sep 6, 2022

e3b0c442 / keywords

A list and count of keywords in programming languages.

Go 198 21 Updated Oct 26, 2024

modood / Administrative-divisions-of-China

中华人民共和国行政区划：省级（省份）、地级（城市）、县级（区县）、乡级（乡镇街道）、村级（村委会居委会），中国省市区镇村二级三级四级五级联动地址数据。

JavaScript 18,772 7,053 Updated Nov 28, 2024

DUTIR-Emotion-Group / CCL2020-Humor-Computation

CCL2020，“小牛杯”幽默计算任务数据发布

21 4 Updated Aug 27, 2024

dengxiuqi / ChineseLyrics

10W首中文歌词数据库

456 75 Updated Jun 13, 2021

Bai-Yu-Lan / SH-COVID19

COVID-19 open data in Shanghai

Jupyter Notebook 121 12 Updated Jul 28, 2022

AuthEceSoftEng / CodeTransformer

Semantic Code Search Tool based on Machine Translation

Jupyter Notebook 5 3 Updated Nov 21, 2022

fewshotcdcs / CDCS

Cross-Domain Deep Code Search with Few-Shot Learning

Python 11 10 Updated Jul 5, 2023

IllDepence / unarXive

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network

Python 281 19 Updated Sep 28, 2024

WoodenHeadoo / design-pattern-catalog

A catalog of more than 400 design patterns collected from multiple sources

2 Updated Mar 5, 2018

WoodenHeadoo / design-pattern-attributes

supplement materials for the paper Mining Attributes of Design Patterns: A Case Study on Online Posts

C 1 1 Updated Feb 17, 2020