Awesome_Korean_Data

한국어 데이터 세트 링크

한국어 텍스트 데이터

2.네이버 뉴스 중 IT/과학 분야에서 50개를 선정해서 요약에 해당하는 문장을 태깅해둔 데이터셋
https://github.com/theeluwin/sci-news-sum-kr-50

3.Naver sentiment movie corpus v1.0(네이버 악평과 선평을 구분해 놓은 데이터)
https://github.com/e9t/nsmc

4.Naver sentiment movie corpus v1.0를 다운 받아서
감성분석 레이블링 상세화한 데이터
label : 'toxic', 'obscene', 'threat', 'insult', 'identity_hate'

5.Paired Question(질문쌍이 같은 질문인지 다른 질문인지 구별하는 데이터)
https://github.com/songys/Question_pair

6.한국어 개체명 정의 및 표지 표준화 기술보고서와 이를 기반으로 제작된 개체명 형태소 말뭉치
https://github.com/kmounlp/NER

9.한영/한불 벙렬말뭉치(번역용)

공공데이터포털 뉴스빅데이터 분석 정보(뉴스데이터베이스 'Kinds' 기반 분석 자료, 기사 메타정보) https://www.data.go.kr/dataset/15012945/fileData.do

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md