ZEN/datasets at master · sinovation/ZEN

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Tasks and datasets used in our experiments

Chinese word segmentation (CWS):

MSR dataset from SIGHAN2005 Chinese word segmentation Bakeoff.

Part-of-speech (POS) tagging:

CTB5 dataset with standard splits.

Named entity recognition (NER):

MSRA dataset from international Chinese language processing Bakeoff 2006.

Document classification (DC):

THUCNews dataset from Sina news with 10 evenly distributed classes.

Sentiment analysis (SA):

The ChnSentiCorp dataset with 12,000 documents from three domains, i.e., book, computer and hotel.

Sentence pair matching (SPM):

The LCQMC (a large-scale Chinese question matching corpus) dataset, where each instance in it is a pair of two sentences with a label indicating whether their intent is matched.

Natural language inference (NLI):

The Chinese part of the XNLI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

README.md

Tasks and datasets used in our experiments

Chinese word segmentation (CWS):

Part-of-speech (POS) tagging:

Named entity recognition (NER):

Document classification (DC):

Sentiment analysis (SA):

Sentence pair matching (SPM):

Natural language inference (NLI):

Files

datasets

Directory actions

More options

Directory actions

More options

Latest commit

History

datasets

Folders and files

parent directory

README.md

Tasks and datasets used in our experiments

Chinese word segmentation (CWS):

Part-of-speech (POS) tagging:

Named entity recognition (NER):

Document classification (DC):

Sentiment analysis (SA):

Sentence pair matching (SPM):

Natural language inference (NLI):