HiMatch

The code for ACL-2021 Long Paper Hierarchy-aware Label Semantics Matching Network for Hierarchical Text Classification

Dependency

PyTorch==1.4.0, sklearn, tqdm, transformers

Dataset

RCV1-V2
WOS
EURLEX-57K
Glove.6B.300d.txt

Preprocess

Dataset Preprocess

Transform your dataset to json format file {'token': List[str], 'label': List[str]}
You can refer to data_modules/preprocess.py, and here is the WOS dataset Google Drive after preprocessing.

Label Prior Probability (Label Structure)

Preprocess the taxnomy format (data/wos.taxnomy and data/wos_prob_child_parent.json)
Extract Label Prior Probability

python helper/hierarchy_tree_statistic.py config/wos.json

Label Description

We use classic TD-IDF to extract the representative words for each label.

python construct_label_desc.py

In our follow-up actual practice, we found that introducing richer label representations is beneficial for further improvement.

Train

Modify the training settings in config/wos.json.

python train.py config/wos-bert.json  
python train.py config/wos.json

Hyperparamter Description

sample_num: 2. The averge label number of WOS is 2. For every positive label, we all regard them as positive label index and construct matching pairs.  
negative_ratio: 3. Coarse-grained label, wrong sibling label and other wrong label.  
total_sample_num: 2*3=6.

Other Experimental Settings

The experimental settings on EURLEX-57K: KAMG
The experimental settings on BERT: Bert-Multi-Label-Text-Classification

Cite

@inproceedings{chen-etal-2021-hierarchy,
    title = "Hierarchy-aware Label Semantics Matching Network for Hierarchical Text Classification",
    author = "Chen, Haibin  and Ma, Qianli  and Lin, Zhenxi  and Yan, Jiangyue",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    year = "2021",
    url = "https://aclanthology.org/2021.acl-long.337",
    pages = "4370--4379"
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data_modules		data_modules
helper		helper
models		models
train_modules		train_modules
vocab_wos		vocab_wos
README.md		README.md
bert-wos.log		bert-wos.log
construct_label_desc.py		construct_label_desc.py
glove-wos.log		glove-wos.log
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiMatch

Dependency

Dataset

Preprocess

Dataset Preprocess

Label Prior Probability (Label Structure)

Label Description

Train

Other Experimental Settings

Cite

About

Releases

Packages

Languages

qianlima-lab/HiMatch

Folders and files

Latest commit

History

Repository files navigation

HiMatch

Dependency

Dataset

Preprocess

Dataset Preprocess

Label Prior Probability (Label Structure)

Label Description

Train

Other Experimental Settings

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages