Frustratingly Easy Label Projection for Cross-lingual Transfer (Findings of ACL2023)
Update (May 30, 2023): Update checkpoints due to an issue in Huggingface NLLB tokenization.
We use the code base and script adapted from MasakhaNER: Script
Google drive: link
NER (for evaluation): data_{masakahner,wikiann}
EasyProject (for training): output_nllb_3Bft_{wikiann,conll}
We use the following script to perform post-processing for translation data. This step assign labels to entities inside the brackets (e.g., [ ]). The post processed data are stored in output_nllb_3Bft_{wikiann,conll}. The original data are stored in {conll,wikiann}_nllb_3B_ft.pkl files in the google drive.
Wikiann:
python decode_marker_wikiann.py
Masakhaner:
python decode_marker_conll.py
We use the following script with slurm to run experiments - please adjust accordingly.
Wikiann:
bash xlmr_en_marker_transfer_3bft.sh
Masakhaner:
bash mdeberta_en_marker_transfer_3bft.sh
Please cite if you use the above resources for your research
@inproceedings{chen2023easyproject,
title={Frustratingly Easy Label Projection for Cross-lingual Transfer},
author={Chen, Yang and Jiang, Chao and Ritter, Alan and Xu, Wei},
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Findings)},
year={2023}
}
This material is based in part on research sponsored by IARPA via the BETTER program (2019-19051600004).