Skip to content

Latest commit

 

History

History
70 lines (52 loc) · 2.29 KB

README.md

File metadata and controls

70 lines (52 loc) · 2.29 KB

This is the folder for the machine translation task.

The data and most code are borrowed from COMET Repo. Thanks for their work.

Basics

We have used WMT-19 DARR dataset, and considered the follwing language pairs: de-en, fi-en, gu-en, kk-en, lt-en, ru-en, zh-en. For each language pair, we converted the original dataset into an unified form as shown below (all texts non-tokenized, normal cased). The unified data form is in each dataset folder, and is named as data.pkl. Note that there is another file final_p.pkl in each dataset folder, which is our calculated score file.

{
    "doc_id": {
        "src": "This is the source text.",
        "ref": "This is the reference translation.",
        "better": {
            "sys_name": "System name 1",
            "sys": "This is system translation 1.",
            "scores": {} 
        },
        "worse": {
            "sys_name": "System name 2",
            "sys": "This is system translation 2.",
            "scores": {}
        }
    }
}

After calculating scores using automatic metrics, the scores field for each system is updated, like the one below.

"scores": {
    "auto_metric1": "0.3", # We use string score to save space
    "auto_metric2": "0.1",
    "auto_metric3": "0.7"
}

Setups

To use BLEURT, please run the following to set up.

git clone https://github.com/google-research/bleurt.git
cd bleurt
pip install .

Please run the following commands to download the PRISM model, BLEURT model and COMET model.

mkdir models
sh download.sh

Our trained BARTScore (on ParaBank2) can be downloaded here. Please also move it to the models folder for subsequent experiments if you consider using it.

Run scores

Run the following to see all the arguments that are supported by the score.py script.

python score.py --help

To reproduce the results, run the following as an example.

python score.py --file kk-en/data.pkl --device cuda:0 --output kk-en/scores.pkl --bleu --chrf --bleurt --prism --comet --bert_score --bart_score --bart_score_cnn --bart_score_para --prompt bart_para_ref