CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

The repository contains the codes and data for our EMNLP 2023 Main Paper: CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction.

CLEME is a reference-based metric that evaluate Grammatical Error Correction (GEC) systems at the chunk-level, aiming to provide unbiased F0.5 scores for GEC multi-reference evaluation.

Features

CLEME is unbiased, allowing a more objective evaluation pipeline.
CLEME is able to visualize evaluation pipeline as tables.
CLEME supports English and Chinese for now. We will extend to other languages in the future.

Requirements and Installation

Python version >= 3.7

ERRANT or variants designed for other specific languages.

Language	Link
English	ERRANT
Arabic	arabic_error_type_annotation
Chinese	ChERRANT
Czech	errant_czech
German	ERRANT-German
Greek	ELERRANT
Hindi	hindi_grammar_correction
Korean	Standard_Korean_GEC
Russian	ERRANT-Russian
Turkish	ERRANT-TR

We recommend the newest version of ERRANT for speed gain, although we use ERRANT v2.3.3 in the paper.

Clone this repository:

git clone https://github.com/THUKElab/CLEME.git
cd ./CLEME

Usage

CLI

Evaluate AMU system

python scripts/evaluate.py --ref tests/examples/conll14.errant --hyp tests/examples/conll14-AMU.errant

{'num_sample': 1312, 'F': 0.2514, 'Acc': 0.7634, 'P': 0.2645, 'R': 0.2097, 'tp': 313.51, 'fp': 871.8, 'fn': 1181.71, 'tn': 6312.0}

Visualize evaluation process as tables

python scripts/evaluate.py  --ref tests/examples/demo.errant  --hyp tests/examples/demo-AMU.errant  --vis

API

Evaluate AMU system using CLEME-dependent

# Read M2 file
dataset_ref = self.reader.read(f"{os.path.dirname(__file__)}/examples/demo.errant")
dataset_hyp = self.reader.read(f"{os.path.dirname(__file__)}/examples/demo-AMU.errant")
print(len(dataset_ref), len(dataset_hyp))
print("Example of reference", dataset_ref[-1])
print("Example of hypothesis", dataset_hyp[-1])

# Evaluate using CLEME_dependent
config_dependent = {
	"tp": {"alpha": 2.0, "min_value": 0.75, "max_value": 1.25, "reverse": False},
	"fp": {"alpha": 2.0, "min_value": 0.75, "max_value": 1.25, "reverse": True},
	"fn": {"alpha": 2.0, "min_value": 0.75, "max_value": 1.25, "reverse": False},
}
metric_dependent = DependentChunkMetric(weigher_config=config_dependent)
score, results = metric_dependent.evaluate(dataset_hyp, dataset_ref)
print(f"==================== Evaluate Demo ====================")
print(score)

# Visualize
metric_dependent.visualize(dataset_ref, dataset_hyp)

Refer to ./tests/test_cleme.py for more details.

Adapt to Other Languages

CLEME is language-agnostic, so you can easily employ CLEME for any languages if you have got reference and hypothesis M2 files.

Recommended Hyper-parameters

We search optimal hyper-parameters on CoNLL-2014 reference set, which are listed in .cleme/constant.py.

Citation

@article{ye-et-al-2023-cleme,
  title   = {CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction},
  author  = {Ye, Jingheng and Li, Yinghui and Zhou, Qingyu and Li, Yangning and Ma, Shirong and Zheng, Hai-Tao and Shen, Ying},
  journal = {arXiv preprint arXiv:2305.10819},
  year    = {2023}
}

Update Logs

v1.0 (2023.11.15)

CLEME v1.0 released.

Contact & Feedback

If you have any questions or feedbacks, please send e-mails to ours: [email protected], [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cleme		cleme
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

Features

Requirements and Installation

Usage

CLI

Evaluate AMU system

Visualize evaluation process as tables

API

Evaluate AMU system using CLEME-dependent

Adapt to Other Languages

Recommended Hyper-parameters

Citation

Update Logs

v1.0 (2023.11.15)

Contact & Feedback

About

Releases

Packages

Languages

License

Rr-shan/CLEME2.0

Folders and files

Latest commit

History

Repository files navigation

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

Features

Requirements and Installation

Usage

CLI

Evaluate AMU system

Visualize evaluation process as tables

API

Evaluate AMU system using CLEME-dependent

Adapt to Other Languages

Recommended Hyper-parameters

Citation

Update Logs

v1.0 (2023.11.15)

Contact & Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages