forked from sebastianruder/NLP-progress
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding MLSUM dataset (sebastianruder#529)
* added french MLSUM description * added german MLSUM description * added spanish MLSUM description * added russian MLSUM description * added turkish MLSUM description * update README for MLSUM summarization datasets * update english/summarization CNN DM * update english/summarization CNN DM * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md * Update summarization.md
- Loading branch information
Showing
7 changed files
with
172 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Summarization | ||
|
||
Summarization is the task of producing a shorter version of one or several documents that preserves most of the | ||
input's meaning. | ||
|
||
### Warning: Evaluation Metrics | ||
|
||
For summarization, automatic metrics such as ROUGE and METEOR have serious limitations: | ||
1. They only assess content selection and do not account for other quality aspects, such as fluency, grammaticality, coherence, etc. | ||
2. To assess content selection, they rely mostly on lexical overlap, although an abstractive summary could express they same content as a reference without any lexical overlap. | ||
3. Given the subjectiveness of summarization and the correspondingly low agreement between annotators, the metrics were designed to be used with multiple reference summaries per input. However, recent datasets such as MLSUM provide only a single reference. | ||
|
||
Therefore, tracking progress and claiming state-of-the-art based only on these metrics is questionable. Most papers carry out additional manual comparisons of alternative summaries. Unfortunately, such experiments are difficult to compare across papers. If you have an idea on how to do that, feel free to contribute. | ||
|
||
|
||
### MLSUM | ||
|
||
We present [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/), the first large-scale MultiLingual SUMmarization dataset. | ||
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, [French](../french/summarization.md#mlsum), [German](../german/summarization.md#mlsum), [Spanish](../spanish/summarization.md#mlsum), [Russian](../russian/summarization.md#mlsum), [Turkish](../turkish/summarization.md#mlsum). Together with [English](../english/summarization.md#cnn--daily-mail) newspapers from the popular CNN / Daily Mail dataset, | ||
the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. | ||
We report cross-lingual comparative analyses based on state-of-the-art systems. | ||
These highlight existing biases which motivate the use of a multi-lingual dataset. | ||
|
||
Below results are ranked by chronological order. | ||
|
||
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR | Paper / Source | Code | | ||
| --------------- | :-----: | :-----: | :-----: | :-----: | -------------- | ---- | | ||
| Lead_3 | 28.74 | 9.84 | 19.7 | 12.6 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Pointer-Generator | 31.08 | 10.12 | 23.6 | 14.1 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| M-BERT (Scialom et al., 2020) | 31.59 | 10.61 | 25.1 | 15.1 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Oracle | 47.32 | 25.95 | 37.7 | 24.7 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| MARGE-NEWS (Train All) (Lewis et al., 2020) | - | - | 25.79 | - | [Pre-training via Paraphrasing](https://arxiv.org/abs/2006.15020) | [Official](https://github.com/lucidrains/marge-pytorch) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Summarization | ||
|
||
Summarization is the task of producing a shorter version of one or several documents that preserves most of the | ||
input's meaning. | ||
|
||
### Warning: Evaluation Metrics | ||
|
||
For summarization, automatic metrics such as ROUGE and METEOR have serious limitations: | ||
1. They only assess content selection and do not account for other quality aspects, such as fluency, grammaticality, coherence, etc. | ||
2. To assess content selection, they rely mostly on lexical overlap, although an abstractive summary could express they same content as a reference without any lexical overlap. | ||
3. Given the subjectiveness of summarization and the correspondingly low agreement between annotators, the metrics were designed to be used with multiple reference summaries per input. However, recent datasets such as MLSUM provide only a single reference. | ||
|
||
Therefore, tracking progress and claiming state-of-the-art based only on these metrics is questionable. Most papers carry out additional manual comparisons of alternative summaries. Unfortunately, such experiments are difficult to compare across papers. If you have an idea on how to do that, feel free to contribute. | ||
|
||
|
||
### MLSUM | ||
|
||
We present [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/), the first large-scale MultiLingual SUMmarization dataset. | ||
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, [French](../french/summarization.md#mlsum), [German](../german/summarization.md#mlsum), [Spanish](../spanish/summarization.md#mlsum), [Russian](../russian/summarization.md#mlsum), [Turkish](../turkish/summarization.md#mlsum). Together with [English](../english/summarization.md#cnn--daily-mail) newspapers from the popular CNN / Daily Mail dataset, | ||
the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. | ||
We report cross-lingual comparative analyses based on state-of-the-art systems. | ||
These highlight existing biases which motivate the use of a multi-lingual dataset. | ||
|
||
Below results are ranked by chronological order. | ||
|
||
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR | Paper / Source | Code | | ||
| --------------- | :-----: | :-----: | :-----: | :-----: | -------------- | ---- | | ||
| Lead_3 | 38.57 | 25.66 | 33.1 | 23.9 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Pointer-Generator | 39.8 | 25.96 | 35.1 | 24.4 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| M-BERT (Scialom et al., 2020) | 44.78 | 30.75 | 42 | 26.5 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Oracle | 57.23 | 39.72 | 52.3 | 31.7 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| MARGE-NEWS (Train All) (Lewis et al., 2020) | - | - | 42.77 | - | [Pre-training via Paraphrasing](https://arxiv.org/abs/2006.15020) | [Official](https://github.com/lucidrains/marge-pytorch) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Summarization | ||
|
||
Summarization is the task of producing a shorter version of one or several documents that preserves most of the | ||
input's meaning. | ||
|
||
### Warning: Evaluation Metrics | ||
|
||
For summarization, automatic metrics such as ROUGE and METEOR have serious limitations: | ||
1. They only assess content selection and do not account for other quality aspects, such as fluency, grammaticality, coherence, etc. | ||
2. To assess content selection, they rely mostly on lexical overlap, although an abstractive summary could express they same content as a reference without any lexical overlap. | ||
3. Given the subjectiveness of summarization and the correspondingly low agreement between annotators, the metrics were designed to be used with multiple reference summaries per input. However, recent datasets such as MLSUM provide only a single reference. | ||
|
||
Therefore, tracking progress and claiming state-of-the-art based only on these metrics is questionable. Most papers carry out additional manual comparisons of alternative summaries. Unfortunately, such experiments are difficult to compare across papers. If you have an idea on how to do that, feel free to contribute. | ||
|
||
|
||
### MLSUM | ||
|
||
We present [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/), the first large-scale MultiLingual SUMmarization dataset. | ||
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, [French](../french/summarization.md#mlsum), [German](../german/summarization.md#mlsum), [Spanish](../spanish/summarization.md#mlsum), [Russian](../russian/summarization.md#mlsum), [Turkish](../turkish/summarization.md#mlsum). Together with [English](../english/summarization.md#cnn--daily-mail) newspapers from the popular CNN / Daily Mail dataset, | ||
the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. | ||
We report cross-lingual comparative analyses based on state-of-the-art systems. | ||
These highlight existing biases which motivate the use of a multi-lingual dataset. | ||
|
||
Below results are ranked by chronological order. | ||
|
||
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR | Paper / Source | Code | | ||
| --------------- | :-----: | :-----: | :-----: | :-----: | -------------- | ---- | | ||
| Lead_3 | 9.29 | 1.54 | 5.9 | 5.8 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Pointer-Generator | 9.19 | 1.18 | 5.7 | 5.7 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| M-BERT (Scialom et al., 2020) | 10.94 | 1.75 | 9.5 | 6.8 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Oracle | 36.14 | 19.88 | 29.8 | 20.3 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| MARGE-NEWS (Train All) (Lewis et al., 2020) | - | - | 11.03 | - | [Pre-training via Paraphrasing](https://arxiv.org/abs/2006.15020) | [Official](https://github.com/lucidrains/marge-pytorch) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Summarization | ||
|
||
Summarization is the task of producing a shorter version of one or several documents that preserves most of the | ||
input's meaning. | ||
|
||
### Warning: Evaluation Metrics | ||
|
||
For summarization, automatic metrics such as ROUGE and METEOR have serious limitations: | ||
1. They only assess content selection and do not account for other quality aspects, such as fluency, grammaticality, coherence, etc. | ||
2. To assess content selection, they rely mostly on lexical overlap, although an abstractive summary could express they same content as a reference without any lexical overlap. | ||
3. Given the subjectiveness of summarization and the correspondingly low agreement between annotators, the metrics were designed to be used with multiple reference summaries per input. However, recent datasets such as MLSUM provide only a single reference. | ||
|
||
Therefore, tracking progress and claiming state-of-the-art based only on these metrics is questionable. Most papers carry out additional manual comparisons of alternative summaries. Unfortunately, such experiments are difficult to compare across papers. If you have an idea on how to do that, feel free to contribute. | ||
|
||
|
||
### MLSUM | ||
|
||
We present [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/), the first large-scale MultiLingual SUMmarization dataset. | ||
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, [French](../french/summarization.md#mlsum), [German](../german/summarization.md#mlsum), [Spanish](../spanish/summarization.md#mlsum), [Russian](../russian/summarization.md#mlsum), [Turkish](../turkish/summarization.md#mlsum). Together with [English](../english/summarization.md#cnn--daily-mail) newspapers from the popular CNN / Daily Mail dataset, | ||
the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. | ||
We report cross-lingual comparative analyses based on state-of-the-art systems. | ||
These highlight existing biases which motivate the use of a multi-lingual dataset. | ||
|
||
Below results are ranked by chronological order. | ||
|
||
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR | Paper / Source | Code | | ||
| --------------- | :-----: | :-----: | :-----: | :-----: | -------------- | ---- | | ||
| Lead_3 | 21.87 | 6.25 | 13.7 | 10.3 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Pointer-Generator | 24.63 | 6.54 | 17.7 | 13.2 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| M-BERT (Scialom et al., 2020) | 25.58 | 8.61 | 20.4 | 14.9 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Oracle | 45.23 | 26.21 | 35.8 | 26.5 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| MARGE-NEWS (Train All) (Lewis et al., 2020) | - | - | 22.72 | - | [Pre-training via Paraphrasing](https://arxiv.org/abs/2006.15020) | [Official](https://github.com/lucidrains/marge-pytorch) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Summarization | ||
|
||
Summarization is the task of producing a shorter version of one or several documents that preserves most of the | ||
input's meaning. | ||
|
||
### Warning: Evaluation Metrics | ||
|
||
For summarization, automatic metrics such as ROUGE and METEOR have serious limitations: | ||
1. They only assess content selection and do not account for other quality aspects, such as fluency, grammaticality, coherence, etc. | ||
2. To assess content selection, they rely mostly on lexical overlap, although an abstractive summary could express they same content as a reference without any lexical overlap. | ||
3. Given the subjectiveness of summarization and the correspondingly low agreement between annotators, the metrics were designed to be used with multiple reference summaries per input. However, recent datasets such as MLSUM provide only a single reference. | ||
|
||
Therefore, tracking progress and claiming state-of-the-art based only on these metrics is questionable. Most papers carry out additional manual comparisons of alternative summaries. Unfortunately, such experiments are difficult to compare across papers. If you have an idea on how to do that, feel free to contribute. | ||
|
||
|
||
### MLSUM | ||
|
||
We present [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/), the first large-scale MultiLingual SUMmarization dataset. | ||
Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, [French](../french/summarization.md#mlsum), [German](../german/summarization.md#mlsum), [Spanish](../spanish/summarization.md#mlsum), [Russian](../russian/summarization.md#mlsum), [Turkish](../turkish/summarization.md#mlsum). Together with [English](../english/summarization.md#cnn--daily-mail) newspapers from the popular CNN / Daily Mail dataset, | ||
the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. | ||
We report cross-lingual comparative analyses based on state-of-the-art systems. | ||
These highlight existing biases which motivate the use of a multi-lingual dataset. | ||
|
||
Below results are ranked by chronological order. | ||
|
||
| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR | Paper / Source | Code | | ||
| --------------- | :-----: | :-----: | :-----: | :-----: | -------------- | ---- | | ||
| Lead_3 | 34.79 | 20.0 | 28.9 | 20.2 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Pointer-Generator | 36.9 | 21.77 | 32.6 | 19.8 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| M-BERT (Scialom et al., 2020) | 36.63 | 20.15 | 32.9 | 26.3 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| Oracle | 50.61 | 33.55 | 45.8 | 26.4 | [MLSUM](https://www.aclweb.org/anthology/2020.emnlp-main.647/) | [Official](https://github.com/recitalAI/MLSUM) | | ||
| MARGE-NEWS (Train All) (Lewis et al., 2020) | - | - | 35.90 | - | [Pre-training via Paraphrasing](https://arxiv.org/abs/2006.15020) | [Official](https://github.com/lucidrains/marge-pytorch) | |