Morphological Evaluation of NLG

L'AMBRE is a tool to measure the grammatical well-formedness of texts generated by NLG systems. It analyzes the dependency parses of the text using morpho-syntactic rules, and returns a well-formedness score. This tool utilizes the Surface Syntactic Universal Dependency (SUD) project both for extracting rules as well as parsing, and is therefore applicable across languages. See our EMNLP 2021 paper for more details.

Getting Started

Install from PyPI

python -m pip install lambre

Try L'AMBRE

For a given input text file, lambre computes a morpho-syntactic well-formedness score [0-1]. The following method first downloads the parsers and rule sets for the specified language before computing the document-level score. See the output folder (out) for error visualizations.

>>> import lambre
>>> with open("data/txt/ru.txt", "r") as rf:
...     data = rf.readlines()
>>> lambre.score("ru", data)
0.9962

L'AMBRE can also be used from command line. See lambre --help for more options.

lambre ru data/txt/ru.txt

Morpho-syntactic Rules

lambre currently supports two rule sets, chaudhary-etal-2021 (see Chaudhary et al., 2020, 2021) and pratapa-etal-2021 (see Pratapa et al., 2021). The former is the default, but the rule set can be specified using --rule-set option.

Visualization Examples

Along with the overall L'AMBRE score, we write the erroneous sentences to the output folder out/errors. We provide two visualizations, i) plain text (errors.txt), ii) HTML (errors/*.html). For plain text visualization, we use the ipymarkup tool. We use brat and Universal Dependencies for HTML visualizations.

Below is a sample run on 1000 example Hindi sentences from the Samanantar corpus.

>>> import lambre
>>> with open("examples/hi_sents_1k.txt", "r") as rf:
...     data = rf.readlines()
>>> lambre.score("hi", data)
0.8821

A few erroneous sentences from this corpus (as detected by L'AMBRE):

Input sentence: संख्या की स्टाफ प्रशिक्षित में हिंदी/वाले जानने हिंदी
                 (Stenography Hindi in trained persons of No.)
                 
                 तीनवर्षतक अग्रनीत किए| जाने के बाद व्यपगत हुए| आरक्षणों की| संख्या
                 (of after forward No. reservations lapsed carrying for 3 years)

Below, we show the visualizations of word order related errors for the above two sentences. We also generate separate files for agreement and case marking (see examples/ for full HTML outputs).

Parser

We provide SUD parsers trained using Stanza toolkit. See section 4 in our paper for more details.

Supported Languages

We currently support the following languages. lambre automatically downloads the necessary language-specific resources (when available).

Language	Code	Language	Code	Language	Code	Language	Code
Catalan	ca	Spanish	es	Italian	it	Russian	ru
Czech	cs	Estonian	et	Latvian	lv	Slovenian	sl
Danish	da	Persian	fa	Dutch	nl	Swedish	sv
German	de	French	fr	Polish	pl	Ukrainian	uk
Greek	el	Hindi	hi	Portuguese	pt	Urdu	ur
English	en	Indonesian	id	Romanian	ro

To manually download rules or parsers for a given language,

>>> import lambre
>>> lambre.download("ru") # Russian

Reference

If you find this toolkit helpful in your research, consider citing our paper,

@inproceedings{pratapa-etal-2021-evaluating,
    title = "Evaluating the Morphosyntactic Well-formedness of Generated Texts",
    author = "Pratapa, Adithya  and
      Anastasopoulos, Antonios  and
      Rijhwani, Shruti  and
      Chaudhary, Aditi  and
      Mortensen, David R.  and
      Neubig, Graham  and
      Tsvetkov, Yulia",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.570",
    pages = "7131--7150",
}

We also encourage you to cite the original works for the chaudhary-etal-2021 ruleset, Chaudhary et al., 2020 and Chaudhary et al., 2021.

License

L'AMBRE is available under MIT License. The code for training parsers is adapted from stanza, which is available under Apache License, Version 2.0.

Issues

For any issues, questions or requests, please use the Github Issue Tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
examples		examples
extract_rules		extract_rules
parsing		parsing
src/lambre		src/lambre
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Morphological Evaluation of NLG

Getting Started

Install from PyPI

Try L'AMBRE

Morpho-syntactic Rules

Visualization Examples

Parser

Supported Languages

Reference

License

Issues

About

Releases 1

Packages

Contributors 2

Languages

adithya7/lambre

Folders and files

Latest commit

History

Repository files navigation

Morphological Evaluation of NLG

Getting Started

Install from PyPI

Try L'AMBRE

Morpho-syntactic Rules

Visualization Examples

Parser

Supported Languages

Reference

License

Issues

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages