pmb2tsv

Tanja: produce /home/tania/Dropbox/pmb2tag-frames/pmb-3.0.0-en-gold-{p31,p32}.tsv -b

mv /home/tania/Dropbox/pmb2tag-frames/pmb-3.0.0-en* /home/tania/Dropbox/pmb2tag-frames/data rm /home/tania/Dropbox/pmb2tag-frames/data/*.toknum rm /home/tania/Dropbox/pmb2tag-frames/data/*.const rm /home/tania/Dropbox/pmb2tag-frames/data/*.lemma rm /home/tania/Dropbox/pmb2tag-frames/data/*.pmbdep rm /home/tania/Dropbox/pmb2tag-frames/data/*.roles rm /home/tania/Dropbox/pmb2tag-frames/data/*.sem rm /home/tania/Dropbox/pmb2tag-frames/data/*.super rm /home/tania/Dropbox/pmb2tag-frames/data/*.wordnet rm /home/tania/Dropbox/pmb2tag-frames/data/*.token

pmb2tsv

pmb2tsv is a collection of scripts to convert data from the Parallel Meaning Bank (PMB) into column-based files including CCG supertags, dependency structure, constituent structure, semantic tags, and semantic roles.

The primary target audience is people wanting to do semantic role labeling (SRL) experiments on the PMB.

Note: pmb2tsv is experimental and some of its output may be erroneous.

Input Data

Please download the PMB 3.0.0 and extract the directory pmb-3.0.0 into the root directory of this repository.

Software Dependencies

Scripts to convert the files are mostly found in this repository; however, the following software needs to be present on the system:

Python 3 – the python3 executable should be on your $PATH.
Produce – the produce executable should be on your $PATH.
SWI-Prolog 7 or higher – the swipl executable should be on your $PATH.
GNU Parallel – the parallel executable should be on your $PATH.

Conversion

Now use the produce command to convert the desired portions of the PMB to TSV files. For example, to get all gold sentences from PMB parts 00 and 01, run:

produce pmb-3.0.0-{en,de,it,nl}-gold-{p00,p01}.tsv

This example will generate 8 TSV files, one per language and part. They contain the converted sentences, separated by empty lines, one token per line with the following tab-separated columns:

Token number within sentence
Token form
PMB semantic tag
Symbol (English lemma)
Dependency head token number or 0 if root
CCG supertag
CCG constituent structure

For every (verbal) frame in the sentence, there is an additional column that marks each token as being the head of the predicate (in which case it contains the string V), as being the head of the role filler (in which case it contains a VerbNet Role such as Agent or Patient), or as neither (in which case it is O).

Warning: for a small number of CCG derivations, especially some that are not fully corrected, dependency and role extraction will fail. The corresponding columns will be empty/missing. In extremely rare cases a dependency non-tree (a cyclic graph) may be extracted.

For details on the conversion from CCG derivations to dependency trees, see

Kilian Evang (2020): Configurable Dependency Tree Extraction from CCG
Derivations. Proceedings of the Universal Dependencies Workshop.

To reproduce the experiments from that paper, run:

produce pmb-3.0.0-{en,de,it,nl}-gold-{p00,p01}.eval

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
.gitignore		.gitignore
README.md		README.md
anno.pl		anno.pl
blocks.py		blocks.py
cac.pl		cac.pl
cac.py		cac.py
cac_renumber.py		cac_renumber.py
cat.pl		cat.pl
clf.py		clf.py
clf2roles.py		clf2roles.py
constants.py		constants.py
drs.py		drs.py
eval.py		eval.py
fix_punct.py		fix_punct.py
parse2const.py		parse2const.py
parse2dep.pl		parse2dep.pl
parse2lemma.py		parse2lemma.py
parse2sem.py		parse2sem.py
parse2super.py		parse2super.py
parse2token.py		parse2token.py
parse2toknum.py		parse2toknum.py
parse2wordnet.py		parse2wordnet.py
produce.ini		produce.ini
slashes.pl		slashes.pl
tatoeba-dev75-gosse.drs.clf		tatoeba-dev75-gosse.drs.clf
tatoeba-dev75-gosse.parse.tags		tatoeba-dev75-gosse.parse.tags
tatoeba-dev75-gosse.roles		tatoeba-dev75-gosse.roles
tatoeba-dev75.drs.clf		tatoeba-dev75.drs.clf
tatoeba-dev75.parse.tags		tatoeba-dev75.parse.tags
util.pl		util.pl
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pmb2tsv

Input Data

Software Dependencies

Conversion

About

Releases

Packages

Languages

TaniaBladier/pmb2tsv

Folders and files

Latest commit

History

Repository files navigation

pmb2tsv

Input Data

Software Dependencies

Conversion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages