Convert CoNLL output of a dependency parser into a latex or graphviz tree.
For example, here is a sample output form the the StanfordNLP:
1 Paul Paul PROPN NNP Number=Sing 3 nsubj _ _
2 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 aux _ _
3 drinking drink VERB VBG Tense=Pres|VerbForm=Part 0 root _ _
4 the the DET DT Definite=Def|PronType=Art 5 det _ _
5 beer beer NOUN NN Number=Sing 3 obj _ _
6 he he PRON PRP Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs 7 nsubj _ _
7 bought buy VERB VBD Mood=Ind|Tense=Past|VerbForm=Fin 5 acl:relcl _ _
8 this this DET DT Number=Sing|PronType=Dem 9 det _ _
9 morning morning NOUN NN Number=Sing 7 obl:tmod _ _
10 . . PUNCT . _ 3 punct _ _
And here are the trees using the Latex mode and the GraphViz mode:
Examples of big trees in French (using outputs from Talismane):
Dependency parsers that uses the CoNLL parser includes:
- StanfordNLP (for multiple languages)
- CoreNLP (for multiple languages)
- Talismane (for French)
- MindTheGap (for French)
- ...
There are two modes: Latex or Graphviz. With the Latex mode, all the sentences will be in a file, each on its own page. The script produces a .tex
file, named according to the -o option, which is compiled if the -c
switch is set (otherwise, just run pdflatex|lualatex <file>.tex
). To activate this mode, you must use the the -l
swith or the -m latex
option:
For example:
python3 dependency2tree.py -l -o <output.tex> -c <input.conll>
or
python3 dependency2tree.py -l -o <output.tex> <input.conll>
pdflatex output.tex # or lualatex
This will produces a output.pdf
file containing your trees. Of course, you will need to install pdflatex
or lualatex
(with your package manager of with texlive).
In the GraphViz mode (the default mode), each sentence is in its own file. If you don't want to compile, you can get graphviz files with:
python3 dependency2tree.py -o <output.gv> <input.conll>
You will get output-001.gv
, output-002.gv
, etc. for each sentence. You can run dot
to get image files (replace svg by the format you want):
dot -Tsvg output-001.gv > output-001.svg
The dot
command comes with the graphviz
program, which can be installed on Ubuntu with the following command:
sudo apt install graphviz
If you want to compile automatically with the -c
switch, just adjust the output file extension to svg
(or png
, etc.) instead of gv
:
python3 dependency2tree.py -o <output.svg> -c <input.conll>
This will get you output-001.svg
, output-002.svg
, etc. You can change the image format (png
, etc.) with -f
option:
python3 dependency2tree.py -o <output.png> -c -f png <input.conll>
Some corpora (as GSD) decompose French amalgams (for example "du" is decomposed to "de le"). The original word is saved within the conll file with a hyphen in the index:
1 Je il PRON ...
2 vais aller VERB ...
3 faire faire VERB ...
4-5 du _ ...
4 de de ADP ...
5 le le DET ...
6 vélo vélo NOUN ...
7 cet ce DET ...
8 après-midi après-midi NOUN ...
9 . . PUNCT ...
Use the --ignore-double-indices
option to ignore these words:
python3 dependency2tree.py -o docs/french.svg -c testing/french.conll --ignore-double-indices
For more information, run:
python3 dependency2tree.py -h