Provides functionality to find Frame Evoking Elements in raw text and predict their corresponding frames. Furthermore possible spans of roles can be found and assigned. Models can be trained either on the given files or on any annotated file in a supported format (For more information look at the section formats).
- Based on (and using) pyfn
- Clone repository or download files
- Enter the directory
- Run
pip install -e .
framenet_tools download
acquires all required data and extracts it , optionally--path
can be used to specify a custom path; default is the current directory.
NOTE: After extraction the space occupied amounts up to around 9GB!framenet_tools convert
can now be used to generate the CoNLL datasets
This function is analogous to pyfn and simply propagates the call.framenet_tools train
trains a new model on the training files and saves it,
optionally--use_eval_files
can be specified to train on the evaluation files as well.
NOTE: Training can take a few minutes, depending on the hardware.
For further information run framenet_tools --help
Alternatively conversion.sh provides a also the ability to convert FN data to CoNLL using pyfn. In this case, manually download and extract the FrameNet dataset and adjust the path inside the script.
The following functions both require a pretrained model,
generate using framenet_tools train
as explained above.
- Stages: The System is split into 4 distinct pipeline stages, namely:
- 1 Frame evoking element identification
- 2 Frame identification
- 3 Span identification (WIP)
- 4 Role identification (WIP)
Each stage can individually be trained by calling it e.g. --frameid
.
Also combinations of mutliple stages are possible. This can be done for every option.
NOTE: A usage of evaluate
or predict
requires a previous training of the same stage level!
framenet_tools predict --path [path]
annotates the given raw text file located at--path
and prints the result. Optionally--out_path
can be used to write the results directly to a file. Also a prediction can be limited to a certain stage by specifying it (e.g.--feeid
). NOTE: As the stages build on the previous ones, this option represents a upper bound.framenet_tools evaluate
evaluates the F1-Score of the model on the evaluation files. Here, evaluation can be exclusively limited to a certain stage.
Training automatically logs the loss and accuracy of the train- and devset in TensorBoard format.
tensorboard --logdir=runs
can be used to run TensorBoard and visualize the data.
Currently support formats include:
- Raw text
- SEMEVAL XML: the format of the SEMEVAL 2007 shared task 19 on frame semantic structure extraction
- SEMAFOR CoNLL: the format used by the SEMAFOR parser
NOTE: If the format is not supported, pyfn might be providing a conversion.
If you find this repository helpful, feel free to cite
@software{andre_markard_2020_3993970,
author = {André Markard and
Jan-Christoph Klie},
title = {{FrameNet Tools: A Python library to work with
FrameNet}},
month = aug,
year = 2020,
publisher = {Zenodo},
version = {v0.1.0},
doi = {10.5281/zenodo.3993970},
url = {https://doi.org/10.5281/zenodo.3993970}
}