Dataset definitions extracted from www.data.gov.cz, used to load metadata by the web-application.
Definition of evaluation tasks.
User evaluation data are stored here.
Used to archive results for particular evaluations.
Graph produced for directories in evaluation-reports-archive
.
Contains input files almost as provided by the external source.
wikidata-cs.jsonl
- Czech titles of wikidata entities.wikidata-cs-en.jsonl
- English titles of wikidata entities.wikidata-hierarchy.jsonl
- File with all instanceof and subclassof edges.2020.04.20-www.data.gov.cz.trig
- Dump with all relevant datasets metadata.
Mapping from datasets to wikidata entities, used by web-application.
Content is copy of something in working
directory.
Similarities of datasets by given method, used for user-based evaluation.
Computed similarity matrices, provided by external source.
Files required to run udpipe.
Temporary working directory, can be deleted to free out some space.
Files with information about datasets as extracted from
./input/2020.04.20-www.data.gov.cz.trig
.
Files from wikidata, based on data downloaded by download-wikidata-remote-content.sh
.
See run_prepare_texts
script to get details about different
versions.
wikidata-cs.v1.jsonl
wikidata-cs.v2.jsonl
wikidata-cs.v3.jsonl
Following files are used in the SISAP demo, they are extraction of used labels that are loaded to the server. While for us the labels are in cs we need en version for the conference.
wikidata-labels-cs.jsonl
wikidata-labels-en.jsonl
File generated only once, store mapping from IRI to dataset file name. Reason for this file is that IRI can not be used as a file name. This file is not generated as we need it to be constant.