Skip to content

Commit

Permalink
Improve quickstart and default Wikidata settings (ad-freiburg#675)
Browse files Browse the repository at this point in the history
* For large knowlege bases, the IndexBuilder opens thousands of temporary files at ones. This is not allowed on many linux sytems by default. The workaround for this problem is now documented in the quickstart.md
* The default batch size for the wikidata-settings.json is now 10 millions. This eliminates a common source for out-of-memory problems when building indices for Wikidata.
  • Loading branch information
hannahbast authored May 29, 2022
1 parent 679215d commit a43c1b9
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
7 changes: 5 additions & 2 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,12 @@ settings as explained in the previous paragraph). It takes about 20 hours on an
AMD Ryzen 9 5900X. Note that the only difference is the basename (`wikidata`
instead of `olympics`) and how the input files are piped into the
`IndexBuilderMain` executable (using `bzcat` instead of `xzcat` and two files
instead of one).
instead of one). Also note the `ulimit -Sn 1048576`, which ensures that the
operating system allows a sufficient number of open files (on some systems, the
default is as low as `1024`, and for large datasets, QLever operates with more
temporary files than that).

chmod o+w . && docker run -it --rm -v $QLEVER_HOME/qlever-indices/wikidata:/index --entrypoint bash qlever -c "cd /index && bzcat latest-all.ttl.bz2 latest-lexemes.ttl.bz2 | IndexBuilderMain -F ttl -f - -l -i wikidata -s wikidata.settings.json | tee wikidata.index-log.txt"
chmod o+w . && docker run -it --rm -v $QLEVER_HOME/qlever-indices/wikidata:/index --entrypoint bash qlever -c "cd /index && ulimit -Sn 1048576 && bzcat latest-all.ttl.bz2 latest-lexemes.ttl.bz2 | IndexBuilderMain -F ttl -f - -l -i wikidata -s wikidata.settings.json | tee wikidata.index-log.txt"

## Start the engine

Expand Down
2 changes: 1 addition & 1 deletion examples/wikidata.settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@
"ignore-punctuation": true
},
"ascii-prefixes-only": true,
"num-triples-per-batch" : 50000000
"num-triples-per-batch" : 10000000
}

0 comments on commit a43c1b9

Please sign in to comment.