No REST for the wikid
🎃 - generate a SQLite database
and a spaCy KnowledgeBase
from Wikipedia & Wikidata dumps. wikid
was
designed with the use case of named entity linking (NEL) with spaCy in mind.
Note this repository is still in an experimental stage, so the public API
might change at any time.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
spaCy projects documentation.
The following commands are defined by the project. They can be executed using
spacy project run [name]
. Commands are
only re-run if their inputs have changed.
Command | Description |
---|---|
parse |
Parse Wiki dumps. This can take a long time if you're not using the filtered dumps! |
download_model |
Download spaCy language model. |
create_kb |
Creates KB utilizing SQLite database with Wiki content. |
delete_db |
Deletes SQLite database generated in step parse_wiki_dumps with data parsed from Wikidata and Wikipedia dump. |
clean |
Delete all generated artifacts except for SQLite database. |
The following workflows are defined by the project. They can be executed using
spacy project run [name]
and will run
the specified commands in order. Commands are only re-run if their inputs have
changed.
Workflow | Steps |
---|---|
all |
parse → download_model → create_kb |
The following assets are defined by the project. They can be fetched by running
spacy project assets
in the project
directory.
File | Source | Description |
---|---|---|
assets/wikidata_entity_dump.json.bz2 |
URL | Wikidata entity dump. Download can take a long time! |
assets/wikipedia_dump.xml.bz2 |
URL | Wikipedia dump. Download can take a long time! |
assets/wikidata_entity_dump_filtered.json.bz2 |
URL | Filtered Wikidata entity dump for demo purposes (English only). |
assets/wikipedia_dump_filtered.xml.bz2 |
URL | Filtered Wikipedia dump for demo purposes (English only). |