Skip to content
forked from explosion/wikid

Generate a SQLite database from Wikipedia & Wikidata dumps.

License

Notifications You must be signed in to change notification settings

Luisgarchi/wikid

 
 

Repository files navigation

🪐 spaCy Project: wikid

tests spaCy
No REST for the wikid 🎃 - generate a SQLite database and a spaCy KnowledgeBase from Wikipedia & Wikidata dumps. wikid was designed with the use case of named entity linking (NEL) with spaCy in mind.
Note this repository is still in an experimental stage, so the public API might change at any time.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
parse Parse Wiki dumps. This can take a long time if you're not using the filtered dumps!
download_model Download spaCy language model.
create_kb Creates KB utilizing SQLite database with Wiki content.
delete_db Deletes SQLite database generated in step parse_wiki_dumps with data parsed from Wikidata and Wikipedia dump.
clean Delete all generated artifacts except for SQLite database.

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all parsedownload_modelcreate_kb

🗂 Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets/wikidata_entity_dump.json.bz2 URL Wikidata entity dump. Download can take a long time!
assets/wikipedia_dump.xml.bz2 URL Wikipedia dump. Download can take a long time!
assets/wikidata_entity_dump_filtered.json.bz2 URL Filtered Wikidata entity dump for demo purposes (English only).
assets/wikipedia_dump_filtered.xml.bz2 URL Filtered Wikipedia dump for demo purposes (English only).

About

Generate a SQLite database from Wikipedia & Wikidata dumps.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%