Skip to content
#

wiki2plaintext

Here is 1 public repository matching this topic...

A tool to extract plain (unformatted) multilingual text, redirects, links and categories from wikipedia backups (dumps). Designed to prepare clean training data for AI training / Machine Learning software.

  • Updated Nov 11, 2023
  • Python

Improve this page

Add a description, image, and links to the wiki2plaintext topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the wiki2plaintext topic, visit your repo's landing page and select "manage topics."

Learn more