Stars
articles
8 repositories
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Module for automatic summarization of text documents and HTML pages.
fast python port of arc90's readability tool, updated to match latest readability.js!
Entity Disambiguation as text extraction (ACL 2022)
reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI.
Plain Russian Language / Понятный (простой) русский язык.