Skip to content
View artgur's full-sized avatar

Block or report artgur

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

articles

8 repositories

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Python 3,724 268 Updated Dec 11, 2024

Module for automatic summarization of text documents and HTML pages.

Python 3,532 531 Updated May 16, 2024

Heuristic based boilerplate removal tool

Python 733 81 Updated May 9, 2024

fast python port of arc90's readability tool, updated to match latest readability.js!

Python 2,680 350 Updated Oct 14, 2024

Entity Disambiguation as text extraction (ACL 2022)

Python 177 13 Updated Apr 17, 2022

reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages on the CLI.

Go 285 12 Updated Aug 1, 2024

Plain Russian Language / Понятный (простой) русский язык.

Python 158 25 Updated Mar 11, 2024

An article extractor in Rust

Rust 131 5 Updated Feb 1, 2022