Skip to content

A python library to correctly extract content from HTML pages.

Notifications You must be signed in to change notification settings

yongzhi444/byeHTML

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bye HTML

Boilerpipe should work only with Python 3.6+, therefore this parser will also only work in Python 3.6+.

Instalation

pip install -e .

Usage

#!/usr/bin/env python

from bywHTML import byeHTML

bye = byeHTML("<html> <head>This is a simple text. </head> <body> Body </body> </html>", preprocesshtml="justext")

print bye.get_html()

About

A python library to correctly extract content from HTML pages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%