Skip to content

sunyam/Web_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repository is primarily for Web Scraping using Python. The library that I'd be using is BeautifulSoup. To learn the basics, I recommend: http://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/ In this repo, I've written python code for extracting the article from a given webpage! You can provide the url to the function and it will return the main article content from it, removing all the unwanted information like advertisements/links.

As you might know, each website has their own HTML way of enclosing an article, so I'll try to cover as many websites as possible.

Also added script to parse Hinglish data (eg. SantaBanta website).

About

Scraping news websites using BeautifulSoup

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages