GitHub - sunyam/Web_Scraper: Scraping news websites using BeautifulSoup

This repository is primarily for Web Scraping using Python. The library that I'd be using is BeautifulSoup. To learn the basics, I recommend: http://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/ In this repo, I've written python code for extracting the article from a given webpage! You can provide the url to the function and it will return the main article content from it, removing all the unwanted information like advertisements/links.

As you might know, each website has their own HTML way of enclosing an article, so I'll try to cover as many websites as possible.

Also added script to parse Hinglish data (eg. SantaBanta website).

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
santabanta_data		santabanta_data
Hinglish_Scraper.py		Hinglish_Scraper.py
News_Scraper.py		News_Scraper.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

sunyam/Web_Scraper

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages