Skip to content

Latest commit

 

History

History
 
 

web-scraping

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

WebScraping

Web scraping can be quite useful to gather data that is not avaialble through an API. Here, some sample code is provided for Beautiful Soup, a web scraping library that is easy to use.

What is it?

  1. link_web.py: script that uses Beautiful Soup and NetworkX to create a graph representing the links between web pages, starting from a given page.
  2. preprocessing: Python script that scrapes a web page containing FAQs and printing them in JSONL format.