Skip to content

ilyesBoukraa/web_scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

web_scraping CLI

Used Requests and Beutifulsoup to scrap the website
this website is commonly used to practice scraping it is an easy website to work with
that have just an easy HTML page nothing crazy such as Javascript, login sessions ..etc
I got it from: best-websites-to-practice-your-web-scraping-skills

Outline of what I did in this project:

1. Created a Virtual envirenment.

2. Installed the required packages (check requirements.txt ).

3. Used Requests and Beutifulsoup for scraping.

4. Extracted the html page with the UTF-8 coding (there was a sign that got saved weirdly so I had to..) then grab the desired data namely the name and price of every book in the main page.

5. Used a Dataclass decorator to initialize a ProductData class that we later on used to save the data with it elegantly (should fall into the best practices road).

6. Used Pandas to save the extracted data and to display it using .head() method (check displaying_csv.py ).

7. A library called Click was used to make the CLI.

To run this code:

You just need to write that in the terminal:
python scraper.py <url_link> <the-csv-file-name.csv>

Inspiration:

Special thanks to John Watson Rooney VIDEO that inspired me to do that project (a beginner projects) that gets my hands on some fundementals of web scraping + CLI.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published