Sprints Web Scrapping Lab

Prerequisites

Download Python Interpreter 3+ https://www.python.org/downloads/
1. for easy Direct download Link : python-3.11.5-amd64.exe
Download Integrated Development Environments “IDE” PyCharm Community Edition main page https://www.jetbrains.com/pycharm/download/ (the bottom black section with is free)
1. for easy Direct download Link : pycharm
Create an account on github - sign up on github if you don't have already :

Main Site : https://github.com/
Current Direct Link for signup
download git locally on your laptop as well: Download Link

Web scrapping using Scrapy and Beautiful Soup

Scrapy:

Scrapy is an open-source web crawling framework for Python. It facilitates the extraction of data from websites and supports robust, efficient, and flexible scraping. With built-in features like middleware and pipelines, Scrapy provides a comprehensive solution for web scraping tasks.

Scrapy Installation Steps

pip install scrapy
scrapy startproject myscrapyproject
cd myscrapyproject
scrapy genspider myspider https://en.wikipedia.org/wiki/Python_(programming_language)
scrapy crawl myspider

Scrapy export to different file formats

scrapy crawl myspider -o output.json
scrapy crawl myspider -o output.csv
scrapy crawl myspider -o output.xml

Scrapy Shell

scrapy shell https://en.wikipedia.org/wiki/Python_(programming_language)

>>> response.css('title::text').get()
>>> response.css('#firstHeading > span::text').get()
>>> response.css('#firstHeading').get()
>>> response.css('div#mw-content-text > div.mw-content-ltr.mw-parser-output > p:nth-child(6)').get()
>>> response.css('div#mw-content-text > div.mw-content-ltr.mw-parser-output > p').getall()
>>> response.css('div#mw-content-text > div.mw-content-ltr.mw-parser-output > p').getall()[4]
>>> response.css('div#mw-content-text > div.mw-content-ltr.mw-parser-output > p').getall()[4].strip().replace('\n', '')
>>> response.css('div#mw-content-text > div.mw-content-ltr.mw-parser-output > p').getall()[4].strip().replace('\n', '')

Beautiful Soup:

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It provides Pythonic idioms for iterating, searching, and modifying the parse tree. Beautiful Soup transforms complex HTML documents into a tree of Python objects, simplifying web scraping tasks by offering intuitive methods to navigate and search the parsed content.

Beautiful Installation Steps

pip install requests beautifulsoup4

Install requirments file for python modules/libraries

pip install -r requirements.txt

Git Commands

initial push

git init
git remote add origin https://github.com/ahmedredahussien/sprints-webscrapping.git
git add .
git commit -m "Initial commit"
git pull origin master --allow-unrelated-histories
> Normal first time push : 
git push -u origin master

OnGoing changes

git checkout -b my-feature
> Optional in case that its new file:
 git add README.md
git commit README.md -m "add git steps to feature branch"" 
> Normal commit push after 1st time :  
git push origin my-feature

After meging with master

git checkout master
git merge my-feature
git push origin master
> Normal delete :
git branch -d feature/my-feature
> Force delete :
git branch -D feature/my-feature

Force push with overwrite :

git push -u --force origin master

Cloning an exiting repository to your local machine

git clone https://github.com/ahmedredahussien/WebScraping.git WebScraping

Force overwrite my local changes from remote version :

git reset --hard origin/master

Direct Change to Master

changed on server

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.idea		.idea
myscrapyproject		myscrapyproject
webscraping		webscraping
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sprints Web Scrapping Lab

Prerequisites

Web scrapping using Scrapy and Beautiful Soup

Scrapy:

Scrapy Installation Steps

Scrapy export to different file formats

Scrapy Shell

Beautiful Soup:

Beautiful Installation Steps

Install requirments file for python modules/libraries

Git Commands

initial push

OnGoing changes

After meging with master

Force push with overwrite :

Cloning an exiting repository to your local machine

Force overwrite my local changes from remote version :

About

Releases

Packages

Languages

ahmedredahussien/WebScraping

Folders and files

Latest commit

History

Repository files navigation

Sprints Web Scrapping Lab

Prerequisites

Web scrapping using Scrapy and Beautiful Soup

Scrapy:

Scrapy Installation Steps

Scrapy export to different file formats

Scrapy Shell

Beautiful Soup:

Beautiful Installation Steps

Install requirments file for python modules/libraries

Git Commands

initial push

OnGoing changes

After meging with master

Force push with overwrite :

Cloning an exiting repository to your local machine

Force overwrite my local changes from remote version :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages