practice URL:
- bookstore: http://books.toscrape.com/
- QQ music songs lyrics, user reviews
- douban moive information
-
BeautifulSoup: get first request in 'network'
-
Selenium: automates browsers, get all info in 'element'
-
smtplib, email, MIMEMultipart: use python to send email automatically
-
gevent: a coroutine -based Python networking library; Queue(): a data structure to save and extract data (from gevent.queue import Queue)
-
scrapy: Scrapy Engine - 1. Scheduler, 2. Downloader: get data, 3. Spiders: get useful data, 4. Item Pipeline: save data