Skip to content

hz2657/web-scraping-is-fun

Repository files navigation

web-scraping

practice URL:

  1. bookstore: http://books.toscrape.com/
  2. QQ music songs lyrics, user reviews
  3. douban moive information

methods

  • BeautifulSoup: get first request in 'network'

  • Selenium: automates browsers, get all info in 'element'

  • smtplib, email, MIMEMultipart: use python to send email automatically

  • gevent: a coroutine -based Python networking library; Queue(): a data structure to save and extract data (from gevent.queue import Queue)

  • scrapy: Scrapy Engine - 1. Scheduler, 2. Downloader: get data, 3. Spiders: get useful data, 4. Item Pipeline: save data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published