Scraping douban movie web site with an indicated region to collect and extract the information of each film which is on now, and then sort them by descending order of the film score.
To execute:
set the url in (ex. url="")
You need to install the libraries urllib2, bs4, and chardet if you don't have.
Scraping dytt8 movie web site with an indicated region to collect and extract the information of each film which is on now, and then sort them by descending order of the film score.
To execute:
set the url in (ex. url="")
You need to install the libraries urllib2, bs4, and chardet if you don't have.
在main_2.py中设置url(ex. url="")
main_2.py与spider_2.py基于Python 2.x, spider_3.py基于Python 3.x(待更新) 所用到的库:urllib2, bs4, chardet. 没有请自行下载
在main_2.py中设置url(ex. url="")
main_2.py与spider_2.py基于Python 2.x, spider_3.py基于Python 3.x(待更新) 所用到的库:urllib2, bs4, chardet. 没有请自行下载
##Remark 电影天堂用了一串js来反爬虫,所以不得已用正则把js的函数挑出来再用python处理.
微信公众号能在搜狗上搜索了,所以爬下来也就不是一个难事了,这里抓取的是公众号碉堡的图片链接. 处理好的demo放在coding上了.
##TODO Python 3.x下的douban_movie与dytt8_movie