A distributed Sina Weibo Search spider base on Scrapy and Redis.
tpeng [email protected]
- put your keywords in items.txt
- scrapy crawl weibosearch -a username=your_weibo_account:your_weibo_pw
- add another spider with scrapy crawl weibosearch -a username=another_weibo_account:another_weibo_pw
- apt-get install python-setuptools
- apt-get install python-dev
- apt-get install libxml2-dev libxslt-dev
- apt-get install mysql-server redis-server
-
install mingw32
-
add mingw32 bin to the path (e.g. c:\mingw32\bin)
-
create distutils.cfg under PYTHON\lib\distutils and add
[build] compiler=mingw32
-
remove -mno-cygwin from PYTHON\lib\distutilscygwinccompiler.py
-
install lxml
-
easy_install pyquery
- easy_install scrapy https://github.com/andymccurdy/redis-py
- sudo apt-get install python-mysqldb
- easy_install redis