Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
lanbing510 committed May 20, 2015
1 parent 1d788de commit 38a651c
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ Python所写,豆瓣读书的爬虫,方便大家搜罗各种美美书

2 按评分排名依次存储

3 依据不同的主题存储到Excel不同的Sheet,也可方便大家筛选搜罗,比如筛选评价人数>2000的好书
3 存储到Excel中,可方便大家筛选搜罗,比如筛选评价人数>1000的高分书籍;可依据不同的主题存储到Excel不同的Sheet

4 加入了User Agent来模仿浏览器行为进行爬取,防止出现Forbidden403等封锁IP的情况 (更新于 2015-5-20)
4 采用User Agent伪装为浏览器进行爬取,并加入随机延时来更好的模仿浏览器行为,避免爬虫被封(更新于 2015-5-20)

试着小小运行了下,爬了七八万本书,结果在book_list.xlsx中,截图如下:

![Aaron Swartz](https://github.com/lanbing510/DouBanSpider/raw/master/screenshots/douban.jpg)


代码刚写一小时,更多功能有待增加
Expand Down

0 comments on commit 38a651c

Please sign in to comment.