Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
xtuhcy committed Feb 2, 2016
1 parent 6c2a6f5 commit 564b8dc
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 0 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
# [GECCO](https://github.com/xtuhcy/gecco)(易用的轻量化的网络爬虫)
##主要特征
1、简单易用,使用jquery的css selector风格抽取元素
2、支持页面中的异步ajax请求
3、支持页面中的javascript变量抽取
4、利用Redis实现分布式抓取
5、支持下载时UserAgent随机选取
6、支持下载代理服务器随机选取
7、支持结合Spring开发业务逻辑
####初衷
>现在开发应用已经离不开爬虫,网络信息浩如烟海,对互联网的信息加以利用是如今所有应用程序都必须要掌握的技术。了解过现在的一些爬虫软件,python语言编写的爬虫框架[scrapy](https://github.com/scrapy/scrapy)得到了较为广泛的应用。gecco的设计和架构受到了scrapy一些启发,结合java语言的特点,形成了如下软件框架。易用是gecco框架首要目标,只要有一些java开发基础,会写jquery的选择器,就能轻松配置爬虫。
##结构图
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,9 @@ private void pipelineContext(SpiderBeanContext context, String[] pipelineNames)
if(pipelineNames != null && pipelineNames.length > 0) {
List<Pipeline> pipelines = new ArrayList<Pipeline>();
for(String pipelineName : pipelineNames) {
if(StringUtils.isEmpty(pipelineName)) {
continue;
}
Pipeline pipeline = pipelineFactory.getPipeline(pipelineName);
if(pipeline != null) {
pipelines.add(pipeline);
Expand Down

0 comments on commit 564b8dc

Please sign in to comment.