默认以Redis作为存储,可基于Store接口扩展
- Docker
- 常规部署
首先需要确保安装了如下环境:
- Docker
- Docker-Compose
然后在项目根目录执行:
docker-compose up -d
首先配置相应的Go及Redis环境
然后在项目根目录执行:
go build
./proxygool
或者:
go run main.go
- Redis client
- Web url
redis-cli
- 查看爬取的所有代理
SMEMBERS proxypool
- 查看代理的详细信息
HGETALL proxyinfo
- 随机获取一条代理
curl localhost:8888
- 随机获取一条https代理
curl localhost:8888/https
程序将定时检测心跳并删除无效代理
- 常规部署需要将docker置为false
- pages控制不同代理网站的爬取页数
- fetch.proxy控制是否用代理池爬取
docker: true
logger:
filename: proxygool.log
level: 4
server:
host: 0.0.0.0
port: 8888
redis:
network: tcp
host: 127.0.0.1
port: 6379
password:
MaxIdle: 100
MaxActive: 0
IdleTimeout: 5m
testFrequency: 1m
Wait: true
proxyPool: proxypool
proxyInfo: proxyinfo
fetch:
proxy: true
xicidaili:
pages: 5
kuaidaili:
pages: 5
- 除了以上已实现的代理,可以在一分钟内轻松扩展代理接口
func XXX() *model.Request {
req := model.NewRequest()
req.WebName = "xxx"
req.WebURL = "http://www.xxx.cn/index_"
req.TrRegular = ".table tbody tr"
req.Pages = viper.GetInt("xxx.pages")
req.HostIndex = 0
req.PortIndex = 1
req.ProtIndex = 3
req.Trim = true
req.Protocol = func(s string) string {
if s == "no" {
return "http"
}
return "https"
}
return req
}
- 在spider/site中新增函数并配置到spider/run.go
requests = []*model.Request{
site.Xici(),
site.Kuai(),
site.IP3366(),
site.Qiyun(),
//site.PLP(),
//site.PLPSSL(),
site.IP66(),
site.IP89(),
site.XXX(),
}
- 实现store.Store接口
- 调用store.SetCustomStore(s Store)