HTTP军刀(呆毛王), Swoole人性化组件库
之PHP高性能HTTP客户端, 基于Swoole原生协程, 支持多种风格操作, 底层提供高性能解决方案, 让开发者专注于功能开发, 从传统同步阻塞且配置繁琐的Curl中解放.
- 基于Swoole协程Client开发
- 人性化使用风格, ajax.js/axios.js/requests.py用户福音, 同时支持PSR风格操作
- 浏览器级别完备的Cookie管理机制, 完美适配爬虫/API代理应用
- 请求/响应/异常拦截器
- 多请求并发, 并发重定向优化, 自动化复用长连接
- 响应报文自动编码转换
- HTTPS连接, CA证书自动化支持
- HTTP/Socks5 Proxy支持
- 重定向控制, 自动化长连接复用
- 自动化 编码请求/解析响应 数据
- 毫秒超时定时器
- 超大文件上传, 断点重传
- WebSocket连接
- 随机UA生成器
最好的安装方法是通过 Composer 包管理器 :
composer require swlib/saber:dev-master
- PHP7 or later
- Swoole 2.1.2 or later
Swoole底层实现协程调度, 业务层无需感知, 开发者可以无感知的用同步的代码编写方式达到异步IO的效果和超高性能,避免了传统异步回调所带来的离散的代码逻辑和陷入多层回调中导致代码无法维护.
需要在onRequet
, onReceive
, onConnect
等事件回调函数中使用, 或是使用go关键字包裹 (swoole.use_shortname
默认开启).
go(function () {
echo Saber::get('http://httpbin.org/get');
})
数据自动打包: 传入的data会自动转换成content-type所指定的类型格式
默认为
x-www-form-urlencoded
, 也支持json
等其它格式
Saber::get('http://httpbin.org/get');
Saber::delete('http://httpbin.org/delete');
Saber::post('http://httpbin.org/post', ['foo' => 'bar']);
Saber::put('http://httpbin.org/put', ['foo' => 'bar']);
Saber::patch('http://httpbin.org/patch', ['foo' => 'bar']);
适用API代理服务
$saber = Saber::create([
'base_uri' => 'http://httpbin.org',
'headers' => [
'Accept-Language' => 'en,zh-CN;q=0.9,zh;q=0.8',
'Content-Type' => ContentType::JSON,
'DNT' => '1',
'User-Agent' => null
]
]);
echo $saber->get('/get');
echo $saber->delete('/delete');
echo $saber->post('/post', ['foo' => 'bar']);
echo $saber->patch('/patch', ['foo' => 'bar']);
echo $saber->put('/put', ['foo' => 'bar']);
Session会自动保存cookie信息, 其实现是浏览器级别完备的
$session = Saber::session([
'base_uri' => 'http://httpbin.org',
'redirect' => 0
]);
$session->get('/cookies/set?foo=bar&k=v&apple=banana');
$session->get('/cookies/delete?k');
echo $session->get('/cookies')->body;
注意: 此处使用了并发重定向优化方案, 多个重定向总是依旧并发的而不会退化为队列的单个请求
$responses = Saber::requests([
['uri' => 'http://github.com/'],
['uri' => 'http://github.com/'],
['uri' => 'https://github.com/']
]);
echo "multi-requests [ {$responses->success_num} ok, {$responses->error_num} error ]:\n" ."consuming-time: {$responses->time}s\n";
// multi-requests [ 3 ok, 0 error ]:
// consuming-time: 0.79090881347656s
// 别名机制可以省略参数书写参数名
$saber = Saber::create(['base_uri' => 'http://httpbin.org']);
echo $saber->requests([
['get','/get'],
['post','/post'],
['patch','/patch'],
['put','/put'],
['delete','/delete']
]);
目前支持json
,xml
,html
,url-query
四种格式的数据快速解析
[$json, $xml, $html] = Saber::list([
'uri' => [
'http://httpbin.org/get',
'http://www.w3school.com.cn/example/xmle/note.xml',
'http://httpbin.org/html'
]
]);
var_dump($json->getParsedJson());
var_dump($json->getParsedJsonObject());
var_dump($xml->getParsedXml());
var_dump($html->getParsedHtml()->getElementsByTagName('h1')->item(0)->textContent);
支持HTTP和SOCKS5代理
$uri = 'http://myip.ipip.net/';
echo Saber::get($uri, ['proxy' => 'http://127.0.0.1:1087'])->body;
echo Saber::get($uri, ['proxy' => 'socks5://127.0.0.1:1086'])->body;
底层自动协程调度, 可支持异步发送超大文件, 断点续传
同时上传三个文件(三种参数风格
string
|array
|object
)
$file1 = __DIR__ . '/black.png';
$file2 = [
'path' => __DIR__ . '/black.png',
'name' => 'white.png',
'type' => ContentType::$Map['png'],
'offset' => null, //re-upload from break
'size' => null //upload a part of the file
];
$file3 = new SwUploadFile(
__DIR__ . '/black.png',
'white.png',
ContentType::$Map['png']
);
echo Saber::post('http://httpbin.org/post', null, [
'files' => [
'image1' => $file1,
'image2' => $file2,
'image3' => $file3
]
]
);
$response = Saber::psr()
->withMethod('POST')
->withUri(new Uri('http://httpbin.org/post?foo=bar'))
->withQueryParams(['foo' => 'option is higher-level than uri'])
->withHeader('content-type', ContentType::JSON)
->withBody(new BufferStream(json_encode(['foo' => 'bar'])))
->exec()->recv();
echo $response->getBody();
可以通过websocketFrame数据帧的__toString方法直接打印返回数据字符串
$websocket = Saber::websocket('ws://127.0.0.1:9999');
while (true) {
echo $websocket->recv(1) . "\n";
$websocket->push("hello");
co::sleep(1);
}
测试机器为最低配MacBookPro, 请求服务器为本地echo服务器
0.9秒完成6666个请求, 成功率100%.
co::set(['max_coroutine' => 8191]);
go(function () {
$requests = [];
for ($i = 6666; $i--;) {
$requests[] = ['uri' => 'http://127.0.0.1'];
}
$res = Saber::requests($requests);
echo "use {$res->time}s\n";
echo "success: $res->success_num, error: $res->error_num";
});
// on MacOS
// use 0.91531705856323s
// success: 6666, error: 0
在实际项目中, 经常会存在使用URL列表来配置请求的情况, 因此提供了list方法来方便使用:
echo Saber::list([
'uri' => [
'http://www.qq.com/',
'https://www.baidu.com/',
'https://www.swoole.com/',
'http://httpbin.org/'
]
]);
在实际爬虫项目中, 我们往往要限制单次并发请求数量以防被服务器防火墙屏蔽, 而一个max_co
参数就可以轻松地解决这个问题, max_co
会将请求根据上限量分批将请求压入队列并执行收包.
// max_co is the max number of concurrency request once, it's very useful to prevent server-waf limit.
$requests = array_fill(0, 10, ['uri' => 'http://www.qq.com/']);
echo Saber::requests($requests, ['max_co' => 5])->time."\n";
echo Saber::requests($requests, ['max_co' => 1])->time."\n";
|
符号分割多种可选值
key | type | introduction | example | remark |
---|---|---|---|---|
protocol_version | string | HTTP协议版本 | 1.1 | HTTP2还在规划中 |
base_uri | string | 基础路径 | http://httpbin.org |
将会与uri按照rfc3986合并 |
uri | string | 资源标识符 | http://httpbin.org/get | /get | get |
可以使用绝对路径和相对路径 |
method | string | 请求方法 | get | post | head | patch | put | delete |
底层自动转换为大写 |
headers | array | 请求报头 | ['DNT' => '1'] | ['accept' => ['text/html'], ['application/xml']] |
字段名不区分大小写, 但会保留设定时的原始大小写规则, 底层每个字段值会根据PSR-7自动分割为数组 |
cookies | array |string |
['foo '=> 'bar'] | 'foo=bar; foz=baz' |
底层自动转化为Cookies对象, 并设置其domain为当前的uri, 具有浏览器级别的完备属性. | |
useragent | string | 用户代理 | 默认为macos平台的chrome | |
redirect | int | 最大重定向次数 | 5 | 默认为3, 为0时不重定向. |
keep_alive | bool | 是否保持连接 | true | false |
默认为true, 重定向时会自动复用连接 |
content_type | string | 发送的内容编码类型 | text/plain | Swlib\Http\ContentType::JSON |
默认为application/x-www-form-urlencoded |
data | array | string |
发送的数据 | 'foo=bar&dog=cat' | ['foo' => 'bar'] |
会根据content_type自动编码数据 |
before | callable | array |
请求前拦截器 | function(Request $request){} |
具体参考拦截器一节 |
after | callable | array |
响应后拦截器 | function(Response $response){} |
具体参考拦截器一节 |
timeout | float | 超时时间 | 0.5 | 默认5s, 支持毫秒级超时 |
proxy | string | 代理 | http://127.0.0.1:1087 | socks5://127.0.0.1:1087 |
支持http和socks5 |
ssl | int | 是否开启ssl连接 | 0=关闭 1=开启 2=自动 |
默认自动 |
cafile | string | ca文件 | __DIR__ . '/cacert.pem' |
默认自带 |
ssl_verify_peer | bool | 验证服务器端证书 | false | true |
默认关闭 |
ssl_allow_self_signed | bool | 允许自签名证书 | true | false |
默认允许 |
exception_report | int | 异常报告级别 | HttpExceptionMask::E_ALL | 默认汇报所有异常 |
exception_handle | callable|array | 异常自定义处理函数 | function(Exception $e){} |
函数返回true时可忽略错误 |
为了使用方便与容错, 配置项的键值具有别名机制, 建议尽量使用本名:
key | alias |
---|---|
method | 0 |
uri | 1 | url |
data | 2 | body |
base_uri | base_url |
after | callback |
content_type | content-type |
cookies | cookie |
headers | header |
redirect | follow |
form_data | query |
useragent | ua |
拦截器是Saber的一个非常强大的特性, 它可以让你非常方便地处理各种事情, 比如打印dev日志:
Saber::get('http://twosee.cn/', [
'before' => function (Saber\Request $request) {
$uri = $request->getUri();
echo "log: request $uri now...\n";
},
'after' => function (Saber\Response $response) {
if ($response->success) {
echo "log: success!\n";
} else {
echo "log: failed\n";
}
echo "use {$response->time}s";
}
]);
// log: request http://twosee.cn/ now...
// log: success!
// use 0.52036285400391s
甚至连异常自定义处理函数
,会话
都是通过拦截器来实现的.
拦截器可以有多个, 会依照注册顺序执行, 并且你可以为拦截器命名, 只需要使用数组包裹并指定key值, 如果你要删除这个拦截器, 给它覆盖一个null值即可.
[
'after' => [
'interceptor_new' => function(){},
'interceptor_old' => null
]
]
Cookie的实现是浏览器级别完备的, 它具体参考了Chrome浏览器的实现, 并遵循其相关规则.
Cookies是一堆Cookie的集合, 而每个Cookie具有以下属性: name
,value
,expires
,path
,session
,secure
,httponly
,hostonly
.
并且Cookies类支持多种格式互转, 如foo=bar; apple=banana
,Set-Cookie: logged_in=no; domain=.github.com; path=/; expires=Tue, 06 Apr 2038 00:00:00 -0000; secure; HttpOnly
,['foo'=>'bar']
等格式转到Cookie类, 或是Cookie类到该几种格式的序列化.
Cookie也支持域名和时限校验, 不会丢失任何信息, 如domain是github.com
cookie, 不会出现在help.github.com
, 除非domain不是hostonly的(.github.com
通配).
如果是session-cookie(没有过期时间,浏览器关闭则过期的), expires属性会设置为当前时间, 你可以通过拦截器来对其设置具体的时间.
通过读取Cookies的raw属性, 可以轻松地将其持久化到数据库中, 非常适合登录类爬虫应用.
更多详情具体请参考Swlib/Http库文档和例子.
Saber遵循将业务与错误分离的守则, 当请求任意环节失败时, 默认都将会抛出异常.
强大的是, Saber的异常处理也是多样化的, 且和PHP的原生的异常处理一样完善.
异常的命名空间位于Swlib\Http\Exception
Exception | Intro | scene |
---|---|---|
RequestException | 请求失败 | 请求配置错误 |
ConnectException | 连接失败 | 如无网络连接, DNS查询失败, 超时等, errno的值等于Linux errno。可使用socket_strerror将错误码转为错误信息。 |
TooManyRedirectsException | 重定向次数超限 | 重定向的次数超过了设定的限制, 抛出的异常将会打印重定向追踪信息 |
ClientException | 客户端异常 | 服务器返回了4xx错误码 |
ServerException | 服务器异常 | 服务器返回了5xx错误码 |
BadResponseException | 未知的获取响应失败 | 服务器无响应或返回了无法识别的错误码 |
除一般异常方法外, 所有HTTP异常类还拥有以下方法 :
Method | Intro |
---|---|
getRequest | 获取请求实例 |
hasResponse | 是否获得响应 |
getResponse | 获取响应实例 |
getResponseBodySummary | 获取响应主体的摘要内容 |
try {
echo Saber::get('http://httpbin.org/redirect/10');
} catch (TooManyRedirectsException $e) {
var_dump($e->getCode());
var_dump($e->getMessage());
var_dump($e->hasResponse());
echo $e->getRedirectsTrace();
}
// int(302)
// string(28) "Too many redirects occurred!"
// bool(true)
#0 http://httpbin.org/redirect/10
#1 http://httpbin.org/relative-redirect/9
#2 http://httpbin.org/relative-redirect/8
同时, Saber亦支持以温和的方式来对待异常, 以免使用者陷入在不稳定的网络环境下, 必须在每一步都使用try包裹代码的恐慌中:
设定errorReport级别, 它是全局生效的, 对已创建的实例不会生效.
// 启用所有异常但忽略重定向次数过多异常
Saber::exceptionReport(
HttpExceptionMask::E_ALL ^ HttpExceptionMask::E_REDIRECT
);
下面的值(数值或者符号)用于建立一个二进制位掩码,来制定要报告的错误信息。可以使用按位运算符来组合这些值或者屏蔽某些类型的错误。标志位与掩码
Mask | Value | Intro |
---|---|---|
E_NONE | 0 | 忽略所有异常 |
E_REQUEST | 1 | 对应RequestException |
E_CONNECT | 2 | 对应RequestException |
E_REDIRECT | 4 | 对应RequestException |
E_BAD_RESPONSE | 8 | 对应BadRException |
E_CLIENT | 16 | 对应ClientException |
E_SERVER | 32 | 对应ServerException |
E_ALL | 63 | 所有异常 |
本函数可以用你自己定义的方式来处理HTTP请求中产生的错误, 可以更加随心所欲地定义你想要捕获/忽略的异常.
注意: 除非函数返回 TRUE (或其它真值),否则异常会继续抛出而不是被自定义函数捕获.
Saber::exceptionHandle(function (\Exception $e) {
echo get_class($e) . " is caught!";
return true;
});
Saber::get('http://httpbin.org/redirect/10');
//output: Swlib\Http\Exception\TooManyRedirectsException is caught!
由于无法在魔术方法中使用协程(__call, __callStatic), 源码中的方法都是手动定义.
为了使用方便,已为所有支持的请求方法提供了别名。
public static function create(array $options): Client { }
public static function session(array $options): Client { }
public static function psr(array $options): Request { }
public static function wait(array $options): Client { }
public static function request(array $options) { }
public static function requests(array $requests, array $default_options): ResponseMap { }
public static function get(string $uri, array $options) { }
public static function delete(string $uri, array $options) { }
public static function head(string $uri, array $options) { }
public static function options(string $uri, array $options) { }
public static function post(string $uri, $data, array $options) { }
public static function put(string $uri, $data, array $options) { }
public static function patch(string $uri, $data, array $options) { }
public static function default(array $options): void { }
public static function exceptionReport(int $level): void { }
public static function exceptionHandle(callable $handle): void { }
public function getExceptionReport(): int { }
public function setExceptionReport(int $level): self { }
public function isWaiting(): bool { }
public function getSSL(): int { }
public function withSSL(int $mode): self { }
public function getCAFile(): string { }
public function withCAFile(string $ca_file): self { }
public function withSSLVerifyPeer(bool $verify_peer, string $ssl_host_name): self { }
public function withSSLAllowSelfSigned(bool $allow): self { }
public function getSSLConf() { }
public function getKeepAlive() { }
public function withKeepAlive(bool $enable): self { }
public function getProxy(): array { }
public function withProxy(string $host, int $port): self { }
public function withSocks5(string $host, int $port, string $username, string $password): self { }
public function withoutProxy(): self { }
public function getTimeout(): float { }
public function withTimeout(float $timeout): self { }
public function getRedirect(): int { }
public function getName() { }
public function withName($name): self { }
public function withRedirect(int $time): self { }
public function getRedirectWait(): bool { }
public function withRedirectWait(bool $enable): self { }
public function resetClient($client) { }
public function exec() { }
public function recv() { }
public function getRequestTarget(): string { }
public function withRequestTarget($requestTarget): self { }
public function getMethod(): string { }
public function withMethod($method): self { }
public function getUri(): Psr\Http\Message\UriInterface { }
public function withUri(Psr\Http\Message\UriInterface $uri, $preserveHost): self { }
public function getCookieParams(): array { }
public function getCookieParam(string $name): string { }
public function withCookieParam(string $name, string $value): self { }
public function withCookieParams(array $cookies): self { }
public function getQueryParam(string $name): string { }
public function getQueryParams(): array { }
public function withQueryParam(string $name, string $value): self { }
public function withQueryParams(array $query): self { }
public function getParsedBody(string $name) { }
public function withParsedBody($data): self { }
public function getUploadFile(string $name): Swlib\Http\UploadFile { }
public function getUploadFiles(): array { }
public function withUploadFile(Swlib\Http\UploadFile $uploadFile): self { }
public function withUploadFiles(array $uploadFiles): self { }
public function getProtocolVersion(): string { }
public function withProtocolVersion($version): self { }
public function hasHeader($name): bool { }
public function getHeader($name): array { }
public function getHeaderLine($name): string { }
public function getHeaders(bool $implode, bool $ucwords): array { }
public function withHeader($raw_name, $value): self { }
public function withHeaders(array $headers): self { }
public function withAddedHeader($raw_name, $value): self { }
public function withoutHeader($name): self { }
public function getBody(): Swlib\Http\StreamInterface { }
public function withBody($body): self { }
public function initialization(bool $incremental) { }
public function getCookies() { }
public function setCookie(array $options): self { }
public function unsetCookie(string $name, string $path, string $domain): self { }
public function withInterceptor(string $name, array $interceptor) { }
public function withAddedInterceptor(string $name, array $functions): self { }
public function removeInterceptor(string $name): self { }
public function callInterceptor(string $name, $arguments) { }
public function getStatusCode() { }
public function withStatus($code, $reasonPhrase) { }
public function getReasonPhrase() { }
public function __toString() { }
public function getProtocolVersion(): string { }
public function withProtocolVersion($version): self { }
public function hasHeader($name): bool { }
public function getHeader($name): array { }
public function getHeaderLine($name): string { }
public function getHeaders(bool $implode, bool $ucwords): array { }
public function withHeader($raw_name, $value): self { }
public function withHeaders(array $headers): self { }
public function withAddedHeader($raw_name, $value): self { }
public function withoutHeader($name): self { }
public function getBody(): Swlib\Http\StreamInterface { }
public function withBody($body): self { }
public function initialization(bool $incremental) { }
public function getCookies() { }
public function setCookie(array $options): self { }
public function unsetCookie(string $name, string $path, string $domain): self { }
public function enqueue($request) { }
public function getMaxConcurrency(): int { }
public function withMaxConcurrency(int $num): self { }
public function recv(): Swlib\Saber\ResponseMap { }
public $time = 0.0;
public $status_map = [];
public $success_map = [];
public $success_num = 0;
public $error_num = 0;
public function offsetSet($index, $response) { }
public function __toString() { }
File Upload ✔ | WebSocket ✔ | AutoParser✔ | AutoRetry | Random-UA | Http2 |
---|---|---|---|---|---|
4 (High-priority) | 3 | 2 | 1 | .5 | .25 |
As the main HTTP/2 benefit is that it allows multiplexing many requests within a single connection, thus [almost] removing the limit on number of simultaneous requests - and there is no such limit when talking to your own backends. Moreover, things may even become worse when using HTTP/2 to backends, due to single TCP connection being used instead of multiple ones, so Http2 Will not be a priority. (#ref)
将本项目源文件加入到IDE的 Include Path
中. (使用composer安装,则可以包含整个vendor文件夹)
良好的注释书写使得Saber完美支持IDE自动提示, 只要在对象后书写箭头符号即可查看所有对象方法名称, 名称都十分通俗易懂, 大量方法都遵循PSR规范或是参考Guzzle项目而实现.
对于底层Swoole相关类的IDE提示则需要引入swoole-ide-helper(composer在dev环境下会默认安装), 该项目会由我持续维护并推送最新代码到eaglewu持有的主仓库中.
欢迎提交issue和PR.