|
| 1 | +##如何使用java.net.URLConnection接收及发送HTTP请求 |
| 2 | + |
| 3 | +首先声明,下面的代码,都是基本的例子。更严谨的话,还应加入处理各种异常的代码(如IOExceptions、NullPointerException、ArrayIndexOutOfBoundsException) |
| 4 | + |
| 5 | +###准备 |
| 6 | +首先,需要设置请求的URL以及charset(编码);另外还需要哪些参数,则取决于各自url的要求。 |
| 7 | +```java |
| 8 | +String url = "http://example.com"; |
| 9 | +String charset = "UTF-8"; |
| 10 | +String param1 = "value1"; |
| 11 | +String param2 = "value2"; |
| 12 | +// ... |
| 13 | +String query = String.format("param1=%s¶m2=%s", |
| 14 | +URLEncoder.encode(param1, charset), |
| 15 | +URLEncoder.encode(param2, charset)); |
| 16 | +``` |
| 17 | +请求参数必须是name=value这样的格式,每个参数间用&连接。一般来说,你还得用 [URLEncoder#encode()](http://docs.oracle.com/javase/6/docs/api/java/net/URLEncoder.html)对参数做[编码](http://en.wikipedia.org/wiki/Percent-encoding) |
| 18 | +上面例子还用到了String#format(),这只是为了方便,我更喜欢用这个方式来完成string的拼接。 |
| 19 | + |
| 20 | +###发送一个[HTTP GET](http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.3)请求(可选:带上参数) |
| 21 | +这依然是个繁琐的事情。默认的方式如下: |
| 22 | +```java |
| 23 | +URLConnection connection = new URL(url + "?" + query).openConnection(); |
| 24 | +connection.setRequestProperty("Accept-Charset", charset); |
| 25 | +InputStream response = connection.getInputStream(); |
| 26 | +``` |
| 27 | +url和参数之间,要用?号连接。请求头(header)中的[Accept-Charset](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2),用于告诉服务器,你所发送参数的编码。如果你不发送任何参数,也可以不管Accept-Charset。如果你无需设置任何header,也可以用[URL#openStream()](http://docs.oracle.com/javase/6/docs/api/java/net/URL.html#openStream%28%29) 而非openConnection。 |
| 28 | +不管那种方式,假设服务器端是 [HttpServlet](http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServlet.html),那么你的get请求将会触发它的doGet()方法,它能通过[HttpServletRequest#getParameter()](http://docs.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#getParameter%28java.lang.String%29)获取你传递的参数。 |
| 29 | + |
| 30 | +###发送一个[HTTP POST](http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.5)请求,并带上参数 |
| 31 | +设置[URLConnection#setDoOutput()](http://docs.oracle.com/javase/6/docs/api/java/net/URLConnection.html#setDoOutput%28boolean%29),等于隐式地将请求方法设为POST。标准的HTTP POST 表单,其Content-Tyep为application/x-www-form-urlencoded,请求的内容放到到body中。也就是如下代码: |
| 32 | +```java |
| 33 | +URLConnection connection = new URL(url).openConnection(); |
| 34 | +connection.setDoOutput(true); // Triggers POST. |
| 35 | +connection.setRequestProperty("Accept-Charset", charset); |
| 36 | +connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded;charset=" + charset); |
| 37 | + |
| 38 | +try (OutputStream output = connection.getOutputStream()) { |
| 39 | + output.write(query.getBytes(charset)); |
| 40 | +} |
| 41 | + |
| 42 | +InputStream response = connection.getInputStream(); |
| 43 | +``` |
| 44 | + |
| 45 | +提醒: |
| 46 | +当你要提交一个HTML表单时,务必要把<input type="hidden"这类元素的值,以name=value的形式也一并提交。另外,还有<input type="submit">这类元素,也是如此。因为,通常服务端也需要这个信息,来确认哪一个按钮触发了这个提交动作。 |
| 47 | + |
| 48 | +也可以使用[HttpURLConnection](http://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html) 来代替[URLConnection](http://docs.oracle.com/javase/6/docs/api/java/net/URLConnection.html) ,然后调用[HttpURLConnection#setRequestMethod()](http://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html#setRequestMethod%28java.lang.String%29)来将请求设为POST类型。 |
| 49 | +```java |
| 50 | +HttpURLConnection httpConnection = (HttpURLConnection) new URL(url).openConnection(); |
| 51 | +httpConnection.setRequestMethod("POST"); |
| 52 | +``` |
| 53 | +同样的,如果服务端是[HttpServlet](http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServlet.html),将会触发它的[doPost()](http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServlet.html#doPost%28javax.servlet.http.HttpServletRequest,%20javax.servlet.http.HttpServletResponse%29)方法,可以通过[HttpServletRequest#getParameter()](http://docs.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#getParameter%28java.lang.String%29)获取post参数 |
| 54 | + |
| 55 | +###真正触发HTTP请求的发送 |
| 56 | +你可以显式地通过[URLConnection#connect()](http://docs.oracle.com/javase/6/docs/api/java/net/URLConnection.html#connect%28%29)来发送请求,但是,当你调用获取响应信息的方法时,一样将自动发送请求。例如当你使用[URLConnection#getInputStream()](http://docs.oracle.com/javase/6/docs/api/java/net/URLConnection.html#getInputStream%28%29)时,就会自动触发请求,因此,不用多次一举地调用connect()方法。上面我的例子,也都是直接调用getInputStream()方法。 |
| 57 | + |
| 58 | +获取HTTP响应信息 |
| 59 | +1. HTTP响应码: |
| 60 | +首先默认你使用了 [HttpURLConnection](http://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html) |
| 61 | +```java |
| 62 | +int status = httpConnection.getResponseCode(); |
| 63 | +``` |
| 64 | +2. HTTP 响应头(headers) |
| 65 | +```java |
| 66 | +for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) { |
| 67 | + System.out.println(header.getKey() + "=" + header.getValue()); |
| 68 | +} |
| 69 | +``` |
| 70 | +3. HTTP响应编码: |
| 71 | +当Content-Type中包含charset参数时,说明响应内容是基于charset参数指定的编码。因此,解码响应信息时,也要按照这个编码格式来。 |
| 72 | + |
| 73 | +```java |
| 74 | +String contentType = connection.getHeaderField("Content-Type"); |
| 75 | +String charset = null; |
| 76 | + |
| 77 | +for (String param : contentType.replace(" ", "").split(";")) { |
| 78 | + if (param.startsWith("charset=")) { |
| 79 | + charset = param.split("=", 2)[1]; |
| 80 | + break; |
| 81 | + } |
| 82 | +} |
| 83 | + |
| 84 | +if (charset != null) { |
| 85 | + try (BufferedReader reader = new BufferedReader(new InputStreamReader(response, charset))) { |
| 86 | + for (String line; (line = reader.readLine()) != null;) { |
| 87 | + // ... System.out.println(line) ? |
| 88 | + } |
| 89 | + } |
| 90 | +} |
| 91 | +else { |
| 92 | + // It's likely binary content, use InputStream/OutputStream. |
| 93 | +} |
| 94 | +``` |
| 95 | + |
| 96 | + |
| 97 | +###session的维护 |
| 98 | +服务端session,通常是基于cookie实现的。你可以通过[CookieHandlerAPI](http://docs.oracle.com/javase/8/docs/api/java/net/CookieHandler.html)来管理cookie。在发送HTTP请求前,初始化一个[CookieManager](http://docs.oracle.com/javase/6/docs/api/java/net/CookieManager.html), 然后设置参数为[CookiePolicy](http://docs.oracle.com/javase/6/docs/api/java/net/CookiePolicy.html).[CCEPT_ALL](http://docs.oracle.com/javase/6/docs/api/java/net/CookiePolicy.html#ACCEPT_ALL)。 |
| 99 | +```java |
| 100 | +// First set the default cookie manager. |
| 101 | +CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL)); |
| 102 | +// All the following subsequent URLConnections will use the same cookie manager. |
| 103 | +URLConnection connection = new URL(url).openConnection(); |
| 104 | +// ... |
| 105 | +connection = new URL(url).openConnection(); |
| 106 | +// ... |
| 107 | +connection = new URL(url).openConnection(); |
| 108 | +// ... |
| 109 | +``` |
| 110 | + |
| 111 | +请注意,这个方式并非适用于所有场景。如果使用这个方式失败了,你可以尝试自己设置cookie:你需要从响应头中拿到Set-Cookie参数,然后再把cookie设置到接下来的其他请求中。 |
| 112 | +```java |
| 113 | +// Gather all cookies on the first request. |
| 114 | +URLConnection connection = new URL(url).openConnection(); |
| 115 | +List<String> cookies = connection.getHeaderFields().get("Set-Cookie"); |
| 116 | +// ... |
| 117 | + |
| 118 | +// Then use the same cookies on all subsequent requests. |
| 119 | +connection = new URL(url).openConnection(); |
| 120 | +for (String cookie : cookies) { |
| 121 | + connection.addRequestProperty("Cookie", cookie.split(";", 2)[0]); |
| 122 | +} |
| 123 | +// ... |
| 124 | +``` |
| 125 | +上面的split(";", 2)[0],作用是去掉一些跟服务端无关的cookie信息(例如expores,path等)。也可用cookie.substring(0, cookie.indexOf(';'))实现同样的目的 |
| 126 | + |
| 127 | +###流的处理 |
| 128 | +不管你是否通过connection.setRequestProperty("Content-Length", contentLength)为content设置了定长, [HttpURLConnection](http://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html)在发送请求前,默认都会缓存整个请求的body。如果发送一个比较大的post请求(例如上传文件),有可能会导致OutOfMemoryException。为了避免这个问题,可以设置[HttpURLConnection#setFixedLengthStreamingMode()](http://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html#setFixedLengthStreamingMode%28int%29) |
| 129 | +httpConnection.setFixedLengthStreamingMode(contentLength); |
| 130 | +但如果content长度是未知的,则可以用[HttpURLConnection#setChunkedStreamingMode()](http://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html#setChunkedStreamingMode%28int%29)。这样,header中Transfer-Encoding会变成chunked,你的请求将会分块发送,例如下面的例子,请求的body,将会按1KB一块,分块发送 |
| 131 | +```java |
| 132 | +httpConnection.setChunkedStreamingMode(1024); |
| 133 | +``` |
| 134 | + |
| 135 | +###User-Agent |
| 136 | +有时候,你发送的请求,可能只有在浏览器下才能正常返回,而其他方式却不行。这可能跟请求头中的User-Agent有关。通过URLConnection发送的请求,默认会带上的User-Agent信息是Java/1.6.0_19,也就是java+jre的版本。你可以重写这个信息: |
| 137 | +```java |
| 138 | +connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401"); // Do as if you're using Firefox 3.6.3. |
| 139 | +``` |
| 140 | +这里有一份更全的浏览器[User-Agent清单](http://www.useragentstring.com/pages/useragentstring.php) |
| 141 | + |
| 142 | +###错误处理 |
| 143 | +如果HTTP的响应码是4xx(客户端异常)或者5xx(服务端异常),你可以通过HttpURLConnection#getErrorStream()获取信息,服务端可能会将一些有用的错误信息放到这里面。 |
| 144 | +```java |
| 145 | +InputStream error = ((HttpURLConnection) connection).getErrorStream(); |
| 146 | +``` |
| 147 | + |
| 148 | +###上传文件 |
| 149 | +一般来说,你需要将post的内容设为[multipart/form-data](http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2)(相关的RFC文档:[RFC2388](http://www.faqs.org/rfcs/rfc2388.html)) |
| 150 | +```java |
| 151 | +String param = "value"; |
| 152 | +File textFile = new File("/path/to/file.txt"); |
| 153 | +File binaryFile = new File("/path/to/file.bin"); |
| 154 | +String boundary = Long.toHexString(System.currentTimeMillis()); // Just generate some unique random value. |
| 155 | +String CRLF = "\r\n"; // Line separator required by multipart/form-data. |
| 156 | +URLConnection connection = new URL(url).openConnection(); |
| 157 | +connection.setDoOutput(true); |
| 158 | +connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + boundary); |
| 159 | + |
| 160 | +try ( |
| 161 | + OutputStream output = connection.getOutputStream(); |
| 162 | + PrintWriter writer = new PrintWriter(new OutputStreamWriter(output, charset), true); |
| 163 | +) { |
| 164 | + // Send normal param. |
| 165 | + writer.append("--" + boundary).append(CRLF); |
| 166 | + writer.append("Content-Disposition: form-data; name=\"param\"").append(CRLF); |
| 167 | + writer.append("Content-Type: text/plain; charset=" + charset).append(CRLF); |
| 168 | + writer.append(CRLF).append(param).append(CRLF).flush(); |
| 169 | + |
| 170 | + // Send text file. |
| 171 | + writer.append("--" + boundary).append(CRLF); |
| 172 | + writer.append("Content-Disposition: form-data; name=\"textFile\"; filename=\"" + textFile.getName() + "\"").append(CRLF); |
| 173 | + writer.append("Content-Type: text/plain; charset=" + charset).append(CRLF); // Text file itself must be saved in this charset! |
| 174 | + writer.append(CRLF).flush(); |
| 175 | + Files.copy(textFile.toPath(), output); |
| 176 | + output.flush(); // Important before continuing with writer! |
| 177 | + writer.append(CRLF).flush(); // CRLF is important! It indicates end of boundary. |
| 178 | + |
| 179 | + // Send binary file. |
| 180 | + writer.append("--" + boundary).append(CRLF); |
| 181 | + writer.append("Content-Disposition: form-data; name=\"binaryFile\"; filename=\"" + binaryFile.getName() + "\"").append(CRLF); |
| 182 | + writer.append("Content-Type: " + URLConnection.guessContentTypeFromName(binaryFile.getName())).append(CRLF); |
| 183 | + writer.append("Content-Transfer-Encoding: binary").append(CRLF); |
| 184 | + writer.append(CRLF).flush(); |
| 185 | + Files.copy(binaryFile.toPath(), output); |
| 186 | + output.flush(); // Important before continuing with writer! |
| 187 | + writer.append(CRLF).flush(); // CRLF is important! It indicates end of boundary. |
| 188 | + |
| 189 | + // End of multipart/form-data. |
| 190 | + writer.append("--" + boundary + "--").append(CRLF).flush(); |
| 191 | +} |
| 192 | +``` |
| 193 | + |
| 194 | +假设服务端还是一个[HttpServlet](http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServlet.html),它的doPost()方法将会处理这个请求,服务端通过[HttpServletRequest#getPart()](http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getPart%28java.lang.String%29)获取你发送的内容(注意了,不是getParameter())。getPart()是个比较新的方法,是在Servlet 3.0后才引入的。如果你是Servlet 3.0之前的版本,则可以选用[Apache Commons FileUpload](http://commons.apache.org/fileupload]来解析multipart/form-data的请求。可以参考这里的[例子](http://stackoverflow.com/questions/2422468/upload-big-file-to-servlet/2424824#2424824) |
| 195 | + |
| 196 | +###最后的话 |
| 197 | +上面啰嗦了很多,Apache提供了工具包,帮助我们更方便地完成这些事情 |
| 198 | +[Apache HttpComponents HttpClient](http://stackoverflow.com/questions/2422468/upload-big-file-to-servlet/2424824#2424824): |
| 199 | +- [HttpClient Tutorial](http://hc.apache.org/httpcomponents-client-ga/tutorial/html/) |
| 200 | +- [HttpClient Examples](http://hc.apache.org/httpcomponents-client-ga/examples.html) |
| 201 | + |
| 202 | + |
| 203 | +google也有类似的[工具包](https://code.google.com/p/google-http-java-client/) |
| 204 | + |
| 205 | +解析、提取HTML内容 |
| 206 | +如果你是想解析提取html的内容,你可以用[Jsoup](http://jsoup.org/)等解析器 |
| 207 | +- [一些比较有名的HTML解析器的优缺点](http://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers/3154281#3154281) |
| 208 | +- [用java如何扫描和解析网页](http://stackoverflow.com/questions/2835505/how-to-scan-a-website-or-page-for-info-and-bring-it-into-my-program/2835555#2835555) |
| 209 | + |
| 210 | + |
| 211 | + |
| 212 | + |
| 213 | +stackoverflow原址: |
| 214 | +http://stackoverflow.com/questions/2793150/using-java-net-urlconnection-to-fire-and-handle-http-requests |
0 commit comments