HttpURLConnection在底层是否复用socket的简单验证方式

关于JDK自身的HttpURLConnection在底层是否复用socket的测试方式,可以快速用repl和lsof来检测:

// 本地启动一个 http server,它返回十几个字符
 ➜  curl "http://localhost:8080/sleep?time=1000"
{"code":"ok"}

// 在repl下连续请求这个url 若干次
scala> val is = new java.net.URL("http://localhost:8080/sleep?time=100").openConnection.getInputStream; for(i <- 1 to 15) is.read; is.close
is: java.io.InputStream = sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@1ba9117e

scala> val is = new java.net.URL("http://localhost:8080/sleep?time=100").openConnection.getInputStream; for(i <- 1 to 15) is.read; is.close
is: java.io.InputStream = sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@a82c5f1

与此同时在另一个终端用lsof查看socket,每秒刷新一次,可看到客户端socket是同一个

 ➜  /usr/sbin/lsof -Pan -iTCP -r 1 -p 43280
=======
=======
COMMAND   PID      USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
java    43280 hongjiang   47u  IPv6 0x43acdfd2ea5b0c01      0t0  TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)
=======
COMMAND   PID      USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
java    43280 hongjiang   47u  IPv6 0x43acdfd2ea5b0c01      0t0  TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)
=======
COMMAND   PID      USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
java    43280 hongjiang   47u  IPv6 0x43acdfd2ea5b0c01      0t0  TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)
=======
COMMAND   PID      USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
java    43280 hongjiang   47u  IPv6 0x43acdfd2ea5b0c01      0t0  TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)  

这个话题是由URLConnection在关闭的时候应该调用close还是disConnect所引起的,关于jdk里keep-alive相关的一些参数不展开了。

验证disconnect方法:

// 执行若干次
scala> val conn = new java.net.URL("http://localhost:8080/sleep?time=100").openConnection.asInstanceOf[java.net.HttpURLConnection]; val is=conn.getInputStream; for(i <- 1 to 15) is.read; conn.disconnect

这时没法用lsof观察了,它最小刷新单位是1秒,因为每次连接立即关闭导致没机会看到,得用tcpdump来观察

 ➜  sudo tcpdump -i lo0  -s 1024 -l -A  port 8080

 ^[[A04:59:57.066577 IP localhost.57355 > localhost.http-alt: Flags [S]
 ...
 -`=.-`=.GET /sleep?time=100 HTTP/1.1
User-Agent: Java/1.8.0_51
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive

...

05:00:05.407691 IP localhost.57356 > localhost.http-alt: Flags [P.], seq 1:168, ack 1, win 12759, options [nop,nop,TS val 761290281 ecr 761290281], length 167: HTTP: GET /sleep?time=100 HTTP/1.1
E...LF@.@.........................1........
-`^)-`^)GET /sleep?time=100 HTTP/1.1
User-Agent: Java/1.8.0_51
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive 

... 

05:00:07.045830 IP localhost.57357 > localhost.http-alt: Flags [P.], seq 1:168, ack 1, win 12759, options [nop,nop,TS val 761291915 ecr 761291915], length 167: HTTP: GET /sleep?time=100 HTTP/1.1
E.....@.@................l.;.\.,..1........
-`d.-`d.GET /sleep?time=100 HTTP/1.1
User-Agent: Java/1.8.0_51
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive

看到三次连接每次客户端socket端口都变了。

HTTP Header里的If-xxx条件参数

上一篇讨论实现静态资源访问的问题,除了可部分获取资源,还有条件参数,服务器端判断当前资源状况满足这些条件才处理请求。

这些条件有If-Match,If-Modified-Since,If-Range,If-Unmodified-Since,If-None-Match等。最常见的If-None-Match这个header,用来在获取资源时比较本地的etag,如果服务器端不一致才获取。

先获取一个资源的etag:

$ curl -I http://localhost:8080/chain.jpg

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"4932-1448300760000"
Last-Modified: Tue, 24 Nov 2015 01:46:00 CST
Content-Type: image/jpeg
Content-Length: 4932
Date: Mon, 23 Nov 2015 18:12:49 GMT

然后在请求header里增加If-None-Match内容:

$ curl -v -H 'If-None-Match: W/"4932-1448300760000"'  http://localhost:8080/chain.jpg

* Hostname was NOT found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 8080 (#0)
> GET /_files_/img/chain.jpg HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:8080
> Accept: */*
> If-None-Match: W/"4932-1448300760000"
>
< HTTP/1.1 304 Not Modified
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< ETag: W/"4932-1448300760000"
< Date: Mon, 23 Nov 2015 18:13:01 GMT
<
* Connection #0 to host localhost left intact

etag相等的话,条件不符合返回304。

HTTP Header里的Range和Content-Range参数

这个话题是从实现一个http资源的静态访问引发的。http协议从1.1开始支持获取文件的部分内容,这为并行下载以及断点续传提供了技术支持。它通过在Header里两个参数实现的,客户端发请求时对应的是Range,服务器端响应时对应的是Content-Range;通过tomcat看一下这两个参数。

在应用的根目录下放了一张图片”chain.jpg”,图片的大小是4932字节,用curl模拟分段请求,请求时把respons的header给dump到一个文件里:

$ curl -D "resp-header1.txt" -H 'Range: bytes=0-2000' \
    http://localhost:8080/chain.jpg > /tmp/test.jpg 

$ cat resp-header1.txt

HTTP/1.1 206 Partial Content # 返回状态码是206
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"4932-1447753566000"
Last-Modified: Tue, 17 Nov 2015 09:46:06 GMT
Content-Range: bytes 0-2000/4932
Content-Type: image/jpeg
Content-Length: 2001
Date: Tue, 17 Nov 2015 17:27:45 GMT 

这时在mac下用preview程序打开图片看到是部分的,把剩余部分数据也下载下来才行:

$ curl -H 'Range: bytes=2001-4932' \
    http://localhost:8080/chain.jpg >> /tmp/test.jpg

Range参数还支持多个区间,用逗号分隔,下面对另一个内容为”hello world”的文件”a.html”多区间请求,这时response的Content-Type不再是原文件mime类型,而用一种multipart/byteranges类型表示:

$ curl -D 'resp-header' -H 'Range: bytes=0-5,6-10' http://localhost:8080/a.html 
--CATALINA_MIME_BOUNDARY
Content-Type: text/html
Content-Range: bytes 0-5/12

hello
--CATALINA_MIME_BOUNDARY
Content-Type: text/html
Content-Range: bytes 6-10/12

world
--CATALINA_MIME_BOUNDARY--

$ cat resp-header

HTTP/1.1 206 Partial Content
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"12-1447780011000"
Last-Modified: Tue, 17 Nov 2015 17:06:51 GMT
Content-Type: multipart/byteranges; boundary=CATALINA_MIME_BOUNDARY
Content-Length: 208
Date: Tue, 17 Nov 2015 17:39:30 GMT

nginx与tomcat之间的keep-alive配置

今天碰到的一个情况,tomcat与前端nginx之间的存在大量的TIME_WAIT状态的连接,第一反应是这里可能没有配置keep-alive。问ops,回复说启用了;要来nginx的配置看了一下,发现upstream里设置了keepalive参数:

upstream  tomcat {
    server x.x.x.x:8080;
    ...
    keepalive 16;
}

不确定这个参数是不是http的keep-alive,在nginx的网站上找了一下

Syntax: keepalive connections;
Default:    —
Context:    upstream

The connections parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process.

它并不是与后端节点开启http-alive方式的意思,nginx作为反向代理后端并不局限http协议,这里的keepalive设置相当于每个worker连接池的最大空闲keepalive连接数,跟http里的keep-alive不是一回事。

在官方文档里明确要对后端节点使用http keep-alive 可以指定http版本为1.1:

location /http/ {
    proxy_pass http://http_backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    ...
}

或者仍使用http 1.0协议,但显式的设置http header里Connection参数为Keep-Alive,这是因为keep-alive是http1.1的默认特性,在1.0里最初并未实现是后来从1.1里backport到1.0的,需要显式设定这个参数才启用:

location /http/ {
    proxy_pass http://http_backend;
    proxy_http_version 1.0;
    proxy_set_header Connection "Keep-Alive";
    ...
}

为了确认有效性,可以对tomcat的logger增加一句

org.apache.coyote.http11.level = FINE

这样可以在tomcat日志里看到每个请求的http header信息:

FINE: Received [GET / HTTP/1.0
Connection: Keep-Alive
Host: localhost:8080
User-Agent: curl/7.37.1
Accept: */*

确实在header里增加了。建议还是配置为http 1.1协议,支持chunked等特性。

ps, keepalive 这个词可能是指连接池或者tcp层面的参数,要看上下文。在http里是keep-alive,注意中间有个连字符。