HTTP Header里的If-xxx条件参数

上一篇讨论实现静态资源访问的问题,除了可部分获取资源,还有条件参数,服务器端判断当前资源状况满足这些条件才处理请求。

这些条件有If-Match,If-Modified-Since,If-Range,If-Unmodified-Since,If-None-Match等。最常见的If-None-Match这个header,用来在获取资源时比较本地的etag,如果服务器端不一致才获取。

先获取一个资源的etag:

$ curl -I http://localhost:8080/chain.jpg

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"4932-1448300760000"
Last-Modified: Tue, 24 Nov 2015 01:46:00 CST
Content-Type: image/jpeg
Content-Length: 4932
Date: Mon, 23 Nov 2015 18:12:49 GMT

然后在请求header里增加If-None-Match内容:

$ curl -v -H 'If-None-Match: W/"4932-1448300760000"'  http://localhost:8080/chain.jpg

* Hostname was NOT found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 8080 (#0)
> GET /_files_/img/chain.jpg HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:8080
> Accept: */*
> If-None-Match: W/"4932-1448300760000"
>
< HTTP/1.1 304 Not Modified
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< ETag: W/"4932-1448300760000"
< Date: Mon, 23 Nov 2015 18:13:01 GMT
<
* Connection #0 to host localhost left intact

etag相等的话,条件不符合返回304。

HTTP Header里的Range和Content-Range参数

这个话题是从实现一个http资源的静态访问引发的。http协议从1.1开始支持获取文件的部分内容,这为并行下载以及断点续传提供了技术支持。它通过在Header里两个参数实现的,客户端发请求时对应的是Range,服务器端响应时对应的是Content-Range;通过tomcat看一下这两个参数。

在应用的根目录下放了一张图片”chain.jpg”,图片的大小是4932字节,用curl模拟分段请求,请求时把respons的header给dump到一个文件里:

$ curl -D "resp-header1.txt" -H 'Range: bytes=0-2000' \
    http://localhost:8080/chain.jpg > /tmp/test.jpg 

$ cat resp-header1.txt

HTTP/1.1 206 Partial Content # 返回状态码是206
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"4932-1447753566000"
Last-Modified: Tue, 17 Nov 2015 09:46:06 GMT
Content-Range: bytes 0-2000/4932
Content-Type: image/jpeg
Content-Length: 2001
Date: Tue, 17 Nov 2015 17:27:45 GMT 

这时在mac下用preview程序打开图片看到是部分的,把剩余部分数据也下载下来才行:

$ curl -H 'Range: bytes=2001-4932' \
    http://localhost:8080/chain.jpg >> /tmp/test.jpg

Range参数还支持多个区间,用逗号分隔,下面对另一个内容为”hello world”的文件”a.html”多区间请求,这时response的Content-Type不再是原文件mime类型,而用一种multipart/byteranges类型表示:

$ curl -D 'resp-header' -H 'Range: bytes=0-5,6-10' http://localhost:8080/a.html 
--CATALINA_MIME_BOUNDARY
Content-Type: text/html
Content-Range: bytes 0-5/12

hello
--CATALINA_MIME_BOUNDARY
Content-Type: text/html
Content-Range: bytes 6-10/12

world
--CATALINA_MIME_BOUNDARY--

$ cat resp-header

HTTP/1.1 206 Partial Content
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"12-1447780011000"
Last-Modified: Tue, 17 Nov 2015 17:06:51 GMT
Content-Type: multipart/byteranges; boundary=CATALINA_MIME_BOUNDARY
Content-Length: 208
Date: Tue, 17 Nov 2015 17:39:30 GMT

模拟tomcat bio模式下线程池利用率超过75%关闭keep-alive的情况

模拟一下在BIO模式下,当线程利用率超过75%时,将自动关闭keep-alive的场景。

通过curl命令来观察,默认情况下curl会开启keep-alive选项,不过注意curl复用socket的话是在同一进程内多次访问目标同一地址时才会复用,两次执行curl的话并不会复用,比如:

$ curl http://localhost:7001/main
$ curl http://localhost:7001/main

上面连续执行curl命令并不会复用socket,socket会随着进程的消失而关闭,下次新的进程会重新创建连接。可以通过tcpdump观察,上面两次连接是不同的socket:

$ sudo tcpdump -l -i lo0 port 7001

23:43:19.236948 IP6 localhost.62625 > localhost.afs3-callback
......
23:43:26.071504 IP6 localhost.62626 > localhost.afs3-callback
......

在同一个curl进程里多次访问同一地址的话,会复用socket,通过-v参数就可以观察到:

$ curl -v  http://localhost:7001/main  http://localhost:7001/main
* Hostname was NOT found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 7001 (#0)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:49:28 GMT
<
* Connection #0 to host localhost left intact
ok
* Found bundle for host localhost: 0x7fa7d8c08c50
* Re-using existing connection! (#0) with host localhost
* Connected to localhost (::1) port 7001 (#0)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:49:28 GMT
<
* Connection #0 to host localhost left intact
ok

注意,在第二次请求开头有一句:Re-using existing connection! 表明复用了上次的socket,使用tcpdump也会看到确实是同一个socket端口连的tomcat。

默认情况下,线程池的最大线程数是200个,BIO模式下当线程利用率超过75%的时候,server会对新来的连接不再使用keep-alive。我们先模拟建立151个连接(默认开启keep-alive的):

for i in {1..151}; do 
    ( 
        {echo -ne "GET /main HTTP/1.1\nhost: localhost:7001\n\n"; sleep 20} 
        | telnet localhost 7001
    )&;  
done

上面的zsh脚本模拟了151个连接(脚本里for循环里使用后台子进程方式启动模拟任务,通过jobs命令也可查看到),每次新建立socket并在服务器端响应后保持连接20秒(这也是服务器端默认keep-alive的超时时间)。tomcat对这151个连接保持keep-alive,BIO模式下会有151个线程耗在上面,即使socket上请求已处理完,后续没有新的请求也不会让出线程,而一直阻塞在上面。这时刚好达到了 151/200 ≈ 0.75 的临界值,那么后续建立的socket将不能再享用keep-alive

现在在这个临界值上,再执行curl命令模拟新的连接:

$  curl -v  http://localhost:7001/main http://localhost:7001/main

* Hostname was NOT found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 7001 (#0)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:22:50 GMT
< Connection: close
<
* Closing connection 0
ok
* Hostname was found in DNS cache
*   Trying ::1...
* Connected to localhost (::1) port 7001 (#1)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:22:50 GMT
< Connection: close
<
* Closing connection 1
ok

注意,这次连接有2次请求,但看不到Re-using existing connection! 关键字,每次请求结束,服务器都显式的关闭了连接,即在header里看到的:Connection: close字段。表明超过75%之后,新建立的连接都不会再使用keep-alive。