tcpdump -A -s 10240 'tcp port 8080 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'
Tag Archives: tcpdump
JDK在mac和linux上对connection reset的行为不一致?
遇到一个JDK(版本都是1.8.0_51-b16)在mac和linux上行为不一致的问题,这个问题是针对redis服务器端关闭连接时的状况处理;即在redis-server端设置连接的空闲时间,当超过这个空闲时间后server主动把该连接关闭掉。在我的mac上这个行为是符合预期的,对这个超时的连接再发起请求时会抛出 SocketException "Connection Reset",但在linux上却不会。
简单模拟一下,在Redis服务端配置timeout为3秒,客户端建立连接之后 sleep 4秒(或更久),让连接超时
import java.io.InputStream;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import java.net.Socket;
public class ConnRestTest {
public static void main(String[] args) throws Exception {
Socket socket = new Socket();
socket.connect(new InetSocketAddress("localhost", 6379));
InputStream is = socket.getInputStream();
OutputStream os = socket.getOutputStream();
Thread.sleep(4000);
os.write("*1\r\n$4\r\nPING\r\n".getBytes());
os.flush();
for (int i = 0; i < 7; i++) {
int r = is.read();
System.out.print( r == -1 ? r : (char) r);
}
socket.close();
}
}
上面的代码在mac上运行后会看到如下异常(如果注释掉 Thread.sleep(4000) 会得到 "+PONG\r\n")
Exception in thread "main" java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:223)
at com.wacai.common.redis.io.Test.main(Test.java:23)
而在linux上运行后则是打印7个连续的"-1",即并不会抛出"Connection reset"
分别进行tcpdump,在mac上:
➜ sudo tcpdump -i lo0 port 6379
23:13:43.398003 IP localhost.63864 > localhost.6379: Flags [S], seq 1658673677, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1063188198 ecr 0,sackOK,eol], length 0
23:13:43.398064 IP localhost.6379 > localhost.63864: Flags [S.], seq 355134851, ack 1658673678, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1063188198 ecr 1063188198,sackOK,eol], length 0
23:13:43.398075 IP localhost.63864 > localhost.6379: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1063188198 ecr 1063188198], length 0
23:13:43.398085 IP localhost.6379 > localhost.63864: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1063188198 ecr 1063188198], length 0
23:13:47.063640 IP localhost.6379 > localhost.63864: Flags [F.], seq 1, ack 1, win 12759, options [nop,nop,TS val 1063191852 ecr 1063188198], length 0
23:13:47.063671 IP localhost.63864 > localhost.6379: Flags [.], ack 2, win 12759, options [nop,nop,TS val 1063191852 ecr 1063191852], length 0
23:13:48.403144 IP localhost.63864 > localhost.6379: Flags [P.], seq 1:15, ack 2, win 12759, options [nop,nop,TS val 1063193184 ecr 1063191852], length 14
23:13:48.403255 IP localhost.6379 > localhost.63864: Flags [R], seq 355134853, win 0, length 0
在linux上:
$ sudo tcpflow -p -c -i lo port 6379
00:26:13.303233 IP localhost.34609 > localhost.6379: Flags [S], seq 1094106697, win 43690, options [mss 65495,sackOK,TS val 8462190 ecr 0,nop,wscale 7], length 0
00:26:13.303272 IP localhost.6379 > localhost.34609: Flags [S.], seq 2755621045, ack 1094106698, win 43690, options [mss 65495,sackOK,TS val 8462190 ecr 8462190,nop,wscale 7], length 0
00:26:13.303298 IP localhost.34609 > localhost.6379: Flags [.], ack 1, win 342, options [nop,nop,TS val 8462190 ecr 8462190], length 0
00:26:17.037992 IP localhost.6379 > localhost.34609: Flags [F.], seq 1, ack 1, win 342, options [nop,nop,TS val 8465925 ecr 8462190], length 0
00:26:17.038450 IP localhost.34609 > localhost.6379: Flags [.], ack 2, win 342, options [nop,nop,TS val 8465926 ecr 8465925], length 0
00:26:18.305591 IP localhost.34609 > localhost.6379: Flags [P.], seq 1:15, ack 2, win 342, options [nop,nop,TS val 8467193 ecr 8465925], length 14
00:26:18.305630 IP localhost.6379 > localhost.34609: Flags [R], seq 2755621047, win 0, length 0
在mac上比linux上多了一次在连接建立后从server发给client端的ack,在3秒之后,连接因为超过空闲时间server端向client端发起了fin,client回复ack,之后client端程序对socket进行写操作,在tcpdump里看到标记为P的标记,然后server端发送了reset
在linux上看到连接状态的变化也符合预期:
$ netstat -antp | grep 6379
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:6379 127.0.0.1:34607 ESTABLISHED -
tcp6 0 0 127.0.0.1:34607 127.0.0.1:6379 ESTABLISHED 3683/java
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:6379 127.0.0.1:34607 FIN_WAIT2 -
tcp6 1 0 127.0.0.1:34607 127.0.0.1:6379 CLOSE_WAIT 3683/java
按说server端发送reset标记后,mac上抛出异常的行为更符合预期,可能是JDK网络层面的实现细节不同,或者tcpdump出的数据已经能解释原因只是我看不出来。
HttpURLConnection在底层是否复用socket的简单验证方式
关于JDK自身的HttpURLConnection
在底层是否复用socket的测试方式,可以快速用repl和lsof来检测:
// 本地启动一个 http server,它返回十几个字符
➜ curl "http://localhost:8080/sleep?time=1000"
{"code":"ok"}
// 在repl下连续请求这个url 若干次
scala> val is = new java.net.URL("http://localhost:8080/sleep?time=100").openConnection.getInputStream; for(i <- 1 to 15) is.read; is.close
is: java.io.InputStream = sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@1ba9117e
scala> val is = new java.net.URL("http://localhost:8080/sleep?time=100").openConnection.getInputStream; for(i <- 1 to 15) is.read; is.close
is: java.io.InputStream = sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@a82c5f1
与此同时在另一个终端用lsof查看socket,每秒刷新一次,可看到客户端socket是同一个
➜ /usr/sbin/lsof -Pan -iTCP -r 1 -p 43280
=======
=======
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 43280 hongjiang 47u IPv6 0x43acdfd2ea5b0c01 0t0 TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)
=======
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 43280 hongjiang 47u IPv6 0x43acdfd2ea5b0c01 0t0 TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)
=======
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 43280 hongjiang 47u IPv6 0x43acdfd2ea5b0c01 0t0 TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)
=======
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 43280 hongjiang 47u IPv6 0x43acdfd2ea5b0c01 0t0 TCP 127.0.0.1:57304->127.0.0.1:8080 (ESTABLISHED)
这个话题是由URLConnection
在关闭的时候应该调用close
还是disConnect
所引起的,关于jdk里keep-alive相关的一些参数不展开了。
验证disconnect
方法:
// 执行若干次
scala> val conn = new java.net.URL("http://localhost:8080/sleep?time=100").openConnection.asInstanceOf[java.net.HttpURLConnection]; val is=conn.getInputStream; for(i <- 1 to 15) is.read; conn.disconnect
这时没法用lsof观察了,它最小刷新单位是1秒,因为每次连接立即关闭导致没机会看到,得用tcpdump来观察
➜ sudo tcpdump -i lo0 -s 1024 -l -A port 8080
^[[A04:59:57.066577 IP localhost.57355 > localhost.http-alt: Flags [S]
...
-`=.-`=.GET /sleep?time=100 HTTP/1.1
User-Agent: Java/1.8.0_51
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
...
05:00:05.407691 IP localhost.57356 > localhost.http-alt: Flags [P.], seq 1:168, ack 1, win 12759, options [nop,nop,TS val 761290281 ecr 761290281], length 167: HTTP: GET /sleep?time=100 HTTP/1.1
E...LF@.@.........................1........
-`^)-`^)GET /sleep?time=100 HTTP/1.1
User-Agent: Java/1.8.0_51
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
...
05:00:07.045830 IP localhost.57357 > localhost.http-alt: Flags [P.], seq 1:168, ack 1, win 12759, options [nop,nop,TS val 761291915 ecr 761291915], length 167: HTTP: GET /sleep?time=100 HTTP/1.1
E.....@.@................l.;.\.,..1........
-`d.-`d.GET /sleep?time=100 HTTP/1.1
User-Agent: Java/1.8.0_51
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
看到三次连接每次客户端socket端口都变了。
模拟tomcat bio模式下线程池利用率超过75%关闭keep-alive的情况
模拟一下在BIO模式下,当线程利用率超过75%时,将自动关闭keep-alive的场景。
通过curl命令来观察,默认情况下curl会开启keep-alive选项,不过注意curl复用socket的话是在同一进程内多次访问目标同一地址时才会复用,两次执行curl的话并不会复用,比如:
$ curl http://localhost:7001/main
$ curl http://localhost:7001/main
上面连续执行curl命令并不会复用socket,socket会随着进程的消失而关闭,下次新的进程会重新创建连接。可以通过tcpdump观察,上面两次连接是不同的socket:
$ sudo tcpdump -l -i lo0 port 7001
23:43:19.236948 IP6 localhost.62625 > localhost.afs3-callback
......
23:43:26.071504 IP6 localhost.62626 > localhost.afs3-callback
......
在同一个curl进程里多次访问同一地址的话,会复用socket,通过-v参数就可以观察到:
$ curl -v http://localhost:7001/main http://localhost:7001/main
* Hostname was NOT found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 7001 (#0)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:49:28 GMT
<
* Connection #0 to host localhost left intact
ok
* Found bundle for host localhost: 0x7fa7d8c08c50
* Re-using existing connection! (#0) with host localhost
* Connected to localhost (::1) port 7001 (#0)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:49:28 GMT
<
* Connection #0 to host localhost left intact
ok
注意,在第二次请求开头有一句:Re-using existing connection! 表明复用了上次的socket,使用tcpdump也会看到确实是同一个socket端口连的tomcat。
默认情况下,线程池的最大线程数是200个,BIO模式下当线程利用率超过75%的时候,server会对新来的连接不再使用keep-alive。我们先模拟建立151个连接(默认开启keep-alive的):
for i in {1..151}; do
(
{echo -ne "GET /main HTTP/1.1\nhost: localhost:7001\n\n"; sleep 20}
| telnet localhost 7001
)&;
done
上面的zsh脚本模拟了151个连接(脚本里for循环里使用后台子进程方式启动模拟任务,通过jobs命令也可查看到),每次新建立socket并在服务器端响应后保持连接20秒(这也是服务器端默认keep-alive的超时时间)。tomcat对这151个连接保持keep-alive,BIO模式下会有151个线程耗在上面,即使socket上请求已处理完,后续没有新的请求也不会让出线程,而一直阻塞在上面。这时刚好达到了 151/200 ≈ 0.75
的临界值,那么后续建立的socket将不能再享用keep-alive
现在在这个临界值上,再执行curl命令模拟新的连接:
$ curl -v http://localhost:7001/main http://localhost:7001/main
* Hostname was NOT found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 7001 (#0)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:22:50 GMT
< Connection: close
<
* Closing connection 0
ok
* Hostname was found in DNS cache
* Trying ::1...
* Connected to localhost (::1) port 7001 (#1)
> GET /main HTTP/1.1
> User-Agent: curl/7.37.1
> Host: localhost:7001
> Accept: */*
>
< HTTP/1.1 200 OK
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Transfer-Encoding: chunked
< Date: Mon, 18 Aug 2014 15:22:50 GMT
< Connection: close
<
* Closing connection 1
ok
注意,这次连接有2次请求,但看不到Re-using existing connection! 关键字,每次请求结束,服务器都显式的关闭了连接,即在header里看到的:Connection: close字段。表明超过75%之后,新建立的连接都不会再使用keep-alive。