遇到一个JDK(版本都是1.8.0_51-b16)在mac和linux上行为不一致的问题,这个问题是针对redis服务器端关闭连接时的状况处理;即在redis-server端设置连接的空闲时间,当超过这个空闲时间后server主动把该连接关闭掉。在我的mac上这个行为是符合预期的,对这个超时的连接再发起请求时会抛出 SocketException "Connection Reset",但在linux上却不会。
简单模拟一下,在Redis服务端配置timeout为3秒,客户端建立连接之后 sleep 4秒(或更久),让连接超时
import java.io.InputStream;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import java.net.Socket;
public class ConnRestTest {
public static void main(String[] args) throws Exception {
Socket socket = new Socket();
socket.connect(new InetSocketAddress("localhost", 6379));
InputStream is = socket.getInputStream();
OutputStream os = socket.getOutputStream();
Thread.sleep(4000);
os.write("*1\r\n$4\r\nPING\r\n".getBytes());
os.flush();
for (int i = 0; i < 7; i++) {
int r = is.read();
System.out.print( r == -1 ? r : (char) r);
}
socket.close();
}
}
上面的代码在mac上运行后会看到如下异常(如果注释掉 Thread.sleep(4000) 会得到 "+PONG\r\n")
Exception in thread "main" java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.net.SocketInputStream.read(SocketInputStream.java:223)
at com.wacai.common.redis.io.Test.main(Test.java:23)
而在linux上运行后则是打印7个连续的"-1",即并不会抛出"Connection reset"
分别进行tcpdump,在mac上:
➜ sudo tcpdump -i lo0 port 6379
23:13:43.398003 IP localhost.63864 > localhost.6379: Flags [S], seq 1658673677, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1063188198 ecr 0,sackOK,eol], length 0
23:13:43.398064 IP localhost.6379 > localhost.63864: Flags [S.], seq 355134851, ack 1658673678, win 65535, options [mss 16344,nop,wscale 5,nop,nop,TS val 1063188198 ecr 1063188198,sackOK,eol], length 0
23:13:43.398075 IP localhost.63864 > localhost.6379: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1063188198 ecr 1063188198], length 0
23:13:43.398085 IP localhost.6379 > localhost.63864: Flags [.], ack 1, win 12759, options [nop,nop,TS val 1063188198 ecr 1063188198], length 0
23:13:47.063640 IP localhost.6379 > localhost.63864: Flags [F.], seq 1, ack 1, win 12759, options [nop,nop,TS val 1063191852 ecr 1063188198], length 0
23:13:47.063671 IP localhost.63864 > localhost.6379: Flags [.], ack 2, win 12759, options [nop,nop,TS val 1063191852 ecr 1063191852], length 0
23:13:48.403144 IP localhost.63864 > localhost.6379: Flags [P.], seq 1:15, ack 2, win 12759, options [nop,nop,TS val 1063193184 ecr 1063191852], length 14
23:13:48.403255 IP localhost.6379 > localhost.63864: Flags [R], seq 355134853, win 0, length 0
在linux上:
$ sudo tcpflow -p -c -i lo port 6379
00:26:13.303233 IP localhost.34609 > localhost.6379: Flags [S], seq 1094106697, win 43690, options [mss 65495,sackOK,TS val 8462190 ecr 0,nop,wscale 7], length 0
00:26:13.303272 IP localhost.6379 > localhost.34609: Flags [S.], seq 2755621045, ack 1094106698, win 43690, options [mss 65495,sackOK,TS val 8462190 ecr 8462190,nop,wscale 7], length 0
00:26:13.303298 IP localhost.34609 > localhost.6379: Flags [.], ack 1, win 342, options [nop,nop,TS val 8462190 ecr 8462190], length 0
00:26:17.037992 IP localhost.6379 > localhost.34609: Flags [F.], seq 1, ack 1, win 342, options [nop,nop,TS val 8465925 ecr 8462190], length 0
00:26:17.038450 IP localhost.34609 > localhost.6379: Flags [.], ack 2, win 342, options [nop,nop,TS val 8465926 ecr 8465925], length 0
00:26:18.305591 IP localhost.34609 > localhost.6379: Flags [P.], seq 1:15, ack 2, win 342, options [nop,nop,TS val 8467193 ecr 8465925], length 14
00:26:18.305630 IP localhost.6379 > localhost.34609: Flags [R], seq 2755621047, win 0, length 0
在mac上比linux上多了一次在连接建立后从server发给client端的ack,在3秒之后,连接因为超过空闲时间server端向client端发起了fin,client回复ack,之后client端程序对socket进行写操作,在tcpdump里看到标记为P的标记,然后server端发送了reset
在linux上看到连接状态的变化也符合预期:
$ netstat -antp | grep 6379
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:6379 127.0.0.1:34607 ESTABLISHED -
tcp6 0 0 127.0.0.1:34607 127.0.0.1:6379 ESTABLISHED 3683/java
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:6379 127.0.0.1:34607 FIN_WAIT2 -
tcp6 1 0 127.0.0.1:34607 127.0.0.1:6379 CLOSE_WAIT 3683/java
按说server端发送reset标记后,mac上抛出异常的行为更符合预期,可能是JDK网络层面的实现细节不同,或者tcpdump出的数据已经能解释原因只是我看不出来。