再谈随机数引起的阻塞问题

Java的随机数实现有很多坑,记录一下这次使用jdk1.8里新增的加强版随机数实现SecureRandom.getInstanceStrong() 遇到的问题。

之前在维护ali-tomcat的时候曾发现过jvm随机数算法选用不当导致tomcat的SessionID生成非常慢的情况,可以参考JVM上的随机数与熵池策略Docker中apache-tomcat启动慢的问题 这两篇文章。不过当时没有太追究,以为使用了-Djava.security.egd=file:/dev/./urandom就可以避免了,在这次项目里再次遇到随机数导致所有线程阻塞之后发现这块还挺多规则。

本次项目中使用的是jdk1.8,启动参数里设置了

-Djava.security.egd=file:/dev/./urandom

使用的随机数方式是Java8新增的:

SecureRandom.getInstanceStrong();

碰到故障时,线程阻塞在

"DubboServerHandler-xxx:20880-thread-1789" #28440 daemon prio=5 os_prio=0 tid=0x0000000008ffd000 nid=0x5712 runnable [0x000000004cbd7000]
java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:246)
    at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:410)
    at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:427)
    - locked <0x00000000c03a3c90> (a java.lang.Object)
    at sun.security.provider.NativePRNG$RandomIO.access$500(NativePRNG.java:329)
    at sun.security.provider.NativePRNG$Blocking.engineGenerateSeed(NativePRNG.java:272)
    at java.security.SecureRandom.generateSeed(SecureRandom.java:522)

因为这个地方有加锁,locked <0x00000000c03a3c90>,所以其它线程调用到这里时会等待这个lock:

"DubboServerHandler-xxx:20880-thread-1790" #28441 daemon prio=5 os_prio=0 tid=0x0000000008fff000 nid=0x5713 waiting for monitor entry [0x000000004ccd8000]
java.lang.Thread.State: BLOCKED (on object monitor)
    at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:424)
    - waiting to lock <0x00000000c03a3c90> (a java.lang.Object)
    at sun.security.provider.NativePRNG$RandomIO.access$500(NativePRNG.java:329)
    at sun.security.provider.NativePRNG$Blocking.engineGenerateSeed(NativePRNG.java:272)
    at java.security.SecureRandom.generateSeed(SecureRandom.java:522)

去查 NativePRNG$Blocking代码,看到它的文档描述:

A NativePRNG-like class that uses /dev/random for both seed and random material. Note that it does not respect the egd properties, since we have no way of knowing what those qualities are.

奇怪怎么-Djava.security.egd=file:/dev/./urandom参数没起作用,仍使用/dev/random作为随机数的熵池,时间久或调用频繁的话熵池很容易不够用而导致阻塞;于是看了一下 SecureRandom.getInstanceStrong()的文档:

Returns a SecureRandom object that was selected by using the algorithms/providers specified in the securerandom.strongAlgorithms Security property.

原来有自己的算法,在 jre/lib/security/java.security 文件里,默认定义为:

securerandom.strongAlgorithms=NativePRNGBlocking:SUN

如果修改算法值为NativePRNGNonBlocking:SUN的话,会采用NativePRNG$NonBlocking里的逻辑,用/dev/urandom作为熵池,不会遇到阻塞问题。但这个文件是jdk系统文件,修改它或重新指定一个路径都有些麻烦,最好能通过系统环境变量来设置,可这个变量不像securerandom.source属性可以通过系统环境变量-Djava.security.egd=xxx来配置,找半天就是没有对应的系统环境变量。只好修改代码,不采用SecureRandom.getInstanceStrong这个新方法,改成了SecureRandom.getInstance("NativePRNGNonBlocking")

对于SecureRandom的两种算法实现:SHA1PRNGNativePRNGsecurerandom.source 变量的关系,找到一篇解释的很清楚的文章:Using the SecureRandom Class

On Linux:

1) when this value is “file:/dev/urandom” then the NativePRNG algorithm is registered by the Sun crypto provider as the default implementation; the NativePRNG algorithm then reads from /dev/urandom for nextBytes but /dev/random for generateSeed

2) when this value is “file:/dev/random” then the NativePRNG algorithm is not registered by the Sun crypto provider, but the SHA1PRNG system uses a NativeSeedGenerator which reads from /dev/random.

3) when this value is anything else then the SHA1PRNG is used with a URLSeedGenerator that reads from that source.

4) when the value is undefined, then SHA1PRNG is used with ThreadedSeedGenerator

5) when the code explicitly asks for “SHA1PRNG” and the value is either “file:/dev/urandom” or “file:/dev/random” then (2) also occurs

6) when the code explicitly asks for “SHA1PRNG” and the value is some other “file:” url, then (3) occurs

7) when the code explicitly asks for “SHA1PRNG” and the value is undefined then (4) occurs

至于SHA1PRNG算法里,为何用urandom时,不能直接设置为file:/dev/urandom而要用变通的方式设置为file:///dev/urandom或者 file:/dev/./urandom,参考这里

In SHA1PRNG, there is a SeedGenerator which does various things depending on the configuration.

  1. If java.security.egd or securerandom.source point to “file:/dev/random” or “file:/dev/urandom”, we will use NativeSeedGenerator, which calls super() which calls SeedGenerator.URLSeedGenerator(/dev/random). (A nested class within SeedGenerator.) The only things that changed in this bug was that urandom will also trigger use of this code path.

  2. If those properties point to another URL that exists, we’ll initialize SeedGenerator.URLSeedGenerator(url). This is why “file:///dev/urandom”, “file:/./dev/random”, etc. will work.