Symptom: The long running process throws java.io.IOException: Too many open files.

Find The problem

  • Read the exception

Somewhere in the exception, it will show which method in our code causes the exception thrown. This is the place to look at what the problem could be. For example, the exception below shows that the initialization from ThreadLocal.get causes the problem.

1492119228.171  xxxx.yahoo.com    45697/10007     -       Container.com.yahoo.jdisc.http.server.jetty.HttpRequestDispatch warning Request failed: /routed/v1/perfb
exception=
org.zeromq.ZMQException: Too many open files(0x18)
  at org.zeromq.ZMQ$Socket.construct(Native Method)
  at org.zeromq.ZMQ$Socket.<init>(ZMQ.java:1719)
  at org.zeromq.ZMQ$Context.socket(ZMQ.java:451)
  at com.yahoo.dhrainbow.smq.client.Connection.socket(Connection.java:38)
  at com.yahoo.dhrainbow.smq.client.Connection.publisher(Connection.java:88)
  at com.yahoo.dhrainbow.smq.client.Connection.publisher(Connection.java:85)
  at com.yahoo.dhrainbow.jdisc.util.SMQThreadUtils$1.initialValue(SMQThreadUtils.java:57)
  at com.yahoo.dhrainbow.jdisc.util.SMQThreadUtils$1.initialValue(SMQThreadUtils.java:50)
  at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
  at java.lang.ThreadLocal.get(ThreadLocal.java:170)
  ... ...
  at org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(SslConnection.java:147)
  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
  at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
  at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:122)
  at org.eclipse.jetty.util.thread.strategy.ExecutingExecutionStrategy.invoke(ExecutingExecutionStrategy.java:58)
  at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:201)
  at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:133)
  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:672)
  at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)
  at java.lang.Thread.run(Thread.java:745)

The other way is to find the number of open files periodically. If there is a leak in open file descriptor, the number will increase over time.

  • use lsof
$ watch -n 6 "sudo lsof -p 15675 | wc -l | tee >> fds.txt"
  • use /proc/{pid}/fd
$ watch -n 6 "sudo ls /proc/31901/fd | wc -l | tee >> fds.txt"

Triage and Fix the Problem

There is a thread pool, queuedthreadpool, used in jetty under the hood. But the question is why the ThreadLocal initialization runs periodically. This suggests that the threads in the pool are created and destroyed dynamically.

To validate this hypothesis, we could just log the thread id in the runnable where the thread is working on.

The next question is why these causes the open file descriptor leak?

ThreadLocal

ThreadLocal could be used to pool expensive object which is not thread-safe. The ThreadLocal variable copy is maintained in each thread and the ThreadLocal initialization code is run for each thread when the get() method is called at the first time in each thread. Thus each thread will not be able to access the value of ThreadLocal variable in another thread.

According to java doc:

Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).

One common issue with ThreadLocal is the class loader leak in applications where the thread pool does not belong to the application but the container, such as servlet container; see ThreadLocal & Memory Leak. The problem in this case is caused by threads in the pool not going way.

In the meanwhile, some containers try to renew its threads periodically which could prevent the memory leak issue afore mentioned. However, this could cause another issue such as open file descriptor leak, which is the main issue in my application.

Open File Descriptor leak

In my application, we use Jetty as web container. The QueuedThreadPool in Jetty renew its threads periodically so new thread comes and goes. We have a ThreadLocal variable which contains a ZMQ.Socket:

private static final ThreadLocal<ZMQ.Socket> sockets = new ThreadLocal<ZMQ.Socket> () {
    @Override
    protected ZMQ.Socket initialValue() {
		return createSocket();
	}
}

The issue is when the thread is destroyed by the pool, the ZMQ.Socket is not properly closed and cleaned up such that the file descriptor associated with the socket is still open.

Once the issue is clear, the method to fix it is also clear: need a way to properly close the socket once the thread is destroyed. Two ways:

  1. If the thread pool provide a hook during the shutdown, close the socket there.
  2. Replace the ThreadLocal variable with an object pool, such as Apache Common Pool’s ObjectPool<ZMQ.Socket>

In this case, it turns out the option 2 is more straight forward:

public ObjectPool<Socket> getPooledSockets(int poolMinIdle, int poolMaxIdle,
                                           int poolMaxTotal, long poolMaxWaitMillis) {

    final PooledObjectFactory<Socket> factory = new PooledObjectFactory<Socket>() {
        @Override
        public PooledObject<Socket> makeObject() throws Exception {
            return new DefaultPooledObject<>(getRebatchSendSocket());
        }

        @Override
        public void destroyObject(PooledObject<Socket> pooledObject) throws Exception {
            Socket socket = pooledObject.getObject();
            socket.close();
        }

        @Override
        public boolean validateObject(PooledObject<Socket> pooledObject) {
            return true;
        }

        @Override
        public void activateObject(PooledObject<Socket> pooledObject) throws Exception {

        }

        @Override
        public void passivateObject(PooledObject<Socket> pooledObject) throws Exception {

        }
    };

    final GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
    poolConfig.setMinIdle(poolMinIdle);
    poolConfig.setMaxIdle(poolMaxIdle);
    poolConfig.setMaxTotal(poolMaxTotal);
    poolConfig.setMaxWaitMillis(poolMaxWaitMillis);

    ObjectPool<Socket> sockets = new GenericObjectPool<>(factory, poolConfig);

    return sockets;
}

References