Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[proxy] Fix deadlock in pulsar proxy (apache#7690)
### Motivation When a broker server is restarted, the Pulsar proxy outputs the following warning: ``` 15:09:01.486 [main-EventThread] INFO o.a.p.z.ZooKeeperChildrenCache - reloadCache called in zookeeperChildrenCache for path /loadbalance/brokers 15:09:31.487 [main-EventThread] WARN o.a.p.p.s.util.ZookeeperCacheLoader - Error updating broker info after broker list changed. java.util.concurrent.TimeoutException: null at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) at org.apache.pulsar.proxy.server.util.ZookeeperCacheLoader.updateBrokerList(ZookeeperCacheLoader.java:118) at org.apache.pulsar.proxy.server.util.ZookeeperCacheLoader.lambda$new$0(ZookeeperCacheLoader.java:82) at org.apache.pulsar.zookeeper.ZooKeeperChildrenCache.lambda$0(ZooKeeperChildrenCache.java:89) at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$22(ZooKeeperCache.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:592) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) ``` This is due to a deadlock occurring in the proxy. While this deadlock occurs, `main-EventThread` is stopped in the following states: ``` "main-EventThread" apache#26 daemon prio=5 os_prio=0 tid=0x00007fa9b55ff000 nid=0x3d98 waiting on condition [0x00007fa923dfc000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000eef2ad80> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1709) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1788) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(ZooKeeperDataCache.java:97) at org.apache.pulsar.proxy.server.util.ZookeeperCacheLoader.updateBrokerList(ZookeeperCacheLoader.java:123) at org.apache.pulsar.proxy.server.util.ZookeeperCacheLoader.lambda$new$0(ZookeeperCacheLoader.java:84) at org.apache.pulsar.proxy.server.util.ZookeeperCacheLoader$$Lambda$40/1450652220.onUpdate(Unknown Source) at org.apache.pulsar.zookeeper.ZooKeeperChildrenCache.lambda$0(ZooKeeperChildrenCache.java:89) at org.apache.pulsar.zookeeper.ZooKeeperChildrenCache$$Lambda$365/191748085.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$22(ZooKeeperCache.java:415) at org.apache.pulsar.zookeeper.ZooKeeperCache$$Lambda$46/1819738265.processResult(Unknown Source) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:592) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) ``` `ZookeeperCacheLoader` will try to get load information for all available brokers from ZK or its cache if there is a change in the znode `/loadbalance/brokers`. The problem is that it is `main-EventThread` that performs the above process, and it is also `main-EventThread` that gets the data from ZK. `main-EventThread` is blocked at the following line, so deadlock occurs. https://github.com/apache/pulsar/blob/9d3ab60efddd3f4064789293fd8e9c4d4388b6c7/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZooKeeperDataCache.java#L97 ### Modifications ~Make the method `ZookeeperCacheLoader#updateBrokerList` run on a thread separate from `main-EventThread`.~ Made `updateBrokerList()` method asynchronous so that `main-EventThread` is not blocked.
- Loading branch information