Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
cpp: fix race condition caused by consumer seek and close (apache#7819)
## Motivation User try a loop of create-exclusive-consumer, seek, consume and close of a consumer with cpp client, and some times will meet “consumer busy” errors, which means the broker side consumer still alive while creating new a consumer. Here are suspicion logs. ``` INFO:Client(88)Subscribing on Topic :public/default/jeff-test-21-partition-0Tue Aug 11 11:38:38 2020 INFO:HandlerBase(53)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Getting connection from poolTue Aug 11 11:38:38 2020 INFO:ConsumerImpl(175)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Created consumer on broker [10.88.109.77:44648 -> 10.88.109.71:31051] Tue Aug 11 11:38:38 2020 INFO:HandlerBase(130)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Schedule reconnection in 0.1 sTue Aug 11 11:38:38 2020 INFO:ConsumerImpl(1047)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Seek successfullyTue Aug 11 11:38:38 2020 INFO:HandlerBase(53)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Getting connection from poolTue Aug 11 11:38:38 2020 INFO:ConsumerImpl(175)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Created consumer on broker [10.88.109.77:44648 -> 10.88.109.71:31051] Tue Aug 11 11:38:40 2020 INFO:HandlerBase(130)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Schedule reconnection in 0.1 sTue Aug 11 11:38:41 2020 INFO:HandlerBase(53)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Getting connection from poolTue Aug 11 11:38:41 2020 INFO:ConsumerImpl(848)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Closing consumer for topic persistent://public/default/jeff-test-21-partition-0Tue Aug 11 11:38:42 2020 INFO:ConsumerImpl(175)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Created consumer on broker [10.88.109.77:44646 -> 10.88.109.71:31051] Tue Aug 11 11:38:42 2020 INFO:ConsumerImpl(104)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Destroyed consumer which was not properly closedTue Aug 11 11:38:42 2020 ``` - there is reconnection caused by seek command; - close operation happens at the same time of seek. - consumer destroyed with log `Destroyed consumer which was not properly closed`. The race condition happens like this: 1. seek command triggered consumer disconnect; ``` subscription.resetCursor \ disconnectFuture = dispatcher.disconnectActiveConsumers(true); \ disconnectAllConsumers(boolean isResetCursor) consumerList.forEach(consumer -> consumer.disconnect(isResetCursor)); ``` 2. client trigger disconnectConsumer, and triggered `connection_.reset();` ``` case BaseCommand::CLOSE_CONSUMER: { consumer->disconnectConsumer(); LOG_INFO("Broker notification of Closed consumer: " << consumerId_); \ void ConsumerImpl::disconnectConsumer() { LOG_INFO("Broker notification of Closed consumer: " << consumerId_); Lock lock(mutex_); connection_.reset(); < === lock.unlock(); } ``` 3. connection not ready, and close consumer happened, then it leaked send CloseConsumer command to broker. ``` void ConsumerImpl::closeAsync(ResultCallback callback) { ... LOG_INFO(getName() << "Closing consumer for topic " << topic_); state_ = Closing; ClientConnectionPtr cnx = getCnx().lock(); < === the seek operation caused cnx reset if (!cnx) { < === goes into this if, and set to Closed and returned directly without closeConsumer sent to broker state_ = Closed; lock.unlock(); // If connection is gone, also the consumer is closed on the broker side if (callback) { callback(ResultOk); } return; } ... Future<Result, ResponseData> future = cnx->sendRequestWithId(Commands::newCloseConsumer(consumerId_, requestId), requested); < ==== .... } ``` ### Modifications when consumer destroy, try to send another closeConsumer command if suitable. * fix race condition caused by consumer seek and close * fix format
- Loading branch information