Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
streaming: Fail streaming sessions during shutdown
Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test The test does: - Insert data on node1 only - Insert data on node2 only - Run repair on node1 and stop node1 once "starting user-requested repair" is seen The repair shutdown code may wait for the stream session to complete for a very long time if node 1 finishes sending data to node2 and is waiting for node2 to send data to it, when node1 is stopped. The stream session will not be closed in this case until stream session _keep_alive_timeout (10 minutes) expires. Instead of waiting for the stream_session keep alive timer to expire, we can fail all the stream sessions during shutdown. Before 1 - The bad case (repair shutdown will last for 10 minutes): INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:23:56,618 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:23:58,669 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:25:56,624 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] The session 0x600021516c00 made no progress with peer 127.0.0.2 Before 2 - The good case: INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:18:34,155 [shard 0] messaging_service - Retry verb=19 to 127.0.0.2:0, retry=10: rpc::closed_error (connection is closed) WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] COMPLETE_MESSAGE for 127.0.0.2 has failed: rpc::closed_error (connection is closed) WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Streaming error occurred INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Session with 127.0.0.2 is complete, state=FAILED INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] bytes_sent = 0, bytes_received = 245000 WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Stream failed, peers={127.0.0.2} WARN 2016-09-21 16:18:34,155 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed) INFO 2016-09-21 16:18:34,155 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed) INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:18:34,156 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:18:34,199 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:18:34,199 [shard 0] repair - Completed shutdown of repair INFO 2016-09-21 16:18:34,199 [shard 0] compaction_manager - Asked to stop INFO 2016-09-21 16:18:34,199 [shard 1] compaction_manager - Asked to stop After: INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:06:21,685 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Session with 127.0.0.2 is complete, state=FAILED INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - stream_manager stopped INFO 2016-09-21 16:06:23,688 [shard 1] storage_service - stream_manager stopped INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] bytes_sent = 0, bytes_received = 25725 INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown stream_manager done WARN 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Stream failed, peers={127.0.0.2} WARN 2016-09-21 16:06:23,688 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed) INFO 2016-09-21 16:06:23,688 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed) INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:06:23,774 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:06:23,774 [shard 0] repair - Completed shutdown of repair INFO 2016-09-21 16:06:23,774 [shard 0] compaction_manager - Asked to stop INFO 2016-09-21 16:06:23,774 [shard 1] compaction_manager - Asked to stop
- Loading branch information