Skip to content

Commit

Permalink
Deadlock fix in acquireShards (cadence-workflow#5825)
Browse files Browse the repository at this point in the history
Fixing the deadlock demonstrated in cadence-workflow#5824.

I decided to move the channel-writing entirely before consuming so it's a bit more accidental-change-resistant: some kinds of simple incorrect changes will lead to an _immediate_ deadlock every time, rather than a random chance of one.

And if someone _does_ want to move it after and go back to a smaller buffer, more code will have to be changed, so hopefully people will pay more attention to the concurrency risks involved.

More generally, the atomic shutdown stuff is _highly_ prone to causing this kind of error because there's no way to wait on it safely, and I would really love for us to get rid of it.
  • Loading branch information
Groxx authored Mar 29, 2024
1 parent 831ebf5 commit 5003ffb
Showing 1 changed file with 8 additions and 9 deletions.
17 changes: 8 additions & 9 deletions service/history/shard/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -383,8 +383,15 @@ func (c *controller) acquireShards() {
sw := c.metricsScope.StartTimer(metrics.AcquireShardsLatency)
defer sw.Stop()

numShards := c.config.NumberOfShards
shardActionCh := make(chan int, numShards)
// Submit all tasks to the channel.
for shardID := 0; shardID < numShards; shardID++ {
shardActionCh <- shardID // must be non-blocking as there is no other coordination with shutdown
}
close(shardActionCh)

concurrency := common.MaxInt(c.config.AcquireShardConcurrency(), 1)
shardActionCh := make(chan int, concurrency)
var wg sync.WaitGroup
wg.Add(concurrency)
// Spawn workers that would lookup and add/remove shards concurrently.
Expand All @@ -410,14 +417,6 @@ func (c *controller) acquireShards() {
}
}()
}
// Submit tasks to the channel.
for shardID := 0; shardID < c.config.NumberOfShards; shardID++ {
shardActionCh <- shardID
if c.isShuttingDown() {
return
}
}
close(shardActionCh)
// Wait until all shards are processed.
wg.Wait()

Expand Down

0 comments on commit 5003ffb

Please sign in to comment.