Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FLINK-22493] Increase test stability in AdaptiveSchedulerITCase.
This addresses the following problem in the testStopWithSavepointFailOnFirstSavepointSucceedOnSecond() test. Once all tasks are running, the test triggers a savepoint, which intentionally fails, because of a test exception in a Task's checkpointing method. The test then waits for the savepoint future to fail, and the scheduler to restart the tasks. Once they are running again, it performs a sanity check whether the savepoint directory has been properly removed. In the reported run, there was still the savepoint directory around. The savepoint directory is removed via the PendingCheckpoint.discard() method. This method is executed using the i/o executor pool of the CheckpointCoordinator. There is no guarantee that this discard method has been executed when the job is running again (and the executor shuts down with the dispatcher, hence it is not bound to job restarts).
- Loading branch information