Skip to content

Commit

Permalink
[SPARK-18846][SCHEDULER] Fix flakiness in SchedulerIntegrationSuite
Browse files Browse the repository at this point in the history
There is a small race in SchedulerIntegrationSuite.
The test assumes that the taskscheduler thread
processing that last task will finish before the DAGScheduler processes
the task event and notifies the job waiter, but that is not 100%
guaranteed.

ran the test locally a bunch of times, never failed, though admittedly
it never failed locally for me before either.  However I am nearly 100%
certain this is what caused the failure of one jenkins build
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull
(which is long gone now, sorry -- I fixed it as part of
apache#14079 initially)

Author: Imran Rashid <[email protected]>

Closes apache#16270 from squito/sched_integ_flakiness.
  • Loading branch information
squito committed Dec 14, 2016
1 parent cccd643 commit ac013ea
Showing 1 changed file with 12 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ import scala.reflect.ClassTag

import org.scalactic.TripleEquals
import org.scalatest.Assertions.AssertionsHelper
import org.scalatest.concurrent.Eventually._
import org.scalatest.time.SpanSugar._

import org.apache.spark._
import org.apache.spark.TaskState._
Expand Down Expand Up @@ -157,8 +159,16 @@ abstract class SchedulerIntegrationSuite[T <: MockBackend: ClassTag] extends Spa
}
// When a job fails, we terminate before waiting for all the task end events to come in,
// so there might still be a running task set. So we only check these conditions
// when the job succeeds
assert(taskScheduler.runningTaskSets.isEmpty)
// when the job succeeds.
// When the final task of a taskset completes, we post
// the event to the DAGScheduler event loop before we finish processing in the taskscheduler
// thread. It's possible the DAGScheduler thread processes the event, finishes the job,
// and notifies the job waiter before our original thread in the task scheduler finishes
// handling the event and marks the taskset as complete. So its ok if we need to wait a
// *little* bit longer for the original taskscheduler thread to finish up to deal w/ the race.
eventually(timeout(1 second), interval(10 millis)) {
assert(taskScheduler.runningTaskSets.isEmpty)
}
assert(!backend.hasTasks)
} else {
assert(failure != null)
Expand Down

0 comments on commit ac013ea

Please sign in to comment.