Skip to content

Commit

Permalink
[SPARK-16302][SQL] Set the right number of partitions for reading dat…
Browse files Browse the repository at this point in the history
…a from a local collection.

follow apache#13137 This pr sets the right number of partitions when reading data from a local collection.
Query 'val df = Seq((1, 2)).toDF("key", "value").count' always use defaultParallelism tasks. So it causes run many empty or small tasks.

Manually tested and checked.

Author: Lianhui Wang <[email protected]>

Closes apache#13979 from lianhuiwang/localTable-Parallel.
  • Loading branch information
lianhuiwang authored and JoshRosen committed Sep 2, 2016
1 parent 5bea875 commit 06e3398
Showing 1 changed file with 4 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,10 @@ case class LocalTableScanExec(
}
}

private lazy val rdd = sqlContext.sparkContext.parallelize(unsafeRows)
private lazy val numParallelism: Int = math.min(math.max(unsafeRows.length, 1),
sqlContext.sparkContext.defaultParallelism)

private lazy val rdd = sqlContext.sparkContext.parallelize(unsafeRows, numParallelism)

protected override def doExecute(): RDD[InternalRow] = {
val numOutputRows = longMetric("numOutputRows")
Expand Down

0 comments on commit 06e3398

Please sign in to comment.