Skip to content

Commit

Permalink
SPARK-6044 [CORE] RDD.aggregate() should not use the closure serializ…
Browse files Browse the repository at this point in the history
…er on the zero value

Use configured serializer in RDD.aggregate, to match PairRDDFunctions.aggregateByKey, instead of closure serializer.

Compare with https://github.com/apache/spark/blob/e60ad2f4c47b011be7a3198689ac2b82ee317d96/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L127

Author: Sean Owen <[email protected]>

Closes apache#5028 from srowen/SPARK-6044 and squashes the following commits:

a4040a7 [Sean Owen] Use configured serializer in RDD.aggregate, to match PairRDDFunctions.aggregateByKey, instead of closure serializer
  • Loading branch information
srowen authored and rxin committed Mar 17, 2015
1 parent b3e6eca commit b2d8c02
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion core/src/main/scala/org/apache/spark/rdd/RDD.scala
Original file line number Diff line number Diff line change
Expand Up @@ -960,7 +960,7 @@ abstract class RDD[T: ClassTag](
*/
def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): U = {
// Clone the zero value since we will also be serializing it as part of tasks
var jobResult = Utils.clone(zeroValue, sc.env.closureSerializer.newInstance())
var jobResult = Utils.clone(zeroValue, sc.env.serializer.newInstance())
val cleanSeqOp = sc.clean(seqOp)
val cleanCombOp = sc.clean(combOp)
val aggregatePartition = (it: Iterator[T]) => it.aggregate(zeroValue)(cleanSeqOp, cleanCombOp)
Expand Down

0 comments on commit b2d8c02

Please sign in to comment.