Skip to content

Commit

Permalink
[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply
Browse files Browse the repository at this point in the history
`GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call it in LDA without validating this requirement. So it might introduce errors. Replacing it by `Graph.apply` would be safer and more proper because it is a public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` here (though it doesn't seem so based on the implementation) or the test cases are special. jkbradley ankurdave

Author: Xiangrui Meng <[email protected]>

Closes apache#11226 from mengxr/SPARK-13355.
  • Loading branch information
mengxr committed Feb 23, 2016
1 parent 72427c3 commit 764ca18
Showing 1 changed file with 1 addition and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ import breeze.stats.distributions.{Gamma, RandBasis}

import org.apache.spark.annotation.{DeveloperApi, Since}
import org.apache.spark.graphx._
import org.apache.spark.graphx.impl.GraphImpl
import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
import org.apache.spark.mllib.linalg.{DenseVector, Matrices, SparseVector, Vector, Vectors}
import org.apache.spark.rdd.RDD
Expand Down Expand Up @@ -188,7 +187,7 @@ final class EMLDAOptimizer extends LDAOptimizer {
graph.aggregateMessages[(Boolean, TopicCounts)](sendMsg, mergeMsg)
.mapValues(_._2)
// Update the vertex descriptors with the new counts.
val newGraph = GraphImpl.fromExistingRDDs(docTopicDistributions, graph.edges)
val newGraph = Graph(docTopicDistributions, graph.edges)
graph = newGraph
graphCheckpointer.update(newGraph)
globalTopicTotals = computeGlobalTopicTotals()
Expand Down

0 comments on commit 764ca18

Please sign in to comment.