Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-5565][ML] LDA wrapper for Pipelines API
This adds LDA to spark.ml, the Pipelines API. It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change: * I eliminated doc IDs. These are not necessary with DataFrames since the user can add an ID column as needed. Note: This will conflict with [apache#9484], but I'll try to merge [apache#9484] first and then rebase this PR. CC: hhbyyh feynmanliang If you have a chance to make a pass, that'd be really helpful--thanks! Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6. CC: mengxr Author: Joseph K. Bradley <[email protected]> Closes apache#9513 from jkbradley/lda-pipelines.
- Loading branch information