Skip to content

Commit

Permalink
[SPARK-5565][ML] LDA wrapper for Pipelines API
Browse files Browse the repository at this point in the history
This adds LDA to spark.ml, the Pipelines API.  It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change:
* I eliminated doc IDs.  These are not necessary with DataFrames since the user can add an ID column as needed.

Note: This will conflict with [apache#9484], but I'll try to merge [apache#9484] first and then rebase this PR.

CC: hhbyyh feynmanliang  If you have a chance to make a pass, that'd be really helpful--thanks!  Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6.

CC: mengxr

Author: Joseph K. Bradley <[email protected]>

Closes apache#9513 from jkbradley/lda-pipelines.
  • Loading branch information
jkbradley committed Nov 11, 2015
1 parent 1dde39d commit e281b87
Show file tree
Hide file tree
Showing 3 changed files with 946 additions and 5 deletions.
Loading

0 comments on commit e281b87

Please sign in to comment.