Skip to content

Commit

Permalink
[SPARK-5802][MLLIB] cache transformed data in glm
Browse files Browse the repository at this point in the history
If we need to transform the input data, we should cache the output to avoid re-computing feature vectors every iteration. dbtsai

Author: Xiangrui Meng <[email protected]>

Closes apache#4593 from mengxr/SPARK-5802 and squashes the following commits:

ae3be84 [Xiangrui Meng] cache transformed data in glm
  • Loading branch information
mengxr committed Feb 17, 2015
1 parent d380f32 commit fd84229
Showing 1 changed file with 15 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ abstract class GeneralizedLinearAlgorithm[M <: GeneralizedLinearModel]
throw new SparkException("Input validation failed.")
}

/**
/*
* Scaling columns to unit variance as a heuristic to reduce the condition number:
*
* During the optimization process, the convergence (rate) depends on the condition number of
Expand All @@ -225,26 +225,27 @@ abstract class GeneralizedLinearAlgorithm[M <: GeneralizedLinearModel]
* Currently, it's only enabled in LogisticRegressionWithLBFGS
*/
val scaler = if (useFeatureScaling) {
(new StandardScaler(withStd = true, withMean = false)).fit(input.map(x => x.features))
new StandardScaler(withStd = true, withMean = false).fit(input.map(_.features))
} else {
null
}

// Prepend an extra variable consisting of all 1.0's for the intercept.
val data = if (addIntercept) {
if (useFeatureScaling) {
input.map(labeledPoint =>
(labeledPoint.label, appendBias(scaler.transform(labeledPoint.features))))
} else {
input.map(labeledPoint => (labeledPoint.label, appendBias(labeledPoint.features)))
}
} else {
if (useFeatureScaling) {
input.map(labeledPoint => (labeledPoint.label, scaler.transform(labeledPoint.features)))
// TODO: Apply feature scaling to the weight vector instead of input data.
val data =
if (addIntercept) {
if (useFeatureScaling) {
input.map(lp => (lp.label, appendBias(scaler.transform(lp.features)))).cache()
} else {
input.map(lp => (lp.label, appendBias(lp.features))).cache()
}
} else {
input.map(labeledPoint => (labeledPoint.label, labeledPoint.features))
if (useFeatureScaling) {
input.map(lp => (lp.label, scaler.transform(lp.features))).cache()
} else {
input.map(lp => (lp.label, lp.features))
}
}
}

/**
* TODO: For better convergence, in logistic regression, the intercepts should be computed
Expand Down

0 comments on commit fd84229

Please sign in to comment.