Skip to content

Commit 0b713e0

Browse files
hhbyyhMLnick
authored andcommitted
[SPARK-13512][ML] add example and doc for MaxAbsScaler
## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-13512 Add example and doc for ml.feature.MaxAbsScaler. ## How was this patch tested? unit tests Author: Yuhao Yang <[email protected]> Closes apache#11392 from hhbyyh/maxabsdoc.
1 parent 6ca990f commit 0b713e0

File tree

3 files changed

+133
-0
lines changed

3 files changed

+133
-0
lines changed

docs/ml-features.md

+32
Original file line numberDiff line numberDiff line change
@@ -773,6 +773,38 @@ for more details on the API.
773773
</div>
774774
</div>
775775

776+
777+
## MaxAbsScaler
778+
779+
`MaxAbsScaler` transforms a dataset of `Vector` rows, rescaling each feature to range [-1, 1]
780+
by dividing through the maximum absolute value in each feature. It does not shift/center the
781+
data, and thus does not destroy any sparsity.
782+
783+
`MaxAbsScaler` computes summary statistics on a data set and produces a `MaxAbsScalerModel`. The
784+
model can then transform each feature individually to range [-1, 1].
785+
786+
The following example demonstrates how to load a dataset in libsvm format and then rescale each feature to [-1, 1].
787+
788+
<div class="codetabs">
789+
<div data-lang="scala" markdown="1">
790+
791+
Refer to the [MaxAbsScaler Scala docs](api/scala/index.html#org.apache.spark.ml.feature.MaxAbsScaler)
792+
and the [MaxAbsScalerModel Scala docs](api/scala/index.html#org.apache.spark.ml.feature.MaxAbsScalerModel)
793+
for more details on the API.
794+
795+
{% include_example scala/org/apache/spark/examples/ml/MaxAbsScalerExample.scala %}
796+
</div>
797+
798+
<div data-lang="java" markdown="1">
799+
800+
Refer to the [MaxAbsScaler Java docs](api/java/org/apache/spark/ml/feature/MaxAbsScaler.html)
801+
and the [MaxAbsScalerModel Java docs](api/java/org/apache/spark/ml/feature/MaxAbsScalerModel.html)
802+
for more details on the API.
803+
804+
{% include_example java/org/apache/spark/examples/ml/JavaMaxAbsScalerExample.java %}
805+
</div>
806+
</div>
807+
776808
## Bucketizer
777809

778810
`Bucketizer` transforms a column of continuous features to a column of feature buckets, where the buckets are specified by users. It takes a parameter:
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.examples.ml;
19+
20+
import org.apache.spark.SparkConf;
21+
import org.apache.spark.api.java.JavaSparkContext;
22+
// $example on$
23+
import org.apache.spark.ml.feature.MaxAbsScaler;
24+
import org.apache.spark.ml.feature.MaxAbsScalerModel;
25+
import org.apache.spark.sql.DataFrame;
26+
// $example off$
27+
import org.apache.spark.sql.SQLContext;
28+
29+
public class JavaMaxAbsScalerExample {
30+
31+
public static void main(String[] args) {
32+
SparkConf conf = new SparkConf().setAppName("JavaMaxAbsScalerExample");
33+
JavaSparkContext jsc = new JavaSparkContext(conf);
34+
SQLContext jsql = new SQLContext(jsc);
35+
36+
// $example on$
37+
DataFrame dataFrame = jsql.read().format("libsvm").load("data/mllib/sample_libsvm_data.txt");
38+
MaxAbsScaler scaler = new MaxAbsScaler()
39+
.setInputCol("features")
40+
.setOutputCol("scaledFeatures");
41+
42+
// Compute summary statistics and generate MaxAbsScalerModel
43+
MaxAbsScalerModel scalerModel = scaler.fit(dataFrame);
44+
45+
// rescale each feature to range [-1, 1].
46+
DataFrame scaledData = scalerModel.transform(dataFrame);
47+
scaledData.show();
48+
// $example off$
49+
jsc.stop();
50+
}
51+
52+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
// scalastyle:off println
19+
package org.apache.spark.examples.ml
20+
21+
import org.apache.spark.{SparkConf, SparkContext}
22+
// $example on$
23+
import org.apache.spark.ml.feature.MaxAbsScaler
24+
// $example off$
25+
import org.apache.spark.sql.SQLContext
26+
27+
object MaxAbsScalerExample {
28+
def main(args: Array[String]): Unit = {
29+
val conf = new SparkConf().setAppName("MaxAbsScalerExample")
30+
val sc = new SparkContext(conf)
31+
val sqlContext = new SQLContext(sc)
32+
33+
// $example on$
34+
val dataFrame = sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
35+
val scaler = new MaxAbsScaler()
36+
.setInputCol("features")
37+
.setOutputCol("scaledFeatures")
38+
39+
// Compute summary statistics and generate MaxAbsScalerModel
40+
val scalerModel = scaler.fit(dataFrame)
41+
42+
// rescale each feature to range [-1, 1]
43+
val scaledData = scalerModel.transform(dataFrame)
44+
scaledData.show()
45+
// $example off$
46+
sc.stop()
47+
}
48+
}
49+
// scalastyle:on println

0 commit comments

Comments
 (0)