forked from byzer-org/byzer-lang
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request byzer-org#526 from allwefantasy/mlsql
以RandomForest为例,增强了MLLib库的算法
- Loading branch information
Showing
67 changed files
with
648 additions
and
103 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
## RandomForest Example | ||
|
||
Suppose you want to use RandomForest to train a model. Here is the MLSQL script. | ||
|
||
```sql | ||
|
||
-- create test data | ||
set jsonStr=''' | ||
{"features":[5.1,3.5,1.4,0.2],"label":0.0}, | ||
{"features":[5.1,3.5,1.4,0.2],"label":1.0} | ||
{"features":[5.1,3.5,1.4,0.2],"label":0.0} | ||
{"features":[4.4,2.9,1.4,0.2],"label":0.0} | ||
{"features":[5.1,3.5,1.4,0.2],"label":1.0} | ||
{"features":[5.1,3.5,1.4,0.2],"label":0.0} | ||
{"features":[5.1,3.5,1.4,0.2],"label":0.0} | ||
{"features":[4.7,3.2,1.3,0.2],"label":1.0} | ||
{"features":[5.1,3.5,1.4,0.2],"label":0.0} | ||
{"features":[5.1,3.5,1.4,0.2],"label":0.0} | ||
'''; | ||
load jsonStr.`jsonStr` as data; | ||
select vec_dense(features) as features ,label as label from data | ||
as data1; | ||
|
||
-- use RandomForest | ||
train data1 as RandomForest.`/tmp/model` where | ||
|
||
-- once set true,every time you run this script, MLSQL will generate new directory for you model | ||
keepVersion="true" | ||
|
||
-- specicy the test dataset which will be used to feed evaluator to generate some metrics e.g. F1, Accurate | ||
and evaluateTable="data1" | ||
|
||
-- specify group 0 parameters | ||
and `fitParam.0.labelCol`="features" | ||
and `fitParam.0.featuresCol`="label" | ||
and `fitParam.0.maxDepth`="2" | ||
|
||
-- specify group 1 parameters | ||
and `fitParam.0.featuresCol`="features" | ||
and `fitParam.0.labelCol`="label" | ||
and `fitParam.1.maxDepth`="10" | ||
; | ||
|
||
``` | ||
|
||
When this script is executed, the following result will be showed in web console: | ||
|
||
|
||
|
||
``` | ||
name value | ||
--------------------------------- | ||
modelPath /tmp/model/_model_10/model/1 | ||
algIndex 1 | ||
alg org.apache.spark.ml.classification.RandomForestClassifier | ||
metrics f1: 0.7625000000000001 weightedPrecision: 0.8444444444444446 weightedRecall: 0.7999999999999999 accuracy: 0.8 | ||
status success | ||
startTime 20180913 59:15:32:685 | ||
endTime 20180913 59:15:36:317 | ||
trainParams Map(maxDepth -> 10) | ||
--------------------------------- | ||
modelPath /tmp/model/_model_10/model/0 | ||
algIndex 0 | ||
alg org.apache.spark.ml.classification.RandomForestClassifier | ||
metrics f1:0.7625000000000001 weightedPrecision: 0.8444444444444446 weightedRecall: 0.7999999999999999 accuracy: 0.8 | ||
status success | ||
startTime 20180913 59:1536:318 | ||
endTime 20180913 59:1538:024 | ||
trainParams Map(maxDepth -> 2, featuresCol -> features, labelCol -> label) | ||
``` | ||
|
||
If you feel ok, register and use the model: | ||
|
||
```sql | ||
register RandomForest.`/tmp/model` as rf_predict; | ||
|
||
-- you can specify which module you want to use: | ||
register RandomForest.`/tmp/model` as rf_predict where | ||
algIndex="0"; | ||
|
||
-- you can specify which metric the MLSQL should use to get best model | ||
register RandomForest.`/tmp/model` as rf_predict where | ||
autoSelectByMetric="f1"; | ||
|
||
select rf_predict(features) as predict_label, label from data1 as output; | ||
``` | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<parent> | ||
<artifactId>streamingpro</artifactId> | ||
<groupId>streaming.king</groupId> | ||
<version>1.1.3</version> | ||
</parent> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<artifactId>streamingpro-automl</artifactId> | ||
<dependencies> | ||
<dependency> | ||
<groupId>com.salesforce.transmogrifai</groupId> | ||
<artifactId>transmogrifai-core_2.11</artifactId> | ||
<version>0.3.4</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-sql_${scala.binary.version}</artifactId> | ||
<version>${spark.version}</version> | ||
<scope>${scope}</scope> | ||
</dependency> | ||
|
||
|
||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-core_${scala.binary.version}</artifactId> | ||
<version>${spark.version}</version> | ||
<scope>${scope}</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-mllib_${scala.binary.version}</artifactId> | ||
<version>${spark.version}</version> | ||
<scope>${scope}</scope> | ||
</dependency> | ||
</dependencies> | ||
|
||
</project> |
26 changes: 26 additions & 0 deletions
26
streamingpro-automl/src/main/java/streaming/example/AutoMLExample.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
package streaming.example | ||
|
||
import com.salesforce.op._ | ||
import com.salesforce.op.evaluators.Evaluators | ||
import com.salesforce.op.readers._ | ||
import com.salesforce.op.features._ | ||
import com.salesforce.op.features.types._ | ||
import com.salesforce.op.stages.impl.classification._ | ||
import com.salesforce.op.test.Passenger | ||
import org.apache.spark.SparkConf | ||
import org.apache.spark.sql.SparkSession | ||
import com.salesforce.op.features.FeatureBuilder | ||
import com.salesforce.op.features.types._ | ||
|
||
/** | ||
* Created by allwefantasy on 12/9/2018. | ||
*/ | ||
object AutoMLExample { | ||
def main(args: Array[String]): Unit = { | ||
|
||
|
||
|
||
} | ||
} | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletions
10
streamingpro-mlsql/src/main/java/streaming/dsl/load/batch/ModelSource.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
package streaming.dsl.load.batch | ||
|
||
/** | ||
* Created by allwefantasy on 12/9/2018. | ||
*/ | ||
class ModelSource { | ||
def output() = { | ||
|
||
} | ||
} |
Oops, something went wrong.