StreamingPro is mainly designed to run on Apache Spark but it also supports Apache Flink for the runtime. Thus, it can be considered as a cross,distributed platform which is the combination of BigData platform and AI platform where you can run both Big Data Processing and Machine Learning script.
MLSQL is a DSL akin to SQL but more powerfull based on StreamingPro platform. Since StreamingPro have already intergrated many ML frameworks including Spark MLLib, DL4J and Python ML framework eg. Sklearn, Tensorflow(supporting cluster mode) this means you can use MLSQL to operate all these popular Machine Learning frameworks.
MLSQL give you the power to use just one SQL-like language to finish all your Machine Learning pipeline. It also provides so many modules and functions to help you simplify the complexity of building Machine Learning application.
- MLSQL is the only one language you should take over.
- Data preproccessing created in training phase can also be used in streaming, batch , API service directly without coding.
- Server mode make you get rid of environment trouble.
Step 1:
Download the jars from the release page: Release页面:
- streamingpro-mlsql-1.x.x.jar
- ansj_seg-5.1.6.jar
- nlp-lang-1.7.8.jar
Step 2:
Visit the downloads page: Spark, to download Apache Spark 2.2.0 and then unarvhive it.
Step 3:
cd spark-2.2.0-bin-hadoop2.7/
./bin/spark-submit --class streaming.core.StreamingApp \
--master local[*] \
--name sql-interactive \
--jars ansj_seg-5.1.6.jar,nlp-lang-1.7.8.jar
streamingpro-mlsql-1.1.2.jar \
-streaming.name sql-interactive \
-streaming.job.file.path file:///tmp/query.json \
-streaming.platform spark \
-streaming.rest true \
-streaming.driver.port 9003 \
-streaming.spark.service true \
-streaming.thrift false \
-streaming.enableHiveSupport true
query.json
is a json file contains "{}".
Step 4:
Open your chrome browser, type the following url:
http://127.0.0.1:9003
Enjoy.
Run the first Machine Learning Script in MLSQL.
-- load data from spark distribution
load libsvm.`/spark-2.2.0-bin-hadoop2.7/data/mllib/sample_libsvm_data.txt` as data;
-- train a NaiveBayes model and save it in /tmp/bayes_model.
-- Here the alg we use is based on Spark MLlib
train data as NaiveBayes.`/tmp/bayes_model`;
-- register your model
register NaiveBayes.`/tmp/bayes_model` as bayes_predict;
-- predict all data
select bayes_predict(features) as predict_label, label from data as result;
-- save predicted result in /tmp/result with json format
save overwrite result as json.`/tmp/result`;
-- show predict label in web table.
select * from result as res;
Please make sure the path /spark-2.2.0-bin-hadoop2.7/data/mllib/sample_libsvm_data.txt
is correct.
Copy and paste the script to the web page, and click 运行
, then you will see the label and predict_label.
Congratulations, you have completed the first Machine Learning script!
Run the first ETL Script In MLSQL.
select "a" as a,"b" as b
as abc;
-- here we just copy all from table abc and then create a new table newabc.
From Oscar:
-- we just copy all from table abc and create a new table newabc here.
select * from abc
as newabc;
-- save the newabc table to mysql.
save overwrite newabc
as jdbc.`db.abc`
options truncate="true"
and driver="com.mysql.jdbc.Driver"
and url="jdbc:mysql://127.0.0.1:3306/...."
and driver="com.mysql.jdbc.Driver"
and user="..."
and password="...."
Congratulations, you have completed the first ETL script!
- Application mode: Run StreamingPro as a application which executes a json file.
- Server mode:Run StreamingPro as a server and you can interactive with it with http protocol.
We strongly recommend users to deploy StreamingPro with Server mode. Server mode is developed actively.
In order to avoid compiling problems, please use release version directly.
If you really want to use application mode, StreamingPro supports batch.mlsql
keyword in json file,
so you can still use mlsql grammar.(This function provided from v1.1.2)
{
"mlsql": {
"desc": "test",
"strategy": "spark",
"algorithm": [],
"ref": [],
"compositor": [
{
"name": "batch.mlsql",
"params": [
{
"sql": [
"select 'a' as a as table1;",
"save overwrite table1 as parquet.`/tmp/kk`;"
]
}
]
}
],
"configParams": {
}
}
}
- MLSQL Grammar
- Using Build-in Algorithms
- Scala/Python UDF
- Stream Jobs
- Using Python ML Framework To Train And Predict Within MLSQL
- How to use spark MMLib in MLSQL
- How to use TensorFlow in MLSQL
- How to use SKlearn in MLSQL
- SKlearn example
- How to reiceive logs from StreamingPro