What's StreamingPro and MLSQL?

StreamingPro is mainly designed to run on Apache Spark but it also supports Apache Flink for the runtime. Thus, it can be considered as a cross,distributed platform which is the combination of BigData platform and AI platform where you can run both Big Data Processing and Machine Learning script.

MLSQL is a DSL akin to SQL but more powerfull based on StreamingPro platform. Since StreamingPro have already intergrated many ML frameworks including Spark MLLib, DL4J and Python ML framework eg. Sklearn, Tensorflow(supporting cluster mode) this means you can use MLSQL to operate all these popular Machine Learning frameworks.

Why MLSQL

MLSQL give you the power to use just one SQL-like language to finish all your Machine Learning pipeline. It also provides so many modules and functions to help you simplify the complexity of building Machine Learning application.

MLSQL is the only one language you should take over.
Data preproccessing created in training phase can also be used in streaming, batch , API service directly without coding.
Server mode make you get rid of environment trouble.

Quick Tutorial

Step 1:

Download the jars from the release page: Release页面:

streamingpro-mlsql-1.1.2.jar
ansj_seg-5.1.6.jar
nlp-lang-1.7.8.jar

Step 2:

Visit the downloads page: Spark, to download Apache Spark 2.2.0 and then unarvhive it.

Step 3:

cd spark-2.2.0-bin-hadoop2.7/

./bin/spark-submit   --class streaming.core.StreamingApp \
--master local[*] \
--name sql-interactive \
--jars ansj_seg-5.1.6.jar,nlp-lang-1.7.8.jar
streamingpro-mlsql-1.1.2.jar    \
-streaming.name sql-interactive    \
-streaming.job.file.path file:///tmp/query.json \
-streaming.platform spark   \
-streaming.rest true   \
-streaming.driver.port 9003   \
-streaming.spark.service true \
-streaming.thrift false \
-streaming.enableHiveSupport true

query.json is a json file contains "{}".

Step 4:

Open your chrome browser, type the following url:

http://127.0.0.1:9003

Enjoy.

Run the first Machine Learning Script in MLSQL.

-- load data from spark distribution 
load libsvm.`/spark-2.2.0-bin-hadoop2.7/data/mllib/sample_libsvm_data.txt` as data;

-- train a NaiveBayes model and save it in /tmp/bayes_model.
-- Here the alg we use  is based on Spark MLlib 
train data as NaiveBayes.`/tmp/bayes_model`;

-- register your model
register NaiveBayes.`/tmp/bayes_model` as bayes_predict;

-- predict all data 
select bayes_predict(features) as predict_label, label  from data as result;

-- save predicted result in /tmp/result with json format
save overwrite result as json.`/tmp/result`;

-- show predict label in web table.
select * from result as res;

Please make sure the path /spark-2.2.0-bin-hadoop2.7/data/mllib/sample_libsvm_data.txt is correct.

Copy and paste the script to the web page, and click 运行, then you will see the label and predict_label.

Congratulations, you have completed the first Machine Learning script!

Run the first ETL Script In MLSQL.

select "a" as a,"b" as b
as abc;

-- here we just copy all from table abc and then create a new table newabc.

From Oscar:
-- we just copy all from table abc and create a new table newabc here.

select * from abc
as newabc;

-- save the newabc table to mysql.
save overwrite newabc
as jdbc.`db.abc`
options truncate="true"
and driver="com.mysql.jdbc.Driver"
and url="jdbc:mysql://127.0.0.1:3306/...."
and driver="com.mysql.jdbc.Driver"
and user="..."
and password="...."

Congratulations, you have completed the first ETL script!

Run as Application or Server

Application mode： Run StreamingPro as a application which executes a json file.
Server mode：Run StreamingPro as a server and you can interactive with it with http protocol.

We strongly recommend users to deploy StreamingPro with Server mode. Server mode is developed actively.

In order to avoid compiling problems, please use release version directly.

If you really want to use application mode, StreamingPro supports batch.mlsql keyword in json file, so you can still use mlsql grammar.(This function provided from v1.1.2)

{
  "mlsql": {
    "desc": "test",
    "strategy": "spark",
    "algorithm": [],
    "ref": [],
    "compositor": [
      {
        "name": "batch.mlsql",
        "params": [
          {
            "sql": [
              "select 'a' as a as table1;",
              "save overwrite table1 as parquet.`/tmp/kk`;"
            ]
          }
        ]
      }
    ],
    "configParams": {
    }
  }
}

Learning MLSQL

MLSQL Grammar
Using Build-in Algorithms
Stream Jobs

Compiling

How to compile
How to compile DSL module

Advanced Programming

How to implements user defined algorithm in MLSQL

Machine Learning

How to use spark MMLib in MLSQL
How to use TensorFlow in MLSQL
How to use SKlearn in MLSQL
SKlearn example
How to reiceive logs from StreamingPro

Model deploy

How to deploy your predict service

MLSQL

Datasources
How to use CarbonData as storage
Preprecessing modules in train statement
Preprecessing functions in select statement
How to use text analyzer in MLSQL
How to use MLSQL to crawl the web pages
How to use MLSQL to do batch processing
How to use MLSQL to do streaming processing

Tools

StreamingPro Manager
StreamingPro json editor

experiment

flink support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

What's StreamingPro and MLSQL?

Why MLSQL

Quick Tutorial

Run as Application or Server

Learning MLSQL

Compiling

Advanced Programming

Machine Learning

Model deploy

MLSQL

Tools

experiment

Other documents

Files

README.md

Latest commit

History

README.md

File metadata and controls

What's StreamingPro and MLSQL?

Why MLSQL

Quick Tutorial

Run as Application or Server

Learning MLSQL

Compiling

Advanced Programming

Machine Learning

Model deploy

MLSQL

Tools

experiment

Other documents