GitHub - gcstar/streamingpro: Build big Data processing and Machine Learning platform with MLSQL

What's StreamingPro and MLSQL?

StreamingPro is mainly designed to run on Apache Spark but it also supports Apache Flink for the runtime. Thus, it can be considered as a cross,distributed platform which is the combination of BigData platform and AI platform where you can run both Big Data Processing and Machine Learning script.

MLSQL is a DSL akin to SQL but more powerfull based on StreamingPro platform. Since StreamingPro have already intergrated many ML frameworks including Spark MLLib, DL4J and Python ML framework eg. Sklearn, Tensorflow(supporting cluster mode) this means you can use MLSQL to operate all these popular Machine Learning frameworks.

Why MLSQL

MLSQL give you the power to use just one SQL-like language to finish all your Machine Learning pipeline. It also provides so many modules and functions to help you simplify the complexity of building Machine Learning application.

MLSQL is the only one language you should take over.
Data preproccessing created in training phase can also be used in streaming, batch , API service directly without coding.
Server mode make you get rid of environment trouble.

Quick Tutorial

Step 1:

Download the jars from the release page: Release页面:

streamingpro-mlsql-1.x.x.jar
ansj_seg-5.1.6.jar
nlp-lang-1.7.8.jar

Step 2:

Visit the downloads page: Spark, to download Apache Spark 2.2.0 and then unarvhive it.

Step 3:

cd spark-2.2.0-bin-hadoop2.7/

./bin/spark-submit   --class streaming.core.StreamingApp \
--master local[*] \
--name sql-interactive \
--jars ansj_seg-5.1.6.jar,nlp-lang-1.7.8.jar
streamingpro-mlsql-1.1.2.jar    \
-streaming.name sql-interactive    \
-streaming.job.file.path file:///tmp/query.json \
-streaming.platform spark   \
-streaming.rest true   \
-streaming.driver.port 9003   \
-streaming.spark.service true \
-streaming.thrift false \
-streaming.enableHiveSupport true

query.json is a json file contains "{}".

Step 4:

Open your chrome browser, type the following url:

http://127.0.0.1:9003

Enjoy.

Run the first Machine Learning Script in MLSQL.

-- load data from spark distribution 
load libsvm.`/spark-2.2.0-bin-hadoop2.7/data/mllib/sample_libsvm_data.txt` as data;

-- train a NaiveBayes model and save it in /tmp/bayes_model.
-- Here the alg we use  is based on Spark MLlib 
train data as NaiveBayes.`/tmp/bayes_model`;

-- register your model
register NaiveBayes.`/tmp/bayes_model` as bayes_predict;

-- predict all data 
select bayes_predict(features) as predict_label, label  from data as result;

-- save predicted result in /tmp/result with json format
save overwrite result as json.`/tmp/result`;

-- show predict label in web table.
select * from result as res;

Please make sure the path /spark-2.2.0-bin-hadoop2.7/data/mllib/sample_libsvm_data.txt is correct.

Copy and paste the script to the web page, and click 运行, then you will see the label and predict_label.

Congratulations, you have completed the first Machine Learning script!

Run the first ETL Script In MLSQL.

select "a" as a,"b" as b
as abc;

-- here we just copy all from table abc and then create a new table newabc.

From Oscar:
-- we just copy all from table abc and create a new table newabc here.

select * from abc
as newabc;

-- save the newabc table to mysql.
save overwrite newabc
as jdbc.`db.abc`
options truncate="true"
and driver="com.mysql.jdbc.Driver"
and url="jdbc:mysql://127.0.0.1:3306/...."
and driver="com.mysql.jdbc.Driver"
and user="..."
and password="...."

Congratulations, you have completed the first ETL script!

Run as Application or Server

Application mode： Run StreamingPro as a application which executes a json file.
Server mode：Run StreamingPro as a server and you can interactive with it with http protocol.

We strongly recommend users to deploy StreamingPro with Server mode. Server mode is developed actively.

In order to avoid compiling problems, please use release version directly.

If you really want to use application mode, StreamingPro supports batch.mlsql keyword in json file, so you can still use mlsql grammar.(This function provided from v1.1.2)

{
  "mlsql": {
    "desc": "test",
    "strategy": "spark",
    "algorithm": [],
    "ref": [],
    "compositor": [
      {
        "name": "batch.mlsql",
        "params": [
          {
            "sql": [
              "select 'a' as a as table1;",
              "save overwrite table1 as parquet.`/tmp/kk`;"
            ]
          }
        ]
      }
    ],
    "configParams": {
    }
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,223 Commits
bin		bin
docs		docs
images		images
src/main/java/streaming/util		src/main/java/streaming/util
streamingpro-api		streamingpro-api
streamingpro-commons		streamingpro-commons
streamingpro-crawler		streamingpro-crawler
streamingpro-dl4j		streamingpro-dl4j
streamingpro-dsl-legacy		streamingpro-dsl-legacy
streamingpro-dsl		streamingpro-dsl
streamingpro-flink		streamingpro-flink
streamingpro-hbase		streamingpro-hbase
streamingpro-jython		streamingpro-jython
streamingpro-manager		streamingpro-manager
streamingpro-mlsql		streamingpro-mlsql
streamingpro-opencv		streamingpro-opencv
streamingpro-redis		streamingpro-redis
streamingpro-spark-2.2.0-adaptor		streamingpro-spark-2.2.0-adaptor
streamingpro-spark-2.3.0-adaptor		streamingpro-spark-2.3.0-adaptor
streamingpro-spark-common		streamingpro-spark-common
streamingpro-spark		streamingpro-spark
streamingpro-tensorflow		streamingpro-tensorflow
template/rest		template/rest
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What's StreamingPro and MLSQL?

Why MLSQL

Quick Tutorial

Run as Application or Server

Learning MLSQL

Compiling

Advanced Programming

Machine Learning

Model deploy

MLSQL

Tools

experiment

Other documents

About

Releases

Packages

Languages

License

gcstar/streamingpro

Folders and files

Latest commit

History

Repository files navigation

What's StreamingPro and MLSQL?

Why MLSQL

Quick Tutorial

Run as Application or Server

Learning MLSQL

Compiling

Advanced Programming

Machine Learning

Model deploy

MLSQL

Tools

experiment

Other documents

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages