Skip to content

Commit

Permalink
refactor: Update README.md and others
Browse files Browse the repository at this point in the history
  • Loading branch information
jiachuan.zhu committed Jul 20, 2021
1 parent 1317d00 commit 6805af9
Show file tree
Hide file tree
Showing 10 changed files with 228 additions and 37 deletions.
86 changes: 60 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,92 @@
## MLSQL

MLSQL is a Programming Language Designed For Big Data and AI, and it also have a distributed runtime.
MLSQL is a Programming Language designed For Big Data and AI, it also has a distributed runtime.

![](http://docs.mlsql.tech/upload_images/WechatIMG67.png)

## Official WebSite

[http://www.mlsql.tech](http://www.mlsql.tech)


Find more examples on [our user guide](http://docs.mlsql.tech/en).

1. [中文文档](http://docs.mlsql.tech/mlsql-stack/)
2. [English Docs](http://docs.mlsql.tech/en)


## Get PreBuild Distribution

## <a id="Download"></a>Download MLSQL
* The lasted version is MLSQL v2.0.1
* You can download from [MLSQL Website](http://download.mlsql.tech/2.0.1/)
* Spark 2.4.3/3.0.0 are tested

***Naming conventions***

Run PreBuild Distribution:

mlsql-engine_${spark_major_version}-${mlsql_version}.tgz
```shell
cp streamingpro-spark_2.x-x.x.x.tar.gz /tmp
cd /tmp && tar xzvf streamingpro-spark_2.x-x.x.x.tar.gz
cd /tmp/streamingpro-spark_2.x-x.x.x
## Pre-built for Spark 2.4.x
mlsql-engine_2.4-2.1.0-SNAPSHOT.tar.gz

## make sure spark distribution is available
## visit http://127.0.0.1:9003
export SPARK_HOME="....." ; ./start-default.sh
```
## Pre-built for Spark 3.0.x
mlsql-engine_3.0-2.1.0-SNAPSHOT.tar.gz
```

## Build Distribution
## <a id="Build"></a>Building a Distribution
### Prerequisites
- JDK 8+
- Maven
- Linux or MacOS

### Downloading Source Code
```shell
## Clone the code base
git clone https://github.com/allwefantasy/mlsql.git .
cd mlsql
```
### Building Spark 2.3.x Bundle
```shell
export MLSQL_SPARK_VERSION=2.3
./dev/make-distribution.sh
```
### Building Spark 2.4.x Bundle
```shell
# clone project
git clone https://github.com/allwefantasy/streamingpro .
cd streamingpro
export MLSQL_SPARK_VERSION=2.4
./dev/make-distribution.sh
```

## configure build envs
export MLSQL_SPARK_VERSIOIN=2.4
export DRY_RUN=false
export DISTRIBUTION=false
### Building Spark 3.0.x Bundle
```shell
export MLSQL_SPARK_VERSION=3.0
./dev/make-distribution.sh
```
### Building without Chinese Analyzer
```shell
## Chinese analyzer is enabled by default.
export ENABLE_CHINESE_ANALYZER=false
./dev/make-distribution.sh <spark_version>
```
### Building with Aliyun OSS Support
```shell
## Aliyun OSS support is disabled by default
export OSS_ENABLE=true
./dev/make-distribution.sh <spark_version>
```

## build
./dev/package.sh
## Deploying
1. [Download](#Download) or [build a distribution](#Build)
2. Install Spark and set environment variable SPARK_HOME
3. Deploy tgz
- Set environment variable MLSQL_HOME
- Copy distribution tar ball over and untar it
4.Start MLSQL in local mode
```shell
cd $MLSQL_HOME
## Run process in background
nohup ./bin/start-local.sh 2>&1 > ./local_mlsql.log &
```
5. Open a browser and type in http://localhost:9003, have fun.

## Fork and Contribute
## Contributing to MLSQL

If you are planning to contribute to this repository, we first request you to create an issue at [our Issue page](https://github.com/allwefantasy/streamingpro/issues)
If you are planning to contribute to this repository, please create an issue at [our Issue page](https://github.com/allwefantasy/streamingpro/issues)
even if the topic is not related to source code itself (e.g., documentation, new idea and proposal).

This is an active open source project for everyone,
Expand Down
Binary file added dev/ansj_seg-5.1.6.jar
Binary file not shown.
81 changes: 79 additions & 2 deletions dev/make-distribution.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,80 @@
#!/bin/bash
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

function exit_with_usage {
cat << EOF
Environment variables
MLSQL_SPARK_VERSION Spark major version 2.3 2.4 3.0 default 2.4
OSS_ENABLE Aliyun OSS default false
ENABLE_JYTHON Jython default true
ENABLE_CHINESE_ANALYZER Chinese NLP default true
ENABLE_HIVE_THRIFT_SERVER Hive ThriftServer default true
EOF
exit 1
}

if [[ $@ == *"help"* ]]; then
exit_with_usage
fi

export LC_ALL=zh_CN.UTF-8
export LANG=zh_CN.UTF-8

## Spark major version
export MLSQL_SPARK_VERSION=${MLSQL_SPARK_VERSION:-2.4}
## Enable Aliyun OSS support, default to false
export OSS_ENABLE=${OSS_ENABLE:-false}
## Enable Jython support
export ENABLE_JYTHON=${ENABLE_JYTHON:-true}
## Including Chinese NLP jars
export ENABLE_CHINESE_ANALYZER=${ENABLE_CHINESE_ANALYZER:-true}
## Including Hive ThriftServe jars
export ENABLE_HIVE_THRIFT_SERVER=${ENABLE_HIVE_THRIFT_SERVER:-true}

## DATASOURCE_INCLUDED is for testing purposes only; therefore false
export DATASOURCE_INCLUDED=false

export DRY_RUN=false
## True means making a distribution package
export DISTRIBUTION=true
./dev/package.sh

echo "Environment variables
MLSQL_SPARK_VERSION ${MLSQL_SPARK_VERSION}
OSS_ENABLE ${OSS_ENABLE}
ENABLE_JYTHON ${ENABLE_JYTHON}
ENABLE_CHINESE_ANALYZER ${ENABLE_CHINESE_ANALYZER}
ENABLE_HIVE_THRIFT_SERVER ${ENABLE_HIVE_THRIFT_SERVER}"

SELF=$(cd $(dirname $0) && pwd)
cd $SELF

if [[ ${MLSQL_SPARK_VERSION} = "2.3" || ${MLSQL_SPARK_VERSION} = "2.4" ]]
then
./change-scala-version.sh 2.11
elif [[ ${MLSQL_SPARK_VERSION} = "3.0" ]]
then
./change-scala-version.sh 2.12
else
echo "Spark-${MLSQL_SPARK_VERSION} is not supported"
exit_with_usage
exit 1
fi

## Start building
./package.sh
Binary file added dev/nlp-lang-1.7.8.jar
Binary file not shown.
24 changes: 23 additions & 1 deletion dev/package.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

function exit_with_usage {
cat << EOF
usage: package
Expand All @@ -26,7 +43,6 @@ cd $SELF
cd ..

MLSQL_SPARK_VERSION=${MLSQL_SPARK_VERSION:-2.4}
# SCALA_VERSION=${SCALA_VERSION:-2.11}
DRY_RUN=${DRY_RUN:-false}
DISTRIBUTION=${DISTRIBUTION:-false}
OSS_ENABLE=${OSS_ENABLE:-false}
Expand All @@ -45,6 +61,12 @@ for env in MLSQL_SPARK_VERSION DRY_RUN DISTRIBUTION; do
fi
done

if [[ ${ENABLE_CHINESE_ANALYZER} = true && ! -f $SELF/../dev/ansj_seg-5.1.6.jar && ! -f $SELF/../dev/nlp-lang-1.7.8.jar ]]
then
echo "When ENABLE_CHINESE_ANALYZER=true, ansj_seg-5.1.6.jar && nlp-lang-1.7.8.jar should be in ./dev/"
exit 1
fi

# before we compile and package, correct the version in MLSQLVersion
#---------------------

Expand Down
36 changes: 31 additions & 5 deletions dev/start-local.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
#!/bin/bash
#set -x
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

## 环境变量:
## SPARK_HOME
## MLSQL_HOME
##

set -u
set -e
set -o pipefail

for env in SPARK_HOME ; do
if [ -z "${!env}" ]; then
Expand All @@ -8,8 +33,10 @@ for env in SPARK_HOME ; do
fi
done

## 本脚本部署在${MLSQL_HOME}/bin 目录
if [ -z "${MLSQL_HOME}" ]; then
export MLSQL_HOME="$(cd "`dirname "$0"`"/.; pwd)"
export MLSQL_HOME="$(cd "`dirname "$0"`"/..; pwd)"
echo "MLSQL_HOME is not set, default to ${MLSQL_HOME}"
fi

JARS=$(echo ${MLSQL_HOME}/libs/*.jar | tr ' ' ',')
Expand All @@ -20,7 +47,6 @@ echo
echo "#############"
echo "Run with spark : $SPARK_HOME"
echo "With DRIVER_MEMORY=${DRIVER_MEMORY:-2g}"
echo "Try mannualy to copy https://github.com/allwefantasy/mlsql/blob/master/streamingpro-mlsql/src/main/resources-online/log4j.properties to your SPARK_HOME/CONF"
echo
echo "JARS: ${JARS}"
echo "MAIN_JAR: ${MLSQL_HOME}/libs/${MAIN_JAR}"
Expand All @@ -47,4 +73,4 @@ $SPARK_HOME/bin/spark-submit --class streaming.core.StreamingApp \
-streaming.driver.port 9003 \
-streaming.spark.service true \
-streaming.thrift false \
-streaming.enableHiveSupport true
-streaming.enableHiveSupport true
31 changes: 31 additions & 0 deletions dev/stop-local.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

############################################################
## Kills Locally run StreamingApp process
##
############################################################

set -u
set -e
set -o pipefail

pid=$(ps aux | grep '[S]treamingApp' | awk '{print $2}' | head -1)
echo "Killing StreamingApp ${pid}"
kill ${pid}
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -746,7 +746,7 @@
</exclusion>
<exclusion>
<groupId>tech.mlsql</groupId>
<artifactId>common-utils_2.12</artifactId>
<artifactId>common-utils_2.11</artifactId>
</exclusion>
<exclusion>
<groupId>org.scala-lang</groupId>
Expand Down
2 changes: 1 addition & 1 deletion streamingpro-assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
<packaging>pom</packaging>

<build>
<finalName>${project.parent.artifactId}_${version}</finalName>
<finalName>mlsql-engine_${spark.bigversion}-${project.version}</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
Expand Down
3 changes: 2 additions & 1 deletion streamingpro-assembly/src/main/assembly/assembly.xml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<assembly>
<id>bin-${project.version}</id>
<formats>
<format>tgz</format>
<format>tar.gz</format>
</formats>
<includeBaseDirectory>true</includeBaseDirectory>

Expand Down Expand Up @@ -31,6 +31,7 @@
<directory>${project.parent.basedir}/dev</directory>
<includes>
<include>start-local.sh</include>
<include>stop-local.sh</include>
</includes>
<outputDirectory>bin</outputDirectory>
</fileSet>
Expand Down

0 comments on commit 6805af9

Please sign in to comment.