update doc

funny2014 · Apr 26, 2019 · 223011c · 223011c
1 parent 07ec9ca
commit 223011c
Show file tree

Hide file tree

Showing 8 changed files with 602 additions and 1 deletion.
diff --git a/docs/gitbook/en/SUMMARY.md b/docs/gitbook/en/SUMMARY.md
@@ -33,7 +33,15 @@
         * [Python UDF](guide/udf/python-udf.md) 
         * [Scala UDF](guide/udf/scala-udf.md) 
         * [Scala UDAF](guide/udf/scala-udaf.md)
-        * [JAVA UDF](guide/udf/java-udf.md)     
+        * [JAVA UDF](guide/udf/java-udf.md)
+
+    * [Built-in UDF](guide/system_udf/README.md)
+        * [Http UDF](guide/system_udf/http-udf.md)
+        * [Common UDF](guide/system_udf/http-udf.md)
+
+    * [Python project](guide/python/README.md)
+        * [Project Standard](guide/python/project.md)
+        * [Python environment](guide/python/env.md)         
 
 * [API Docs](test2/a.md)
 * [More]()

diff --git a/docs/gitbook/en/guide/python/README.md b/docs/gitbook/en/guide/python/README.md
@@ -0,0 +1,4 @@
+# Python Project supports
+
+MLSQL not only support Python UDF but also Python Project. We use conda to resolve python environment and this 
+is transparent for users.
diff --git a/docs/gitbook/en/guide/python/env.md b/docs/gitbook/en/guide/python/env.md
@@ -0,0 +1,38 @@
+# Python Environment
+
+Before you can really run your python project, you should create the environment which your project 
+depends.
+
+It looks like this:
+
+```sql
+set dependencies='''
+name: tutorial4
+dependencies:
+  - python=3.6
+  - pip
+  - pip:
+    - --index-url https://mirrors.aliyun.com/pypi/simple/
+    - numpy==1.14.3
+    - kafka==1.3.5
+    - pyspark==2.3.2
+    - pandas==0.22.0
+''';
+
+load script.`dependencies` as dependencies;
+
+run command as PythonEnvExt.`/tmp/jack` where condaFile="dependencies" and command="create";
+```
+
+If you wanna remove this env, set command to `remove`. Notice that you should make sure all you machines have conda installed 
+and the internet connection is ok.
+
+You can also specify  condaYamlFilePath, which is the location of conda.yaml.
+
+When you run python project meets errors like `Could not find Conda executable at conda`, you can add config in 
+PythonAlg/PythonParallelExt.
+
+```sql
+-- anaconda3 local path
+and systemParam.envs='''{"MLFLOW_CONDA_HOME":"/anaconda3"}''';
+```
diff --git a/docs/gitbook/en/guide/python/process.md b/docs/gitbook/en/guide/python/process.md
@@ -0,0 +1,31 @@
+# Distribute run Python project In MLSQL
+
+## Prerequisites
+
+If you runs on yarn mode, please make sure you start the MLSQL Engine with follow configuration:
+
+```
+-streaming.ps.cluster.enable  should be  enabled.
+
+Please make sure
+you have the uber-jar of mlsql placed in
+1. --jars
+2. --conf "spark.executor.extraClassPath=[your jar name in jars]"
+
+for exmaple:
+
+--jars ./streamingpro-mlsql-spark_2.x-x.x.x-SNAPSHOT.jar
+--conf "spark.executor.extraClassPath=streamingpro-mlsql-spark_2.x-x.x.x-SNAPSHOT.jar"
+
+Otherwise the executor will
+fail to start and the whole application will fails.
+```
+
+If you runs on  Standalone, please send the MLSQL jar to every node and then configure:
+
+```
+--conf "spark.executor.extraClassPath=[MLSQL jar path]"
+```
+
+## How to use
+
diff --git a/docs/gitbook/en/guide/python/project.md b/docs/gitbook/en/guide/python/project.md
@@ -0,0 +1,52 @@
+# Python project standard
+
+MLSQL has low invasion on your python project. You add two description file and then convert this 
+project to a MLSQL compatible project. 
+
+Here is the structure of project:
+
+```
+examples/sklearn_elasticnet_wine/
+├── MLproject
+├── batchPredict.py
+├── conda.yaml
+├── predict.py
+├── train.py
+```
+
+MLproject describe how to execute the project.
+conda.yaml describe how to build python environment.
+
+MLProject contains：
+
+```yaml
+name: tutorial
+
+conda_env: conda.yaml
+
+entry_points:
+  main:
+    train:        
+        command: "python train.py"
+    batch_predict:                  
+        command: "python batchPredict.py"
+    api_predict:        
+        command: "python predict.py"
+```
+
+
+conda.yaml: 
+
+```
+name: tutorial
+dependencies:
+  - python=3.6
+  - pip
+  - pip:
+    - --index-url https://mirrors.aliyun.com/pypi/simple/
+    - numpy==1.14.3
+    - kafka==1.3.5
+    - pyspark==2.3.2
+    - pandas==0.22.0
+```
+
diff --git a/docs/gitbook/en/guide/system_udf/README.md b/docs/gitbook/en/guide/system_udf/README.md
@@ -0,0 +1,3 @@
+# Built-in UDF List
+
+MLSQL has a lot built-in UDFs.
diff --git a/docs/gitbook/en/guide/system_udf/http-udf.md b/docs/gitbook/en/guide/system_udf/http-udf.md
@@ -0,0 +1,40 @@
+# HTTP UDFs
+
+Before you can use this functions, please add this line to your startup script:
+
+```sql
+ -streaming.udf.clzznames streaming.crawler.udf.Functions
+```
+
+HTTP UDFs make MLSQL more powerful, this means you can invoke any API from out or inner to help you achieve your target.
+
+For example:
+
+```sql
+select crawler_http("http://www.csdn.net","GET",map("k1","v1","k2","v2")) as c as output;
+```
+
+The second parameter supports:
+
+* GET
+* POST
+
+MLSQL also support download image:
+
+```sql
+select crawler_request_image("http://www.csdn.net","GET",map("k1","v1","k2","v2")) as c as output;
+```
+
+c is array[byte].
+
+We also provide UDFs which you can used to extract title and body from html:
+
+*  crawler_auto_extract_body
+*  crawler_auto_extract_title
+
+Or you can use xpath to extract something you want:
+
+```sql
+
+crawler_extract_xpath(html,xpath)
+```
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Built-in UDF List

		MLSQL has a lot built-in UDFs.