forked from byzer-org/byzer-lang
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
07ec9ca
commit 223011c
Showing
8 changed files
with
602 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Python Project supports | ||
|
||
MLSQL not only support Python UDF but also Python Project. We use conda to resolve python environment and this | ||
is transparent for users. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Python Environment | ||
|
||
Before you can really run your python project, you should create the environment which your project | ||
depends. | ||
|
||
It looks like this: | ||
|
||
```sql | ||
set dependencies=''' | ||
name: tutorial4 | ||
dependencies: | ||
- python=3.6 | ||
- pip | ||
- pip: | ||
- --index-url https://mirrors.aliyun.com/pypi/simple/ | ||
- numpy==1.14.3 | ||
- kafka==1.3.5 | ||
- pyspark==2.3.2 | ||
- pandas==0.22.0 | ||
'''; | ||
|
||
load script.`dependencies` as dependencies; | ||
|
||
run command as PythonEnvExt.`/tmp/jack` where condaFile="dependencies" and command="create"; | ||
``` | ||
|
||
If you wanna remove this env, set command to `remove`. Notice that you should make sure all you machines have conda installed | ||
and the internet connection is ok. | ||
|
||
You can also specify condaYamlFilePath, which is the location of conda.yaml. | ||
|
||
When you run python project meets errors like `Could not find Conda executable at conda`, you can add config in | ||
PythonAlg/PythonParallelExt. | ||
|
||
```sql | ||
-- anaconda3 local path | ||
and systemParam.envs='''{"MLFLOW_CONDA_HOME":"/anaconda3"}'''; | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Distribute run Python project In MLSQL | ||
|
||
## Prerequisites | ||
|
||
If you runs on yarn mode, please make sure you start the MLSQL Engine with follow configuration: | ||
|
||
``` | ||
-streaming.ps.cluster.enable should be enabled. | ||
Please make sure | ||
you have the uber-jar of mlsql placed in | ||
1. --jars | ||
2. --conf "spark.executor.extraClassPath=[your jar name in jars]" | ||
for exmaple: | ||
--jars ./streamingpro-mlsql-spark_2.x-x.x.x-SNAPSHOT.jar | ||
--conf "spark.executor.extraClassPath=streamingpro-mlsql-spark_2.x-x.x.x-SNAPSHOT.jar" | ||
Otherwise the executor will | ||
fail to start and the whole application will fails. | ||
``` | ||
|
||
If you runs on Standalone, please send the MLSQL jar to every node and then configure: | ||
|
||
``` | ||
--conf "spark.executor.extraClassPath=[MLSQL jar path]" | ||
``` | ||
|
||
## How to use | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Python project standard | ||
|
||
MLSQL has low invasion on your python project. You add two description file and then convert this | ||
project to a MLSQL compatible project. | ||
|
||
Here is the structure of project: | ||
|
||
``` | ||
examples/sklearn_elasticnet_wine/ | ||
├── MLproject | ||
├── batchPredict.py | ||
├── conda.yaml | ||
├── predict.py | ||
├── train.py | ||
``` | ||
|
||
MLproject describe how to execute the project. | ||
conda.yaml describe how to build python environment. | ||
|
||
MLProject contains: | ||
|
||
```yaml | ||
name: tutorial | ||
|
||
conda_env: conda.yaml | ||
|
||
entry_points: | ||
main: | ||
train: | ||
command: "python train.py" | ||
batch_predict: | ||
command: "python batchPredict.py" | ||
api_predict: | ||
command: "python predict.py" | ||
``` | ||
conda.yaml: | ||
``` | ||
name: tutorial | ||
dependencies: | ||
- python=3.6 | ||
- pip | ||
- pip: | ||
- --index-url https://mirrors.aliyun.com/pypi/simple/ | ||
- numpy==1.14.3 | ||
- kafka==1.3.5 | ||
- pyspark==2.3.2 | ||
- pandas==0.22.0 | ||
``` | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Built-in UDF List | ||
|
||
MLSQL has a lot built-in UDFs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# HTTP UDFs | ||
|
||
Before you can use this functions, please add this line to your startup script: | ||
|
||
```sql | ||
-streaming.udf.clzznames streaming.crawler.udf.Functions | ||
``` | ||
|
||
HTTP UDFs make MLSQL more powerful, this means you can invoke any API from out or inner to help you achieve your target. | ||
|
||
For example: | ||
|
||
```sql | ||
select crawler_http("http://www.csdn.net","GET",map("k1","v1","k2","v2")) as c as output; | ||
``` | ||
|
||
The second parameter supports: | ||
|
||
* GET | ||
* POST | ||
|
||
MLSQL also support download image: | ||
|
||
```sql | ||
select crawler_request_image("http://www.csdn.net","GET",map("k1","v1","k2","v2")) as c as output; | ||
``` | ||
|
||
c is array[byte]. | ||
|
||
We also provide UDFs which you can used to extract title and body from html: | ||
|
||
* crawler_auto_extract_body | ||
* crawler_auto_extract_title | ||
|
||
Or you can use xpath to extract something you want: | ||
|
||
```sql | ||
|
||
crawler_extract_xpath(html,xpath) | ||
``` |
Oops, something went wrong.