Skip to content

Commit

Permalink
add doc
Browse files Browse the repository at this point in the history
  • Loading branch information
allwefantasy committed Jan 10, 2019
1 parent 3a8fb29 commit e29dc49
Show file tree
Hide file tree
Showing 18 changed files with 770 additions and 10 deletions.
170 changes: 163 additions & 7 deletions docs/gitbook/zh/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,166 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Summary

* [概述](README.md)
* [用户指南](test1/a.md)
* [数据源](guide/datasource/README.md)
* [JDBC](guide/datasource/jdbc.md)
* [Elasticsearch](guide/datasource/elasticsearch.md)
* [API文档](test2/a.md)
* [更多]()
## TABLE OF CONTENTS

* 概要
* [MSLQL简介](getting_started/README.md)
* [MSLQL-Engine](getting_started/mlsql-engine.md)
* [MLSQL-Cluster](getting_started/mlsql-cluster.md)
* [MLSQL-Console](getting_started/mlsql-console.md)

* 用户指南
* [编译&运行&部署](installation/README.md)
* [自助编译](installation/compile.md)
* [使用Docker](installation/docker.md)
* [如何运行](installation/run.md)
* [启动参数说明](installation/startup-configuration.md)

* [数据源](datasource/README.md)
* [JDBC](datasource/jdbc.md)
* [ElasticSearch](datasource/es.md)
* [Solr](datasource/solr.md)
* [HBase](datasource/hbase.md)
* [MongoDB](datasource/mongodb.md)
* [Parquet/Json/Text/Xml/Csv]()
* [jsonStr/script]()
* [mlsqlAPI/mlsqlConf]()
* [其他]()

* [变量设置]()
* [Conf]()
* [Shell]()
* [Sql]()
* [String]()

* [数据处理]()
* [Select 语法]()
* [Run 语法]()
* [Train 语法]()
* [Save 语法]()
* [内置Estimator/Transformer]()
* [直接操作MySQL]()
* [计算复杂的父子关系]()
* [改变表的分区数]()
* [如何发送邮件]()
* [如何缓存表]()

* [创建UDF/UDAF]()
* [Python UDF]()
* [Python UDAF]()
* [Scala UDF]()
* [Scala UDAF]()

* [系统UDF函数列表]()
* [http请求]()
* [向量操作]()

* [Python项目支持]()
* [Python项目规范]()
* [分布式运行Python项目]()
* [单实例运行Python项目]()
* [如何附带资源文件]()

* [项目化脚本]()
* [脚本如何互相引用]()

* [流式计算]()
* [MLSQL流式计算概念简介]()
* [数据源]()
* [Kafka]()
* [Mock]()
* [如何将JSON/CSV转化为表]()
* [数据写入]()

* [特征工程组件]()

* [文本向量化操作-TfIdf]()
* [文本向量化操作-Word2Vec]()
* [ScalerInPlace]()
* [ConfusionMatrix]()
* [FeatureExtract]()
* [NormalizeInPlace]()
* [ModelExplainInPlace]()
* [Discretizer]()
* [bucketizer]()
* [quantile]()
* [OpenCVImage]()
* [VecMapInPlace]()
* [JavaImage]()
* [TokenExtract / TokenAnalysis]()
* [RateSampler]()
* [RowMatrix]()
* [CommunityBasedSimilarityInPlace]()
* [Word2ArrayInPlace]()
* [WaterMarkInPlace]()
* [MapValues]()

* [Python算法]()
* [集成SKlearn示例]()
* [集成TensorFlow示例]()
* [TensorFlow Cluster支持]()

* [MLSQL内置算法]()
* [NaiveBayes]()
* [ALS]()
* [RandomForest]()
* [GBTRegressor]()
* [LDA]()
* [KMeans]()
* [FPGrowth]()
* [GBTs]()
* [LSVM]()
* [PageRank]()
* [LogisticRegressor]()
* [XGBoost]()

* [深度学习]()
* [加载图片数据]()
* [Cifar10示例]()

* [部署算法API服务]()
* [设计和原理]()
* [案例剖析]()

* [爬虫]()
* [爬虫示例]()
* [基于MLSQL爬虫系统的设计]()

* [保障数据安全]()
* [MLSQL统一授权体系]()
* [如何开发自定义授权规则]()

* [管理多个MLSQL实例]()
* [MLSQL-Cluster设计和原理]()
* [MLSQL-Cluster部署]()

* 开发者指南
* [如何参与开发]()
* [开发者列表]()
* MLSQL实战







33 changes: 33 additions & 0 deletions docs/gitbook/zh/datasource/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#数据源

MLSQL使用load语法进行数据源的加载,加载完成后,可以通过表名进行应用和查询。
本质上,MLSQL是通过load语法把数据源映射成一张表,然后就可以用标准的SQL或者其他方式进行处理了。

测试中,我们经常会自己制造一些数据,可以像这么用:

```sql
set rawData='''
{"a":1,"b":2}
{"a":1,"b":3}
''';

load jsonStr.`rawData` as data;

```

首先我们通过set语法设置了一个变量,这个变量可以使用三个`'` 来包括大段的文本,包括换行。
接着我们通过jsonStr数据源来加载这个普通的文本成为一张表,这个表我们取名为data.

如果正想描述load这个句子,就是 以jsonStr为数据源,加载字符变量 rawData,加载后的结果是一张表,并且我们取名为
data.

接着我们就可以这样使用了:

```
select * from data as newdata;
```

其中as语句后的newdata也是一张表明,代表的是经过select后得到的一张新表。

除了jsonStr,我们还能以相同的方式加载ElasticSearch,JDBC(如MySQL)等。后续章节我们会详细介绍。

37 changes: 37 additions & 0 deletions docs/gitbook/zh/datasource/es.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# ElasticSearch

ElasticSearch 是一个应用很广泛的数据系统。MLSQL也支持将其中的某个索引加载为表。

注意,ES的包并没有包含在MLSQL默认发型包里,所以你需要通过 --jars 带上相关的依赖才能使用。

## 加载数据

示例:

```sql
set data='''
{"jack":"cool"}
''';

load jsonStr.`data` as data1;

save overwrite data1 as es.`twitter/cool` where
`es.index.auto.create`="true"
and es.nodes="127.0.0.1";

load es.`twitter/cool` where
and es.nodes="127.0.0.1"
as table1;
select * from table1 as output1;

connect es where `es.index.auto.create`="true"
and es.nodes="127.0.0.1" as es_instance;

load es.`es_instance/twitter/cool`
as table1;
select * from table1 as output2;
```

在ES里,数据连接引用和表之间的分隔符不是`.`,而是`/`。 这是因为ES索引名允许带"."。
所以es相关的参数可以参考驱动[官方文档](https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html)

36 changes: 36 additions & 0 deletions docs/gitbook/zh/datasource/hbase.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# HBase

HBase 是一个应用很广泛的存储系统。MLSQL也支持将其中的某个索引加载为表。

注意,HBase的包并没有包含在MLSQL默认发型包里,所以你需要通过 --jars 带上相关的依赖才能使用。

MLSQL实现了相对应的驱动,可以通过如下方式获取jar包:

```
git clone https://github.com/allwefantasy/streamingpro .
mvn -Pshade -am -pl external/streamingpro-hbase -Pspark-2.4.0 -Pscala-2.11 -Ponline clean package
```

之后通过--jars带上 `external/streamingpro-hbase/target/streamingpro-hbase-x.x.x-SNAPSHOT.jar`

## 加载数据

示例:

```sql
connect hbase where `zk`="127.0.0.1:2181"
and `family`="cf" as hbase1;

load hbase.`hbase1:mlsql_example`
as mlsql_example;

select * from mlsql_example as show_data;


select '2' as rowkey, 'insert test data' as name as insert_table;

save insert_table as hbase.`hbase1:mlsql_example`;
```

在HBase里,数据连接引用和表之间的分隔符不是`.`,而是`:`

Loading

0 comments on commit e29dc49

Please sign in to comment.