Skip to content

Commit

Permalink
[docs-sync] Synchronize the latest documentation changes (commits to 5…
Browse files Browse the repository at this point in the history
…6c6b48) into Chinese documents
  • Loading branch information
wuchong committed Aug 11, 2019
1 parent f400fbb commit edc6826
Show file tree
Hide file tree
Showing 18 changed files with 428 additions and 135 deletions.
2 changes: 1 addition & 1 deletion docs/dev/api_concepts.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -758,7 +758,7 @@ Restrictions apply to classes containing fields that cannot be serialized, like
resources. Classes that follow the Java Beans conventions work well in general.

All classes that are not identified as POJO types (see POJO requirements above) are handled by Flink as general class types.
Flink treats these data types as black boxes and is not able to access their content (i.e., for efficient sorting). General types are de/serialized using the serialization framework [Kryo](https://github.com/EsotericSoftware/kryo).
Flink treats these data types as black boxes and is not able to access their content (e.g., for efficient sorting). General types are de/serialized using the serialization framework [Kryo](https://github.com/EsotericSoftware/kryo).

#### Values

Expand Down
5 changes: 5 additions & 0 deletions docs/dev/connectors/filesystem_sink.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ specific language governing permissions and limitations
under the License.
-->

<div class="alert alert-info" markdown="span">
The `BucketingSink` has been **deprecated since Flink 1.9** and will be removed in subsequent releases.
Please use the [__StreamingFileSink__]({{site.baseurl}}/dev/connectors/streamfile_sink.html) instead.
</div>

这个连接器可以向所有 [Hadoop FileSystem](http://hadoop.apache.org) 支持的文件系统写入分区文件。
使用前,需要在工程里添加下面的依赖:

Expand Down
7 changes: 6 additions & 1 deletion docs/dev/connectors/guarantees.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ Please read the documentation of each connector to understand the details of the
<td>at most once</td>
<td></td>
</tr>
<tr>
<td>Google PubSub</td>
<td>at least once</td>
<td></td>
</tr>
<tr>
<td>Collections</td>
<td>exactly once</td>
Expand Down Expand Up @@ -104,7 +109,7 @@ state updates) of Flink coupled with bundled sinks:
</tr>
<tr>
<td>Kafka producer</td>
<td>at least once/ exactly once</td>
<td>at least once / exactly once</td>
<td>exactly once with transactional producers (v 0.11+)</td>
</tr>
<tr>
Expand Down
1 change: 1 addition & 0 deletions docs/dev/connectors/index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ under the License.
* [RabbitMQ](rabbitmq.html) (source/sink)
* [Apache NiFi](nifi.html) (source/sink)
* [Twitter Streaming API](twitter.html) (source)
* [Google PubSub](pubsub.html) (source/sink)

请记住,在使用一种连接器时,通常需要额外的第三方组件,比如:数据存储服务器或者消息队列。
要注意这些列举的连接器是 Flink 工程的一部分,包含在发布的源码中,但是不包含在二进制发行版中。
Expand Down
10 changes: 6 additions & 4 deletions docs/dev/connectors/pubsub.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ DataStream<SomeObject> dataStream = (...);

SerializationSchema<SomeObject> serializationSchema = (...);
SinkFunction<SomeObject> pubsubSink = PubSubSink.newBuilder()
.withDeserializationSchema(deserializer)
.withSerializationSchema(serializationSchema)
.withProjectName("project")
.withSubscriptionName("subscription")
.build()
Expand All @@ -120,18 +120,20 @@ The following example shows how you would create a source to read messages from
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
{% highlight java %}
String hostAndPort = "localhost:1234";
DeserializationSchema<SomeObject> deserializationSchema = (...);
SourceFunction<SomeObject> pubsubSource = PubSubSource.newBuilder()
.withDeserializationSchema(deserializationSchema)
.withProjectName("my-fake-project")
.withSubscriptionName("subscription")
.withPubSubSubscriberFactory(new PubSubSubscriberFactoryForEmulator("localhost:1234", "my-fake-project", "subscription", 10, Duration.ofSeconds(15), 100))
.withPubSubSubscriberFactory(new PubSubSubscriberFactoryForEmulator(hostAndPort, "my-fake-project", "subscription", 10, Duration.ofSeconds(15), 100))
.build();
SerializationSchema<SomeObject> serializationSchema = (...);
SinkFunction<SomeObject> pubsubSink = PubSubSink.newBuilder()
.withDeserializationSchema(deserializationSchema)
.withSerializationSchema(serializationSchema)
.withProjectName("my-fake-project")
.withSubscriptionName("subscription")
.withHostAndPortForEmulator(getPubSubHostPort())
.withHostAndPortForEmulator(hostAndPort)
.build()

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Expand Down
22 changes: 18 additions & 4 deletions docs/dev/custom_serializers.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,20 @@ env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.cla
<dependency>
<groupId>com.twitter</groupId>
<artifactId>chill-thrift</artifactId>
<version>0.5.2</version>
<version>0.7.6</version>
<!-- exclusions for dependency conversion -->
<exclusions>
<exclusion>
<groupId>com.esotericsoftware.kryo</groupId>
<artifactId>kryo</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- libthrift is required by chill-thrift -->
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libthrift</artifactId>
<version>0.6.1</version>
<version>0.11.0</version>
<exclusions>
<exclusion>
<groupId>javax.servlet</groupId>
Expand All @@ -90,13 +97,20 @@ env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.cla
<dependency>
<groupId>com.twitter</groupId>
<artifactId>chill-protobuf</artifactId>
<version>0.5.2</version>
<version>0.7.6</version>
<!-- exclusions for dependency conversion -->
<exclusions>
<exclusion>
<groupId>com.esotericsoftware.kryo</groupId>
<artifactId>kryo</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- We need protobuf for chill-protobuf -->
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>2.5.0</version>
<version>3.7.0</version>
</dependency>

{% endhighlight %}
Expand Down
8 changes: 8 additions & 0 deletions docs/dev/stream/state/schema_evolution.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,12 @@ Flink 完全支持 Avro 状态类型的升级,只要数据结构的修改是

一个例外是如果新的 Avro 数据 schema 生成的类无法被重定位或者使用了不同的命名空间,在作业恢复时状态数据会被认为是不兼容的。

{% warn Attention %} Schema evolution of keys is not supported.

Example: RocksDB state backend relies on binary objects identity, rather than `hashCode` method implementation. Any changes to the keys object structure could lead to non deterministic behaviour.

{% warn Attention %} **Kryo** cannot be used for schema evolution.

When Kryo is used, there is no possibility for the framework to verify if any incompatible changes have been made.

{% top %}
69 changes: 36 additions & 33 deletions docs/dev/table/catalog.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ select * from mydb2.myTable2

`CatalogManager` always has a built-in `GenericInMemoryCatalog` named `default_catalog`, which has a built-in default database named `default_database`. If no other catalog and database are explicitly set, they will be the current catalog and current database by default. All temp meta-objects, such as those defined by `TableEnvironment#registerTable` are registered to this catalog.

Users can set current catalog and database via `TableEnvironment.useCatalog(...)` and `TableEnvironment.useDatabase(...)` in Table API, or `USE CATALOG ...` and `USE DATABASE ...` in Flink SQL.
Users can set current catalog and database via `TableEnvironment.useCatalog(...)` and
`TableEnvironment.useDatabase(...)` in Table API, or `USE CATALOG ...` and `USE ...` in Flink SQL
Client.


Catalog Types
Expand Down Expand Up @@ -95,25 +97,17 @@ The ultimate goal for integrating Flink with Hive metadata is that:

2. Meta-objects created by `HiveCatalog` can be written back to Hive metastore such that Hive and other Hive-compatible applications can consume.

## User-configured Catalog

Catalogs are pluggable. Users can develop custom catalogs by implementing the `Catalog` interface, which defines a set of APIs for reading and writing catalog meta-objects such as database, tables, partitions, views, and functions.


HiveCatalog
-----------

## Supported Hive Versions
### Supported Hive Versions

Flink's `HiveCatalog` officially supports Hive 2.3.4 and 1.2.1.

The Hive version is explicitly specified as a String, either by passing it to the constructor when creating `HiveCatalog` instances directly in Table API or specifying it in yaml config file in SQL CLI. The Hive version string are `2.3.4` and `1.2.1`.

## Case Insensitive to Meta-Object Names
### Case Insensitive to Meta-Object Names

Note that Hive Metastore stores meta-object names in lower cases. Thus, unlike `GenericInMemoryCatalog`, `HiveCatalog` is case-insensitive to meta-object names, and users need to be cautious on that.

## Dependencies
### Dependencies

To use `HiveCatalog`, users need to include the following dependency jars.

Expand All @@ -138,6 +132,7 @@ For Hive 1.2.1, users need:
- hive-metastore-1.2.1.jar
- hive-exec-1.2.1.jar
- libfb303-0.9.3.jar
// Hadoop dependencies
Expand All @@ -154,35 +149,36 @@ If you don't have Hive dependencies at hand, they can be found at [mvnrepostory.
Note that users need to make sure the compatibility between their Hive versions and Hadoop versions. Otherwise, there may be potential problem, for example, Hive 2.3.4 is compiled against Hadoop 2.7.2, you may run into problems when using Hive 2.3.4 with Hadoop 2.4.


## Data Type Mapping
### Data Type Mapping

For both Flink and Hive tables, `HiveCatalog` stores table schemas by mapping them to Hive table schemas with Hive data types. Types are dynamically mapped back on read.

Currently `HiveCatalog` supports most Flink data types with the following mapping:

| Flink Data Type | Hive Data Type |
|---|---|
| CHAR(p) | char(p)* |
| VARCHAR(p) | varchar(p)** |
| STRING | string |
| BOOLEAN | boolean |
| BYTE | tinyint |
| SHORT | smallint |
| INT | int |
| BIGINT | long |
| FLOAT | float |
| DOUBLE | double |
| DECIMAL(p, s) | decimal(p, s) |
| DATE | date |
| TIMESTAMP_WITHOUT_TIME_ZONE | Timestamp |
| CHAR(p) | CHAR(p)* |
| VARCHAR(p) | VARCHAR(p)** |
| STRING | STRING |
| BOOLEAN | BOOLEAN |
| TINYINT | TINYINT |
| SMALLINT | SMALLINT |
| INT | INT |
| BIGINT | LONG |
| FLOAT | FLOAT |
| DOUBLE | DOUBLE |
| DECIMAL(p, s) | DECIMAL(p, s) |
| DATE | DATE |
| TIMESTAMP_WITHOUT_TIME_ZONE | TIMESTAMP |
| TIMESTAMP_WITH_TIME_ZONE | N/A |
| TIMESTAMP_WITH_LOCAL_TIME_ZONE | N/A |
| INTERVAL | N/A |
| BINARY | binary |
| VARBINARY(p) | binary |
| ARRAY\<E> | list\<E> |
| MAP<K, V> | map<K, V> |
| ROW | struct |
| INTERVAL | N/A*** |
| BINARY | N/A |
| VARBINARY(p) | N/A |
| BYTES | BINARY |
| ARRAY\<E> | ARRAY\<E> |
| MAP<K, V> | MAP<K, V> ****|
| ROW | STRUCT |
| MULTISET | N/A |


Expand All @@ -194,6 +190,13 @@ The following limitations in Hive's data types impact the mapping between Flink

\** maximum length is 65535

\*** `INTERVAL` type can not be mapped to hive `INTERVAL` for now.

\**** Hive map key type only allows primitive types, while Flink map key can be any data type.

## User-configured Catalog

Catalogs are pluggable. Users can develop custom catalogs by implementing the `Catalog` interface, which defines a set of APIs for reading and writing catalog meta-objects such as database, tables, partitions, views, and functions.

Catalog Registration
--------------------
Expand Down Expand Up @@ -333,7 +336,7 @@ default_catalog
# ------ Set default catalog and database ------

Flink SQL> use catalog myHive1;
Flink SQL> use database myDb;
Flink SQL> use myDb;

# ------ Access Hive metadata ------

Expand Down
103 changes: 103 additions & 0 deletions docs/dev/table/config.zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: "配置"
nav-parent_id: tableapi
nav-pos: 150
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

By default, the Table & SQL API is preconfigured for producing accurate results with acceptable
performance.

Depending on the requirements of a table program, it might be necessary to adjust
certain parameters for optimization. For example, unbounded streaming programs may need to ensure
that the required state size is capped (see [streaming concepts](./streaming/query_configuration.html)).

* This will be replaced by the TOC
{:toc}

### Overview

In every table environment, the `TableConfig` offers options for configuring the current session.

For common or important configuration options, the `TableConfig` provides getters and setters methods
with detailed inline documentation.

For more advanced configuration, users can directly access the underlying key-value map. The following
sections list all available options that can be used to adjust Flink Table & SQL API programs.

<span class="label label-danger">Attention</span> Because options are read at different point in time
when performing operations, it is recommended to set configuration options early after instantiating a
table environment.

<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
{% highlight java %}
// instantiate table environment
TableEnvironment tEnv = ...

tEnv.getConfig() // access high-level configuration
.getConfiguration() // set low-level key-value options
.setString("table.exec.mini-batch.enabled", "true")
.setString("table.exec.mini-batch.allow-latency", "5 s")
.setString("table.exec.mini-batch.size", "5000");
{% endhighlight %}
</div>

<div data-lang="scala" markdown="1">
{% highlight scala %}
// instantiate table environment
val tEnv: TableEnvironment = ...

tEnv.getConfig // access high-level configuration
.getConfiguration // set low-level key-value options
.setString("table.exec.mini-batch.enabled", "true")
.setString("table.exec.mini-batch.allow-latency", "5 s")
.setString("table.exec.mini-batch.size", "5000")
{% endhighlight %}
</div>

<div data-lang="python" markdown="1">
{% highlight python %}
# instantiate table environment
t_env = ...

t_env.get_config() # access high-level configuration
.get_configuration() # set low-level key-value options
.set_string("table.exec.mini-batch.enabled", "true")
.set_string("table.exec.mini-batch.allow-latency", "5 s")
.set_string("table.exec.mini-batch.size", "5000");
{% endhighlight %}
</div>
</div>

<span class="label label-danger">Attention</span> Currently, key-value options are only supported
for the Blink planner.

### Execution Options

The following options can be used to tune the performance of the query execution.

{% include generated/execution_config_configuration.html %}

### Optimizer Options

The following options can be used to adjust the behavior of the query optimizer to get a better execution plan.

{% include generated/optimizer_config_configuration.html %}
Loading

0 comments on commit edc6826

Please sign in to comment.