[docs-sync] Synchronize the latest documentation changes (commits to 5…

…6c6b48) into Chinese documents
obermeier · Aug 11, 2019 · edc6826 · edc6826
1 parent f400fbb
commit edc6826
Show file tree

Hide file tree

Showing 18 changed files with 428 additions and 135 deletions.
diff --git a/docs/dev/api_concepts.zh.md b/docs/dev/api_concepts.zh.md
@@ -758,7 +758,7 @@ Restrictions apply to classes containing fields that cannot be serialized, like
 resources. Classes that follow the Java Beans conventions work well in general.
 
 All classes that are not identified as POJO types (see POJO requirements above) are handled by Flink as general class types.
-Flink treats these data types as black boxes and is not able to access their content (i.e., for efficient sorting). General types are de/serialized using the serialization framework [Kryo](https://github.com/EsotericSoftware/kryo).
+Flink treats these data types as black boxes and is not able to access their content (e.g., for efficient sorting). General types are de/serialized using the serialization framework [Kryo](https://github.com/EsotericSoftware/kryo).
 
 #### Values
 

diff --git a/docs/dev/connectors/filesystem_sink.zh.md b/docs/dev/connectors/filesystem_sink.zh.md
@@ -23,6 +23,11 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<div class="alert alert-info" markdown="span">
+The `BucketingSink` has been **deprecated since Flink 1.9** and will be removed in subsequent releases.
+Please use the [__StreamingFileSink__]({{site.baseurl}}/dev/connectors/streamfile_sink.html) instead.
+</div>
+
 这个连接器可以向所有 [Hadoop FileSystem](http://hadoop.apache.org) 支持的文件系统写入分区文件。
 使用前，需要在工程里添加下面的依赖：
 

diff --git a/docs/dev/connectors/guarantees.zh.md b/docs/dev/connectors/guarantees.zh.md
@@ -61,6 +61,11 @@ Please read the documentation of each connector to understand the details of the
             <td>at most once</td>
             <td></td>
         </tr>
+        <tr>
+            <td>Google PubSub</td>
+            <td>at least once</td>
+            <td></td>
+        </tr>
         <tr>
             <td>Collections</td>
             <td>exactly once</td>
@@ -104,7 +109,7 @@ state updates) of Flink coupled with bundled sinks:
     </tr>
     <tr>
         <td>Kafka producer</td>
-        <td>at least once/ exactly once</td>
+        <td>at least once / exactly once</td>
         <td>exactly once with transactional producers (v 0.11+)</td>
     </tr>
     <tr>

diff --git a/docs/dev/connectors/index.zh.md b/docs/dev/connectors/index.zh.md
@@ -46,6 +46,7 @@ under the License.
  * [RabbitMQ](rabbitmq.html) (source/sink)
  * [Apache NiFi](nifi.html) (source/sink)
  * [Twitter Streaming API](twitter.html) (source)
+ * [Google PubSub](pubsub.html) (source/sink)
 
 请记住，在使用一种连接器时，通常需要额外的第三方组件，比如：数据存储服务器或者消息队列。
 要注意这些列举的连接器是 Flink 工程的一部分，包含在发布的源码中，但是不包含在二进制发行版中。

diff --git a/docs/dev/connectors/pubsub.zh.md b/docs/dev/connectors/pubsub.zh.md
@@ -94,7 +94,7 @@ DataStream<SomeObject> dataStream = (...);
 
 SerializationSchema<SomeObject> serializationSchema = (...);
 SinkFunction<SomeObject> pubsubSink = PubSubSink.newBuilder()
-                                                .withDeserializationSchema(deserializer)
+                                                .withSerializationSchema(serializationSchema)
                                                 .withProjectName("project")
                                                 .withSubscriptionName("subscription")
                                                 .build()
@@ -120,18 +120,20 @@ The following example shows how you would create a source to read messages from
 <div class="codetabs" markdown="1">
 <div data-lang="java" markdown="1">
 {% highlight java %}
+String hostAndPort = "localhost:1234";
 DeserializationSchema<SomeObject> deserializationSchema = (...);
 SourceFunction<SomeObject> pubsubSource = PubSubSource.newBuilder()
                                                       .withDeserializationSchema(deserializationSchema)
                                                       .withProjectName("my-fake-project")
                                                       .withSubscriptionName("subscription")
-                                                      .withPubSubSubscriberFactory(new PubSubSubscriberFactoryForEmulator("localhost:1234", "my-fake-project", "subscription", 10, Duration.ofSeconds(15), 100))
+                                                      .withPubSubSubscriberFactory(new PubSubSubscriberFactoryForEmulator(hostAndPort, "my-fake-project", "subscription", 10, Duration.ofSeconds(15), 100))
                                                       .build();
+SerializationSchema<SomeObject> serializationSchema = (...);
 SinkFunction<SomeObject> pubsubSink = PubSubSink.newBuilder()
-                                                .withDeserializationSchema(deserializationSchema)
+                                                .withSerializationSchema(serializationSchema)
                                                 .withProjectName("my-fake-project")
                                                 .withSubscriptionName("subscription")
-                                                .withHostAndPortForEmulator(getPubSubHostPort())
+                                                .withHostAndPortForEmulator(hostAndPort)
                                                 .build()
 
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

diff --git a/docs/dev/custom_serializers.zh.md b/docs/dev/custom_serializers.zh.md
@@ -62,13 +62,20 @@ env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.cla
 <dependency>
 	<groupId>com.twitter</groupId>
 	<artifactId>chill-thrift</artifactId>
-	<version>0.5.2</version>
+	<version>0.7.6</version>
+	<!-- exclusions for dependency conversion -->
+	<exclusions>
+		<exclusion>
+			<groupId>com.esotericsoftware.kryo</groupId>
+			<artifactId>kryo</artifactId>
+		</exclusion>
+	</exclusions>
 </dependency>
 <!-- libthrift is required by chill-thrift -->
 <dependency>
 	<groupId>org.apache.thrift</groupId>
 	<artifactId>libthrift</artifactId>
-	<version>0.6.1</version>
+	<version>0.11.0</version>
 	<exclusions>
 		<exclusion>
 			<groupId>javax.servlet</groupId>
@@ -90,13 +97,20 @@ env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.cla
 <dependency>
 	<groupId>com.twitter</groupId>
 	<artifactId>chill-protobuf</artifactId>
-	<version>0.5.2</version>
+	<version>0.7.6</version>
+	<!-- exclusions for dependency conversion -->
+	<exclusions>
+		<exclusion>
+			<groupId>com.esotericsoftware.kryo</groupId>
+			<artifactId>kryo</artifactId>
+		</exclusion>
+	</exclusions>
 </dependency>
 <!-- We need protobuf for chill-protobuf -->
 <dependency>
 	<groupId>com.google.protobuf</groupId>
 	<artifactId>protobuf-java</artifactId>
-	<version>2.5.0</version>
+	<version>3.7.0</version>
 </dependency>
 
 {% endhighlight %}

diff --git a/docs/dev/stream/state/schema_evolution.zh.md b/docs/dev/stream/state/schema_evolution.zh.md
@@ -95,4 +95,12 @@ Flink 完全支持 Avro 状态类型的升级，只要数据结构的修改是
 
 一个例外是如果新的 Avro 数据 schema 生成的类无法被重定位或者使用了不同的命名空间，在作业恢复时状态数据会被认为是不兼容的。
 
+{% warn Attention %} Schema evolution of keys is not supported.
+
+Example: RocksDB state backend relies on binary objects identity, rather than `hashCode` method implementation. Any changes to the keys object structure could lead to non deterministic behaviour.
+
+{% warn Attention %} **Kryo** cannot be used for schema evolution.
+
+When Kryo is used, there is no possibility for the framework to verify if any incompatible changes have been made.
+
 {% top %}
diff --git a/docs/dev/table/catalog.zh.md b/docs/dev/table/catalog.zh.md
@@ -63,7 +63,9 @@ select * from mydb2.myTable2
 
 `CatalogManager` always has a built-in `GenericInMemoryCatalog` named `default_catalog`, which has a built-in default database named `default_database`. If no other catalog and database are explicitly set, they will be the current catalog and current database by default. All temp meta-objects, such as those defined by `TableEnvironment#registerTable`  are registered to this catalog. 
 
-Users can set current catalog and database via `TableEnvironment.useCatalog(...)` and `TableEnvironment.useDatabase(...)` in Table API, or `USE CATALOG ...` and `USE DATABASE ...` in Flink SQL.
+Users can set current catalog and database via `TableEnvironment.useCatalog(...)` and
+`TableEnvironment.useDatabase(...)` in Table API, or `USE CATALOG ...` and `USE ...` in Flink SQL
+ Client.
 
 
 Catalog Types
@@ -95,25 +97,17 @@ The ultimate goal for integrating Flink with Hive metadata is that:
 
 2. Meta-objects created by `HiveCatalog` can be written back to Hive metastore such that Hive and other Hive-compatible applications can consume.
 
-## User-configured Catalog
-
-Catalogs are pluggable. Users can develop custom catalogs by implementing the `Catalog` interface, which defines a set of APIs for reading and writing catalog meta-objects such as database, tables, partitions, views, and functions.
-
-
-HiveCatalog
------------
-
-## Supported Hive Versions
+### Supported Hive Versions
 
 Flink's `HiveCatalog` officially supports Hive 2.3.4 and 1.2.1.
 
 The Hive version is explicitly specified as a String, either by passing it to the constructor when creating `HiveCatalog` instances directly in Table API or specifying it in yaml config file in SQL CLI. The Hive version string are `2.3.4` and `1.2.1`.
 
-## Case Insensitive to Meta-Object Names
+### Case Insensitive to Meta-Object Names
 
 Note that Hive Metastore stores meta-object names in lower cases. Thus, unlike `GenericInMemoryCatalog`, `HiveCatalog` is case-insensitive to meta-object names, and users need to be cautious on that.
 
-## Dependencies
+### Dependencies
 
 To use `HiveCatalog`, users need to include the following dependency jars.
 
@@ -138,6 +132,7 @@ For Hive 1.2.1, users need:
 
 - hive-metastore-1.2.1.jar
 - hive-exec-1.2.1.jar
+- libfb303-0.9.3.jar
 
 
 // Hadoop dependencies
@@ -154,35 +149,36 @@ If you don't have Hive dependencies at hand, they can be found at [mvnrepostory.
 Note that users need to make sure the compatibility between their Hive versions and Hadoop versions. Otherwise, there may be potential problem, for example, Hive 2.3.4 is compiled against Hadoop 2.7.2, you may run into problems when using Hive 2.3.4 with Hadoop 2.4.
 
 
-## Data Type Mapping
+### Data Type Mapping
 
 For both Flink and Hive tables, `HiveCatalog` stores table schemas by mapping them to Hive table schemas with Hive data types. Types are dynamically mapped back on read.
 
 Currently `HiveCatalog` supports most Flink data types with the following mapping:
 
 |  Flink Data Type  |  Hive Data Type  |
 |---|---|
-| CHAR(p)       |  char(p)* |
-| VARCHAR(p)    |  varchar(p)** |
-| STRING        |  string |
-| BOOLEAN       |  boolean |
-| BYTE          |  tinyint |
-| SHORT         |  smallint |
-| INT           |  int |
-| BIGINT        |  long |
-| FLOAT         |  float |
-| DOUBLE        |  double |
-| DECIMAL(p, s) |  decimal(p, s) |
-| DATE          |  date |
-| TIMESTAMP_WITHOUT_TIME_ZONE |  Timestamp |
+| CHAR(p)       |  CHAR(p)* |
+| VARCHAR(p)    |  VARCHAR(p)** |
+| STRING        |  STRING |
+| BOOLEAN       |  BOOLEAN |
+| TINYINT       |  TINYINT |
+| SMALLINT      |  SMALLINT |
+| INT           |  INT |
+| BIGINT        |  LONG |
+| FLOAT         |  FLOAT |
+| DOUBLE        |  DOUBLE |
+| DECIMAL(p, s) |  DECIMAL(p, s) |
+| DATE          |  DATE |
+| TIMESTAMP_WITHOUT_TIME_ZONE |  TIMESTAMP |
 | TIMESTAMP_WITH_TIME_ZONE |  N/A |
 | TIMESTAMP_WITH_LOCAL_TIME_ZONE |  N/A |
-| INTERVAL |  N/A |
-| BINARY        |  binary |
-| VARBINARY(p)  |  binary |
-| ARRAY\<E>     |  list\<E> |
-| MAP<K, V>     |  map<K, V> |
-| ROW           |  struct |
+| INTERVAL      |   N/A*** |
+| BINARY        |   N/A |
+| VARBINARY(p)  |   N/A |
+| BYTES         |   BINARY |
+| ARRAY\<E>     |  ARRAY\<E> |
+| MAP<K, V>     |  MAP<K, V> ****|
+| ROW           |  STRUCT |
 | MULTISET      |  N/A |
 
 
@@ -194,6 +190,13 @@ The following limitations in Hive's data types impact the mapping between Flink
 
 \** maximum length is 65535
 
+\*** `INTERVAL` type can not be mapped to hive `INTERVAL` for now.
+
+\**** Hive map key type only allows primitive types, while Flink map key can be any data type.
+
+## User-configured Catalog
+
+Catalogs are pluggable. Users can develop custom catalogs by implementing the `Catalog` interface, which defines a set of APIs for reading and writing catalog meta-objects such as database, tables, partitions, views, and functions.
 
 Catalog Registration
 --------------------
@@ -333,7 +336,7 @@ default_catalog
 # ------ Set default catalog and database ------
 
 Flink SQL> use catalog myHive1;
-Flink SQL> use database myDb;
+Flink SQL> use myDb;
 
 # ------ Access Hive metadata ------
 

diff --git a/docs/dev/table/config.zh.md b/docs/dev/table/config.zh.md
@@ -0,0 +1,103 @@
+---
+title: "配置"
+nav-parent_id: tableapi
+nav-pos: 150
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+By default, the Table & SQL API is preconfigured for producing accurate results with acceptable
+performance.
+
+Depending on the requirements of a table program, it might be necessary to adjust
+certain parameters for optimization. For example, unbounded streaming programs may need to ensure
+that the required state size is capped (see [streaming concepts](./streaming/query_configuration.html)).
+
+* This will be replaced by the TOC
+{:toc}
+
+### Overview
+
+In every table environment, the `TableConfig` offers options for configuring the current session.
+
+For common or important configuration options, the `TableConfig` provides getters and setters methods
+with detailed inline documentation.
+
+For more advanced configuration, users can directly access the underlying key-value map. The following
+sections list all available options that can be used to adjust Flink Table & SQL API programs.
+
+<span class="label label-danger">Attention</span> Because options are read at different point in time
+when performing operations, it is recommended to set configuration options early after instantiating a
+table environment.
+
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
+{% highlight java %}
+// instantiate table environment
+TableEnvironment tEnv = ...
+
+tEnv.getConfig()        // access high-level configuration
+  .getConfiguration()   // set low-level key-value options
+  .setString("table.exec.mini-batch.enabled", "true")
+  .setString("table.exec.mini-batch.allow-latency", "5 s")
+  .setString("table.exec.mini-batch.size", "5000");
+{% endhighlight %}
+</div>
+
+<div data-lang="scala" markdown="1">
+{% highlight scala %}
+// instantiate table environment
+val tEnv: TableEnvironment = ...
+
+tEnv.getConfig         // access high-level configuration
+  .getConfiguration    // set low-level key-value options
+  .setString("table.exec.mini-batch.enabled", "true")
+  .setString("table.exec.mini-batch.allow-latency", "5 s")
+  .setString("table.exec.mini-batch.size", "5000")
+{% endhighlight %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% highlight python %}
+# instantiate table environment
+t_env = ...
+
+t_env.get_config()        # access high-level configuration
+  .get_configuration()    # set low-level key-value options
+  .set_string("table.exec.mini-batch.enabled", "true")
+  .set_string("table.exec.mini-batch.allow-latency", "5 s")
+  .set_string("table.exec.mini-batch.size", "5000");
+{% endhighlight %}
+</div>
+</div>
+
+<span class="label label-danger">Attention</span> Currently, key-value options are only supported
+for the Blink planner.
+
+### Execution Options
+
+The following options can be used to tune the performance of the query execution.
+
+{% include generated/execution_config_configuration.html %}
+
+### Optimizer Options
+
+The following options can be used to adjust the behavior of the query optimizer to get a better execution plan.
+
+{% include generated/optimizer_config_configuration.html %}