Skip to content

Commit

Permalink
[FLINK-17269][docs-zh] Translate new Training Overview to Chinese
Browse files Browse the repository at this point in the history
This closes apache#12311
  • Loading branch information
yangyichao-mango authored and wuchong committed Jun 17, 2020
1 parent acd0de8 commit 5d7ea5d
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 159 deletions.
61 changes: 10 additions & 51 deletions docs/concepts/index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,65 +26,24 @@ specific language governing permissions and limitations
under the License.
-->

The [Hands-on Training]({% link learn-flink/index.zh.md %}) explains the basic concepts
of stateful and timely stream processing that underlie Flink's APIs, and provides examples of how
these mechanisms are used in applications. Stateful stream processing is introduced in the context
of [Data Pipelines & ETL]({% link learn-flink/etl.zh.md %}#stateful-transformations)
and is further developed in the section on [Fault Tolerance]({% link learn-flink/fault_tolerance.zh.md %}). Timely stream processing is introduced in the section on
[Streaming Analytics]({% link learn-flink/streaming_analytics.zh.md %}).
[实践练习]({% link learn-flink/index.zh.md %})章节介绍了作为 Flink API 根基的有状态实时流处理的基本概念,并且举例说明了如何在 Flink 应用中使用这些机制。其中 [Data Pipelines & ETL]({% link learn-flink/etl.zh.md %}#stateful-transformations) 小节介绍了有状态流处理的概念,并且在 [Fault Tolerance]({% link learn-flink/fault_tolerance.zh.md %}) 小节中进行了深入介绍。[Streaming Analytics]({% link learn-flink/streaming_analytics.zh.md %}) 小节介绍了实时流处理的概念。

This _Concepts in Depth_ section provides a deeper understanding of how Flink's architecture and runtime
implement these concepts.
本章将深入分析 Flink 分布式运行时架构如何实现这些概念。

## Flink's APIs
## Flink 中的 API

Flink offers different levels of abstraction for developing streaming/batch applications.
Flink 为流式/批式处理应用程序的开发提供了不同级别的抽象。

<img src="{{ site.baseurl }}/fig/levels_of_abstraction.svg" alt="Programming levels of abstraction" class="offset" width="80%" />

- The lowest level abstraction simply offers **stateful and timely stream processing**. It is
embedded into the [DataStream API]({{ site.baseurl}}{% link
dev/datastream_api.zh.md %}) via the [Process Function]({{ site.baseurl }}{%
link dev/stream/operators/process_function.zh.md %}). It allows users to freely
process events from one or more streams, and provides consistent, fault tolerant
*state*. In addition, users can register event time and processing time
callbacks, allowing programs to realize sophisticated computations.
- Flink API 最底层的抽象为**有状态实时流处理**。其抽象实现是 [Process Function]({{ site.baseurl }}{% link dev/stream/operators/process_function.zh.md %}),并且 **Process Function** 被 Flink 框架集成到了 [DataStream API]({{ site.baseurl}}{% link dev/datastream_api.zh.md %}) 中来为我们使用。它允许用户在应用程序中自由地处理来自单流或多流的事件(数据),并提供具有全局一致性和容错保障的*状态*。此外,用户可以在此层抽象中注册事件时间(event time)和处理时间(processing time)回调方法,从而允许程序可以实现复杂计算。

- In practice, many applications do not need the low-level
abstractions described above, and can instead program against the **Core APIs**: the
[DataStream API]({% link dev/datastream_api.zh.md %})
(bounded/unbounded streams) and the [DataSet API]({% link
dev/batch/index.zh.md %}) (bounded data sets). These fluent APIs offer the
common building blocks for data processing, like various forms of
user-specified transformations, joins, aggregations, windows, state, etc.
Data types processed in these APIs are represented as classes in the
respective programming languages.
- Flink API 第二层抽象是 **Core APIs**。实际上,许多应用程序不需要使用到上述最底层抽象的 API,而是可以使用 **Core APIs** 进行编程:其中包含 [DataStream API]({% link dev/datastream_api.zh.md %})(应用于有界/无界数据流场景)和 [DataSet API]({% link dev/batch/index.zh.md %})(应用于有界数据集场景)两部分。Core APIs 提供的流式 API(Fluent API)为数据处理提供了通用的模块组件,例如各种形式的用户自定义转换(transformations)、联接(joins)、聚合(aggregations)、窗口(windows)和状态(state)操作等。此层 API 中处理的数据类型在每种编程语言中都有其对应的类。

The low level *Process Function* integrates with the *DataStream API*,
making it possible to use the lower-level abstraction on an as-needed basis.
The *DataSet API* offers additional primitives on bounded data sets,
like loops/iterations.
*Process Function* 这类底层抽象和 *DataStream API* 的相互集成使得用户可以选择使用更底层的抽象 API 来实现自己的需求。*DataSet API* 还额外提供了一些原语,比如循环/迭代(loop/iteration)操作。

- The **Table API** is a declarative DSL centered around *tables*, which may
be dynamically changing tables (when representing streams). The [Table
API]({% link dev/table/index.zh.md %}) follows the
(extended) relational model: Tables have a schema attached (similar to
tables in relational databases) and the API offers comparable operations,
such as select, project, join, group-by, aggregate, etc. Table API
programs declaratively define *what logical operation should be done*
rather than specifying exactly *how the code for the operation looks*.
Though the Table API is extensible by various types of user-defined
functions, it is less expressive than the *Core APIs*, and more concise to
use (less code to write). In addition, Table API programs also go through
an optimizer that applies optimization rules before execution.
- Flink API 第三层抽象是 **Table API****Table API** 是以表(Table)为中心的声明式编程(DSL)API,例如在流式数据场景下,它可以表示一张正在动态改变的表。[Table API]({% link dev/table/index.zh.md %}) 遵循(扩展)关系模型:即表拥有 schema(类似于关系型数据库中的 schema),并且 Table API 也提供了类似于关系模型中的操作,比如 select、project、join、group-by 和 aggregate 等。Table API 程序是以声明的方式定义*应执行的逻辑操作*,而不是确切地指定程序*应该执行的代码*。尽管 Table API 使用起来很简洁并且可以由各种类型的用户自定义函数扩展功能,但还是比 Core API 的表达能力差。此外,Table API 程序在执行之前还会使用优化器中的优化规则对用户编写的表达式进行优化。

One can seamlessly convert between tables and *DataStream*/*DataSet*,
allowing programs to mix the *Table API* with the *DataStream* and
*DataSet* APIs.
表和 *DataStream*/*DataSet* 可以进行无缝切换,Flink 允许用户在编写应用程序时将 *Table API**DataStream*/*DataSet* API 混合使用。

- The highest level abstraction offered by Flink is **SQL**. This abstraction
is similar to the *Table API* both in semantics and expressiveness, but
represents programs as SQL query expressions. The [SQL]({{ site.baseurl
}}{% link dev/table/index.zh.md %}#sql) abstraction closely interacts with the
Table API, and SQL queries can be executed over tables defined in the
*Table API*.
- Flink API 最顶层抽象是 **SQL**。这层抽象在语义和程序表达式上都类似于 *Table API*,但是其程序实现都是 SQL 查询表达式。[SQL]({{ site.baseurl}}{% link dev/table/index.zh.md %}#sql) 抽象与 Table API 抽象之间的关联是非常紧密的,并且 SQL 查询语句可以在 *Table API* 中定义的表上执行。
Loading

0 comments on commit 5d7ea5d

Please sign in to comment.