Skip to content

Commit

Permalink
[FLINK-34247][doc] Update the usage of flink-conf.yaml in doc.
Browse files Browse the repository at this point in the history
This closes apache#24251.
  • Loading branch information
JunRuiLee authored and zhuzhurk committed Feb 5, 2024
1 parent 04dd91f commit e9bea09
Show file tree
Hide file tree
Showing 68 changed files with 189 additions and 189 deletions.
4 changes: 2 additions & 2 deletions docs/content.zh/docs/connectors/table/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,8 +241,8 @@ CREATE TABLE MyUserTableWithFilepath (

**注意:** 对于 bulk formats 数据 (parquet、orc、avro),滚动策略与 checkpoint 间隔(pending 状态的文件会在下个 checkpoint 完成)控制了 part 文件的大小和个数。

**注意:** 对于 row formats 数据 (csv、json),如果想使得分区文件更快在文件系统中可见,可以设置 `sink.rolling-policy.file-size``sink.rolling-policy.rollover-interval` 属性以及在 flink-conf.yaml 中的 `execution.checkpointing.interval` 属性。
对于其他 formats (avro、orc),可以只设置 flink-conf.yaml 中的 `execution.checkpointing.interval` 属性。
**注意:** 对于 row formats 数据 (csv、json),如果想使得分区文件更快在文件系统中可见,可以设置 `sink.rolling-policy.file-size``sink.rolling-policy.rollover-interval` 属性以及在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中的 `execution.checkpointing.interval` 属性。
对于其他 formats (avro、orc),可以只设置 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中的 `execution.checkpointing.interval` 属性。

<a name="file-compaction"></a>

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/deployment/advanced/historyserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ bin/historyserver.sh (start|start-foreground|stop)

**JobManager**

已完成作业的存档在 JobManager 上进行,将已存档的作业信息上传到文件系统目录中。你可以在 `flink-conf.yaml` 文件中通过 `jobmanager.archive.fs.dir` 设置一个目录存档已完成的作业。
已完成作业的存档在 JobManager 上进行,将已存档的作业信息上传到文件系统目录中。你可以在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}})中通过 `jobmanager.archive.fs.dir` 设置一个目录存档已完成的作业。

```yaml
# 上传已完成作业信息的目录
Expand Down
6 changes: 3 additions & 3 deletions docs/content.zh/docs/deployment/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ under the License.
Flink provides a Command-Line Interface (CLI) `bin/flink` to run programs that
are packaged as JAR files and to control their execution. The CLI is part of any
Flink setup, available in local single node setups and in distributed setups.
It connects to the running JobManager specified in `conf/flink-conf.yaml`.
It connects to the running JobManager specified in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}).



Expand Down Expand Up @@ -366,7 +366,7 @@ Here's an overview of actions supported by Flink's CLI tool:
This action can be used to create or disposing savepoints for a given job. It might be
necessary to specify a savepoint directory besides the JobID, if the
<a href="{{< ref "docs/deployment/config" >}}#state-savepoints-dir">state.savepoints.dir</a>
parameter was not specified in <code class="highlighter-rouge">conf/flink-conf.yaml</code>.
parameter was not specified in <code class="highlighter-rouge">Flink configuration file</code>.
</td>
</tr>
<tr>
Expand Down Expand Up @@ -431,7 +431,7 @@ parameter combinations:
* `./bin/flink run --target remote`: Submission to an already running Flink cluster

The `--target` will overwrite the [execution.target]({{< ref "docs/deployment/config" >}}#execution-target)
specified in the `conf/flink-conf.yaml`.
specified in the [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}).

For more details on the commands and the available options, please refer to the Resource Provider-specific
pages of the documentation.
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/deployment/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ All configuration can be set in Flink configuration file in the `conf/` director

The configuration is parsed and evaluated when the Flink processes are started. Changes to the configuration file require restarting the relevant processes.

The out of the box configuration will use your default Java installation. You can manually set the environment variable `JAVA_HOME` or the configuration key `env.java.home` in Flink configuration file if you want to manually override the Java runtime to use.
The out of the box configuration will use your default Java installation. You can manually set the environment variable `JAVA_HOME` or the configuration key `env.java.home` in Flink configuration file if you want to manually override the Java runtime to use. Note that the configuration key `env.java.home` must be specified in a flattened format (i.e. one-line key-value format) in the configuration file.

You can specify a different configuration directory location by defining the `FLINK_CONF_DIR` environment variable. For resource providers which provide non-session deployments, you can specify per-job configurations this way. Make a copy of the `conf` directory from the Flink distribution and modify the settings on a per-job basis. Note that this is not supported in Docker or standalone Kubernetes deployments. On Docker-based deployments, you can use the `FLINK_PROPERTIES` environment variable for passing configuration values.

Expand Down
6 changes: 3 additions & 3 deletions docs/content.zh/docs/deployment/filesystems/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,13 @@ cp ./opt/flink-azure-fs-hadoop-{{< version >}}.jar ./plugins/azure-fs-hadoop/
### WASB

Hadoop 的 WASB Azure 文件系统支持通过 Hadoop 配置来配置凭据,如 [Hadoop Azure Blob Storage 文档](https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Configuring_Credentials) 所述。
为方便起见,Flink 将所有的 Flink 配置添加 `fs.azure` 键前缀后转发至文件系统的 Hadoop 配置中。因此,可通过以下方法在 `flink-conf.yaml` 中配置 Azure Blob 存储密钥:
为方便起见,Flink 将所有的 Flink 配置添加 `fs.azure` 键前缀后转发至文件系统的 Hadoop 配置中。因此,可通过以下方法在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中配置 Azure Blob 存储密钥:

```yaml
fs.azure.account.key.<account_name>.blob.core.windows.net: <azure_storage_key>
```
或者通过在 `flink-conf.yaml` 中设置以下配置键,将文件系统配置为从环境变量 `AZURE_STORAGE_KEY` 读取 Azure Blob 存储密钥:
或者通过在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中设置以下配置键,将文件系统配置为从环境变量 `AZURE_STORAGE_KEY` 读取 Azure Blob 存储密钥:

```yaml
fs.azure.account.keyprovider.<account_name>.blob.core.windows.net: org.apache.flink.fs.azurefs.EnvironmentVariableKeyProvider
Expand All @@ -107,7 +107,7 @@ Azure 推荐使用 Azure 托管身份来使用 abfs 访问 ADLS Gen2 存储帐
{{< /hint >}}

##### 使用存储密钥访问ABFS(不鼓励)
Azure blob 存储密钥可以通过以下方式在 `flink-conf.yaml` 中配置:
Azure blob 存储密钥可以通过以下方式在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中配置:

```yaml
fs.azure.account.key.<account_name>.dfs.core.windows.net: <azure_storage_key>
Expand Down
6 changes: 3 additions & 3 deletions docs/content.zh/docs/deployment/filesystems/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,13 @@ cp ./opt/flink-gs-fs-hadoop-{{< version >}}.jar ./plugins/gs-fs-hadoop/

### Configuration

The underlying Hadoop file system can be configured using the [Hadoop configuration keys](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.18/gcs/CONFIGURATION.md) for `gcs-connector` by adding the configurations to your `flink-conf.yaml`.
The underlying Hadoop file system can be configured using the [Hadoop configuration keys](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.18/gcs/CONFIGURATION.md) for `gcs-connector` by adding the configurations to your [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}).

For example, `gcs-connector` has a `fs.gs.http.connect-timeout` configuration key. If you want to change it, you need to set `gs.http.connect-timeout: xyz` in `flink-conf.yaml`. Flink will internally translate this back to `fs.gs.http.connect-timeout`.
For example, `gcs-connector` has a `fs.gs.http.connect-timeout` configuration key. If you want to change it, you need to set `gs.http.connect-timeout: xyz` in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}). Flink will internally translate this back to `fs.gs.http.connect-timeout`.

You can also set `gcs-connector` options directly in the Hadoop `core-site.xml` configuration file, so long as the Hadoop configuration directory is made known to Flink via the `env.hadoop.conf.dir` Flink option or via the `HADOOP_CONF_DIR` environment variable.

`flink-gs-fs-hadoop` can also be configured by setting the following options in `flink-conf.yaml`:
`flink-gs-fs-hadoop` can also be configured by setting the following options in [Flink configuration file]({{< ref "docs/deployment/config#flink-configuration-file" >}}):

| Key | Description |
|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand Down
6 changes: 3 additions & 3 deletions docs/content.zh/docs/deployment/filesystems/oss.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,19 +70,19 @@ cp ./opt/flink-oss-fs-hadoop-{{< version >}}.jar ./plugins/oss-fs-hadoop/

在设置好 OSS 文件系统包装器之后,需要添加一些配置以保证 Flink 有权限访问 OSS buckets。

为了简单使用,可直接在 `flink-conf.yaml` 中使用与 Hadoop `core-site.xml` 相同的配置关键字。
为了简单使用,可直接在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中使用与 Hadoop `core-site.xml` 相同的配置关键字。

可在 [Hadoop OSS 文档](http://hadoop.apache.org/docs/current/hadoop-aliyun/tools/hadoop-aliyun/index.html) 中查看配置关键字。

一些配置必须添加至 `flink-conf.yaml`**在 Hadoop OSS 文档中定义的其它配置为用作性能调优的高级配置**):
一些配置必须添加至 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}})**在 Hadoop OSS 文档中定义的其它配置为用作性能调优的高级配置**):

```yaml
fs.oss.endpoint: 连接的 Aliyun OSS endpoint
fs.oss.accessKeyId: Aliyun access key ID
fs.oss.accessKeySecret: Aliyun access key secret
```
备选的 `CredentialsProvider` 也可在 `flink-conf.yaml` 中配置,例如:
备选的 `CredentialsProvider` 也可在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中配置,例如:
```yaml
# 从 OSS_ACCESS_KEY_ID 和 OSS_ACCESS_KEY_SECRET 读取凭据 (Credentials)
fs.oss.credentials.provider: com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider
Expand Down
12 changes: 6 additions & 6 deletions docs/content.zh/docs/deployment/filesystems/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,12 @@ env.configure(config);
Flink 提供两种文件系统用来与 S3 交互:`flink-s3-fs-presto``flink-s3-fs-hadoop`。两种实现都是独立的且没有依赖项,因此使用时无需将 Hadoop 添加至 classpath。

- `flink-s3-fs-presto`,通过 *s3://**s3p://* 两种 scheme 使用,基于 [Presto project](https://prestodb.io/)
可以使用[和 Presto 文件系统相同的配置项](https://prestodb.io/docs/0.272/connector/hive.html#amazon-s3-configuration)进行配置,方式为将配置添加到 `flink-conf.yaml` 文件中。如果要在 S3 中使用 checkpoint,推荐使用 Presto S3 文件系统。
可以使用[和 Presto 文件系统相同的配置项](https://prestodb.io/docs/0.272/connector/hive.html#amazon-s3-configuration)进行配置,方式为将配置添加到 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}})中。如果要在 S3 中使用 checkpoint,推荐使用 Presto S3 文件系统。

- `flink-s3-fs-hadoop`,通过 *s3://**s3a://* 两种 scheme 使用, 基于 [Hadoop Project](https://hadoop.apache.org/)
本文件系统可以使用类似 [Hadoop S3A 的配置项](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A)进行配置,方式为将配置添加到 `flink-conf.yaml` 文件中
本文件系统可以使用类似 [Hadoop S3A 的配置项](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A)进行配置,方式为将配置添加到 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}})中

例如,Hadoop 有 `fs.s3a.connection.maximum` 的配置选项。 如果你想在 Flink 程序中改变该配置的值,你需要将配置 `s3.connection.maximum: xyz` 添加到 `flink-conf.yaml` 文件中。Flink 会内部将其转换成配置 `fs.s3a.connection.maximum`。 而无需通过 Hadoop 的 XML 配置文件来传递参数。
例如,Hadoop 有 `fs.s3a.connection.maximum` 的配置选项。 如果你想在 Flink 程序中改变该配置的值,你需要将配置 `s3.connection.maximum: xyz` 添加到 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}})中。Flink 会内部将其转换成配置 `fs.s3a.connection.maximum`。 而无需通过 Hadoop 的 XML 配置文件来传递参数。

另外,它是唯一支持 [FileSystem]({{< ref "docs/connectors/datastream/filesystem" >}}) 的 S3 文件系统。

Expand Down Expand Up @@ -99,7 +99,7 @@ cp ./opt/flink-s3-fs-presto-{{< version >}}.jar ./plugins/s3-fs-presto/

可以通过**访问密钥对(access and secret key)**授予 S3 访问权限。请注意,根据 [Introduction of IAM roles](https://blogs.aws.amazon.com/security/post/Tx1XG3FX6VMU6O5/A-safer-way-to-distribute-AWS-credentials-to-EC2),不推荐使用该方法。

`s3.access-key``s3.secret-key` 均需要在 Flink 的 `flink-conf.yaml` 中进行配置:
`s3.access-key``s3.secret-key` 均需要在 Flink 的 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中进行配置:

```yaml
s3.access-key: your-access-key
Expand All @@ -108,15 +108,15 @@ s3.secret-key: your-secret-key
## 配置非 S3 访问点
S3 文件系统还支持兼容 S3 的对象存储服务,如 [IBM's Cloud Object Storage](https://www.ibm.com/cloud/object-storage) 和 [Minio](https://min.io/)。可在 `flink-conf.yaml` 中配置使用的访问点:
S3 文件系统还支持兼容 S3 的对象存储服务,如 [IBM's Cloud Object Storage](https://www.ibm.com/cloud/object-storage) 和 [Minio](https://min.io/)。可在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中配置使用的访问点:
```yaml
s3.endpoint: your-endpoint-hostname
```
## 配置路径样式的访问
某些兼容 S3 的对象存储服务可能没有默认启用虚拟主机样式的寻址。这种情况下需要在 `flink-conf.yaml` 中添加配置以启用路径样式的访问:
某些兼容 S3 的对象存储服务可能没有默认启用虚拟主机样式的寻址。这种情况下需要在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中添加配置以启用路径样式的访问:
```yaml
s3.path.style.access: true
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/deployment/ha/kubernetes_ha.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ kubernetes.cluster-id: cluster1337

### 配置示例

`conf/flink-conf.yaml` 中配置高可用模式:
[Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中配置高可用模式:

```yaml
kubernetes.cluster-id: <cluster-id>
Expand Down
4 changes: 2 additions & 2 deletions docs/content.zh/docs/deployment/ha/zookeeper_ha.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Flink 利用 **[ZooKeeper](http://zookeeper.apache.org)** 在所有运行的 Job

### 配置示例

`conf/flink-conf.yaml` 中配置高可用模式和 ZooKeeper 复制组(quorum):
[Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中配置高可用模式和 ZooKeeper 复制组(quorum):

```bash
high-availability.type: zookeeper
Expand All @@ -82,7 +82,7 @@ high-availability.storageDir: hdfs:///flink/recovery

## ZooKeeper 安全配置

如果 ZooKeeper 使用 Kerberos 以安全模式运行,必要时可以在 `flink-conf.yaml` 中覆盖以下配置:
如果 ZooKeeper 使用 Kerberos 以安全模式运行,必要时可以在 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 中覆盖以下配置:

```bash
# 默认配置为 "zookeeper". 如果 ZooKeeper quorum 配置了不同的服务名称,
Expand Down
8 changes: 4 additions & 4 deletions docs/content.zh/docs/deployment/memory/mem_migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ under the License.

<br/>

Flink 自带的[默认 flink-conf.yaml](#default-configuration-in-flink-confyaml) 文件指定了 [`taskmanager.memory.process.size`]({{< ref "docs/deployment/config" >}}#taskmanager-memory-process-size)(*>= 1.10*)和 [`jobmanager.memory.process.size`]({{< ref "docs/deployment/config" >}}#jobmanager-memory-process-size) (*>= 1.11*),以便与此前的行为保持一致。
Flink 自带的 [Flink 默认配置文件](#default-configuration-in-flink-confyaml) 文件指定了 [`taskmanager.memory.process.size`]({{< ref "docs/deployment/config" >}}#taskmanager-memory-process-size)(*>= 1.10*)和 [`jobmanager.memory.process.size`]({{< ref "docs/deployment/config" >}}#jobmanager-memory-process-size) (*>= 1.11*),以便与此前的行为保持一致。

可以使用这张[电子表格](https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE)来估算和比较原本的和新的内存配置下的计算结果。

Expand Down Expand Up @@ -288,9 +288,9 @@ Flink 通过设置上述 JVM 内存限制降低内存泄漏问题的排查难度

<a name="default-configuration-in-flink-confyaml" />

## flink-conf.yaml 中的默认配置
## Flink 配置文件中的默认配置

本节描述 Flink 自带的默认 `flink-conf.yaml` 文件中的变化
本节描述 Flink 自带的默认 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}})中的变化

原本的 TaskManager 总内存(`taskmanager.heap.size`)被新的配置项 [`taskmanager.memory.process.size`]({{< ref "docs/deployment/config" >}}#taskmanager-memory-process-size) 所取代。
默认值从 1024MB 增加到了 1728MB。
Expand All @@ -301,5 +301,5 @@ Flink 通过设置上述 JVM 内存限制降低内存泄漏问题的排查难度
请参考[如何配置总内存]({{< ref "docs/deployment/memory/mem_setup" >}}#configure-total-memory)。

{{< hint warning >}}
**注意:** 使用新的默认 `flink-conf.yaml` 可能会造成各内存部分的大小发生变化,从而产生性能变化。
**注意:** 使用新的默认 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}}) 可能会造成各内存部分的大小发生变化,从而产生性能变化。
{{< /hint >}}
2 changes: 1 addition & 1 deletion docs/content.zh/docs/deployment/metric_reporters.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ under the License.
Flink 支持用户将 Flink 的各项运行时指标发送给外部系统。
了解更多指标方面信息可查看 [metric 系统相关文档]({{< ref "docs/ops/metrics" >}})。

你可以通过 `conf/flink-conf.yaml` 文件来配置一种或多种发送器,将运行时指标暴露给外部系统。
你可以通过 [Flink 配置文件]({{< ref "docs/deployment/config#flink-configuration-file" >}})来配置一种或多种发送器,将运行时指标暴露给外部系统。
发送器会在 TaskManager、Flink 作业启动时进行实例化。

下面列出了所有发送器都适用的参数,可以通过配置文件中的 `metrics.reporter.<reporter_name>.<property>` 项进行配置。有些发送器有自己特有的配置,详见该发送器章节下的具体说明。
Expand Down
Loading

0 comments on commit e9bea09

Please sign in to comment.