Skip to content

Commit

Permalink
[hotfix][docs] Fix various broken links in the docs
Browse files Browse the repository at this point in the history
I used this to identify broken links in the zh.md files:

	git grep -E "[^z][^h]\.md " -- '*.zh.md'
  • Loading branch information
rmetzger committed Dec 1, 2020
1 parent 8b2e4c2 commit 493a700
Show file tree
Hide file tree
Showing 8 changed files with 50 additions and 39 deletions.
30 changes: 15 additions & 15 deletions docs/deployment/index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ under the License.
Flink is a versatile framework, supporting many different deployment scenarios in a mix and match fashion.

Below, we briefly explain the building blocks of a Flink cluster, their purpose and available implementations.
If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link deployment/resource-providers/standalone/index.md %}).
If you just want to start Flink locally, we recommend setting up a [Standalone Cluster]({% link deployment/resource-providers/standalone/index.zh.md %}).

* This will be replaced by the TOC
{:toc}
Expand Down Expand Up @@ -63,11 +63,11 @@ When deploying Flink, there are often multiple options available for each buildi
</td>
<td>
<ul>
<li><a href="{% link deployment/cli.md %}">Command Line Interface</a></li>
<li><a href="{% link ops/rest_api.md %}">REST Endpoint</a></li>
<li><a href="{% link dev/table/sqlClient.md %}">SQL Client</a></li>
<li><a href="{% link deployment/repls/python_shell.md %}">Python REPL</a></li>
<li><a href="{% link deployment/repls/scala_shell.md %}">Scala REPL</a></li>
<li><a href="{% link deployment/cli.zh.md %}">Command Line Interface</a></li>
<li><a href="{% link ops/rest_api.zh.md %}">REST Endpoint</a></li>
<li><a href="{% link dev/table/sqlClient.zh.md %}">SQL Client</a></li>
<li><a href="{% link deployment/repls/python_shell.zh.md %}">Python REPL</a></li>
<li><a href="{% link deployment/repls/scala_shell.zh.md %}">Scala REPL</a></li>
</ul>
</td>
</tr>
Expand All @@ -84,11 +84,11 @@ When deploying Flink, there are often multiple options available for each buildi
</td>
<td>
<ul id="jmimpls">
<li><a href="{% link deployment/resource-providers/standalone/index.md %}">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with <a href="{% link deployment/resource-providers/standalone/docker.md %}">Docker, Docker Swarm / Compose</a>, <a href="{% link deployment/resource-providers/standalone/kubernetes.md %}">non-native Kubernetes</a> and other models is possible through manual setup in this mode)
<li><a href="{% link deployment/resource-providers/standalone/index.zh.md %}">Standalone</a> (this is the barebone mode that requires just JVMs to be launched. Deployment with <a href="{% link deployment/resource-providers/standalone/docker.zh.md %}">Docker, Docker Swarm / Compose</a>, <a href="{% link deployment/resource-providers/standalone/kubernetes.zh.md %}">non-native Kubernetes</a> and other models is possible through manual setup in this mode)
</li>
<li><a href="{% link deployment/resource-providers/native_kubernetes.md %}">Kubernetes</a></li>
<li><a href="{% link deployment/resource-providers/yarn.md %}">YARN</a></li>
<li><a href="{% link deployment/resource-providers/mesos.md %}">Mesos</a></li>
<li><a href="{% link deployment/resource-providers/native_kubernetes.zh.md %}">Kubernetes</a></li>
<li><a href="{% link deployment/resource-providers/yarn.zh.md %}">YARN</a></li>
<li><a href="{% link deployment/resource-providers/mesos.zh.md %}">Mesos</a></li>
</ul>
</td>
</tr>
Expand All @@ -112,8 +112,8 @@ When deploying Flink, there are often multiple options available for each buildi
</td>
<td>
<ul>
<li><a href="{% link deployment/ha/zookeeper_ha.md %}">Zookeeper</a></li>
<li><a href="{% link deployment/ha/kubernetes_ha.md %}">Kubernetes HA</a></li>
<li><a href="{% link deployment/ha/zookeeper_ha.zh.md %}">Zookeeper</a></li>
<li><a href="{% link deployment/ha/kubernetes_ha.zh.md %}">Kubernetes HA</a></li>
</ul>
</td>
</tr>
Expand All @@ -122,7 +122,7 @@ When deploying Flink, there are often multiple options available for each buildi
<td>
For checkpointing (recovery mechanism for streaming jobs) Flink relies on external file storage systems
</td>
<td>See <a href="{% link deployment/filesystems/index.md %}">FileSystems</a> page.</td>
<td>See <a href="{% link deployment/filesystems/index.zh.md %}">FileSystems</a> page.</td>
</tr>
<tr>
<td>Resource Provider</td>
Expand All @@ -136,7 +136,7 @@ When deploying Flink, there are often multiple options available for each buildi
<td>
Flink components report internal metrics and Flink jobs can report additional, job specific metrics as well.
</td>
<td>See <a href="{% link deployment/metric_reporters.md %}">Metrics Reporter</a> page.</td>
<td>See <a href="{% link deployment/metric_reporters.zh.md %}">Metrics Reporter</a> page.</td>
</tr>
<tr>
<td>Application-level data sources and sinks</td>
Expand All @@ -151,7 +151,7 @@ When deploying Flink, there are often multiple options available for each buildi
<li>ElasticSearch</li>
<li>Apache Cassandra</li>
</ul>
See <a href="{% link dev/connectors/index.md %}">Connectors</a> page.
See <a href="{% link dev/connectors/index.zh.md %}">Connectors</a> page.
</td>
</tr>
</tbody>
Expand Down
24 changes: 12 additions & 12 deletions docs/deployment/resource-providers/yarn.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Flink can dynamically allocate and de-allocate TaskManager resources depending o
This *Getting Started* section assumes a functional YARN environment, starting from version 2.4.1. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial.

- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
- Download a recent Flink distribution from the [download page]({{ site.download_url }}) and unpack it.
- Download a recent Flink distribution from the [download page]({{ site.zh_download_url }}) and unpack it.
- **Important** Make sure that the `HADOOP_CLASSPATH` environment variable is set up (it can be checked by running `echo $HADOOP_CLASSPATH`). If not, set it up using

{% highlight bash %}
Expand Down Expand Up @@ -78,7 +78,7 @@ Congratulations! You have successfully run a Flink application by deploying Flin

## Deployment Modes Supported by Flink on YARN

For production use, we recommend deploying Flink Applications in the [Per-job or Application Mode]({% link deployment/index.md %}#deployment-modes), as these modes provide a better isolation for the Applications.
For production use, we recommend deploying Flink Applications in the [Per-Job or Application Mode]({% link deployment/index.zh.md %}#deployment-modes), as these modes provide a better isolation for the Applications.

### Application Mode

Expand Down Expand Up @@ -117,7 +117,7 @@ client.

### Per-Job Cluster Mode

The Per-job Cluster mode will launch a Flink cluster on YARN, then run the provided application jar locally and finally submit the JobGraph to the JobManager on YARN. If you pass the `--detached` argument, the client will stop once the submission is accepted.
The Per-Job Cluster mode will launch a Flink cluster on YARN, then run the provided application jar locally and finally submit the JobGraph to the JobManager on YARN. If you pass the `--detached` argument, the client will stop once the submission is accepted.

The YARN cluster will stop once the job has stopped.

Expand Down Expand Up @@ -159,7 +159,7 @@ You can **re-attach to a YARN session** using the following command:
./bin/yarn-session.sh -id application_XXXX_YY
```

Besides passing [configuration]({% link deployment/config.md %}) via the `conf/flink-conf.yaml` file, you can also pass any configuration at submission time to the `./bin/yarn-session.sh` client using `-Dkey=value` arguments.
Besides passing [configuration]({% link deployment/config.zh.md %}) via the `conf/flink-conf.yaml` file, you can also pass any configuration at submission time to the `./bin/yarn-session.sh` client using `-Dkey=value` arguments.

The YARN session client also has a few "shortcut arguments" for commonly used settings. They can be listed with `./bin/yarn-session.sh -h`.

Expand All @@ -169,7 +169,7 @@ The YARN session client also has a few "shortcut arguments" for commonly used se

### Configuring Flink on YARN

The YARN-specific configurations are listed on the [configuration page]({% link deployment/config.md %}#yarn).
The YARN-specific configurations are listed on the [configuration page]({% link deployment/config.zh.md %}#yarn).

The following configuration parameters are managed by Flink on YARN, as they might get overwritten by the framework at runtime:
- `jobmanager.rpc.address` (dynamically set to the address of the JobManager container by Flink on YARN)
Expand All @@ -182,17 +182,17 @@ If you need to pass additional Hadoop configuration files to Flink, you can do s

A JobManager running on YARN will request additional TaskManagers, if it can not run all submitted jobs with the existing resources. In particular when running in Session Mode, the JobManager will, if needed, allocate additional TaskManagers as additional jobs are submitted. Unused TaskManagers are freed up again after a timeout.

The memory configurations for JobManager and TaskManager processes will be respected by the YARN implementation. The number of reported VCores is by default equal to the number of configured slots per TaskManager. The [yarn.containers.vcores]({% link deployment/config.md %}#yarn-containers-vcores) allows overwriting the number of vcores with a custom value. In order for this parameter to work you should enable CPU scheduling in your YARN cluster.
The memory configurations for JobManager and TaskManager processes will be respected by the YARN implementation. The number of reported VCores is by default equal to the number of configured slots per TaskManager. The [yarn.containers.vcores]({% link deployment/config.zh.md %}#yarn-containers-vcores) allows overwriting the number of vcores with a custom value. In order for this parameter to work you should enable CPU scheduling in your YARN cluster.

Failed containers (including the JobManager) are replaced by YARN. The maximum number of JobManager container restarts is configured via [yarn.application-attempts]({% link deployment/config.md %}#yarn-application-attempts) (default 1). The YARN Application will fail once all attempts are exhausted.
Failed containers (including the JobManager) are replaced by YARN. The maximum number of JobManager container restarts is configured via [yarn.application-attempts]({% link deployment/config.zh.md %}#yarn-application-attempts) (default 1). The YARN Application will fail once all attempts are exhausted.

### High-Availability on YARN

High-Availability on YARN is achieved through a combination of YARN and a [high availability service]({% link deployment/ha/index.md %}).
High-Availability on YARN is achieved through a combination of YARN and a [high availability service]({% link deployment/ha/index.zh.md %}).

Once a HA service is configured, it will persist JobManager metadata and perform leader elections.

YARN is taking care of restarting failed JobManagers. The maximum number of JobManager restarts is defined through two configuration parameters. First Flink's [yarn.application-attempts]({% link deployment/config.md %}#yarn-application-attempts) configuration will default 2. This value is limited by YARN's [yarn.resourcemanager.am.max-attempts](https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml), which also defaults to 2.
YARN is taking care of restarting failed JobManagers. The maximum number of JobManager restarts is defined through two configuration parameters. First Flink's [yarn.application-attempts]({% link deployment/config.zh.md %}#yarn-application-attempts) configuration will default 2. This value is limited by YARN's [yarn.resourcemanager.am.max-attempts](https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml), which also defaults to 2.

Note that Flink is managing the `high-availability.cluster-id` configuration parameter when running on YARN. **You should not overwrite this parameter when running an HA cluster on YARN**. The cluster ID is used to distinguish multiple HA clusters in the HA backend (for example Zookeeper). Overwriting this configuration parameter can lead to multiple YARN clusters affecting each other.

Expand All @@ -213,19 +213,19 @@ For providing Flink with the required Hadoop dependencies, we recommend setting

If that is not possible, the dependencies can also be put into the `lib/` folder of Flink.

Flink also offers pre-bundled Hadoop fat jars for placing them in the `lib/` folder, on the [Downloads / Additional Components]({{site.download_url}}#additional-components) section of the website. These pre-bundled fat jars are shaded to avoid dependency conflicts with common libraries. The Flink community is not testing the YARN integration against these pre-bundled jars.
Flink also offers pre-bundled Hadoop fat jars for placing them in the `lib/` folder, on the [Downloads / Additional Components]({{site.zh_download_url}}#additional-components) section of the website. These pre-bundled fat jars are shaded to avoid dependency conflicts with common libraries. The Flink community is not testing the YARN integration against these pre-bundled jars.

### Running Flink on YARN behind Firewalls

Some YARN clusters use firewalls for controlling the network traffic between the cluster and the rest of the network.
In those setups, Flink jobs can only be submitted to a YARN session from within the cluster's network (behind the firewall).
If this is not feasible for production use, Flink allows to configure a port range for its REST endpoint, used for the client-cluster communication. With this range configured, users can also submit jobs to Flink crossing the firewall.

The configuration parameter for specifying the REST endpoint port is [rest.bind-port]({% link deployment/config.md %}#rest-bind-port). This configuration option accepts single ports (for example: "50010"), ranges ("50000-50025"), or a combination of both.
The configuration parameter for specifying the REST endpoint port is [rest.bind-port]({% link deployment/config.zh.md %}#rest-bind-port). This configuration option accepts single ports (for example: "50010"), ranges ("50000-50025"), or a combination of both.

### User jars & Classpath

By default Flink will include the user jars into the system classpath when running a single job. This behavior can be controlled with the [yarn.per-job-cluster.include-user-jar]({% link deployment/config.md %}#yarn-per-job-cluster-include-user-jar) parameter.
By default Flink will include the user jars into the system classpath when running a single job. This behavior can be controlled with the [yarn.per-job-cluster.include-user-jar]({% link deployment/config.zh.md %}#yarn-per-job-cluster-include-user-jar) parameter.

When setting this to `DISABLED` Flink will include the jar in the user classpath instead.

Expand Down
2 changes: 1 addition & 1 deletion docs/dev/batch/hadoop_compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ and Reducers.
</dependency>
{% endhighlight %}

If you want to run your Flink application locally (from your IDE), you also need to add
If you want to run your Flink application locally (e.g. from your IDE), you also need to add
a `hadoop-client` dependency such as:

{% highlight xml %}
Expand Down
3 changes: 3 additions & 0 deletions docs/dev/batch/hadoop_compatibility.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ and Reducers.
</dependency>
{% endhighlight %}

If you want to run your Flink application locally (e.g. from your IDE), you also need to add
a `hadoop-client` dependency such as:

{% highlight xml %}
<dependency>
<groupId>org.apache.hadoop</groupId>
Expand Down
6 changes: 5 additions & 1 deletion docs/dev/project-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,11 @@ for details on how to build Flink for a specific Scala version.

If you want to use Flink with Hadoop, you need to have a Flink setup that includes the Hadoop dependencies, rather than
adding Hadoop as an application dependency. Flink will use the Hadoop dependencies specified by the `HADOOP_CLASSPATH`
environment variable, which can usually be set by calling `export HADOOP_CLASSPATH=``hadoop classpath```
environment variable, which can be set in the following way:

{% highlight bash %}
export HADOOP_CLASSPATH=`hadoop classpath`
{% endhighlight %}

There are two main reasons for that design:

Expand Down
6 changes: 5 additions & 1 deletion docs/dev/project-configuration.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,11 @@ for details on how to build Flink for a specific Scala version.

If you want to use Flink with Hadoop, you need to have a Flink setup that includes the Hadoop dependencies, rather than
adding Hadoop as an application dependency. Flink will use the Hadoop dependencies specified by the `HADOOP_CLASSPATH`
environment variable, which can usually be set by calling `export HADOOP_CLASSPATH=``hadoop classpath```
environment variable, which can be set in the following way:

{% highlight bash %}
export HADOOP_CLASSPATH=`hadoop classpath`
{% endhighlight %}

There are two main reasons for that design:

Expand Down
2 changes: 1 addition & 1 deletion docs/dev/table/connectors/hive/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ to make the integration work in Table API program or SQL in SQL Client.
Alternatively, you can put these dependencies in a dedicated folder, and add them to classpath with the `-C`
or `-l` option for Table API program or SQL Client respectively.

Apache Hive is built on Hadoop, so you need to provide Hadoop dependenies, by setting the `HADOOP_CLASSPATH`
Apache Hive is built on Hadoop, so you need to provide Hadoop dependencies, by setting the `HADOOP_CLASSPATH`
environment variable:
```
export HADOOP_CLASSPATH=`hadoop classpath`
Expand Down
Loading

0 comments on commit 493a700

Please sign in to comment.