[hotfix][docs] Remove HADOOP_CONF_DIR

The Hadoop configuration page was quite confusing because it suggested that the user has to set the HADOOP_CONF_DIR as well. All jars and configs can and should be passed through HADOOP_CLASSPATH. Period. This closes apache#12491
zhengcanbin · Jun 5, 2020 · b48d330 · b48d330
1 parent 13bae63
commit b48d330
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 32 deletions.
diff --git a/docs/ops/deployment/hadoop.md b/docs/ops/deployment/hadoop.md
@@ -32,7 +32,7 @@ under the License.
 In order to use Hadoop features (e.g., YARN, HDFS) it is necessary to provide Flink with the required Hadoop classes,
 as these are not bundled by default.
 
-This can be done by adding the Hadoop classpath to Flink through the `HADOOP_CLASSPATH` environment variable.
+The recommended approach is adding the Hadoop classpath to Flink through the `HADOOP_CLASSPATH` environment variable.
 
 Flink will use the environment variable `HADOOP_CLASSPATH` to augment the
 classpath that is used when starting Flink components such as the Client,
@@ -44,27 +44,16 @@ that are running Flink components.
 When running on YARN, this is usually not a problem because the components
 running inside YARN will be started with the Hadoop classpaths, but it can
 happen that the Hadoop dependencies must be in the classpath when submitting a
-job to YARN. For this, it's usually enough to do a
+job to YARN. For this, it's usually enough to run
 
 {% highlight bash %}
 export HADOOP_CLASSPATH=`hadoop classpath`
 {% endhighlight %}
 
-in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an argument that will make it print the configured Hadoop classpath.
+in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an argument that will make it print the configured Hadoop classpath. The classpath returned by `hadoop classpath` also includes the Hadoop configuration directories.
 
-Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
-
-## Referencing a Hadoop configuration
-
-You can reference a Hadoop configuration by setting the environment variable `HADOOP_CONF_DIR`.
-
-```sh
-HADOOP_CONF_DIR=/path/to/etc/hadoop
-```
-
-Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl }}/ops/config.html#hdfs) is deprecated.
-
-Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details above.
+If you are manually assembling the `HADOOP_CLASSPATH` variable, we recommend
+adding the Hadoop configuration directories as well.
 
 ## Running a job locally
 

diff --git a/docs/ops/deployment/hadoop.zh.md b/docs/ops/deployment/hadoop.zh.md
@@ -32,7 +32,7 @@ under the License.
 In order to use Hadoop features (e.g., YARN, HDFS) it is necessary to provide Flink with the required Hadoop classes,
 as these are not bundled by default.
 
-This can be done by adding the Hadoop classpath to Flink through the `HADOOP_CLASSPATH` environment variable.
+The recommended approach is adding the Hadoop classpath to Flink through the `HADOOP_CLASSPATH` environment variable.
 
 Flink will use the environment variable `HADOOP_CLASSPATH` to augment the
 classpath that is used when starting Flink components such as the Client,
@@ -44,27 +44,16 @@ that are running Flink components.
 When running on YARN, this is usually not a problem because the components
 running inside YARN will be started with the Hadoop classpaths, but it can
 happen that the Hadoop dependencies must be in the classpath when submitting a
-job to YARN. For this, it's usually enough to do a
+job to YARN. For this, it's usually enough to run
 
 {% highlight bash %}
 export HADOOP_CLASSPATH=`hadoop classpath`
 {% endhighlight %}
 
-in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an argument that will make it print the configured Hadoop classpath.
+in the shell. Note that `hadoop` is the hadoop binary and that `classpath` is an argument that will make it print the configured Hadoop classpath. The classpath returned by `hadoop classpath` also includes the Hadoop configuration directories.
 
-Putting the Hadoop configuration in the same class path as the Hadoop libraries makes Flink pick up that configuration.
-
-## Referencing a Hadoop configuration
-
-You can reference a Hadoop configuration by setting the environment variable `HADOOP_CONF_DIR`.
-
-```sh
-HADOOP_CONF_DIR=/path/to/etc/hadoop
-```
-
-Referencing the HDFS configuration in the [Flink configuration]({{ site.baseurl }}/ops/config.html#hdfs) is deprecated.
-
-Another way to provide the Hadoop configuration is to have it on the class path of the Flink process, see more details above.
+If you are manually assembling the `HADOOP_CLASSPATH` variable, we recommend
+adding the Hadoop configuration directories as well.
 
 ## Running a job locally