Skip to content

Commit

Permalink
[FLINK-15698][docs] Restructure the Configuration docs
Browse files Browse the repository at this point in the history
  - Grouping options by semantics and functionality, rather than by defined class.
  - Splitting between "normal/common options" and options that should only be necessary
    for trouble-shooting.
  • Loading branch information
StephanEwen committed Feb 3, 2020
1 parent c8ade87 commit afb11a9
Show file tree
Hide file tree
Showing 48 changed files with 2,210 additions and 397 deletions.
66 changes: 66 additions & 0 deletions docs/_includes/generated/all_jobmanager_section.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Key</th>
<th class="text-left" style="width: 15%">Default</th>
<th class="text-left" style="width: 10%">Type</th>
<th class="text-left" style="width: 55%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>jobmanager.archive.fs.dir</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>Dictionary for JobManager to store the archives of completed jobs.</td>
</tr>
<tr>
<td><h5>jobmanager.execution.attempts-history-size</h5></td>
<td style="word-wrap: break-word;">16</td>
<td>Integer</td>
<td>The maximum number of prior execution attempts kept in history.</td>
</tr>
<tr>
<td><h5>jobmanager.execution.failover-strategy</h5></td>
<td style="word-wrap: break-word;">region</td>
<td>String</td>
<td>This option specifies how the job computation recovers from task failures. Accepted values are:<ul><li>'full': Restarts all tasks to recover the job.</li><li>'region': Restarts all tasks that could be affected by the task failure. More details can be found <a href="../dev/task_failure_recovery.html#restart-pipelined-region-failover-strategy">here</a>.</li></ul></td>
</tr>
<tr>
<td><h5>jobmanager.heap.size</h5></td>
<td style="word-wrap: break-word;">"1024m"</td>
<td>String</td>
<td>JVM heap size for the JobManager.</td>
</tr>
<tr>
<td><h5>jobmanager.rpc.address</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>The config parameter defining the network address to connect to for communication with the job manager. This value is only interpreted in setups where a single JobManager with static name or address exists (simple standalone setups, or container setups with dynamic service name resolution). It is not used in many high-availability setups, when a leader-election service (like ZooKeeper) is used to elect and discover the JobManager leader from potentially multiple standby JobManagers.</td>
</tr>
<tr>
<td><h5>jobmanager.rpc.port</h5></td>
<td style="word-wrap: break-word;">6123</td>
<td>Integer</td>
<td>The config parameter defining the network port to connect to for communication with the job manager. Like jobmanager.rpc.address, this value is only interpreted in setups where a single JobManager with static name/address and port exists (simple standalone setups, or container setups with dynamic service name resolution). This config option is not used in many high-availability setups, when a leader-election service (like ZooKeeper) is used to elect and discover the JobManager leader from potentially multiple standby JobManagers.</td>
</tr>
<tr>
<td><h5>jobstore.cache-size</h5></td>
<td style="word-wrap: break-word;">52428800</td>
<td>Long</td>
<td>The job store cache size in bytes which is used to keep completed jobs in memory.</td>
</tr>
<tr>
<td><h5>jobstore.expiration-time</h5></td>
<td style="word-wrap: break-word;">3600</td>
<td>Long</td>
<td>The time in seconds after which a completed job expires and is purged from the job store.</td>
</tr>
<tr>
<td><h5>jobstore.max-capacity</h5></td>
<td style="word-wrap: break-word;">2147483647</td>
<td>Integer</td>
<td>The max number of completed jobs that can be kept in the job store.</td>
</tr>
</tbody>
</table>
96 changes: 96 additions & 0 deletions docs/_includes/generated/all_taskmanager_network_section.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Key</th>
<th class="text-left" style="width: 15%">Default</th>
<th class="text-left" style="width: 10%">Type</th>
<th class="text-left" style="width: 55%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>taskmanager.network.blocking-shuffle.compression.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Boolean flag indicating whether the shuffle data will be compressed for blocking shuffle mode. Note that data is compressed per buffer and compression can incur extra CPU overhead, so it is more effective for IO bounded scenario when data compression ratio is high. Currently, shuffle data compression is an experimental feature and the config option can be changed in the future.</td>
</tr>
<tr>
<td><h5>taskmanager.network.blocking-shuffle.type</h5></td>
<td style="word-wrap: break-word;">"file"</td>
<td>String</td>
<td>The blocking shuffle type, either "mmap" or "file". The "auto" means selecting the property type automatically based on system memory architecture (64 bit for mmap and 32 bit for file). Note that the memory usage of mmap is not accounted by configured memory limits, but some resource frameworks like yarn would track this memory usage and kill the container once memory exceeding some threshold. Also note that this option is experimental and might be changed future.</td>
</tr>
<tr>
<td><h5>taskmanager.network.detailed-metrics</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Boolean flag to enable/disable more detailed metrics about inbound/outbound network queue lengths.</td>
</tr>
<tr>
<td><h5>taskmanager.network.memory.buffers-per-channel</h5></td>
<td style="word-wrap: break-word;">2</td>
<td>Integer</td>
<td>Maximum number of network buffers to use for each outgoing/incoming channel (subpartition/input channel).In credit-based flow control mode, this indicates how many credits are exclusive in each input channel. It should be configured at least 2 for good performance. 1 buffer is for receiving in-flight data in the subpartition and 1 buffer is for parallel serialization.</td>
</tr>
<tr>
<td><h5>taskmanager.network.memory.floating-buffers-per-gate</h5></td>
<td style="word-wrap: break-word;">8</td>
<td>Integer</td>
<td>Number of extra network buffers to use for each outgoing/incoming gate (result partition/input gate). In credit-based flow control mode, this indicates how many floating credits are shared among all the input channels. The floating buffers are distributed based on backlog (real-time output buffers in the subpartition) feedback, and can help relieve back-pressure caused by unbalanced data distribution among the subpartitions. This value should be increased in case of higher round trip times between nodes and/or larger number of machines in the cluster.</td>
</tr>
<tr>
<td><h5>taskmanager.network.netty.client.connectTimeoutSec</h5></td>
<td style="word-wrap: break-word;">120</td>
<td>Integer</td>
<td>The Netty client connection timeout.</td>
</tr>
<tr>
<td><h5>taskmanager.network.netty.client.numThreads</h5></td>
<td style="word-wrap: break-word;">-1</td>
<td>Integer</td>
<td>The number of Netty client threads.</td>
</tr>
<tr>
<td><h5>taskmanager.network.netty.num-arenas</h5></td>
<td style="word-wrap: break-word;">-1</td>
<td>Integer</td>
<td>The number of Netty arenas.</td>
</tr>
<tr>
<td><h5>taskmanager.network.netty.sendReceiveBufferSize</h5></td>
<td style="word-wrap: break-word;">0</td>
<td>Integer</td>
<td>The Netty send and receive buffer size. This defaults to the system buffer size (cat /proc/sys/net/ipv4/tcp_[rw]mem) and is 4 MiB in modern Linux.</td>
</tr>
<tr>
<td><h5>taskmanager.network.netty.server.backlog</h5></td>
<td style="word-wrap: break-word;">0</td>
<td>Integer</td>
<td>The netty server connection backlog.</td>
</tr>
<tr>
<td><h5>taskmanager.network.netty.server.numThreads</h5></td>
<td style="word-wrap: break-word;">-1</td>
<td>Integer</td>
<td>The number of Netty server threads.</td>
</tr>
<tr>
<td><h5>taskmanager.network.netty.transport</h5></td>
<td style="word-wrap: break-word;">"auto"</td>
<td>String</td>
<td>The Netty transport type, either "nio" or "epoll". The "auto" means selecting the property mode automatically based on the platform. Note that the "epoll" mode can get better performance, less GC and have more advanced features which are only available on modern Linux.</td>
</tr>
<tr>
<td><h5>taskmanager.network.request-backoff.initial</h5></td>
<td style="word-wrap: break-word;">100</td>
<td>Integer</td>
<td>Minimum backoff in milliseconds for partition requests of input channels.</td>
</tr>
<tr>
<td><h5>taskmanager.network.request-backoff.max</h5></td>
<td style="word-wrap: break-word;">10000</td>
<td>Integer</td>
<td>Maximum backoff in milliseconds for partition requests of input channels.</td>
</tr>
</tbody>
</table>
115 changes: 115 additions & 0 deletions docs/_includes/generated/all_taskmanager_section.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Key</th>
<th class="text-left" style="width: 15%">Default</th>
<th class="text-left" style="width: 10%">Type</th>
<th class="text-left" style="width: 55%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>task.cancellation.interval</h5></td>
<td style="word-wrap: break-word;">30000</td>
<td>Long</td>
<td>Time interval between two successive task cancellation attempts in milliseconds.</td>
</tr>
<tr>
<td><h5>task.cancellation.timeout</h5></td>
<td style="word-wrap: break-word;">180000</td>
<td>Long</td>
<td>Timeout in milliseconds after which a task cancellation times out and leads to a fatal TaskManager error. A value of 0 deactivates the watch dog.</td>
</tr>
<tr>
<td><h5>task.cancellation.timers.timeout</h5></td>
<td style="word-wrap: break-word;">7500</td>
<td>Long</td>
<td>Time we wait for the timers in milliseconds to finish all pending timer threads when the stream task is cancelled.</td>
</tr>
<tr>
<td><h5>taskmanager.data.port</h5></td>
<td style="word-wrap: break-word;">0</td>
<td>Integer</td>
<td>The task manager’s port used for data exchange operations.</td>
</tr>
<tr>
<td><h5>taskmanager.data.ssl.enabled</h5></td>
<td style="word-wrap: break-word;">true</td>
<td>Boolean</td>
<td>Enable SSL support for the taskmanager data transport. This is applicable only when the global flag for internal SSL (security.ssl.internal.enabled) is set to true</td>
</tr>
<tr>
<td><h5>taskmanager.debug.memory.log</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Flag indicating whether to start a thread, which repeatedly logs the memory usage of the JVM.</td>
</tr>
<tr>
<td><h5>taskmanager.debug.memory.log-interval</h5></td>
<td style="word-wrap: break-word;">5000</td>
<td>Long</td>
<td>The interval (in ms) for the log thread to log the current memory usage.</td>
</tr>
<tr>
<td><h5>taskmanager.host</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>The address of the network interface that the TaskManager binds to. This option can be used to define explicitly a binding address. Because different TaskManagers need different values for this option, usually it is specified in an additional non-shared TaskManager-specific config file.</td>
</tr>
<tr>
<td><h5>taskmanager.jvm-exit-on-oom</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to kill the TaskManager when the task thread throws an OutOfMemoryError.</td>
</tr>
<tr>
<td><h5>taskmanager.memory.segment-size</h5></td>
<td style="word-wrap: break-word;">32 kb</td>
<td>MemorySize</td>
<td>Size of memory buffers used by the network stack and the memory manager.</td>
</tr>
<tr>
<td><h5>taskmanager.network.bind-policy</h5></td>
<td style="word-wrap: break-word;">"ip"</td>
<td>String</td>
<td>The automatic address binding policy used by the TaskManager if "taskmanager.host" is not set. The value should be one of the following:
<ul><li>"name" - uses hostname as binding address</li><li>"ip" - uses host's ip address as binding address</li></ul></td>
</tr>
<tr>
<td><h5>taskmanager.numberOfTaskSlots</h5></td>
<td style="word-wrap: break-word;">1</td>
<td>Integer</td>
<td>The number of parallel operator or user function instances that a single TaskManager can run. If this value is larger than 1, a single TaskManager takes multiple instances of a function or operator. That way, the TaskManager can utilize multiple CPU cores, but at the same time, the available memory is divided between the different operator or function instances. This value is typically proportional to the number of physical CPU cores that the TaskManager's machine has (e.g., equal to the number of cores, or half the number of cores).</td>
</tr>
<tr>
<td><h5>taskmanager.registration.initial-backoff</h5></td>
<td style="word-wrap: break-word;">500 ms</td>
<td>Duration</td>
<td>The initial registration backoff between two consecutive registration attempts. The backoff is doubled for each new registration attempt until it reaches the maximum registration backoff.</td>
</tr>
<tr>
<td><h5>taskmanager.registration.max-backoff</h5></td>
<td style="word-wrap: break-word;">30 s</td>
<td>Duration</td>
<td>The maximum registration backoff between two consecutive registration attempts. The max registration backoff requires a time unit specifier (ms/s/min/h/d).</td>
</tr>
<tr>
<td><h5>taskmanager.registration.refused-backoff</h5></td>
<td style="word-wrap: break-word;">10 s</td>
<td>Duration</td>
<td>The backoff after a registration has been refused by the job manager before retrying to connect.</td>
</tr>
<tr>
<td><h5>taskmanager.registration.timeout</h5></td>
<td style="word-wrap: break-word;">5 min</td>
<td>Duration</td>
<td>Defines the timeout for the TaskManager registration. If the duration is exceeded without a successful registration, then the TaskManager terminates.</td>
</tr>
<tr>
<td><h5>taskmanager.rpc.port</h5></td>
<td style="word-wrap: break-word;">"0"</td>
<td>String</td>
<td>The task manager’s IPC port. Accepts a list of ports (“50100,50101”), ranges (“50100-50200”) or a combination of both. It is recommended to set a range of ports to avoid collisions when multiple TaskManagers are running on the same machine.</td>
</tr>
</tbody>
</table>
30 changes: 30 additions & 0 deletions docs/_includes/generated/common_high_availability_section.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Key</th>
<th class="text-left" style="width: 15%">Default</th>
<th class="text-left" style="width: 10%">Type</th>
<th class="text-left" style="width: 55%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>high-availability</h5></td>
<td style="word-wrap: break-word;">"NONE"</td>
<td>String</td>
<td>Defines high-availability mode used for the cluster execution. To enable high-availability, set this mode to "ZOOKEEPER" or specify FQN of factory class.</td>
</tr>
<tr>
<td><h5>high-availability.cluster-id</h5></td>
<td style="word-wrap: break-word;">"/default"</td>
<td>String</td>
<td>The ID of the Flink cluster, used to separate multiple Flink clusters from each other. Needs to be set for standalone clusters but is automatically inferred in YARN and Mesos.</td>
</tr>
<tr>
<td><h5>high-availability.storageDir</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>File system path (URI) where Flink persists metadata in high-availability setups.</td>
</tr>
</tbody>
</table>
24 changes: 24 additions & 0 deletions docs/_includes/generated/common_high_availability_zk_section.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Key</th>
<th class="text-left" style="width: 15%">Default</th>
<th class="text-left" style="width: 10%">Type</th>
<th class="text-left" style="width: 55%">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>high-availability.zookeeper.path.root</h5></td>
<td style="word-wrap: break-word;">"/flink"</td>
<td>String</td>
<td>The root path under which Flink stores its entries in ZooKeeper.</td>
</tr>
<tr>
<td><h5>high-availability.zookeeper.quorum</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>The ZooKeeper quorum to use, when running Flink in a high-availability mode with ZooKeeper.</td>
</tr>
</tbody>
</table>
Loading

0 comments on commit afb11a9

Please sign in to comment.