Skip to content

Commit

Permalink
[hotfix][docs] Add note about Kinesis producer limitations
Browse files Browse the repository at this point in the history
This closes apache#2229
  • Loading branch information
rmetzger committed Jul 12, 2016
1 parent f0387ac commit 971dcc5
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 9 deletions.
11 changes: 6 additions & 5 deletions docs/apis/streaming/connectors/kinesis.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,8 @@ mvn clean install -Pinclude-kinesis -DskipTests
{% endhighlight %}



Note that the streaming connectors are not part of the binary distribution.
See how to link with them for cluster execution [here]({{site.baseurl}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).
The streaming connectors are not part of the binary distribution. See how to link with them for cluster
execution [here]({{site.baseurl}}/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution).

### Using the Amazon Kinesis Streams Service
Follow the instructions from the [Amazon Kinesis Streams Developer Guide](https://docs.aws.amazon.com/streams/latest/dev/learning-kinesis-module-one-create-stream.html)
Expand Down Expand Up @@ -240,8 +239,10 @@ consumer when calling this API can also be modified by using the other keys pref
### Kinesis Producer

The `FlinkKinesisProducer` is used for putting data from a Flink stream into a Kinesis stream. Note that the producer is not participating in
Flink's checkpointing and doesn't provide exactly-once processing guarantees. In case of a failure, data will be written again
to Kinesis, leading to duplicates. This behavior is usually called "at-least-once" semantics.
Flink's checkpointing and doesn't provide exactly-once processing guarantees.
Also, the Kinesis producer does not guarantee that records are written in order to the shards (See [here](https://github.com/awslabs/amazon-kinesis-producer/issues/23) and [here](http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html#API_PutRecord_RequestSyntax) for more details).

In case of a failure or a resharding, data will be written again to Kinesis, leading to duplicates. This behavior is usually called "at-least-once" semantics.

To put data into a Kinesis stream, make sure the stream is marked as "ACTIVE" in the AWS dashboard.

Expand Down
9 changes: 5 additions & 4 deletions docs/apis/streaming/fault_tolerance.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,9 @@ env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)
### Fault Tolerance Guarantees of Data Sources and Sinks

Flink can guarantee exactly-once state updates to user-defined state only when the source participates in the
snapshotting mechanism. This is currently guaranteed for the Kafka source and AWS Kinesis Streams source (and internal number generators), but
not for other sources. The following table lists the state update guarantees of Flink coupled with the bundled sources:
snapshotting mechanism. The following table lists the state update guarantees of Flink coupled with the bundled connectors.

Please read the documentation of each connector to understand the details of the fault tolerance guarantees.

<table class="table table-bordered">
<thead>
Expand Down Expand Up @@ -142,8 +143,8 @@ not for other sources. The following table lists the state update guarantees of
</tr>
<tr>
<td>Files</td>
<td>at least once</td>
<td>At failure the file will be read from the beginning</td>
<td>exactly once</td>
<td></td>
</tr>
<tr>
<td>Sockets</td>
Expand Down

0 comments on commit 971dcc5

Please sign in to comment.