Skip to content

Commit

Permalink
updated tiered storage demo for cp6 compatablity
Browse files Browse the repository at this point in the history
updated Tiered Storage demo with new docs and docker-compose file for compatabiltiy with CP6.0
  • Loading branch information
Marcinthecloud committed Oct 6, 2020
1 parent 0e46f05 commit 0495956
Show file tree
Hide file tree
Showing 7 changed files with 109 additions and 46 deletions.
51 changes: 29 additions & 22 deletions tiered-storage/README.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,63 @@
# Overview

This demo showcases Confluent's Tiered Storage capability built into Confluent Server.
This demo walks through how to get started with Tiered Storage in Confluent Platform 6.0. Docker must be installed and running.

In Confluent Platform 5.4, Tiered Storage is in preview and are not intended for production. There are several limitations of the preview since not all features are yet supported. For more information:
For more information:

* [Link to documentation](https://docs.confluent.io/current/kafka/tiered-storage-preview.html)
* [Link to documentation](https://docs.confluent.io/current/kafka/tiered-storage.html)

## Concepts

* Hotset - The recent log segments that remain on disk. When the hotset time interval expires for a log segment, it will be deleted from disk, but will still exist in object storage.

<kbd><img src="images/ts-overview.png" /></kbd>

# Run the Demo

## Requirements

* An S3 bucket and AWS credentials to access it


* AWS CLI configured to properly pull from ECS
* An S3 bucket

### Start a Broker with Tiered Storage Enabled

1. Set the following environment variables

```
export AWS_ACCESS_KEY_ID=<YOUR AWS ACCESS KEY>
export AWS_SECRET_ACCESS_KEY=<YOUR AWS SECRET>
export AWS_ACCESS_KEY_ID=<AWS ACCESS KEY>
export AWS_SECRET_ACCESS_KEY=<AWS SECRET KEY>
export BUCKET_NAME=<S3 BUCKET NAME>
export REGION=<S3 REGION>
export REGION=<REGION>
```

2. Run docker compose
```
docker-compose up
```
`$ docker-compose up` or `$ docker-compuse up -d` if you'd like to hide the output and run in the background

3. Confluent Control Center will be available at localhost:9021


## Create a Topic

To observe the results of the demo within a reasonable time frame, we create a topic with a short hotset (1 minute), a short retention period (10 minutes), and smaller log segments (100 MB). These configurations were passed to the broker through the [docker-compose.yml](docker-compose.yml) file. Messages that are produced to this topic will be uploaded to the specified S3 bucket.

```
docker-compose exec broker kafka-topics \
kafka-topics \
--bootstrap-server localhost:9091 \
--create \
--topic test-topic \
--partitions 1
```

Additionally, you can create a topic through Confluent Control Center and configure Tiered Storage from there.

<kbd><img src="images/c3-ts-settings.png" /></kbd>

## Produce Messages to the Topic

After creating the topic, we should produce enough messages to the topic to ensure that log segments will fill to the 100MB limit and be uploaded to the S3 bucket.

```
docker-compose exec broker kafka-producer-perf-test --topic test-topic \
kafka-producer-perf-test --topic test-topic \
--num-records 5000000 \
--record-size 5000 \
--throughput -1 \
Expand Down Expand Up @@ -91,21 +95,21 @@ Finalized UploadComplete(2569b931-7bed-462a-a048-413f27c43ea3) for VPTViNheT4mUB
Because the topic has a short hotset period, log segments that are uploaded to the S3 bucket will not remain on disk for long. The log segments with the earliest offsets will start to be deleted from disk, since a copy of them resides in object storage. We can still consume these messages that now reside only in the S3 bucket. We can create a consumer that is configured to read messages from the beginning of the topic:

```
docker-compose exec broker kafka-consumer-perf-test --topic test-topic \
kafka-consumer-perf-test --topic multi-region-async \
--messages 5000 \
--threads 1 \
--broker-list localhost:9091 \
--broker-list localhost:9092 \
--timeout 20000 \
--consumer.config /etc/kafka/demo/consumer.config
--consumer.config config/consumer.config
```

## Monitor JMX Metrics
## Monitoring Tiered Storage

It is likely that there is no obvious difference between reading messages delivered from the S3 bucket versus reading messages from log segments on disk. We can query metrics from the broker to verify that the consumer was reading messages delivered from the broker.

```
./scripts/jmx_metrics.sh
```
You can monitor various Tiered Storage metrics through the C3 dashboard

<kbd><img src="images/ts-metrics.png" /></kbd>

## Delete the Topic

Expand All @@ -114,7 +118,7 @@ The log segment files will remain in the S3 bucket for the duration of the topic
Alternatively, if we delete the topic from the broker, the broker will delete the topic's log segments from the S3 bucket before the retention period has finished. The broker scans for log segments that need to be deleted on a time interval, which was configured to 5 minutes for this demo (default interval is 3 hours). We can run the following command to delete the topic:

```
docker-compose exec broker kafka-topics \
kafka-topics \
--bootstrap-server localhost:9091 \
--delete \
--topic test-topic
Expand All @@ -132,3 +136,6 @@ Completed partition deletion for cnHmnjURSwuq_yqRvv6Xow-test-topic-0 (kafka.tier
Stopping deletion process for cnHmnjURSwuq_yqRvv6Xow-test-topic-0 after task completion (kafka.tier.tasks.delete.DeletionTask)
Completed deleting segments for cnHmnjURSwuq_yqRvv6Xow-test-topic-0 (kafka.tier.TierDeletedPartitionsCoordinator)
```


Docs: https://github.com/confluentinc/docs/pull/2884/files
26 changes: 26 additions & 0 deletions tiered-storage/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Key Benefits

Faster Rebalancing

Cheaper to run

More easily meet regulator requirements

Easier ML training and deployment



# Key Questions


Will infinite retention be made in Confluent Cloud?

With Tiered Storage, can the cluster use less nodes?

# Links

go/internal-demos

go/artifacts

go/tiered-storage-docs
58 changes: 34 additions & 24 deletions tiered-storage/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,53 +1,63 @@
version: '3'
version: "3"

services:
zookeeper:
image: confluentinc/cp-zookeeper:5.4.0
image: confluentinc/cp-zookeeper:latest
hostname: zookeeper
container_name: zookeeper
networks:
- n1
ports:
- "2181:2181"
environment:
ZOOKEEPER_SERVER_ID: 1
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_SERVERS: zookeeper:2888:3888
ZOOKEEPER_SERVER_ID: 1
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_SERVERS: zookeeper:2888:3888

broker:
image: confluentinc/cp-server:5.4.0
image: confluentinc/cp-server:latest
hostname: broker
container_name: broker
networks:
- n1
ports:
- "9091:9091"
- "8091:8091"
volumes:
- ./config:/etc/kafka/demo
environment:
KAFKA_BROKER_ID: 1
KAFKA_CONFLUENT_TIER_FEATURE: 'true'
KAFKA_CONFLUENT_TIER_ENABLE: 'true'
KAFKA_CONFLUENT_TIER_FEATURE: "true"
KAFKA_CONFLUENT_TIER_ENABLE: "true"
KAFKA_CONFLUENT_TIER_BACKEND: S3
KAFKA_CONFLUENT_TIER_S3_BUCKET: ${BUCKET_NAME}
KAFKA_CONFLUENT_TIER_S3_REGION: ${REGION}
KAFKA_CONFLUENT_TIER_S3_AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
KAFKA_CONFLUENT_TIER_S3_AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
KAFKA_CONFLUENT_TIER_LOCAL_HOTSET_MS: 60000 # hotset of 1 minute
KAFKA_CONFLUENT_TIER_TOPIC_DELETE_CHECK_INTERVAL_MS: 300000 # check every 5 min for topic deletion
KAFKA_CONFLUENT_TIER_METADATA_REPLICATION_FACTOR: 1 # only one broker, so replication factor is one
KAFKA_CONFLUENT_TIER_TOPIC_DELETE_CHECK_INTERVAL: 300000 # check every 5 min for topic deletion
KAFKA_CONFLUENT_TIER_METADATA_REPLICATION_FACTOR: 1
KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://broker:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
KAFKA_CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: localhost:9091
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
KAFKA_LOG_SEGMENT_BYTES: 104857600 # 100 MB log segments
KAFKA_LOG_RETENTION_MS: 600000 # 10 minute retention
KAFKA_JMX_PORT: 8091
depends_on:
- zookeeper

networks:
n1:
control-center:
image: confluentinc/cp-enterprise-control-center:latest
hostname: control-center
container_name: control-center
depends_on:
- zookeeper
- broker
ports:
- "9021:9021"
environment:
CONTROL_CENTER_BOOTSTRAP_SERVERS: 'broker:19091'
CONTROL_CENTER_ZOOKEEPER_CONNECT: 'zookeeper:2181'
CONTROL_CENTER_REPLICATION_FACTOR: 1
CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
CONFLUENT_METRICS_TOPIC_REPLICATION: 1
CONTROL_CENTER_CONFLUENT_CONTROL_CENTER_INTERNAL_TOPICS_REPLICATION: 1
PORT: 9021
Binary file added tiered-storage/images/c3-ts-settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tiered-storage/images/ts-metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tiered-storage/images/ts-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions tiered-storage/produce.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#! /bin/bash

echo -e "\n\n==> Creating a demo topic \n"

docker exec broker kafka-topics \
--bootstrap-server localhost:9092 \
--create \
--topic demo-topic \
--partitions 1

echo -e "\n\n==> Producing to demo-topic \n"

docker exec broker kafka-producer-perf-test --topic demo-topic \
--num-records 5000000 \
--record-size 5000 \
--throughput -1 \
--producer-props \
acks=all \
bootstrap.servers=localhost:9092 \
batch.size=8196

0 comments on commit 0495956

Please sign in to comment.