title	description	services	documentationcenter	author	manager	editor	ms.service	ms.custom	ms.devlang	ms.topic	ms.tgt_pltfrm	ms.workload	ms.date	ms.author
Apache Kafka increase scale - Azure HDInsight \| Microsoft Docs	Learn how to configure managed disks for Apache Kafka cluster on Azure HDInsight to increase scalability.	hdinsight		Blackmist	jhubbard	cgronlun	hdinsight	hdinsightactive		hero-article	na	big-data	09/07/2017	larryfr

Configure storage and scalability for Apache Kafka on HDInsight

Learn how to configure the number of managed disks used by Apache Kafka on HDInsight.

Kafka on HDInsight uses the local disk of the virtual machines in the HDInsight cluster. Since Kafka is very I/O heavy, Azure Managed Disks is used to provide high throughput and provide more storage per node. If traditional virtual hard drives (VHD) were used for Kafka, each node is limited to 1 TB. With managed disks, you can use multiple disks to achieve 16 TB for each node in the cluster.

The following diagram provides a comparison between Kafka on HDInsight before managed disks, and Kafka on HDInsight with managed disks:

Configure managed disks: Azure portal

Follow the steps in the Create an HDInsight cluster to understand the common steps to create a cluster using the portal. Do not complete the portal creation process.
From the Cluster size section, use the Disks per worker node field to configure the number of disks.

[!NOTE] The type of managed disk can be either Standard (HDD) or Premium (SSD). Premium disks are used with DS and GS series VMs. All other VM types use standard.

Configure managed disks: Resource Manager template

To control the number of disks used by the worker nodes in a Kafka cluster, use the following section of the template:

"dataDisksGroups": [
    {
        "disksPerNode": "[variables('disksPerWorkerNode')]"
    }
    ],

You can find a complete template that demonstrates how to configure managed disks at https://hditutorialdata.blob.core.windows.net/armtemplates/create-linux-based-kafka-mirror-cluster-in-vnet-v2.1.json.

Next steps

For more information on working with Kafka on HDInsight, see the following documents:

Use MirrorMaker to create a replica of Kafka on HDInsight
Use Apache Storm with Kafka on HDInsight
Use Apache Spark with Kafka on HDInsight
Connect to Kafka through an Azure Virtual Network
HDInsight blog on managed disks with Kafka

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apache-kafka-scalability.md

apache-kafka-scalability.md

Configure storage and scalability for Apache Kafka on HDInsight

Configure managed disks: Azure portal

Configure managed disks: Resource Manager template

Next steps

Files

apache-kafka-scalability.md

Latest commit

History

apache-kafka-scalability.md

File metadata and controls

Configure storage and scalability for Apache Kafka on HDInsight

Configure managed disks: Azure portal

Configure managed disks: Resource Manager template

Next steps