Skip to content

Latest commit

 

History

History
160 lines (106 loc) · 5.53 KB

ingest-data-kafka.md

File metadata and controls

160 lines (106 loc) · 5.53 KB
title description services author ms.author ms.reviewer ms.service ms.topic ms.date
Quickstart: Ingest data from Kafka into Azure Data Explorer
In this quickstart, you learn how to ingest (load) data into Azure Data Explorer from Kafka.
data-explorer
orspod
v-orspod
mblythe
data-explorer
quickstart
11/19/2018

Quickstart: Ingest data from Kafka into Azure Data Explorer

Azure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. Azure Data Explorer offers ingestion (data loading) from Kafka. Kafka is a distributed streaming platform that allows building of real-time streaming data pipelines that reliably move data between systems or applications.

Prerequisites

Kafka connector setup

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. The ADX Kafka Sink serves as the connector from Kafka.

Bundle

Kafka can load a .jar as a plugin that will act as a custom connector. To produce such a .jar, we will clone the code locally and build using Maven.

Clone

git clone git://github.com:Azure/kafka-sink-azure-kusto.git
cd ./kafka-sink-azure-kusto/kafka/

Build

Build locally with Maven to produce a .jar complete with dependencies.

Inside the root directory kafka-sink-azure-kusto, run:

mvn clean compile assembly:single

Deploy

Load plugin into Kafka. An deployment example using docker can be found at kafka-sink-azure-kusto

Detailed documentation on Kafka connectors and how to deploy them can be found at Kafka Connect

Example configuration

name=KustoSinkConnector 
connector.class=com.microsoft.azure.kusto.kafka.connect.sink.KustoSinkConnector 
kusto.sink.flush_interval_ms=300000 
key.converter=org.apache.kafka.connect.storage.StringConverter 
value.converter=org.apache.kafka.connect.storage.StringConverter 
tasks.max=1 
topics=testing1 
kusto.tables.topics_mapping=[{'topic': 'testing1','db': 'daniel', 'table': 'TestTable','format': 'json', 'mapping':'TestMapping'}] 
kusto.auth.authority=XXX 
kusto.url=https://ingest-{mycluster}.kusto.windows.net/ 
kusto.auth.appid=XXX 
kusto.auth.appkey=XXX 
kusto.sink.tempdir=/var/tmp/ 
kusto.sink.flush_size=1000

Create a target table in ADX

Create a table in ADX to which Kafka can send data. Create the table in the cluster and database provisioned in the Prerequisites.

  1. In the Azure portal, navigate to your cluster and select Query.

    Query application link

  2. Copy the following command into the window and select Run.

    .create table TestTable (TimeStamp: datetime, Name: string, Metric: int, Source:string)

    Run create query

  3. Copy the following command into the window and select Run.

    .create table TestTable ingestion json mapping 'TestMapping' '[{"column":"TimeStamp","path":"$.timeStamp","datatype":"datetime"},{"column":"Name","path":"$.name","datatype":"string"},{"column":"Metric","path":"$.metric","datatype":"int"},{"column":"Source","path":"$.source","datatype":"string"}]'

    This command maps incoming JSON data to the column names and data types of the table (TestTable).

Generate sample data

Now that the Kafka cluster is connected to ADX, use the sample app you downloaded to generate data.

Clone

Clone the sample app locally:

git clone git://github.com:Azure/azure-kusto-samples-dotnet.git
cd ./azure-kusto-samples-dotnet/kafka/

Run the app

  1. Open the sample app solution in Visual Studio.

  2. In the Program.cs file, update the connectionString constant to your Kafka connection string.

    const string connectionString = @"<YourConnectionString>";
  3. Build and run the app. The app sends messages to the Kafka cluster, and it prints out its status every ten seconds.

  4. After the app has sent a few messages, move on to the next step.

Query and review the data

  1. To make sure no errors occured during ingestion:

    .show ingestion failures
  2. To see the newly ingested data:

    TestTable 
    | count
  3. To see the content of the messages:

    TestTable

    The result set should look like the following:

    Message result set

Next steps

[!div class="nextstepaction"] Quickstart: Query data in Azure Data Explorer