Skip to content

Latest commit

 

History

History
359 lines (236 loc) · 15.8 KB

Lab5.md

File metadata and controls

359 lines (236 loc) · 15.8 KB

Lab 5: Ingest and Analyse real-time data with Event Hubs and Stream Analytics

In this lab you will use an Azure Logic App to simulate high-frequency stock market of the NYSE. Every second the LogicApp generates a random number of stock purchase transactions with variable amounts for 5 of the biggest tech companies in the world. The LogicApp then sends stock transaction messages to Event Hubs. You will use Stream Analytics queries to join hot and cold data streams to process the high volume of transactions and generate aggregate calculations. The results will be sent to a real-time dataset in Power BI.

IMPORTANT: This lab requires you have a valid Power BI account. If you don’t have one you can register for a 60-day trial here: https://powerbi.microsoft.com/en-us/power-bi-pro/

The estimated time to complete this lab is: 60 minutes.

Microsoft Learn & Technical Documentation

The following Azure services will be used in this lab. If you need further training resources or access to technical documentation please find in the table below links to Microsoft Learn and to each service's Technical Documentation.

Azure Service Microsoft Learn Technical Documentation
Azure Data Lake Gen2 Large Scale Data Processing with Azure Data Lake Storage Gen2 Azure Data Lake Gen2 Technical Documentation
Azure Logic Apps Build automated workflows to integrate data and apps with Azure Logic Apps Azure Logic Apps Technical Documentation
Azure Event Hubs Enable reliable messaging for Big Data applications using Azure Event Hubs Azure Event Hubs Technical Documentation
Azure Stream Analytics Implement a Data Streaming Solution with Azure Streaming Analytics Azure Stream Analytics Technical Documentation
Power BI Create and use analytics reports with Power BI Power BI Technical Documentation

Lab Architecture

Lab Architecture

Step Description
Review the Azure Logic App logic that simmulates the NYSE transaction stream sent to EventHubs
Save simmulated NYSE stock transaction messages into your data lake for future analysis (cold path)
Send stream of NYSE stock transaction messages to Stream Analytics for real-time analytics (hot path)
Incorporate Stock Company reference data into your stream processing logic
Visualize real-time data generated by Stream Analytics with Power BI

IMPORTANT: Some of the Azure services provisioned require globally unique name and a “-suffix” has been added to their names to ensure this uniqueness. Please take note of the suffix generated as you will need it for the following resources in this lab:

Name Type
synapsedatalakesuffix Storage Account
ADPEventHubs-suffix Event Hubs Namespace
SynapseStreamAnalytics-suffix Stream Analytics job

Review the Azure LogicApp implementation

In this section you will review the implementation of the LogicApp used to simmulate high-frequency stock purchase transactions. These transactions will be formatted as JSON messages and sent to Event Hubs for processing. All steps required to generate the stock transaction messages have alreay been done for you and no further changes are required in this section.

IMPORTANT
Execute these steps on your host computer
  1. In the Azure Portal, go to the lab resource group and locate the Logic App resource ADPLogicApp.

  2. On the ADPLogicApp menu, click Logic app designer to open the design panel.

  3. On the Logic app designer panel, note that the Recurrence trigger is set to execute every 1 second:

  4. The next two steps Initialize Config Settings and Parse Config Settings define the parameters used to generate the stock purchase messages.

  5. The Initialize messageCount step is used to initialize a variable used to count the number of messages generated. It's initial value is set to 1.

  6. In the Until numberOfMessages is achieved loop a piece of JavaScript is executed to generate the stock purchase transaction message. The message is then sent to the nysestocktrade Event Hub. The loop repeats its execution generating a random number of messages between 1 and 10 for every execution of the LogicApp.

  7. The format of each stock purchase transaction message generated looks like this:

{
  "StockTicker": "MSFT",
  "Quantity": 120,
  "Price": 83.63,
  "TradeTimestamp": "2019-11-24T00:21:50.207Z"
}
  1. Return to the Overview panel. Note that the LogicApp is disabled by default. Click Enable to enable the LogicApp to start firing every second.

Configure Capture for Event Hubs

In this section you will prepare Event Hubs to ingest NYSE stock trade messages generated by the LogicApp and save to your Synapse Data Lake account.

IMPORTANT
Execute these steps on your host computer
  1. In the Azure Portal, go to the lab resource group and locate the Event Hubs resource ADPEventHubs-suffix.

  2. On the Event Hubs panel, note that the nysestocktrade Event Hub has already been created for you. This is the same Event Hub you saw referenced by the LogicApp in the previous section.

  3. Click on the nysestocktrade Event Hub to open its settings. Then click on the Capture item on the left-hand side menu.

  4. Enter the following details:
    - Capture: On
    - Time window (minutes): 1
    - Do not emit empty files when no events occur during the capture time window: Checked.
    - Capture Provider: Azure Storage
    - Azure Storage Container: [select the nysestocktrade container in your synapsedatalakesuffix storage account]

  5. Leave remaining fields with their default values.

  6. Click Save Changes.

Optional: Investigate NYCStockTrades Container contents

  1. On your SynapseDataLake account, navigate to the NYCStockTrades container you created in the previous section.

  2. You should be able to see the folder structure created by Event Hubs Capture with AVRO files containing the individual stock purchase transaction messages generated by the LogicApp. These files can then be used in other analytics worlkloads whenever the granular detail about each individual transaction is required.

Configure Stream Analytics Job

In this section you will configure your Stream Analytics job to join hot and cold data streams and execute queries on data sent by Event Hubs and generate outputs to Power BI.

IMPORTANT
Execute these steps on your host computer
  1. In the Azure Portal, go to the lab resource group and locate the Stream Analytics resource SynapseStreamAnalytics-suffix.

  2. On the Inputs panel, click + Add stream input button and select Event Hub to create a new input stream.

  3. On the Event Hub New input blade enter the following details:
    - Input alias: NYSEStockTrades
    - Event Hub namespace: ADPEventHubs-suffix
    - Event Hub name > Use existing: nysestocktrade
    - Event Hub policy name > Use existing: RootManageSharedAccessKey
    - Event Hub consumer group > Use existing: $Default

  4. Leave remaining fields with their default values.

  5. Now click + Add reference input button and select SQL Database to create a new reference data input stream.

  6. On the SQL Database New input blade enter the following details:
    - Input alias: NYSEStockCompanies
    - Storage account for this job: [select your synapsedatalakesuffix storage account]
    - Select SQL Database from your subscriptions: Checked
    - Subscription: your Azure subscription
    - Database: NYCDataSets
    - Server name: operationalsql-suffix.database.windows.net
    - User name: adpadmin
    - Password: P@ssw0rd123!
    - Refresh periodically: Off
    - Snapshot query:

    select [StockTicker]
      ,[CompanyName]
    from [NYC].[NYSE_StockTickerLookup]

    This reference table contains static data about the companies:

  7. Click Save to save your reference input stream and return to the Inputs panel.

  8. Click on the Outputs panel on the left-hand side menu. Once it is loaded, click + Add button and select Power BI to create a new output stream.

  9. On the Power BI New Output blade, click Authorize to authenticate with Power BI. Enter your credentials to authenticate.

  10. Once authenticated, enter the following details:
    - Output alias: StockTradeByCompany
    - Authentication Mode: User Token
    - Group Workspace: My Workspace
    - Dataset name: StockTradeByCompany
    - Table name: StockTradeByCompany

IMPORTANT: Set Authentication Mode to User Token before you can select My Workspace for Group Workspace.

  1. Leave remaining fields with their default values.

  2. Click Save to save your output stream and return to the Outputs panel.

  3. Repeat the process to create another Power BI Output. This time enter the following details:
    - Output alias: StockTradeTotals
    - Authentication Mode: User Token
    - Group Workspace: My Workspace
    - Dataset name: StockTradeTotals
    - Table name: StockTradeTotals

  4. Click Save to save your output stream and return to the Outputs panel.

  5. On the Query panel, note the inputs and outputs you created in the previous steps.

  6. Enter the following SQL commands in the query window.

--Total amount traded broken down by company in the last 30 seconds and calculated every 5 seconds
SELECT
    Company.CompanyName
    , sum(Trade.Quantity * Trade.Price) as TradedAmount
    , System.Timestamp as WindowDateTime
INTO
    [StockTradeByCompany]
FROM
    [NYSEStockTrades] as Trade TIMESTAMP BY TradeTimestamp
    INNER JOIN [NYSEStockCompanies] as Company
        on Trade.StockTicker = Company.StockTicker
GROUP BY Company.CompanyName, HoppingWindow(second, 30, 5)

--Total amount traded and total number of transactions in the last 30 seconds and calculated every 5 seconds
SELECT
    sum(Trade.Quantity * Trade.Price) as TotalTradedAmount
    , count(*) as TotalTradeCount
    , System.Timestamp as WindowDateTime
INTO
    [StockTradeTotals]
FROM
    [NYSEStockTrades] as Trade TIMESTAMP BY TradeTimestamp
GROUP BY HoppingWindow(second, 30, 5)
  1. Click Save query.

  2. On the Overview panel, click Start to start the Stream Analytics job.

  3. On the Start job blade, select Now and click the Start button.

Create Power BI Dashboard to Visualise Real-Time Data

In this section you will log on to the Power BI portal to create a dashboard to visualize real-time stock transactions statistics data sent by Stream Analytics.

IMPORTANT
Execute these steps on your host computer
  1. Open a new browser tab and navigate to https://www.powerbi.com

  2. Enter your credentials to authenticate with the Power BI service.

  3. Once authenticated, open the Workspaces menu and click My Workspace at the top of the Workspaces list.

  4. Navigate to the Datasets tab and verify that two datasets have been created by Stream Analytics: StockTradeByCompany and StockTradeTotals.

  5. On the top right-hand side corner click + Create and then click Dashboard from the dropdown menu to create a new dashboard.

  6. Type NYSE Trade Activity in the Dashboard name field and click Create.

  7. Click on the (elipsis) ... button from the toolbar and then click on the + Add tile menu item.

  1. On the Add tile blade, select Custom Streaming Data under the Real-Time Data section.

  2. Click Next.

  3. On the Add a custom streaming data tile blade, select the StockTradeTotals dataset.

  4. Click Next.

  5. On the Visualization Type field select Card.

  6. On the Fields field select TotalTradedAmount.

  7. Click on the brush icon to set the Value decimal places field to 2.

  8. Click Next.

  9. On the Tile details blade, enter the following details:
    - Title: Total Traded Amount
    - Subtitle: in the last 30 seconds

  10. Leave remaining fields with their default values. Click Apply.

  11. Repeat the process to create another tile, this time to display the total trade count. Use the following details:
    - Dataset: StockTradeTotals
    - Visualization Type: Card
    - Fields: TotalTradeCount
    - Details > Title: Total Trade Count
    - Details > Subtitle: in the last 30 seconds

  12. You should be able to see the values for both tiles changing every 5 seconds.

  13. Repeat the process to create another tile, this time to display the historical values for TotalTradeCount over the last 5 minutes.
    - Dataset: StockTradeTotals
    - Visualization Type: Line Chart
    - Axis: WindowDateTime
    - Value: TotalTradeCount
    - Time window to display: 5 minutes
    - Details > Title: Total Trade Count
    - Details > Subtitle: 5 min history window

  14. Repeat the process to create another tile, this time to display the total traded amount broken down by company.
    - Dataset: StockTradeByCompany
    - Visualization Type: Clustered bar chart
    - Axis: CompanyName
    - Legend: CompanyName
    - Value: TradedAmount
    - Details > Title: Traded Amount by Company
    - Details > Subtitle: in the last 30 seconds

  15. Your real-time dashboard should look similar to the picture below. Every tile should be refreshed approximately every 5 seconds.