Skip to content

Commit

Permalink
Lab 2 Completed
Browse files Browse the repository at this point in the history
  • Loading branch information
fabragaMS committed May 15, 2019
1 parent adcef9b commit b574e85
Show file tree
Hide file tree
Showing 6 changed files with 77 additions and 50 deletions.
98 changes: 49 additions & 49 deletions Lab/Lab1/Lab1.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Step | Description
![4](./Media/Black4.png) | Load data to an Azure SQL Data Warehouse table using Polybase
![5](./Media/Black5.png) | Visualize data from Azure SQL Data Warehouse using Power BI

**IMPORTANT**: Some of the Azure services provisioned by Lab0 require globally unique name and a “-suffix” has been added to their names to ensure this uniqueness. Please take note of the suffix generated as you will need it for the following resources:
**IMPORTANT**: Some of the Azure services provisioned require globally unique name and a “-suffix” has been added to their names to ensure this uniqueness. Please take note of the suffix generated as you will need it for the following resources:

Name |Type
-----------------------------|--------------------
Expand All @@ -33,16 +33,16 @@ In this section you are going to establish a Remote Desktop Connection to MDWDes
1. In the Azure Portal, navigate to the **MDW-Lab** resource group and click the **MDWDesktop** virtual machine.
2. On the MDWDesktop blade, from the Overview menu, click the Connect button.

![](./Media/Lab1-Image02.png)
![](./Media/Lab1-Image02.png)

3. On the **Connect to virtual machine** blade, click **Download RDP File**. This will download a .rdp file that you can use to establish a Remote Desktop Connection with the virtual machine.

![](./Media/Lab1-Image03.png)
![](./Media/Lab1-Image03.png)

## Install required software onto MDWDesktop
In this section you are going to install Power BI Desktop and Azure Data Studio on MDWDesktop.

![](./Media/Lab1-Image04.jpg)
![](./Media/Lab1-Image04.jpg)

**IMPORTANT**|
-------------|
Expand All @@ -66,20 +66,20 @@ In this section you are going to install Power BI Desktop and Azure Data Studio
## Restore NYCDataSets database onto MDWSQLServer
In this section you are going to connect to MDWSQLServer to restore the NYCDataSets database from backup stored in the MDWResources storage Account.

![](./Media/Lab1-Image07.jpg)
![](./Media/Lab1-Image07.jpg)

**IMPORTANT**|
-------------|
**Execute these steps inside the MDWDesktop remote desktop connection**|

1. Open Azure Data Studio and establish a new connection to MDWSQLServer using Windows Authentication

![](./Media/Lab1-Image08.png)
![](./Media/Lab1-Image08.png)

2. Press **Ctrl+G** to expand the Servers panel
3. Right-click the **MDWSQLServer** server name on the SERVERS panel and select **New Query**

![](./Media/Lab1-Image09.png)
![](./Media/Lab1-Image09.png)

4. On the **Query Editor** window, create a new credential named [https://mdwresources.blob.core.windows.net/nycdatasets] using a Shared Access Signature (SAS). Use this SQL command:

Expand All @@ -101,15 +101,15 @@ go
## Create Azure SQL Data Warehouse database objects
In this section you will connect to Azure SQL Data Warehouse to create the database objects used to host and process data.

![](./Media/Lab1-Image10.png)
![](./Media/Lab1-Image10.png)

**IMPORTANT**|
-------------|
**Execute these steps inside the MDWDesktop remote desktop connection**|

1. Open Azure Data Studio. On the Servers panel, click **New Connection**.

![](./Media/Lab1-Image11.png)
![](./Media/Lab1-Image11.png)

2. On the **Connection Details** panel, enter the following connection details:
<br> - **Server**: mdwsqlvirtualserver-suffix.database.windows.net
Expand All @@ -120,7 +120,7 @@ In this section you will connect to Azure SQL Data Warehouse to create the datab
3. Click **Connect**.
4. Right-click the server name and click **New Query**.

![](./Media/Lab1-Image12.png)
![](./Media/Lab1-Image12.png)

5. On the new query window, create a new database schema named [NYC]. Use this SQL Command:

Expand Down Expand Up @@ -181,34 +181,34 @@ In this section you are going to install and configure required software onto MD
1. In the Azure Portal, navigate to the MDW-Lab resource group and locate the Azure Data Factory **MDWDataFactory-*suffix***.
2. On the **MDWDataFactory-*suffix*** blade, click the **Author & Monitor** button. The Azure Data Factory portal will open on a new browser tab.

![](./Media/Lab1-Image13.png)
![](./Media/Lab1-Image13.png)

3. On the **Azure Data Factory** portal, click the **Author *(pencil icon)*** button on the left-hand side menu. On the **Connections** tab, click **Integration Runtimes**.
4. Click the **+ New** button to create a new Integration Runtime.

![](./Media/Lab1-Image14.png)
![](./Media/Lab1-Image14.png)

5. On the **Integration Runtime Setup** blade, select **Perform data movement and dispatch activities to external computers** and click **Next**.

![](./Media/Lab1-Image15.png)
![](./Media/Lab1-Image15.png)

6. When prompted to choose what network environment the integration runtime will connect to, select **Self-Hosted** and click **Next**.

![](./Media/Lab1-Image16.png)
![](./Media/Lab1-Image16.png)

7. Type MDWDataGateway in the **Name** text box and give it a meaningful description such as the example here. Click **Next**.

![](./Media/Lab1-Image17.png)
![](./Media/Lab1-Image17.png)

8. Copy any of the generated **Authentication Key** keys (Key 1 or Key 2) to Notepad. You are going to need it in the next step.
9. Click **Finish**.

![](./Media/Lab1-Image18.png)
![](./Media/Lab1-Image18.png)

## Connect to MDWDataGateway and register the Self Hosted Integration Runtime with Azure Data Factory
In this section you are going to establish a Remote Desktop Connection to MDWDataGateway virtual machine.

![](./Media/Lab1-Image19.png)
![](./Media/Lab1-Image19.png)

**IMPORTANT**|
-------------|
Expand All @@ -217,11 +217,11 @@ In this section you are going to establish a Remote Desktop Connection to MDWDat
1. On the Azure Portal, navigate to the MDW-Lab resource group and locate the **MDWDataGateway** virtual machine.
2. On the **MDWDataGateway** blade, from the **Overview** menu, click the **Connect** button.

![](./Media/Lab1-Image20.png)
![](./Media/Lab1-Image20.png)

3. On the **Connect to virtual machine** blade, click **Download RDP File**. This will download a .rdp file that you can use to establish a Remote Desktop Connection with the virtual machine.

![](./Media/Lab1-Image21.png)
![](./Media/Lab1-Image21.png)

4. Once the file is downloaded, click on file to establish the RDP connection with MDWDataGateway
5. User the following credentials to authenticate:
Expand All @@ -236,7 +236,7 @@ In this section you are going to establish a Remote Desktop Connection to MDWDat
2. Turn the setting **Off** for both **Administrators** and **Users**.
3. Close **Server Manager**.

![](./Media/Lab1-Image22.png)
![](./Media/Lab1-Image22.png)

4. Open the browser and download and execute the latest version of the Azure Data Factory Integration Runtime.

Expand All @@ -247,12 +247,12 @@ https://www.microsoft.com/en-ie/download/details.aspx?id=39717
6. Enter the authentication key generated in the previous exercise and click Register.
7. Once registration is confirmed, click Finish.

![](./Media/Lab1-Image23.png)
![](./Media/Lab1-Image23.png)

## Create Staging Container on Azure Blob Storage
In this section you create a staging container in your MDWDataLake that will be used as a staging environment for Polybase before data can be copied to Azure SQL Data Warehouse.

![](./Media/Lab1-Image24.jpg)
![](./Media/Lab1-Image24.jpg)

**IMPORTANT**|
-------------|
Expand All @@ -261,23 +261,23 @@ In this section you create a staging container in your MDWDataLake that will be
1. In the Azure Portal, go to the lab resource group and locate the Azure Storage account **mdwdatalake*suffix***.
2. On the **Overview** panel, click **Blobs**.

![](./Media/Lab1-Image25.png)
![](./Media/Lab1-Image25.png)

3. On the **mdwdalalake*suffix* – Blobs** blade, click **+ Container**.

![](./Media/Lab1-Image26.png)
![](./Media/Lab1-Image26.png)

4. On the **New container** blade, enter the following details:
<br>- **Name**: polybase
<br>- **Public access level**: Private (no anynymous access)
5. Click **OK** to create the new container.

![](./Media/Lab1-Image27.png)
![](./Media/Lab1-Image27.png)

## Create Azure Data Factory Pipeline to Copy Relational Data
In this section you will build an Azure Data Factory pipeline to copy a table from MDWSQLServer to Azure SQL Data Warehouse.

![](./Media/Lab1-Image28.jpg)
![](./Media/Lab1-Image28.jpg)

### Create Linked Service connections

Expand All @@ -287,11 +287,11 @@ In this section you will build an Azure Data Factory pipeline to copy a table fr

1. Open the **Azure Data Factory** portal and click the **Author *(pencil icon)*** option on the left-hand side panel. Under **Connections** tab, click **Linked Services** and then click **+ New** to create a new linked service connection.

![](./Media/Lab1-Image29.png)
![](./Media/Lab1-Image29.png)

2. On the **New Linked Service** blade, type “SQL Server” in the search box to find the **SQL Server** linked service. Click **Continue**.

![](./Media/Lab1-Image30.png)
![](./Media/Lab1-Image30.png)

3. On the **New Linked Service (SQL Server)** blade, enter the following details:
<br>- **Name**: MDWSQLServer_NYCDataSets
Expand All @@ -303,11 +303,11 @@ In this section you will build an Azure Data Factory pipeline to copy a table fr
<br>- **Password**: P@ssw0rd123!
4. Click **Test connection** to make sure you entered the correct connection details and then click **Finish**.

![](./Media/Lab1-Image31.png)
![](./Media/Lab1-Image31.png)

5. Repeat the process to create an **Azure SQL Data Warehouse** linked service connection.

![](./Media/Lab1-Image32.png)
![](./Media/Lab1-Image32.png)

6. On the New Linked Service (Azure SQL Data Warehouse) blade, enter the following details:
<br>- **Name**: MDWVirtualSQLServer_MDWASQLDW
Expand All @@ -321,11 +321,11 @@ In this section you will build an Azure Data Factory pipeline to copy a table fr
<br>- **Password**: P@ssw0rd123!
7. Click **Test connection** to make sure you entered the correct connection details and then click **Finish**.

![](./Media/Lab1-Image33.png)
![](./Media/Lab1-Image33.png)

8. Repeat the process once again to create an **Azure Blob Storage** linked service connection.

![](./Media/Lab1-Image34.png)
![](./Media/Lab1-Image34.png)

9. On the **New Linked Service (Azure Blob Storage)** blade, enter the following details:
- <br>**Name**: MDWDataLake
Expand All @@ -336,11 +336,11 @@ In this section you will build an Azure Data Factory pipeline to copy a table fr
- <br>**Storage account name**: mdwdatalake*suffix*
10. Click **Test connection** to make sure you entered the correct connection details and then click **Finish**.

![](./Media/Lab1-Image35.png)
![](./Media/Lab1-Image35.png)

11. You should now see 3 linked services connections that will be used as source, destination and staging.

![](./Media/Lab1-Image36.png)
![](./Media/Lab1-Image36.png)

### Create Source and Destination Data Sets

Expand All @@ -350,35 +350,35 @@ In this section you will build an Azure Data Factory pipeline to copy a table fr

1. Open the **Azure Data Factory** portal and click the **Author *(pencil icon)*** option on the left-hand side panel. Under **Factory Resources** tab, click the ellipsis **(…)** next to **Datasets** and then click **Add Dataset** to create a new dataset.

![](./Media/Lab1-Image37.png)
![](./Media/Lab1-Image37.png)

2. Type SQL Server in the search box and select **SQL Server**. Click **Finish**.

![](./Media/Lab1-Image38.png)
![](./Media/Lab1-Image38.png)

3. On the **New Data Set** tab, enter the following details:
<br>- **Name**: NYCDataSets_MotorVehicleCollisions
<br>- **Linked Service**: MDWSQLServer_NYCDataSets
<br>- **Table**: [NYC].[NYPD_MotorVehicleCollisions]
4. Leave remaining fields with default values and click **Continue**.

![](./Media/Lab1-Image39.png)
![](./Media/Lab1-Image39.png)

5. Repeat the process to create a new **Azure SQL Data Warehouse** data set.

![](./Media/Lab1-Image40.png)
![](./Media/Lab1-Image40.png)

6. On the **New Data Set** tab, enter the following details:
<br>- **Name**: MDWASQLDW_MotorVehicleCollisions
<br>- **Linked Service**: MDWSQLVirtualServer_MDWASQLDW
<br>- **Table**: [NYC].[NYPD_MotorVehicleCollisions]
7. Leave remaining fields with default values and click Continue.

![](./Media/Lab1-Image41.png)
![](./Media/Lab1-Image41.png)

8. Publish your dataset changes by clicking the Publish All button on the top of the screen.

![](./Media/Lab1-Image42.png)
![](./Media/Lab1-Image42.png)

### Create and Execute Pipeline

Expand All @@ -391,7 +391,7 @@ In this section you will build an Azure Data Factory pipeline to copy a table fr
<br>- **General > Name**: Copy Relational Data
3. Leave remaining fields with default values.

![](./Media/Lab1-Image43.png)
![](./Media/Lab1-Image43.png)

4. From the **Activities** panel, type “Copy Data” in the search box. Drag the **Copy Data** activity on to the design surface.
5. Select the **Copy Data** activity and enter the following details:
Expand All @@ -404,23 +404,23 @@ In this section you will build an Azure Data Factory pipeline to copy a table fr
<br>- **Settings > Storage Path**: polybase
6. Leave remaining fields with default values.

![](./Media/Lab1-Image44.png)
![](./Media/Lab1-Image45.png)
![](./Media/Lab1-Image46.png)
![](./Media/Lab1-Image44.png)
![](./Media/Lab1-Image45.png)
![](./Media/Lab1-Image46.png)

7. Publish your pipeline changes by clicking the **Publish all** button.

![](./Media/Lab1-Image47.png)
![](./Media/Lab1-Image47.png)

8. To execute the pipeline, click on **Add trigger** menu and then **Trigger Now**.
9. On the **Pipeline Run** blade, click **Finish**.

![](./Media/Lab1-Image48.png)
![](./Media/Lab1-Image48.png)

10. To monitor the execution of your pipeline, click on the **Monitor** menu on the left-hand side panel.
11. You should be able to see the Status of your pipeline execution on the right-hand side panel.

![](./Media/Lab1-Image49.png)
![](./Media/Lab1-Image49.png)

## Visualize Data with Power BI
In this section you are going to use Power BI to visualize data from Azure SQL Data Warehouse. The Power BI report will use an Import connection to query Azure SQL Data Warehouse and visualise Motor Vehicle Collision data from the table you loaded in the previous exercise.
Expand All @@ -437,9 +437,9 @@ In this section you are going to use Power BI to visualize data from Azure SQL D
<br>- **User Name**: MDWAdmin
<br>- **Password**: P@ssw0rd123!

![](./Media/Lab1-Image50.png)
![](./Media/Lab1-Image50.png)

6. Once the data is finished loading, interact with the report by changing the CollisionDate slicer and by clicking on the other visualisations.
7. Save your work and close Power BI Desktop.

![](./Media/Lab1-Image51.png)
![](./Media/Lab1-Image51.png)
27 changes: 27 additions & 0 deletions Lab/Lab2/Lab2.md
Original file line number Diff line number Diff line change
Expand Up @@ -710,3 +710,30 @@ In this section you create a data factory pipeline to copy data in the follwng s

![](./Media/Lab2-Image36.png)
![](./Media/Lab2-Image37.png)

## Visualize Data with Power BI
In this section you are going to use Power BI to visualize data from Azure SQL Data Warehouse. The Power BI report will use an Import connection to query Azure SQL Data Warehouse and visualise Motor Vehicle Collision data from the table you loaded in the previous exercise.

**IMPORTANT**|
-------------|
**Execute these steps inside the MDWDesktop remote desktop connection**|

1. On MDWDesktop, download the Power BI report from the link https://aka.ms/MDWLab2 and save it in the Desktop.
2. Open the file MDWLab2.pbit with Power BI Desktop.
3. When prompted to enter the value of the MDWSQLVirtualServer parameter, type the full server name: **mdwsqlvirtualserver-*suffix*.database.windows.net**
4. Click **Load**.

![](./Media/Lab2-Image38.png)

5. When prompted to enter credentials, select **Database** from the left-hand side panel and enter the following details:
<br>- **User name**: mdwadmin
<br>- **Password**: P@ssw0rd123!
6. Leave remaining fields with their default values.
7. Click **Connect**.

![](./Media/Lab2-Image39.png)

8. Once data finish loading interact with the report by changing the PickUpDate slicer and by clicking on the other visualisations.
9. Save your work and close Power BI Desktop.

![](./Media/Lab2-Image40.png)
Binary file added Lab/Lab2/Media/Lab2-Image38.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image39.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image40.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Step | Description
![4](./Media/Black4.png) | Load data to an Azure SQL Data Warehouse table using Polybase
![5](./Media/Black5.png) | Visualize data from Azure SQL Data Warehouse using Power BI

### Lab 2: Transform Big Data using Azure Data Factory and Azure SQL Data Warehouse
### [Lab 2: Transform Big Data using Azure Data Factory and Azure SQL Data Warehouse](./Lab/Lab2/Lab2.md)
In this lab you will use Azure Data Factory to download large data files into your data lake and use an Azure SQL Data Warehouse stored procedure to generate a summary dataset and store it in the final table. The dataset you will use contains detailed New York City Yellow Taxi rides for 2018. You will generate a daily aggregated summary of all rides and save the result in your data warehouse. You will then use Power BI to visualise summarised data.

The estimated time to complete this lab is: **45 minutes**.
Expand Down

0 comments on commit b574e85

Please sign in to comment.