In this lab you will configure the Azure environment to allow relational data to be transferred from a SQL Server 2017 database to an Azure SQL Data Warehouse database using Azure Data Factory. The dataset you will use contains data about motor vehicle collisions that happened in New Your City from 2012 to 2019. You will use Power BI to visualise collision data loaded from Azure SQL Data Warehouse.
The estimated time to complete this lab is: 75 minutes.
IMPORTANT: Some of the Azure services provisioned require globally unique name and a “-suffix” has been added to their names to ensure this uniqueness. Please take note of the suffix generated as you will need it for the following resources in this lab:
Name | Type |
---|---|
MDWDataFactory-suffix | Data Factory (V2) |
mdwdatalakesuffix | Storage Account |
mdwsqlvirtualserver-suffix | SQL server |
In this section you are going to establish a Remote Desktop Connection to MDWDesktop virtual machine.
IMPORTANT |
---|
Execute these steps on your host computer |
-
In the Azure Portal, navigate to the lab resource group and click the MDWDesktop virtual machine.
-
On the MDWDesktop blade, from the Overview menu, click the Connect button.
-
On the Connect to virtual machine blade, click Download RDP File. This will download a .rdp file that you can use to establish a Remote Desktop Connection with the virtual machine.
In this section you are going to install Power BI Desktop and Azure Data Studio on MDWDesktop.
IMPORTANT |
---|
Execute these steps inside the MDWDesktop remote desktop connection |
-
Once the RDP file is downloaded, click on it to establish an RDP connection with MDWDesktop
-
User the following credentials to authenticate:
- User Name: MDWAdmin
- Password: P@ssw0rd123! -
Once logged in, accept the default privacy settings.
-
Using the browser, download and install the latest version of following software. During the setup, accept all default settings:
- Azure Data Studio (User Installer)
https://docs.microsoft.com/en-us/sql/azure-data-studio/download
Power BI Desktop (64-bit)
https://www.microsoft.com/en-us/download/details.aspx?id=45331
In this section you are going to connect to MDWSQLServer to restore the NYCDataSets database from backup stored in the MDWResources storage Account.
IMPORTANT |
---|
Execute these steps inside the MDWDesktop remote desktop connection |
-
Open Azure Data Studio and establish a new connection to MDWSQLServer using Windows Authentication
-
Press Ctrl+G to expand the Servers panel
-
Right-click the MDWSQLServer server name on the SERVERS panel and select New Query
-
On the Query Editor window, create a new credential named [https://mdwresources.blob.core.windows.net/nycdatasets] using a Shared Access Signature (SAS). Use this SQL command:
create credential [https://mdwresources.blob.core.windows.net/nycdatasets]
with identity = 'SHARED ACCESS SIGNATURE',
secret = 'sv=2018-03-28&ss=b&srt=sco&sp=rwl&se=2050-12-30T17:25:52Z&st=2019-04-05T09:25:52Z&spr=https&sig=4qrD8NmhaSmRFu2gKja67ayohfIDEQH3LdVMa2Utykc%3D'
go
- Restore the NYCDataSets database from the backup file stored in the Azure Storage Account. The backup file name is NYCDataSets.Full.bak. The restore command should move the data file to the ‘F:\Data’ folder and the log file to the ‘F:\Log’ folder. Use this SQL command:
restore database NYCDataSets from url = 'https://mdwresources.blob.core.windows.net/nycdatasets/NYCDataSets.Full.bak'
with move 'NYCDataSets' to 'F:\Data\NYCDataSets.mdf'
, move 'NYCDataSets_log' to 'F:\Log\NYCDataSets_log.ldf'
, stats = 5
go
IMPORTANT |
---|
In a classroom environment concurrent restore database operations cannot occur using the same backup file. If you get an error message when restoring the database, please change the backup file name to "NYCDataSets.Full_nn.bak", where nn is a number between 01 and 10. See example below: |
restore database NYCDataSets from url = 'https://mdwresources.blob.core.windows.net/nycdatasets/NYCDataSets.Full_01.bak'
with move 'NYCDataSets' to 'F:\Data\NYCDataSets.mdf'
, move 'NYCDataSets_log' to 'F:\Log\NYCDataSets_log.ldf'
, stats = 5
go
In this section you will connect to Azure SQL Data Warehouse to create the database objects used to host and process data.
IMPORTANT |
---|
Execute these steps inside the MDWDesktop remote desktop connection |
-
Open Azure Data Studio. On the Servers panel, click New Connection.
-
On the Connection Details panel, enter the following connection details:
- Server: mdwsqlvirtualserver-suffix.database.windows.net
- Authentication Type: SQL Login
- User Name: mdwadmin
- Password: P@ssw0rd123!
- Database: MDWASQLDW -
Click Connect.
-
Right-click the server name and click New Query.
-
On the new query window, create a new database schema named [NYC]. Use this SQL Command:
create schema [NYC]
go
- Create a new round robin distributed table named NYC.NYPD_MotorVehicleCollisions, see column definitions on the SQL Command:
create table [NYC].[NYPD_MotorVehicleCollisions](
[UniqueKey] [int] NULL,
[CollisionDate] [date] NULL,
[CollisionDayOfWeek] [varchar](9) NULL,
[CollisionTime] [time](7) NULL,
[CollisionTimeAMPM] [varchar](2) NOT NULL,
[CollisionTimeBin] [varchar](11) NULL,
[Borough] [varchar](200) NULL,
[ZipCode] [varchar](20) NULL,
[Latitude] [float] NULL,
[Longitude] [float] NULL,
[Location] [varchar](200) NULL,
[OnStreetName] [varchar](200) NULL,
[CrossStreetName] [varchar](200) NULL,
[OffStreetName] [varchar](200) NULL,
[NumberPersonsInjured] [int] NULL,
[NumberPersonsKilled] [int] NULL,
[IsFatalCollision] [int] NOT NULL,
[NumberPedestriansInjured] [int] NULL,
[NumberPedestriansKilled] [int] NULL,
[NumberCyclistInjured] [int] NULL,
[NumberCyclistKilled] [int] NULL,
[NumberMotoristInjured] [int] NULL,
[NumberMotoristKilled] [int] NULL,
[ContributingFactorVehicle1] [varchar](200) NULL,
[ContributingFactorVehicle2] [varchar](200) NULL,
[ContributingFactorVehicle3] [varchar](200) NULL,
[ContributingFactorVehicle4] [varchar](200) NULL,
[ContributingFactorVehicle5] [varchar](200) NULL,
[VehicleTypeCode1] [varchar](200) NULL,
[VehicleTypeCode2] [varchar](200) NULL,
[VehicleTypeCode3] [varchar](200) NULL,
[VehicleTypeCode4] [varchar](200) NULL,
[VehicleTypeCode5] [varchar](200) NULL
)
with (distribution = round_robin)
go
In this section you are going to install and configure required software onto MDWDataGateway.
IMPORTANT |
---|
Execute these steps on your host computer |
-
In the Azure Portal, navigate to the lab resource group and locate the Azure Data Factory MDWDataFactory-suffix.
-
On the MDWDataFactory-suffix blade, click the Author & Monitor button. The Azure Data Factory portal will open on a new browser tab.
-
On the Azure Data Factory portal, click the Author (pencil icon) button on the left-hand side menu. On the Connections tab, click Integration Runtimes.
-
Click the + New button to create a new Integration Runtime.
-
On the Integration Runtime Setup blade, select Perform data movement and dispatch activities to external computers and click Next.
-
When prompted to choose what network environment the integration runtime will connect to, select Self-Hosted and click Next.
-
Type MDWDataGateway in the Name text box and give it a meaningful description such as the example here. Click Next.
-
Copy any of the generated Authentication Key keys (Key 1 or Key 2) to Notepad. You are going to need it in the next step.
-
Click Finish.
In this section you are going to establish a Remote Desktop Connection to MDWDataGateway virtual machine.
IMPORTANT |
---|
Execute these steps on your host computer |
-
On the Azure Portal, navigate to the lab resource group and locate the MDWDataGateway virtual machine.
-
On the MDWDataGateway blade, from the Overview menu, click the Connect button.
-
On the Connect to virtual machine blade, click Download RDP File. This will download a .rdp file that you can use to establish a Remote Desktop Connection with the virtual machine.
-
Once the file is downloaded, click on file to establish the RDP connection with MDWDataGateway
-
User the following credentials to authenticate:
- User Name: MDWAdmin
- Password: P@ssw0rd123!
IMPORTANT |
---|
Execute these steps inside the MDWDataGateway remote desktop connection |
-
Once logged in, on the Server Manager, select Local Server on the left-hand side menu. On the right-hand side panel, locate the IE Enhanced Security Configuration and click the On link.
-
Turn the setting Off for both Administrators and Users.
-
Close Server Manager.
-
Open the browser and download and execute the latest version of the Azure Data Factory Integration Runtime.
Azure Data Factory Integration Runtime https://www.microsoft.com/en-ie/download/details.aspx?id=39717
-
Accept all default options during the setup wizard. Once the setup if finished, the Microsoft Integration Runtime Configuration Manager will pop up asking you to enter a valid authentication key.
-
Enter the authentication key generated in the previous exercise and click Register.
-
Once registration is confirmed, click Finish.
In this section you create a staging container in your MDWDataLake that will be used as a staging environment for Polybase before data can be copied to Azure SQL Data Warehouse.
IMPORTANT |
---|
Execute these steps on your host computer |
-
In the Azure Portal, go to the lab resource group and locate the Azure Storage account mdwdatalakesuffix.
-
On the Overview panel, click Blobs.
-
On the mdwdalalakesuffix – Blobs blade, click + Container.
-
On the New container blade, enter the following details:
- Name: polybase
- Public access level: Private (no anynymous access) -
Click OK to create the new container.
In this section you will build an Azure Data Factory pipeline to copy a table from MDWSQLServer to Azure SQL Data Warehouse.
IMPORTANT |
---|
Execute these steps on your host computer |
-
Open the Azure Data Factory portal and click the Author (pencil icon) option on the left-hand side panel. Under Connections tab, click Linked Services and then click + New to create a new linked service connection.
-
On the New Linked Service blade, type “SQL Server” in the search box to find the SQL Server linked service. Click Continue.
-
On the New Linked Service (SQL Server) blade, enter the following details:
- Name: MDWSQLServer_NYCDataSets
- Connect via integration runtime: MDWDataGateway
- Server Name: MDWSQLServer
- Database Name: NYCDataSets
- Authentication Type: Windows Authentication
- User Name: MDWAdmin
- Password: P@ssw0rd123! -
Click Test connection to make sure you entered the correct connection details and then click Finish.
-
Repeat the process to create an Azure SQL Data Warehouse linked service connection.
-
On the New Linked Service (Azure SQL Data Warehouse) blade, enter the following details:
- Name: MDWVirtualSQLServer_MDWASQLDW
- Connect via integration runtime: AutoResolveIntegrationRuntime
- Account selection method: From Azure subscription
- Azure subscription:
- Server Name: mdwsqlvirtualserver-suffix
- Database Name: MDWASQLDW
- Authentication Type: SQL Authentication
- User Name: MDWAdmin
- Password: P@ssw0rd123! -
Click Test connection to make sure you entered the correct connection details and then click Finish.
-
Repeat the process once again to create an Azure Blob Storage linked service connection.
-
On the New Linked Service (Azure Blob Storage) blade, enter the following details:
- Name: MDWDataLake
- Connect via integration runtime: AutoResolveIntegrationRuntime
- Authentication method: Account key
- Account selection method: From Azure subscription
- Azure subscription:
- Storage account name: mdwdatalakesuffix -
Click Test connection to make sure you entered the correct connection details and then click Finish.
-
You should now see 3 linked services connections that will be used as source, destination and staging.
IMPORTANT |
---|
Execute these steps on your host computer |
-
Open the Azure Data Factory portal and click the Author (pencil icon) option on the left-hand side panel. Under Factory Resources tab, click the ellipsis (…) next to Datasets and then click Add Dataset to create a new dataset.
-
Type SQL Server in the search box and select SQL Server. Click Finish.
-
On the New Data Set tab, enter the following details:
- Name: NYCDataSets_MotorVehicleCollisions
- Linked Service: MDWSQLServer_NYCDataSets
- Table: [NYC].[NYPD_MotorVehicleCollisions]Alternatively you can copy and paste the dataset JSON definition below:
{ "name": "NYCDataSets_MotorVehicleCollisions", "properties": { "linkedServiceName": { "referenceName": "MDWSQLServer_NYCDataSets", "type": "LinkedServiceReference" }, "type": "SqlServerTable", "typeProperties": { "tableName": "[NYC].[NYPD_MotorVehicleCollisions]" } }, "type": "Microsoft.DataFactory/factories/datasets" }
-
Leave remaining fields with default values and click Continue.
-
Repeat the process to create a new Azure SQL Data Warehouse data set.
-
On the New Data Set tab, enter the following details:
- Name: MDWASQLDW_MotorVehicleCollisions
- Linked Service: MDWSQLVirtualServer_MDWASQLDW
- Table: [NYC].[NYPD_MotorVehicleCollisions]Alternatively you can copy and paste the dataset JSON definition below:
{ "name": "MDWASQLDW_MotorVehicleCollisions", "properties": { "linkedServiceName": { "referenceName": "MDWSQLVirtualServer_MDWASQLDW", "type": "LinkedServiceReference" }, "type": "AzureSqlDWTable", "typeProperties": { "tableName": "[NYC].[NYPD_MotorVehicleCollisions]" } }, "type": "Microsoft.DataFactory/factories/datasets" }
-
Leave remaining fields with default values and click Continue.
-
Publish your dataset changes by clicking the Publish All button on the top of the screen.
IMPORTANT |
---|
Execute these steps on your host computer |
-
Open the Azure Data Factory portal and click the Author (pencil icon) option on the left-hand side panel. Under Factory Resources tab, click the ellipsis (…) next to Pipelines and then click Add Pipeline to create a new pipeline.
-
On the New Pipeline tab, enter the following details:
- General > Name: Copy Relational Data -
Leave remaining fields with default values.
-
From the Activities panel, type “Copy Data” in the search box. Drag the Copy Data activity on to the design surface.
-
Select the Copy Data activity and enter the following details:
- General > Name: CopyMotorVehicleCollisions
- Source > Source dataset: NYCDataSets_MotorVehicleCollisions
- Sink > Sink dataset: MDWASQLDW_MotorVehicleCollisions
- Sink > Allow PolyBase: Checked
- Settings > Enable staging: Checked
- Settings > Staging account linked service: MDWDataLake
- Settings > Storage Path: polybase -
Leave remaining fields with default values.
-
Publish your pipeline changes by clicking the Publish all button.
-
To execute the pipeline, click on Add trigger menu and then Trigger Now.
-
On the Pipeline Run blade, click Finish.
-
To monitor the execution of your pipeline, click on the Monitor menu on the left-hand side panel.
-
You should be able to see the Status of your pipeline execution on the right-hand side panel.
In this section you are going to use Power BI to visualize data from Azure SQL Data Warehouse. The Power BI report will use an Import connection to query Azure SQL Data Warehouse and visualise Motor Vehicle Collision data from the table you loaded in the previous exercise.
IMPORTANT |
---|
Execute these steps inside the MDWDesktop remote desktop connection |
-
On MDWDesktop, download the Power BI report from the link https://aka.ms/MDWLab1 and save it on the Desktop.
-
Open the file MDWLab1.pbit with Power BI Desktop. Optionally sign up for the Power BI tips and tricks email, or to dismiss this, click to sign in with an existing account, and then hit the escape key.
-
When prompted to enter the value of the MDWSQLVirtualServer parameter, type the full server name: mdwsqlvirtualserver-suffix.database.windows.net
-
Click Load, and then Run to acknowledge the Native Database Query message
-
When prompted, enter the Database credentials:
- User Name: MDWAdmin
- Password: P@ssw0rd123! -
Once the data is finished loading, interact with the report by changing the CollisionDate slicer and by clicking on the other visualisations.
-
Save your work and close Power BI Desktop.