Skip to content

Commit 9cc6a61

Browse files
committed
Create Lab1-CloudLabs.md
1 parent a810b22 commit 9cc6a61

File tree

1 file changed

+370
-0
lines changed

1 file changed

+370
-0
lines changed

Lab/Lab1/Lab1-CloudLabs.md

+370
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,370 @@
1+
# Lab 1: Load Data into Azure Synapse Analytics using Azure Data Factory Pipelines
2+
In this lab you will configure the Azure environment to allow relational data to be transferred from an Azure SQL Database to an Azure Synapse Analytics data warehouse using Azure Data Factory. The dataset you will use contains data about motor vehicle collisions that happened in New Your City from 2012 to 2019. You will use Power BI to visualise collision data loaded from your Azure Synapse Analytics data warehouse.
3+
4+
The estimated time to complete this lab is: **45 minutes**.
5+
6+
## Microsoft Learn & Technical Documentation
7+
8+
The following Azure services will be used in this lab. If you need further training resources or access to technical documentation please find in the table below links to Microsoft Learn and to each service's Technical Documentation.
9+
10+
Azure Service | Microsoft Learn | Technical Documentation|
11+
--------------|-----------------|------------------------|
12+
Azure SQL Database | [Work with relational data in Azure](https://docs.microsoft.com/en-us/learn/paths/work-with-relational-data-in-azure/) | [Azure SQL Database Technical Documentation](https://docs.microsoft.com/en-us/azure/sql-database/)
13+
Azure Data Factory | [Data ingestion with Azure Data Factory](https://docs.microsoft.com/en-us/learn/modules/data-ingestion-with-azure-data-factory/)| [Azure Data Factory Technical Documentation](https://docs.microsoft.com/en-us/azure/data-factory/)
14+
Azure Synapse Analytics | [Implement a Data Warehouse with Azure Synapse Analytics](https://docs.microsoft.com/en-us/learn/paths/implement-sql-data-warehouse/) | [Azure Synapse Analytics Technical Documentation](https://docs.microsoft.com/en-us/azure/sql-data-warehouse/)
15+
Azure Data Lake Storage Gen2 | [Large Scale Data Processing with Azure Data Lake Storage Gen2](https://docs.microsoft.com/en-us/learn/paths/data-processing-with-azure-adls/) | [Azure Data Lake Storage Gen2 Technical Documentation](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
16+
17+
## Lab Architecture
18+
![Lab Architecture](./Media/Lab1-Image01.png)
19+
20+
Step | Description
21+
-------- | -----
22+
![1](./Media/Black1.png) | Build an Azure Data Factory Pipeline to copy data from an Azure SQL Database table
23+
![2](./Media/Black2.png) | Use Azure Data Lake Storage Gen2 as a staging area for Polybase
24+
![3](./Media/Black3.png) | Load data to an Azure Synapse Analytics table using Polybase
25+
![4](./Media/Black4.png) | Visualize data from Azure Synapse Analytics using Power BI
26+
27+
**IMPORTANT**: Some of the Azure services provisioned require globally unique name and a “-suffix” has been added to their names to ensure this uniqueness. Please take note of the suffix generated as you will need it for the following resources in this lab:
28+
29+
Name |Type
30+
-----------------------------|--------------------
31+
SynapseDataFactory-*suffix* |Data Factory (V2)
32+
synapsedatalake*suffix* |Data Lake Storage Gen2
33+
synapsesql-*suffix* |SQL server
34+
operationalsql-*suffix* |SQL server
35+
36+
## Install required software onto ADPDesktop
37+
In this section you are going to install Power BI Desktop and Azure Data Studio on ADPDesktop.
38+
39+
![](./Media/Lab1-Image04.jpg)
40+
41+
**IMPORTANT**|
42+
-------------|
43+
**Execute these steps inside the ADPDesktop remote desktop connection**|
44+
45+
1. Once logged on to the VM, accept the default privacy settings.
46+
47+
2. Using the browser, download and install the latest version of following software. During the setup, accept all default settings:
48+
<br>
49+
<br> **Azure Data Studio (User Installer)**
50+
<br>https://docs.microsoft.com/en-us/sql/azure-data-studio/download
51+
<br>![](./Media/Lab1-Image05.png)
52+
<br>
53+
<br>**Power BI Desktop (64-bit)**
54+
<br>https://aka.ms/pbiSingleInstaller
55+
<br>![](./Media/Lab1-Image06.png)
56+
57+
## Create Azure Synapse Analytics data warehouse objects
58+
In this section you will connect to Azure Synapse Analytics to create the database objects used to host and process data.
59+
60+
![](./Media/Lab1-Image10.png)
61+
62+
**IMPORTANT**|
63+
-------------|
64+
**Execute these steps inside the ADPDesktop remote desktop connection**|
65+
66+
1. Open Azure Data Studio. On the Servers panel, click **New Connection**.
67+
68+
![](./Media/Lab1-Image11.png)
69+
70+
2. On the **Connection Details** panel, enter the following connection details:
71+
<br> - **Server**: synapsesql-*suffix*.database.windows.net
72+
<br>- **Authentication Type**: SQL Login
73+
<br>- **User Name**: ADPAdmin
74+
<br>- **Password**: P@ssw0rd123!
75+
<br>- **Database**: SynapseDW
76+
77+
3. Click **Connect**.
78+
79+
![](./Media/Lab1-Image08.png)
80+
81+
4. Right-click the server name and click **New Query**.
82+
83+
![](./Media/Lab1-Image12.png)
84+
85+
5. On the new query window, create a new database schema named [NYC]. Use this SQL Command:
86+
87+
```sql
88+
create schema [NYC]
89+
go
90+
```
91+
92+
6. Create a new round robin distributed table named NYC.NYPD_MotorVehicleCollisions, see column definitions on the SQL Command:
93+
94+
```sql
95+
create table [NYC].[NYPD_MotorVehicleCollisions](
96+
[UniqueKey] [int] NULL,
97+
[CollisionDate] [date] NULL,
98+
[CollisionDayOfWeek] [varchar](9) NULL,
99+
[CollisionTime] [time](7) NULL,
100+
[CollisionTimeAMPM] [varchar](2) NOT NULL,
101+
[CollisionTimeBin] [varchar](11) NULL,
102+
[Borough] [varchar](200) NULL,
103+
[ZipCode] [varchar](20) NULL,
104+
[Latitude] [float] NULL,
105+
[Longitude] [float] NULL,
106+
[Location] [varchar](200) NULL,
107+
[OnStreetName] [varchar](200) NULL,
108+
[CrossStreetName] [varchar](200) NULL,
109+
[OffStreetName] [varchar](200) NULL,
110+
[NumberPersonsInjured] [int] NULL,
111+
[NumberPersonsKilled] [int] NULL,
112+
[IsFatalCollision] [int] NOT NULL,
113+
[NumberPedestriansInjured] [int] NULL,
114+
[NumberPedestriansKilled] [int] NULL,
115+
[NumberCyclistInjured] [int] NULL,
116+
[NumberCyclistKilled] [int] NULL,
117+
[NumberMotoristInjured] [int] NULL,
118+
[NumberMotoristKilled] [int] NULL,
119+
[ContributingFactorVehicle1] [varchar](200) NULL,
120+
[ContributingFactorVehicle2] [varchar](200) NULL,
121+
[ContributingFactorVehicle3] [varchar](200) NULL,
122+
[ContributingFactorVehicle4] [varchar](200) NULL,
123+
[ContributingFactorVehicle5] [varchar](200) NULL,
124+
[VehicleTypeCode1] [varchar](200) NULL,
125+
[VehicleTypeCode2] [varchar](200) NULL,
126+
[VehicleTypeCode3] [varchar](200) NULL,
127+
[VehicleTypeCode4] [varchar](200) NULL,
128+
[VehicleTypeCode5] [varchar](200) NULL
129+
)
130+
with (distribution = round_robin)
131+
go
132+
```
133+
134+
## Create Azure Data Factory Pipeline to Copy Relational Data
135+
In this section you will build an Azure Data Factory pipeline to copy a table from NYCDataSets database to Azure Synapse Analytics data warehouse.
136+
137+
![](./Media/Lab1-Image28.jpg)
138+
139+
### Create Linked Service connections
140+
141+
**IMPORTANT**|
142+
-------------|
143+
**Execute these steps on your host computer**|
144+
145+
1. In the Azure Portal, go to the lab resource group and locate the Azure Data Factory resource **SynapseDataFactory-*suffix***.
146+
147+
2. On the **Overview** panel, click **Author & Monitor**. The **Azure Data Factory** portal will open in a new browser tab.
148+
149+
![](./Media/Lab1-Image55.png)
150+
151+
152+
3. In the **Azure Data Factory** portal and click the **Manage *(toolcase icon)*** option on the left-hand side panel. Under **Linked services** menu item, click **+ New** to create a new linked service connection.
153+
154+
![](./Media/Lab1-Image29.png)
155+
156+
2. On the **New Linked Service** blade, type “Azure SQL Database” in the search box to find the **Azure SQL Database** linked service. Click **Continue**.
157+
158+
![](./Media/Lab1-Image30.png)
159+
160+
3. On the **New Linked Service (Azure SQL Database)** blade, enter the following details:
161+
<br>- **Name**: OperationalSQL_NYCDataSets
162+
<br>- **Account selection method**: From Azure subscription
163+
<br>- **Azure subscription**: *[your subscription]*
164+
<br>- **Server Name**: operationalsql-*suffix*
165+
<br>- **Database Name**: NYCDataSets
166+
<br>- **Authentication** Type: SQL Authentication
167+
<br>- **User** Name: ADPAdmin
168+
<br>- **Password**: P@ssw0rd123!
169+
170+
4. Click **Test connection** to make sure you entered the correct connection details and then click **Finish**.
171+
172+
![](./Media/Lab1-Image31.png)
173+
174+
5. Repeat the process to create an **Azure Synapse Analytics** linked service connection.
175+
176+
![](./Media/Lab1-Image32.png)
177+
178+
6. On the New Linked Service (Azure Synapse Analytics) blade, enter the following details:
179+
<br>- **Name**: SynapseSQL_SynapseDW
180+
<br>- **Connect via integration runtime**: AutoResolveIntegrationRuntime
181+
<br>- **Account selection method**: From Azure subscription
182+
<br>- **Azure subscription**: *[your subscription]*
183+
<br>- **Server Name**: synapsesql-*suffix*
184+
<br>- **Database Name**: SynapseDW
185+
<br>- **Authentication** Type: SQL Authentication
186+
<br>- **User** Name: ADPAdmin
187+
<br>- **Password**: P@ssw0rd123!
188+
7. Click **Test connection** to make sure you entered the correct connection details and then click **Finish**.
189+
190+
![](./Media/Lab1-Image33.png)
191+
192+
8. Repeat the process once again to create an **Azure Blob Storage** linked service connection.
193+
194+
![](./Media/Lab1-Image34.png)
195+
196+
9. On the **New Linked Service (Azure Blob Storage)** blade, enter the following details:
197+
<br>- **Name**: synapsedatalake
198+
<br>- **Connect via integration runtime**: AutoResolveIntegrationRuntime
199+
<br>- **Authentication method**: Account key
200+
<br>- **Account selection method**: From Azure subscription
201+
<br>- **Azure subscription**: *[your subscription]*
202+
<br>- **Storage account name**: synapsedatalake*suffix*
203+
10. Click **Test connection** to make sure you entered the correct connection details and then click **Finish**.
204+
205+
![](./Media/Lab1-Image35.png)
206+
207+
11. You should now see 3 linked services connections that will be used as source, destination and staging.
208+
209+
![](./Media/Lab1-Image36.png)
210+
211+
### Create Source and Destination Data Sets
212+
213+
**IMPORTANT**|
214+
-------------|
215+
**Execute these steps on your host computer**|
216+
217+
1. Open the **Azure Data Factory** portal and click the **Author *(pencil icon)*** option on the left-hand side panel. Under **Factory Resources** tab, click the ellipsis **(…)** next to **Datasets** and then click **New Dataset** to create a new dataset.
218+
219+
![](./Media/Lab1-Image37.png)
220+
221+
2. Type "Azure SQL Database" in the search box and select **Azure SQL Database**. Click **Finish**.
222+
223+
![](./Media/Lab1-Image38.png)
224+
225+
3. On the **New Data Set** tab, enter the following details:
226+
<br>- **Name**: NYCDataSets_MotorVehicleCollisions
227+
<br>- **Linked Service**: OperationalSQL_NYCDataSets
228+
<br>- **Table**: [NYC].[NYPD_MotorVehicleCollisions]
229+
230+
Alternatively you can copy and paste the dataset JSON definition below:
231+
232+
```json
233+
{
234+
"name": "NYCDataSets_MotorVehicleCollisions",
235+
"properties": {
236+
"linkedServiceName": {
237+
"referenceName": "OperationalSQL_NYCDataSets",
238+
"type": "LinkedServiceReference"
239+
},
240+
"folder": {
241+
"name": "Lab1"
242+
},
243+
"annotations": [],
244+
"type": "AzureSqlTable",
245+
"schema": [],
246+
"typeProperties": {
247+
"schema": "NYC",
248+
"table": "NYPD_MotorVehicleCollisions"
249+
}
250+
}
251+
}
252+
```
253+
254+
4. Leave remaining fields with default values and click **Continue**.
255+
256+
![](./Media/Lab1-Image39.png)
257+
258+
5. Repeat the process to create a new **Azure Synapse Analytics** data set.
259+
260+
![](./Media/Lab1-Image40.png)
261+
262+
6. On the **New Data Set** tab, enter the following details:
263+
<br>- **Name**: SynapseDW_MotorVehicleCollisions
264+
<br>- **Linked Service**: SynapseSQL_SynapseDW
265+
<br>- **Table**: [NYC].[NYPD_MotorVehicleCollisions]
266+
267+
Alternatively you can copy and paste the dataset JSON definition below:
268+
269+
```json
270+
{
271+
"name": "SynapseDW_MotorVehicleCollisions",
272+
"properties": {
273+
"linkedServiceName": {
274+
"referenceName": "SynapseSQL_SynapseDW",
275+
"type": "LinkedServiceReference"
276+
},
277+
"folder": {
278+
"name": "Lab1"
279+
},
280+
"annotations": [],
281+
"type": "AzureSqlDWTable",
282+
"schema": [],
283+
"typeProperties": {
284+
"schema": "NYC",
285+
"table": "NYPD_MotorVehicleCollisions"
286+
}
287+
}
288+
}
289+
```
290+
291+
7. Leave remaining fields with default values and click **Continue**.
292+
293+
![](./Media/Lab1-Image41.png)
294+
295+
8. Under **Factory Resources** tab, click the ellipsis **(…)** next to **Datasets** and then click **New folder** to create a new Folder. Name it **Lab1**.
296+
297+
9. Drag the two datasets created into the **Lab1** folder you just created.
298+
299+
![](./Media/Lab1-Image53.png)
300+
301+
10. Publish your dataset changes by clicking the **Publish All** button on the top of the screen.
302+
303+
![](./Media/Lab1-Image42.png)
304+
305+
### Create and Execute Pipeline
306+
307+
**IMPORTANT**|
308+
-------------|
309+
**Execute these steps on your host computer**|
310+
311+
1. Open the **Azure Data Factory** portal and click the **Author *(pencil icon)*** option on the left-hand side panel. Under **Factory Resources** tab, click the ellipsis **(…)** next to **Pipelines** and then click **New Pipeline** to create a new pipeline.
312+
2. On the **New Pipeline** tab, enter the following details:
313+
<br>- **General > Name**: Lab1 - Copy Collision Data
314+
3. Leave remaining fields with default values.
315+
316+
![](./Media/Lab1-Image43.png)
317+
318+
4. From the **Activities** panel, type “Copy Data” in the search box. Drag the **Copy Data** activity on to the design surface.
319+
5. Select the **Copy Data** activity and enter the following details:
320+
<br>- **General > Name**: CopyMotorVehicleCollisions
321+
<br>- **Source > Source dataset**: NYCDataSets_MotorVehicleCollisions
322+
<br>- **Sink > Sink dataset**: SynapseDW_MotorVehicleCollisions
323+
<br>- **Sink > Allow PolyBase**: Checked
324+
<br>- **Settings > Enable staging**: Checked
325+
<br>- **Settings > Staging account linked service**: synapsedatalake
326+
<br>- **Settings > Storage Path**: polybase
327+
6. Leave remaining fields with default values.
328+
329+
![](./Media/Lab1-Image44.png)
330+
![](./Media/Lab1-Image45.png)
331+
![](./Media/Lab1-Image46.png)
332+
333+
7. Publish your pipeline changes by clicking the **Publish all** button.
334+
335+
![](./Media/Lab1-Image47.png)
336+
337+
8. To execute the pipeline, click on **Add trigger** menu and then **Trigger Now**.
338+
9. On the **Pipeline Run** blade, click **Finish**.
339+
340+
![](./Media/Lab1-Image48.png)
341+
342+
10. To monitor the execution of your pipeline, click on the **Monitor** menu on the left-hand side panel.
343+
11. You should be able to see the Status of your pipeline execution on the right-hand side panel.
344+
345+
![](./Media/Lab1-Image49updated.png)
346+
347+
## Visualize Data with Power BI
348+
In this section you are going to use Power BI to visualize data from Azure Synapse Analytics. The Power BI report will use an Import connection to query Azure Synapse Analytics and visualise Motor Vehicle Collision data from the table you loaded in the previous exercise.
349+
350+
**IMPORTANT**|
351+
-------------|
352+
**Execute these steps inside the ADPDesktop remote desktop connection**|
353+
354+
1. On ADPDesktop, download the Power BI report from the link https://aka.ms/ADPLab1 and save it on the Desktop.
355+
2. Open the file ADPLab1.pbit with Power BI Desktop. Optionally sign up for the Power BI tips and tricks email, or to dismiss this, click to sign in with an existing account, and then hit the escape key.
356+
3. When prompted to enter the value of the **SynapseSQLEnpoint** parameter, type the full server name: synapsesql-*suffix*.database.windows.net
357+
358+
![](./Media/Lab1-Image50.png)
359+
360+
4. Click Load, and then Run to acknowledge the Native Database Query message
361+
5. When prompted, enter the **Database** credentials:
362+
<br>- **User Name**: adpadmin
363+
<br>- **Password**: P@ssw0rd123!
364+
365+
![](./Media/Lab1-Image52.png)
366+
367+
6. Once the data is finished loading, interact with the report by changing the CollisionDate slicer and by clicking on the other visualisations.
368+
7. Save your work and close Power BI Desktop.
369+
370+
![](./Media/Lab1-Image51.png)

0 commit comments

Comments
 (0)