diff --git a/Lab/Lab2/Lab2.md b/Lab/Lab2/Lab2.md
index 5495433..e1bf80f 100644
--- a/Lab/Lab2/Lab2.md
+++ b/Lab/Lab2/Lab2.md
@@ -211,10 +211,11 @@ In this section you are going to create 5 datasets that will be used by your dat
Dataset | Description
--------|---------------
**MDWResources_NYCTaxiData**| References MDWResources shared storage account container that contains source data files.
-**MDWResources_NYCTaxiLookup**| References MDWResources shared storage account that contains a .csv file with all taxi location codes and names.
+**MDWResources_NYCTaxiLookup_CSV**| References MDWResources shared storage account that contains a .csv file with all taxi location codes and names.
**MDWASQLDW_StagingNYCTaxiData**| References the table Staging.NYCTaxiData in the Azure SQL Data Warehouse database MDWASQLDW.
**MDWASQLDW_StagingNYCLocationLookup**| References the table [Staging].[NYCTaxiLocationLookup] in the Azure SQL Data Warehouse database MDWASQLDW and acts as destination of lookup data copied from MDWResources_NYCTaxiLookup.
-**MDWDataLake_NYCTaxiData**| References your MDWDataLake-*suffix* storage account. It acts as the destination for the files copied from MDWResources_NYCTaxiData. It also functions as a data source when copying data to MDWASQLDW_StagingNYCTaxiData.
+**MDWDataLake_NYCTaxiData_Binary**| References your MDWDataLake-*suffix* storage account. It acts as the destination for the files copied from MDWResources_NYCTaxiData.
+**MDWDataLake_NYCTaxiData_CSV**| References your MDWDataLake-*suffix* storage account. It functions as a data source when copying data to MDWASQLDW_StagingNYCTaxiData.
**IMPORTANT**|
-------------|
@@ -232,36 +233,40 @@ Dataset | Description
![](./Media/Lab2-Image14.png)
-4. On the New Data Set tab, enter the following details:
-
- **General > Name**: MDWResources_NYCTaxiData
-
- **Connection > Linked Service**: MDWResources
-
- **Connection > File Path**: nyctaxidata / *.csv
-
- **Connection > Binary Copy**: Checked
+4. On the **Set Properties** blade, enter the following details:
+
- **Name**: MDWResources_NYCTaxiData
+
- **Linked service**: MDWResources
+
- **File Path**: **Container**: nyctaxidata, **Directory**: [blank], **File**: [blank]
+
+ ![](./Media/Lab2-Image41.png)
Alternatively you can copy and paste the Dataset JSON definition below:
```json
{
- "name": "MDWResources_NYCTaxiData",
- "properties": {
- "linkedServiceName": {
- "referenceName": "MDWResources",
- "type": "LinkedServiceReference"
- },
- "type": "AzureBlob",
- "typeProperties": {
- "fileName": "*.csv",
- "folderPath": "nyctaxidata"
+ "name": "MDWResources_NYCTaxiData",
+ "properties": {
+ "linkedServiceName": {
+ "referenceName": "MDWResources",
+ "type": "LinkedServiceReference"
+ },
+ "annotations": [],
+ "type": "Binary",
+ "typeProperties": {
+ "location": {
+ "type": "AzureBlobStorageLocation",
+ "container": "nyctaxidata"
+ }
+ }
}
- },
- "type": "Microsoft.DataFactory/factories/datasets"
}
```
5. Leave remaining fields with default values.
![](./Media/Lab2-Image15.png)
-6. Repeat the process to create another dataset, this time referencing the NYCTaxiData container in your MDWDataLake storage account.
+6. Repeat the process to create another Azure Storage Binary dataset, this time referencing the NYCTaxiData container in your MDWDataLake storage account. This dataset acts as the destination for the NYC taxi data files you will copy from the previous dataset.
+
7. Type “Azure Blob Storage” in the search box and select **Azure Blob Storage**. Click **Continue**.
![](./Media/Lab2-Image13.png)
@@ -270,97 +275,151 @@ Dataset | Description
![](./Media/Lab2-Image14.png)
-11. On the New Data Set tab, enter the following details:
-
- **General > Name**: MDWDataLake_NYCTaxiData
-
- **Connection > Linked Service**: MDWDataLake
-
- **Connection > File Path**: nyctaxidata
-
- **Connection > Binary Copy**: Unchecked
-
- **Connection > Column names in the first row**: Checked
+9. On the **Set Properties** blade, enter the following details:
+
- **Name**: MDWDataLake_NYCTaxiData_Binary
+
- **Linked Service**: MDWDataLake
+
- **File Path**: **Container**: nyctaxidata, **Directory**: [blank], **File**: [blank]
+
+ ![](./Media/Lab2-Image42.png)
+
+ Click **Continue**.
Alternatively you can copy and paste the Dataset JSON definition below:
```json
{
- "name": "MDWDataLake_NYCTaxiData",
- "properties": {
- "linkedServiceName": {
- "referenceName": "MDWDataLake",
- "type": "LinkedServiceReference"
- },
- "type": "AzureBlob",
- "typeProperties": {
- "format": {
- "type": "TextFormat",
- "columnDelimiter": ",",
- "rowDelimiter": "",
- "nullValue": "",
- "treatEmptyAsNull": true,
- "firstRowAsHeader": true
+ "name": "MDWDataLake_NYCTaxiData_Binary",
+ "properties": {
+ "linkedServiceName": {
+ "referenceName": "MDWDataLake",
+ "type": "LinkedServiceReference"
},
- "fileName": "",
- "folderPath": "nyctaxidata"
+ "annotations": [],
+ "type": "Binary",
+ "typeProperties": {
+ "location": {
+ "type": "AzureBlobStorageLocation",
+ "container": "nyctaxidata"
+ }
+ }
}
- },
- "type": "Microsoft.DataFactory/factories/datasets"
}
```
-12. Leave remaining fields with default values.
+10. Leave remaining fields with default values.
![](./Media/Lab2-Image16.png)
-13. Repeat the process to create another dataset, this time referencing the NYCTaxiLookup container in your MDWResources storage account.
-14. Type “Azure Blob Storage” in the search box and select **Azure Blob Storage**. Click **Continue**.
+11. Repeat the process to create a new Azure Storage CSV dataset referencing the NYCTaxiData container in your MDWDataLake storage account. This dataset acts as the data source of NYC taxi records (CSV) you will copy to your Azure SQL Data Warehouse.
+
+12. Type “Azure Blob Storage” in the search box and select **Azure Blob Storage**. Click **Continue**.
![](./Media/Lab2-Image13.png)
-15. On the **Select Format** blade, select **Binary** and click **Continue**.
+13. On the **Select Format** blade, select **DelimitedText** and click **Continue**.
- ![](./Media/Lab2-Image14.png)
+ ![](./Media/Lab2-Image43.png)
-16. On the New Data Set tab, enter the following details:
-
- **General > Name**: MDWResources_NYCTaxiLookup
-
- **Connection > Linked Service**: MDWResources
-
- **Connection > File Path**: nyctaxilookup / taxi_zone_lookup.csv
-
- **Connection > Binary Copy**: Unchecked
-
- **Connection > Column names in the first row**: Checked.
-
- **Connection > Quote character**: “ (double-quote) *expand Advanced to see this field*
+14. On the **Set Properties** blade, enter the following details:
+
- **Name**: MDWDataLake_NYCTaxiData_CSV
+
- **Linked Service**: MDWDataLake
+
- **File Path**: **Container**: nyctaxidata, **Directory**: [blank], **File Path**: [blank]
+
- **First row as header**: Checked
+
- **Import schema**: None
+
+ ![](./Media/Lab2-Image44.png)
+
+ Click **Continue**.
+
+15. On the dataset properties, set the following property values:
+
- **Connection > Escape character**: No escape character
+
- **Connection > Quote character**: No quote character
+
+ Leave remaining fields with default values.
+
+ ![](./Media/Lab2-Image46.png)
Alternatively you can copy and paste the Dataset JSON definition below:
```json
{
- "name": "MDWResources_NYCTaxiLookup",
- "properties": {
- "linkedServiceName": {
- "referenceName": "MDWResources",
- "type": "LinkedServiceReference"
- },
- "type": "AzureBlob",
- "typeProperties": {
- "format": {
- "type": "TextFormat",
+ "name": "MDWDataLake_NYCTaxiData_CSV",
+ "properties": {
+ "linkedServiceName": {
+ "referenceName": "MDWDataLake",
+ "type": "LinkedServiceReference"
+ },
+ "annotations": [],
+ "type": "DelimitedText",
+ "typeProperties": {
+ "location": {
+ "type": "AzureBlobStorageLocation",
+ "container": "nyctaxidata"
+ },
"columnDelimiter": ",",
- "rowDelimiter": "",
- "quoteChar": "\"",
- "nullValue": "\\N",
- "treatEmptyAsNull": true,
- "firstRowAsHeader": true
+ "escapeChar": "",
+ "firstRowAsHeader": true,
+ "quoteChar": ""
},
- "fileName": "taxi_zone_lookup.csv",
- "folderPath": "nyctaxilookup"
+ "schema": []
}
- },
- "type": "Microsoft.DataFactory/factories/datasets"
}
```
-17. Leave remaining fields with default values.
-18. Repeat the process to create another dataset, this time referencing the Staging.NYCTaxiData in your Azure SQL Data Warehouse database.
-19. Type “Azure SQL Data Warehouse” in the search box and select **Azure SQL Data Warehouse**. Click **Continue**.
+16. Repeat the process to create another Azure Blob CSV dataset, this time referencing the NYCTaxiLookup container in your MDWResources storage account.
+
+17. Type “Azure Blob Storage” in the search box and select **Azure Blob Storage**. Click **Continue**.
+
+ ![](./Media/Lab2-Image13.png)
+
+18. On the **Select Format** blade, select **DelimitedText** and click **Continue**.
+
+ ![](./Media/Lab2-Image43.png)
+
+19. On the **Set Properties** blade, enter the following details:
+
- **Name**: MDWResources_NYCTaxiLookup_CSV
+
- **Linked Service**: MDWResources
+
- **File Path**: **Container**:nyctaxilookup, **Directory*: [blank], **File**: [blank]
+
- **First row as header**: Checked
+
- **Import schema**: None.
+
+ ![](./Media/Lab2-Image47.png)
+
+20. Leave remaining fields with default values.
+
+ Alternatively you can copy and paste the Dataset JSON definition below:
+
+ ```json
+ {
+ "name": "MDWResources_NYCTaxiLookup_CSV",
+ "properties": {
+ "linkedServiceName": {
+ "referenceName": "MDWResources",
+ "type": "LinkedServiceReference"
+ },
+ "annotations": [],
+ "type": "DelimitedText",
+ "typeProperties": {
+ "location": {
+ "type": "AzureBlobStorageLocation",
+ "container": "nyctaxilookup"
+ },
+ "columnDelimiter": ",",
+ "escapeChar": "\\",
+ "firstRowAsHeader": true,
+ "quoteChar": "\""
+ },
+ "schema": []
+ }
+ }
+ ```
+
+
+21. Repeat the process to create another dataset, this time referencing the Staging.NYCTaxiData in your Azure SQL Data Warehouse database.
+22. Type “Azure SQL Data Warehouse” in the search box and select **Azure SQL Data Warehouse**. Click **Continue**.
![](./Media/Lab2-Image17.png)
-20. On the Set Properties blade, enter the following details:
+23. On the Set Properties blade, enter the following details:
- **Name**: MDWASQLDW_StagingNYCTaxiData
- **Linked Service**: MDWSQLVirtualServer_MDWASQLDW
- **Table**: [Staging].[NYCTaxiData]
@@ -384,17 +443,17 @@ Dataset | Description
}
```
-21. Leave remaining fields with default values.
+24. Leave remaining fields with default values.
![](./Media/Lab2-Image18.png)
-22. Repeat the process to create another dataset, this time referencing the Staging.NYCLocationLookup in your Azure SQL Data Warehouse database.
+25. Repeat the process to create another dataset, this time referencing the Staging.NYCLocationLookup in your Azure SQL Data Warehouse database.
-23. Type “Azure SQL Data Warehouse” in the search box and select **Azure SQL Data Warehouse**. Click **Finish**.
+26. Type “Azure SQL Data Warehouse” in the search box and select **Azure SQL Data Warehouse**. Click **Finish**.
![](./Media/Lab2-Image17.png)
-24. On the Set Properties blade, enter the following details:
+27. On the Set Properties blade, enter the following details:
-**Name**: MDWASQLDW_StagingNYCLocationLookup
-**Linked Service**: MDWSQLVirtualServer_MDWASQLDW
-**Table**: [Staging].[NYCTaxiLocationLookup]
@@ -418,11 +477,11 @@ Dataset | Description
}
```
-25. Leave remaining fields with default values.
+28. Leave remaining fields with default values.
![](./Media/Lab2-Image19.png)
-26. Publish your dataset changes by clicking the **Publish all** button.
+29. Publish your dataset changes by clicking the **Publish all** button.
![](./Media/Lab2-Image20.png)
@@ -447,6 +506,7 @@ In this section you create a data factory pipeline to copy data in the following
{
"name": "CopyTaxiDataFiles",
"type": "Copy",
+ "dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
@@ -454,14 +514,21 @@ In this section you create a data factory pipeline to copy data in the following
"secureOutput": false,
"secureInput": false
},
+ "userProperties": [],
"typeProperties": {
"source": {
- "type": "BlobSource",
- "recursive": true
+ "type": "BinarySource",
+ "storeSettings": {
+ "type": "AzureBlobStorageReadSettings",
+ "recursive": true
+ }
},
"sink": {
- "type": "BlobSink",
- "copyBehavior": "PreserveHierarchy"
+ "type": "BinarySink",
+ "storeSettings": {
+ "type": "AzureBlobStorageWriteSettings",
+ "copyBehavior": "PreserveHierarchy"
+ }
},
"enableStaging": false
},
@@ -473,7 +540,7 @@ In this section you create a data factory pipeline to copy data in the following
],
"outputs": [
{
- "referenceName": "MDWDataLake_NYCTaxiData",
+ "referenceName": "MDWDataLake_NYCTaxiData_Binary",
"type": "DatasetReference"
}
]
@@ -496,15 +563,22 @@ In this section you create a data factory pipeline to copy data in the following
"secureOutput": false,
"secureInput": false
},
+ "userProperties": [],
"typeProperties": {
"source": {
- "type": "BlobSource",
- "recursive": true
+ "type": "DelimitedTextSource",
+ "storeSettings": {
+ "type": "AzureBlobStorageReadSettings",
+ "recursive": true,
+ "wildcardFileName": "*.*"
+ },
+ "formatSettings": {
+ "type": "DelimitedTextReadSettings"
+ }
},
"sink": {
"type": "SqlDWSink",
"allowPolyBase": true,
- "writeBatchSize": 10000,
"preCopyScript": "truncate table Staging.NYCTaxiData",
"polyBaseSettings": {
"rejectValue": 0,
@@ -516,7 +590,7 @@ In this section you create a data factory pipeline to copy data in the following
},
"inputs": [
{
- "referenceName": "MDWDataLake_NYCTaxiData",
+ "referenceName": "MDWDataLake_NYCTaxiData_CSV",
"type": "DatasetReference"
}
],
@@ -530,6 +604,7 @@ In this section you create a data factory pipeline to copy data in the following
{
"name": "CopyTaxiLookupDataToDW",
"type": "Copy",
+ "dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
@@ -537,16 +612,23 @@ In this section you create a data factory pipeline to copy data in the following
"secureOutput": false,
"secureInput": false
},
+ "userProperties": [],
"typeProperties": {
"source": {
- "type": "BlobSource",
- "recursive": true
+ "type": "DelimitedTextSource",
+ "storeSettings": {
+ "type": "AzureBlobStorageReadSettings",
+ "recursive": true,
+ "wildcardFileName": "*.*"
+ },
+ "formatSettings": {
+ "type": "DelimitedTextReadSettings"
+ }
},
"sink": {
"type": "SqlDWSink",
"allowPolyBase": true,
- "writeBatchSize": 10000,
- "preCopyScript": "truncate table Staging.NYCTaxiLocationLookup",
+ "preCopyScript": "truncate table [Staging].[NYCTaxiLocationLookup]",
"polyBaseSettings": {
"rejectValue": 0,
"rejectType": "value",
@@ -564,7 +646,7 @@ In this section you create a data factory pipeline to copy data in the following
},
"inputs": [
{
- "referenceName": "MDWResources_NYCTaxiLookup",
+ "referenceName": "MDWResources_NYCTaxiLookup_CSV",
"type": "DatasetReference"
}
],
@@ -599,6 +681,7 @@ In this section you create a data factory pipeline to copy data in the following
"secureOutput": false,
"secureInput": false
},
+ "userProperties": [],
"typeProperties": {
"storedProcedureName": "[Staging].[spNYCLoadTaxiDataSummary]"
},
@@ -607,12 +690,13 @@ In this section you create a data factory pipeline to copy data in the following
"type": "LinkedServiceReference"
}
}
- ]
- },
- "type": "Microsoft.DataFactory/factories/pipelines"
+ ],
+ "annotations": []
+ }
}
```
+
**IMPORTANT**|
-------------|
@@ -629,7 +713,7 @@ In this section you create a data factory pipeline to copy data in the following
5. Select the Copy Data activity and enter the following details:
- **General > Name**: CopyTaxiDataFiles
- **Source > Source dataset**: MDWResources_NYCTaxiData
-
- **Sink > Sink dataset**: MDWDataLake_NYCTaxiData
+
- **Sink > Sink dataset**: MDWDataLake_NYCTaxiData_Binary
- **Sink > Copy Behavior**: Preserve Hierarchy
6. Leave remaining fields with default values.
@@ -640,7 +724,7 @@ In this section you create a data factory pipeline to copy data in the following
8. From the Activities panel, type “Copy Data” in the search box. Drag the Copy Data activity on to the design surface.
9. Select the Copy Data activity and enter the following details:
- **General > Name**: CopyTaxiDataToDW
-
- **Source > Source dataset**: MDWDataLake_NYCTaxiData
+
- **Source > Source dataset**: MDWDataLake_NYCTaxiData_CSV
- **Sink > Sink dataset**: MDWASQLDW_StagingNYCTaxiData
- **Sink > Pre Copy Script**:
```sql
@@ -659,7 +743,7 @@ In this section you create a data factory pipeline to copy data in the following
13. From the Activities panel, type “Copy Data” in the search box. Drag the Copy Data activity on to the design surface.
14. Select the Copy Data activity and enter the following details:
- **General > Name**: CopyTaxiLookupDataToDW
-
- **Source > Source dataset**: MDWResources_NYCTaxiLookup
+
- **Source > Source dataset**: MDWResources_NYCTaxiLookup_CSV
- **Sink > Sink dataset**: MDWASQLDW_StagingNYCLocationLookup
- **Sink > Pre Copy Script**:
```sql
diff --git a/Lab/Lab2/Media/Lab2-Image15.png b/Lab/Lab2/Media/Lab2-Image15.png
index 54dc807..470ce10 100644
Binary files a/Lab/Lab2/Media/Lab2-Image15.png and b/Lab/Lab2/Media/Lab2-Image15.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image16.png b/Lab/Lab2/Media/Lab2-Image16.png
index 7edb96d..0a6943e 100644
Binary files a/Lab/Lab2/Media/Lab2-Image16.png and b/Lab/Lab2/Media/Lab2-Image16.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image22.png b/Lab/Lab2/Media/Lab2-Image22.png
index 4b4c318..c9112ae 100644
Binary files a/Lab/Lab2/Media/Lab2-Image22.png and b/Lab/Lab2/Media/Lab2-Image22.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image23.png b/Lab/Lab2/Media/Lab2-Image23.png
index a0f1951..cb6e350 100644
Binary files a/Lab/Lab2/Media/Lab2-Image23.png and b/Lab/Lab2/Media/Lab2-Image23.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image24.png b/Lab/Lab2/Media/Lab2-Image24.png
index 0fb697f..ef972ae 100644
Binary files a/Lab/Lab2/Media/Lab2-Image24.png and b/Lab/Lab2/Media/Lab2-Image24.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image27.png b/Lab/Lab2/Media/Lab2-Image27.png
index 4d9636a..441688a 100644
Binary files a/Lab/Lab2/Media/Lab2-Image27.png and b/Lab/Lab2/Media/Lab2-Image27.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image41.png b/Lab/Lab2/Media/Lab2-Image41.png
new file mode 100644
index 0000000..5cb1f28
Binary files /dev/null and b/Lab/Lab2/Media/Lab2-Image41.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image42.png b/Lab/Lab2/Media/Lab2-Image42.png
new file mode 100644
index 0000000..e35efc5
Binary files /dev/null and b/Lab/Lab2/Media/Lab2-Image42.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image43.png b/Lab/Lab2/Media/Lab2-Image43.png
new file mode 100644
index 0000000..d33e71f
Binary files /dev/null and b/Lab/Lab2/Media/Lab2-Image43.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image44.png b/Lab/Lab2/Media/Lab2-Image44.png
new file mode 100644
index 0000000..25f769a
Binary files /dev/null and b/Lab/Lab2/Media/Lab2-Image44.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image45.png b/Lab/Lab2/Media/Lab2-Image45.png
new file mode 100644
index 0000000..8e83c87
Binary files /dev/null and b/Lab/Lab2/Media/Lab2-Image45.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image46.png b/Lab/Lab2/Media/Lab2-Image46.png
new file mode 100644
index 0000000..8e83c87
Binary files /dev/null and b/Lab/Lab2/Media/Lab2-Image46.png differ
diff --git a/Lab/Lab2/Media/Lab2-Image47.png b/Lab/Lab2/Media/Lab2-Image47.png
new file mode 100644
index 0000000..3a8ae55
Binary files /dev/null and b/Lab/Lab2/Media/Lab2-Image47.png differ
diff --git a/Lab/Lab4/Lab4.md b/Lab/Lab4/Lab4.md
index af64ca1..535cb79 100644
--- a/Lab/Lab4/Lab4.md
+++ b/Lab/Lab4/Lab4.md
@@ -262,9 +262,9 @@ In this section you will create 4 Azure Data Factory data sets that will be used
Dataset | Description
--------|---------------
-**MDWResources_NYCImages**| References MDWResources shared storage account container that contains source image files.
-**MDWDataLake_NYCImages**| References your MDWDataLake-*suffix* storage account and it acts as the destination for the image files copied from MDWResources_NYCImages.
-**MDWDataLake_NYCImageMetadata**|References your MDWDataLake-*suffix* storage account and it acts as the destination for the image metadata files generated by Databricks.
+**MDWResources_NYCImages_Binary**| References MDWResources shared storage account container that contains source image files.
+**MDWDataLake_NYCImages_Binary**| References your MDWDataLake-*suffix* storage account and it acts as the destination for the image files copied from MDWResources_NYCImages.
+**MDWDataLake_NYCImageMetadata_JSON**|References your MDWDataLake-*suffix* storage account and it acts as the source of image metadata files (JSON) generated by Databricks and Computer Vision.
**MDWCosmosDB_ImageMetadata**| References MDWCosmosDB-*suffix* database that will save the metadata info for all images.
![](./Media/Lab4-Image23.png)
@@ -285,36 +285,41 @@ Dataset | Description
![](./Media/Lab4-Image35.png)
-4. On the **New Data Set** tab, enter the following details:
-
- **General > Name**: MDWResources_NYCImages
-
- **Connection > Linked Service**: MDWResources
-
- **Connection > File Path**: nycimages
-
- **Connection > Binary Copy**: Checked
+4. On the **Set Properties** blade, enter the following details:
+
- **Name**: MDWResources_NYCImages_Binary
+
- **Linked Service**: MDWResources
+
- **File Path**: **Container**: nycimages, **Directory**: [blank], **File**: [blank]
+
+ ![](./Media/Lab4-Image71.png)
+
+ Click **Continue**.
+
+5. Leave remaining fields with default values.
+
+ ![](./Media/Lab4-Image36.png)
Alternatively you can copy and paste the dataset JSON definition below:
```json
{
- "name": "MDWResources_NYCImages",
+ "name": "MDWResources_NYCImages_Binary",
"properties": {
"linkedServiceName": {
"referenceName": "MDWResources",
"type": "LinkedServiceReference"
},
- "type": "AzureBlob",
+ "annotations": [],
+ "type": "Binary",
"typeProperties": {
- "fileName": "",
- "folderPath": "nycimages"
+ "location": {
+ "type": "AzureBlobStorageLocation",
+ "container": "nycimages"
+ }
}
- },
- "type": "Microsoft.DataFactory/factories/datasets"
+ }
}
```
-5. Leave remaining fields with default values.
-
- ![](./Media/Lab4-Image36.png)
-
6. Repeat the process to create another dataset, this time referencing the **NYCImages** container in your **MDWDataLake-*suffix*** storage account.
7. Type “Azure Blob Storage” in the search box and click **Azure Blob Storage**.
@@ -325,32 +330,39 @@ Dataset | Description
![](./Media/Lab4-Image35.png)
-9. On the New Data Set tab, enter the following details:
-
- **General > Name**: MDWDataLake_NYCImages
-
- **Connection > Linked Service**: MDWDataLake
-
- **Connection > File Path**: nycimages
+9. On the **Set Properties** blade, enter the following details:
+
- **Name**: MDWDataLake_NYCImages_Binary
+
- **Linked Service**: MDWDataLake
+
- **File Path**: **Container**: nycimages, **Directory**: [blank], **File**: [blank]
+
+ ![](./Media/Lab4-Image72.png)
+
+ Click **Continue**.
+10. Leave remaining fields with default values.
+
+ ![](./Media/Lab4-Image37.png)
Alternatively you can copy and paste the dataset JSON definition below:
```json
{
- "name": "MDWDataLake_NYCImages",
+ "name": "MDWDataLake_NYCImages_Binary",
"properties": {
"linkedServiceName": {
"referenceName": "MDWDataLake",
"type": "LinkedServiceReference"
},
- "type": "AzureBlob",
+ "annotations": [],
+ "type": "Binary",
"typeProperties": {
- "folderPath": "nycimages"
+ "location": {
+ "type": "AzureBlobStorageLocation",
+ "container": "nycimages"
+ }
}
- },
- "type": "Microsoft.DataFactory/factories/datasets"
+ }
}
```
-10. Leave remaining fields with default values.
-
- ![](./Media/Lab4-Image37.png)
11. Repeat the process to create another dataset, this time referencing the **NYCImageMetadata** container in your **MDWDataLake-*suffix*** storage account.
@@ -363,7 +375,7 @@ Dataset | Description
![](./Media/Lab4-Image38.png)
14. On the **New Data Set** tab, enter the following details:
-
- **General > Name**: MDWDataLake_NYCImageMetadata
+
- **General > Name**: MDWDataLake_NYCImageMetadata_JSON
- **Connection > Linked Service**: MDWDataLake
- **Connection > File Path**: nycimagemetadata
- **File format**: JSON format
@@ -372,17 +384,17 @@ Dataset | Description
```json
{
- "name": "MDWDataLake_NYCImageMetadata",
+ "name": "MDWDataLake_NYCImageMetadata_JSON",
"properties": {
"linkedServiceName": {
"referenceName": "MDWDataLake",
"type": "LinkedServiceReference"
},
+ "annotations": [],
"type": "AzureBlob",
"typeProperties": {
"format": {
- "type": "JsonFormat",
- "filePattern": "setOfObjects"
+ "type": "JsonFormat"
},
"fileName": "",
"folderPath": "nycimagemetadata"
@@ -450,7 +462,7 @@ In this section you will create an Azure Data Factory pipeline to copy New York
- **General > Name**: Copy NYC Images
- **Variables > [click + New] >**
- **Name**: ImageMetadataContainerUrl
-
- **Default Value**: https://mdwdatalake*suffix*.blob.core.windows.net/nycimages/
+
- **Default Value**: https://[your data lake account name].blob.core.windows.net/nycimages/
3. Leave remaining fields with default values.
@@ -460,8 +472,8 @@ In this section you will create an Azure Data Factory pipeline to copy New York
5. Select the **Copy Data** activity and enter the following details:
- **General > Name**: CopyImageFiles
-
- **Source > Source dataset**: MDWResources_NYCImages
-
- **Sink > Sink dataset**: MDWDataLake_NYCImages
+
- **Source > Source dataset**: MDWResources_NYCImages_Binary
+
- **Sink > Sink dataset**: MDWDataLake_NYCImages_Binary
- **Sink > Copy Behavior**: Preserve Hierarchy
6. Leave remaining fields with default values.
@@ -473,7 +485,7 @@ In this section you will create an Azure Data Factory pipeline to copy New York
8. Select the **Get Metadata** activity and enter the following details:
- **General > Name**: GetImageFileList
-
- **Dataset**: MDWDataLake_NYCImages
+
- **Dataset**: MDWDataLake_NYCImages_Binary
- **Source > Field list**: Child Items
9. Leave remaining fields with default values.
@@ -529,7 +541,7 @@ In this section you will create an Azure Data Factory pipeline to copy New York
21. Select the Copy Data activity and enter the following details:
- **General > Name**: ServeImageMetadata
-
- **Source > Source dataset**: MDWDataLake_NYCImageMetadata
+
- **Source > Source dataset**: MDWDataLake_NYCImageMetadata_JSON
- **Sink > Sink dataset**: MDWCosmosDB_NYCImageMetadata
22. Leave remaining fields with default values.
diff --git a/Lab/Lab4/Media/Lab4-Image36.png b/Lab/Lab4/Media/Lab4-Image36.png
index 79949f3..712d227 100644
Binary files a/Lab/Lab4/Media/Lab4-Image36.png and b/Lab/Lab4/Media/Lab4-Image36.png differ
diff --git a/Lab/Lab4/Media/Lab4-Image39.png b/Lab/Lab4/Media/Lab4-Image39.png
index 2b0e1e5..07bdfd4 100644
Binary files a/Lab/Lab4/Media/Lab4-Image39.png and b/Lab/Lab4/Media/Lab4-Image39.png differ
diff --git a/Lab/Lab4/Media/Lab4-Image46.png b/Lab/Lab4/Media/Lab4-Image46.png
index 8c9e25d..af7e298 100644
Binary files a/Lab/Lab4/Media/Lab4-Image46.png and b/Lab/Lab4/Media/Lab4-Image46.png differ
diff --git a/Lab/Lab4/Media/Lab4-Image47.png b/Lab/Lab4/Media/Lab4-Image47.png
index fdee95d..96abc50 100644
Binary files a/Lab/Lab4/Media/Lab4-Image47.png and b/Lab/Lab4/Media/Lab4-Image47.png differ
diff --git a/Lab/Lab4/Media/Lab4-Image48.png b/Lab/Lab4/Media/Lab4-Image48.png
index 122a205..c2e81ca 100644
Binary files a/Lab/Lab4/Media/Lab4-Image48.png and b/Lab/Lab4/Media/Lab4-Image48.png differ
diff --git a/Lab/Lab4/Media/Lab4-Image71.png b/Lab/Lab4/Media/Lab4-Image71.png
new file mode 100644
index 0000000..880e1b5
Binary files /dev/null and b/Lab/Lab4/Media/Lab4-Image71.png differ
diff --git a/Lab/Lab4/Media/Lab4-Image72.png b/Lab/Lab4/Media/Lab4-Image72.png
new file mode 100644
index 0000000..0796577
Binary files /dev/null and b/Lab/Lab4/Media/Lab4-Image72.png differ
diff --git a/Slides/Azure Data Platform End2End - 1Day.pptx b/Slides/Azure Data Platform End2End - 1Day.pptx
index 02c0a6d..6d384d5 100644
Binary files a/Slides/Azure Data Platform End2End - 1Day.pptx and b/Slides/Azure Data Platform End2End - 1Day.pptx differ
diff --git a/Slides/Azure Data Platform End2End - 2Day.pptx b/Slides/Azure Data Platform End2End - 2Day.pptx
index f04afed..233b212 100644
Binary files a/Slides/Azure Data Platform End2End - 2Day.pptx and b/Slides/Azure Data Platform End2End - 2Day.pptx differ