Skip to content

Commit

Permalink
Data Factory UI Changes
Browse files Browse the repository at this point in the history
Lab 2 Updates to Binary datasets
Lab 4 Updates to Binary datasets
  • Loading branch information
fabragaMS committed Aug 24, 2019
1 parent 18214bf commit 2dc0ad5
Show file tree
Hide file tree
Showing 24 changed files with 241 additions and 145 deletions.
296 changes: 190 additions & 106 deletions Lab/Lab2/Lab2.md

Large diffs are not rendered by default.

Binary file modified Lab/Lab2/Media/Lab2-Image15.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab2/Media/Lab2-Image16.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab2/Media/Lab2-Image22.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab2/Media/Lab2-Image23.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab2/Media/Lab2-Image24.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab2/Media/Lab2-Image27.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image41.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image42.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image43.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image44.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image45.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image46.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab2/Media/Lab2-Image47.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
90 changes: 51 additions & 39 deletions Lab/Lab4/Lab4.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,9 +262,9 @@ In this section you will create 4 Azure Data Factory data sets that will be used

Dataset | Description
--------|---------------
**MDWResources_NYCImages**| References MDWResources shared storage account container that contains source image files.
**MDWDataLake_NYCImages**| References your MDWDataLake-*suffix* storage account and it acts as the destination for the image files copied from MDWResources_NYCImages.
**MDWDataLake_NYCImageMetadata**|References your MDWDataLake-*suffix* storage account and it acts as the destination for the image metadata files generated by Databricks.
**MDWResources_NYCImages_Binary**| References MDWResources shared storage account container that contains source image files.
**MDWDataLake_NYCImages_Binary**| References your MDWDataLake-*suffix* storage account and it acts as the destination for the image files copied from MDWResources_NYCImages.
**MDWDataLake_NYCImageMetadata_JSON**|References your MDWDataLake-*suffix* storage account and it acts as the source of image metadata files (JSON) generated by Databricks and Computer Vision.
**MDWCosmosDB_ImageMetadata**| References MDWCosmosDB-*suffix* database that will save the metadata info for all images.

![](./Media/Lab4-Image23.png)
Expand All @@ -285,36 +285,41 @@ Dataset | Description

![](./Media/Lab4-Image35.png)

4. On the **New Data Set** tab, enter the following details:
<br>- **General > Name**: MDWResources_NYCImages
<br>- **Connection > Linked Service**: MDWResources
<br>- **Connection > File Path**: nycimages
<br>- **Connection > Binary Copy**: Checked
4. On the **Set Properties** blade, enter the following details:
<br>- **Name**: MDWResources_NYCImages_Binary
<br>- **Linked Service**: MDWResources
<br>- **File Path**: **Container**: nycimages, **Directory**: [blank], **File**: [blank]

![](./Media/Lab4-Image71.png)

Click **Continue**.

5. Leave remaining fields with default values.

![](./Media/Lab4-Image36.png)

Alternatively you can copy and paste the dataset JSON definition below:

```json
{
"name": "MDWResources_NYCImages",
"name": "MDWResources_NYCImages_Binary",
"properties": {
"linkedServiceName": {
"referenceName": "MDWResources",
"type": "LinkedServiceReference"
},
"type": "AzureBlob",
"annotations": [],
"type": "Binary",
"typeProperties": {
"fileName": "",
"folderPath": "nycimages"
"location": {
"type": "AzureBlobStorageLocation",
"container": "nycimages"
}
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}
}
```

5. Leave remaining fields with default values.

![](./Media/Lab4-Image36.png)

6. Repeat the process to create another dataset, this time referencing the **NYCImages** container in your **MDWDataLake-*suffix*** storage account.

7. Type “Azure Blob Storage” in the search box and click **Azure Blob Storage**.
Expand All @@ -325,32 +330,39 @@ Dataset | Description

![](./Media/Lab4-Image35.png)

9. On the New Data Set tab, enter the following details:
<br>- **General > Name**: MDWDataLake_NYCImages
<br>- **Connection > Linked Service**: MDWDataLake
<br>- **Connection > File Path**: nycimages
9. On the **Set Properties** blade, enter the following details:
<br>- **Name**: MDWDataLake_NYCImages_Binary
<br>- **Linked Service**: MDWDataLake
<br>- **File Path**: **Container**: nycimages, **Directory**: [blank], **File**: [blank]

![](./Media/Lab4-Image72.png)

Click **Continue**.

10. Leave remaining fields with default values.

![](./Media/Lab4-Image37.png)
Alternatively you can copy and paste the dataset JSON definition below:

```json
{
"name": "MDWDataLake_NYCImages",
"name": "MDWDataLake_NYCImages_Binary",
"properties": {
"linkedServiceName": {
"referenceName": "MDWDataLake",
"type": "LinkedServiceReference"
},
"type": "AzureBlob",
"annotations": [],
"type": "Binary",
"typeProperties": {
"folderPath": "nycimages"
"location": {
"type": "AzureBlobStorageLocation",
"container": "nycimages"
}
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}
}
```
10. Leave remaining fields with default values.

![](./Media/Lab4-Image37.png)

11. Repeat the process to create another dataset, this time referencing the **NYCImageMetadata** container in your **MDWDataLake-*suffix*** storage account.

Expand All @@ -363,7 +375,7 @@ Dataset | Description
![](./Media/Lab4-Image38.png)

14. On the **New Data Set** tab, enter the following details:
<br>- **General > Name**: MDWDataLake_NYCImageMetadata
<br>- **General > Name**: MDWDataLake_NYCImageMetadata_JSON
<br>- **Connection > Linked Service**: MDWDataLake
<br>- **Connection > File Path**: nycimagemetadata
<br>- **File format**: JSON format
Expand All @@ -372,17 +384,17 @@ Dataset | Description

```json
{
"name": "MDWDataLake_NYCImageMetadata",
"name": "MDWDataLake_NYCImageMetadata_JSON",
"properties": {
"linkedServiceName": {
"referenceName": "MDWDataLake",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "AzureBlob",
"typeProperties": {
"format": {
"type": "JsonFormat",
"filePattern": "setOfObjects"
"type": "JsonFormat"
},
"fileName": "",
"folderPath": "nycimagemetadata"
Expand Down Expand Up @@ -450,7 +462,7 @@ In this section you will create an Azure Data Factory pipeline to copy New York
<br>- **General > Name**: Copy NYC Images
<br>- **Variables > [click + New] >**
<br> - **Name**: ImageMetadataContainerUrl
<br> - **Default Value**: https://mdwdatalake*suffix*.blob.core.windows.net/nycimages/
<br> - **Default Value**: https://[your data lake account name].blob.core.windows.net/nycimages/

3. Leave remaining fields with default values.

Expand All @@ -460,8 +472,8 @@ In this section you will create an Azure Data Factory pipeline to copy New York

5. Select the **Copy Data** activity and enter the following details:
<br>- **General > Name**: CopyImageFiles
<br>- **Source > Source dataset**: MDWResources_NYCImages
<br>- **Sink > Sink dataset**: MDWDataLake_NYCImages
<br>- **Source > Source dataset**: MDWResources_NYCImages_Binary
<br>- **Sink > Sink dataset**: MDWDataLake_NYCImages_Binary
<br>- **Sink > Copy Behavior**: Preserve Hierarchy

6. Leave remaining fields with default values.
Expand All @@ -473,7 +485,7 @@ In this section you will create an Azure Data Factory pipeline to copy New York

8. Select the **Get Metadata** activity and enter the following details:
<br>- **General > Name**: GetImageFileList
<br>- **Dataset**: MDWDataLake_NYCImages
<br>- **Dataset**: MDWDataLake_NYCImages_Binary
<br>- **Source > Field list**: Child Items

9. Leave remaining fields with default values.
Expand Down Expand Up @@ -529,7 +541,7 @@ In this section you will create an Azure Data Factory pipeline to copy New York

21. Select the Copy Data activity and enter the following details:
<br>- **General > Name**: ServeImageMetadata
<br>- **Source > Source dataset**: MDWDataLake_NYCImageMetadata
<br>- **Source > Source dataset**: MDWDataLake_NYCImageMetadata_JSON
<br>- **Sink > Sink dataset**: MDWCosmosDB_NYCImageMetadata

22. Leave remaining fields with default values.
Expand Down
Binary file modified Lab/Lab4/Media/Lab4-Image36.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab4/Media/Lab4-Image39.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab4/Media/Lab4-Image46.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab4/Media/Lab4-Image47.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Lab/Lab4/Media/Lab4-Image48.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab4/Media/Lab4-Image71.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Lab/Lab4/Media/Lab4-Image72.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified Slides/Azure Data Platform End2End - 1Day.pptx
Binary file not shown.
Binary file modified Slides/Azure Data Platform End2End - 2Day.pptx
Binary file not shown.

0 comments on commit 2dc0ad5

Please sign in to comment.