title | description | services | author | ms.service | ms.devlang | ms.topic | ms.date | ms.author | ms.custom | ms.component |
---|---|---|---|---|---|---|---|---|---|---|
Download large amounts of random data from Azure Storage | Microsoft Docs |
Learn how to use the Azure SDK to download large amounts of random data from an Azure Storage account |
storage |
roygara |
storage |
dotnet |
tutorial |
02/20/2018 |
rogarana |
mvc |
blobs |
This tutorial is part three of a series. This tutorial shows you how to download large amounts of data from Azure storage.
In part three of the series, you learn how to:
[!div class="checklist"]
- Update the application
- Run the application
- Validate the number of connections
To complete this tutorial, you must have completed the previous Storage tutorial: Upload large amounts of random data in parallel to Azure storage.
To create a remote desktop session with the virtual machine, use the following command on your local machine. Replace the IP address with the publicIPAddress of your virtual machine. When prompted, enter the credentials used when creating the virtual machine.
mstsc /v:<publicIpAddress>
In the previous tutorial, you only uploaded files to the storage account. Open D:\git\storage-dotnet-perf-scale-app\Program.cs
in a text editor. Replace the Main
method with the following sample. This example comments out the upload task and uncomments the download task and the task to delete the content in the storage account when complete.
public static void Main(string[] args)
{
Console.WriteLine("Azure Blob storage performance and scalability sample");
// Set threading and default connection limit to 100 to ensure multiple threads and connections can be opened.
// This is in addition to parallelism with the storage client library that is defined in the functions below.
ThreadPool.SetMinThreads(100, 4);
ServicePointManager.DefaultConnectionLimit = 100; // (Or More)
bool exception = false;
try
{
// Call the UploadFilesAsync function.
UploadFilesAsync().GetAwaiter().GetResult();
// Uncomment the following line to enable downloading of files from the storage account. This is commented out
// initially to support the tutorial at https://docs.microsoft.com/azure/storage/blobs/storage-blob-scalable-app-download-files.
// DownloadFilesAsync().GetAwaiter().GetResult();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
exception = true;
}
finally
{
// The following function will delete the container and all files contained in them. This is commented out initially
// As the tutorial at https://docs.microsoft.com/azure/storage/blobs/storage-blob-scalable-app-download-files has you upload only for one tutorial and download for the other.
if (!exception)
{
// DeleteExistingContainersAsync().GetAwaiter().GetResult();
}
Console.WriteLine("Press any key to exit the application");
Console.ReadKey();
}
}
After the application has been updated, you need to build the application again. Open a Command Prompt
and navigate to D:\git\storage-dotnet-perf-scale-app
. Rebuild the application by running dotnet build
as seen in the following example:
dotnet build
Now that the application has been rebuilt it is time to run the application with the updated code. If not already open, open a Command Prompt
and navigate to D:\git\storage-dotnet-perf-scale-app
.
Type dotnet run
to run the application.
dotnet run
The application reads the containers located in the storage account specified in the storageconnectionstring. It iterates through the blobs 10 at a time using the ListBlobsSegmented method in the containers and downloads them to the local machine using the DownloadToFileAsync method. The following table shows the BlobRequestOptions that are defined for each blob as it is downloaded.
Property | Value | Description |
---|---|---|
DisableContentMD5Validation | true | This property disables checking the MD5 hash of the content uploaded. Disabling MD5 validation produces a faster transfer. But does not confirm the validity or integrity of the files being transferred. |
StorBlobContentMD5 | false | This property determines if an MD5 hash is calculated and stored. |
The DownloadFilesAsync
task is shown in the following example:
private static async Task DownloadFilesAsync()
{
CloudBlobClient blobClient = GetCloudBlobClient();
// Define the BlobRequestOptions on the download, including disabling MD5 hash validation for this example, this improves the download speed.
BlobRequestOptions options = new BlobRequestOptions
{
DisableContentMD5Validation = true,
StoreBlobContentMD5 = false
};
// Retrieve the list of containers in the storage account. Create a directory and configure variables for use later.
BlobContinuationToken continuationToken = null;
List<CloudBlobContainer> containers = new List<CloudBlobContainer>();
do
{
var listingResult = await blobClient.ListContainersSegmentedAsync(continuationToken);
continuationToken = listingResult.ContinuationToken;
containers.AddRange(listingResult.Results);
}
while (continuationToken != null);
var directory = Directory.CreateDirectory("download");
BlobResultSegment resultSegment = null;
Stopwatch time = Stopwatch.StartNew();
// Download the blobs
try
{
List<Task> tasks = new List<Task>();
int max_outstanding = 100;
int completed_count = 0;
// Create a new instance of the SemaphoreSlim class to define the number of threads to use in the application.
SemaphoreSlim sem = new SemaphoreSlim(max_outstanding, max_outstanding);
// Iterate through the containers
foreach (CloudBlobContainer container in containers)
{
do
{
// Return the blobs from the container lazily 10 at a time.
resultSegment = await container.ListBlobsSegmentedAsync(null, true, BlobListingDetails.All, 10, continuationToken, null, null);
continuationToken = resultSegment.ContinuationToken;
{
foreach (var blobItem in resultSegment.Results)
{
if (((CloudBlob)blobItem).Properties.BlobType == BlobType.BlockBlob)
{
// Get the blob and add a task to download the blob asynchronously from the storage account.
CloudBlockBlob blockBlob = container.GetBlockBlobReference(((CloudBlockBlob)blobItem).Name);
Console.WriteLine("Downloading {0} from container {1}", blockBlob.Name, container.Name);
await sem.WaitAsync();
tasks.Add(blockBlob.DownloadToFileAsync(directory.FullName + "\\" + blockBlob.Name, FileMode.Create, null, options, null).ContinueWith((t) =>
{
sem.Release();
Interlocked.Increment(ref completed_count);
}));
}
}
}
}
while (continuationToken != null);
}
// Creates an asynchronous task that completes when all the downloads complete.
await Task.WhenAll(tasks);
}
catch (Exception e)
{
Console.WriteLine("\nError encountered during transfer: {0}", e.Message);
}
time.Stop();
Console.WriteLine("Download has been completed in {0} seconds. Press any key to continue", time.Elapsed.TotalSeconds.ToString());
Console.ReadLine();
}
While the files are being downloaded, you can verify the number of concurrent connections to your storage account. Open a Command Prompt
and type netstat -a | find /c "blob:https"
. This command shows the number of connections that are currently opened using netstat
. The following example shows a similar output to what you see when running the tutorial yourself. As you can see from the example, over 280 connections were open when downloading the random files from the storage account.
C:\>netstat -a | find /c "blob:https"
289
C:\>
In part three of the series, you learned about downloading large amounts of random data from a storage account, such as how to:
[!div class="checklist"]
- Run the application
- Validate the number of connections
Advance to part four of the series to verify throughput and latency metrics in the portal.
[!div class="nextstepaction"] Verify throughput and latency metrics in the portal