Skip to content

Latest commit

 

History

History
109 lines (78 loc) · 7.07 KB

tutorial-batch-functions.md

File metadata and controls

109 lines (78 loc) · 7.07 KB
title description ms.devlang ms.topic ms.date ms.author ms.custom
Trigger a Batch job using Azure Functions
Tutorial - Apply OCR to scanned documents as they're added to a storage blob
dotnet
tutorial
05/30/2019
peshultz
mvc

Tutorial: Trigger a Batch job using Azure Functions

In this tutorial, you'll learn how to trigger a Batch job using Azure Functions. We'll walk through an example in which documents added to an Azure Storage blob container have optical character recognition (OCR) applied to them via Azure Batch. To streamline the OCR processing, we will configure an Azure function that runs a Batch OCR job each time a file is added to the blob container.

Prerequisites

Sign in to Azure

Sign in to the Azure portal.

Create a Batch pool and Batch job using Batch Explorer

In this section, you'll use Batch Explorer to create the Batch pool and Batch job that will run OCR tasks.

Create a pool

  1. Sign in to Batch Explorer using your Azure credentials.
  2. Create a pool by selecting Pools on the left side bar, then the Add button above the search form.
    1. Choose an ID and display name. We'll use ocr-pool for this example.
    2. Set the scale type to Fixed size, and set the dedicated node count to 3.
    3. Select Ubuntu 18.04-LTS as the operating system.
    4. Choose Standard_f2s_v2 as the virtual machine size.
    5. Enable the start task and add the command /bin/bash -c "sudo update-locale LC_ALL=C.UTF-8 LANG=C.UTF-8; sudo apt-get update; sudo apt-get -y install ocrmypdf". Be sure to set the user identity as Task default user (Admin), which allows start tasks to include commands with sudo.
    6. Select OK.

Create a job

  1. Create a job on the pool by selecting Jobs on the left side bar, then the Add button above the search form.
    1. Choose an ID and display name. We'll use ocr-job for this example.
    2. Set the pool to ocr-pool, or whatever name you chose for your pool.
    3. Select OK.

Create blob containers

Here you'll create blob containers that will store your input and output files for the OCR Batch job.

  1. Sign in to Storage Explorer using your Azure credentials.
  2. Using the storage account linked to your Batch account, create two blob containers (one for input files, one for output files) by following the steps at Create a blob container.

In this example, the input container is named input and is where all documents without OCR are initially uploaded for processing. The output container is named output and is where the Batch job writes processed documents with OCR.
* In this example, we'll call our input container input, and our output container output.
* The input container is where all documents without OCR are initially uploaded.
* The output container is where the Batch job writes documents with OCR.

Create a shared access signature for your output container in Storage Explorer. Do this by right-clicking on the output container and selecting Get Shared Access Signature.... Under Permissions, check Write. No other permissions are necessary.

Create an Azure Function

In this section you'll create the Azure Function that triggers the OCR Batch job whenever a file is uploaded to your input container.

  1. Follow the steps in Create a function triggered by Azure Blob storage to create a function.
    1. When prompted for a storage account, use the same storage account that you linked to your Batch account.
    2. For runtime stack, choose .NET. We'll write our function in C# to leverage the Batch .NET SDK.
  2. Once the blob-triggered function is created, use the run.csx and function.proj from GitHub in the Function.
    • run.csx is run when a new blob is added to your input blob container.
    • function.proj lists the external libraries in your Function code, for example, the Batch .NET SDK.
  3. Change the placeholder values of the variables in the Run() function of the run.csx file to reflect your Batch and storage credentials. You can find your Batch and storage account credentials in the Azure portal in the Keys section of your Batch account.
    • Retrieve your Batch and storage account credentials in the Azure portal in the Keys section of your Batch account.

Trigger the function and retrieve results

Upload any or all of the scanned files from the input_files directory on GitHub to your input container. Monitor Batch Explorer to confirm that a task gets added to ocr-pool for each file. After a few seconds, the file with OCR applied is added to the output container. The file is then visible and retrievable on Storage Explorer.

Additionally, you can watch the logs file at the bottom of the Azure Functions web editor window, where you'll see messages like this for every file you upload to your input container:

2019-05-29T19:45:25.846 [Information] Creating job...
2019-05-29T19:45:25.847 [Information] Accessing input container <inputContainer>...
2019-05-29T19:45:25.847 [Information] Adding <fileName> as a resource file...
2019-05-29T19:45:25.848 [Information] Name of output text file: <outputTxtFile>
2019-05-29T19:45:25.848 [Information] Name of output PDF file: <outputPdfFile>
2019-05-29T19:45:26.200 [Information] Adding OCR task <taskID> for <fileName> <size of fileName>...

To download the output files from Storage Explorer to your local machine, first select the files you want and then select the Download on the top ribbon.

Tip

The downloaded files are searchable if opened in a PDF reader.

Next steps

In this tutorial you learned how to:

[!div class="checklist"]

  • Use Batch Explorer to create pools and jobs
  • Use Storage Explorer to create blob containers and a shared access signature (SAS)
  • Create a blob-triggered Azure Function
  • Upload input files to Storage
  • Monitor task execution
  • Retrieve output files