title | description | ms.devlang | ms.topic | ms.date | ms.author | ms.custom |
---|---|---|---|---|---|---|
Trigger a Batch job using Azure Functions |
Tutorial - Apply OCR to scanned documents as they're added to a storage blob |
dotnet |
tutorial |
05/30/2019 |
peshultz |
mvc |
In this tutorial, you'll learn how to trigger a Batch job using Azure Functions. We'll walk through an example in which documents added to an Azure Storage blob container have optical character recognition (OCR) applied to them via Azure Batch. To streamline the OCR processing, we will configure an Azure function that runs a Batch OCR job each time a file is added to the blob container.
- An Azure subscription. If you don't have one, create a free account before you begin.
- An Azure Batch account and a linked Azure Storage account. See Create a Batch account for more information on how to create and link accounts.
- Batch Explorer
- Azure Storage Explorer
Sign in to the Azure portal.
In this section, you'll use Batch Explorer to create the Batch pool and Batch job that will run OCR tasks.
- Sign in to Batch Explorer using your Azure credentials.
- Create a pool by selecting Pools on the left side bar, then the Add button above the search form.
- Choose an ID and display name. We'll use
ocr-pool
for this example. - Set the scale type to Fixed size, and set the dedicated node count to 3.
- Select Ubuntu 18.04-LTS as the operating system.
- Choose
Standard_f2s_v2
as the virtual machine size. - Enable the start task and add the command
/bin/bash -c "sudo update-locale LC_ALL=C.UTF-8 LANG=C.UTF-8; sudo apt-get update; sudo apt-get -y install ocrmypdf"
. Be sure to set the user identity as Task default user (Admin), which allows start tasks to include commands withsudo
. - Select OK.
- Choose an ID and display name. We'll use
- Create a job on the pool by selecting Jobs on the left side bar, then the Add button above the search form.
- Choose an ID and display name. We'll use
ocr-job
for this example. - Set the pool to
ocr-pool
, or whatever name you chose for your pool. - Select OK.
- Choose an ID and display name. We'll use
Here you'll create blob containers that will store your input and output files for the OCR Batch job.
- Sign in to Storage Explorer using your Azure credentials.
- Using the storage account linked to your Batch account, create two blob containers (one for input files, one for output files) by following the steps at Create a blob container.
In this example, the input container is named input
and is where all documents without OCR are initially uploaded for processing. The output container is named output
and is where the Batch job writes processed documents with OCR.
* In this example, we'll call our input container input
, and our output container output
.
* The input container is where all documents without OCR are initially uploaded.
* The output container is where the Batch job writes documents with OCR.
Create a shared access signature for your output container in Storage Explorer. Do this by right-clicking on the output container and selecting Get Shared Access Signature.... Under Permissions, check Write. No other permissions are necessary.
In this section you'll create the Azure Function that triggers the OCR Batch job whenever a file is uploaded to your input container.
- Follow the steps in Create a function triggered by Azure Blob storage to create a function.
- When prompted for a storage account, use the same storage account that you linked to your Batch account.
- For runtime stack, choose .NET. We'll write our function in C# to leverage the Batch .NET SDK.
- Once the blob-triggered function is created, use the
run.csx
andfunction.proj
from GitHub in the Function.run.csx
is run when a new blob is added to your input blob container.function.proj
lists the external libraries in your Function code, for example, the Batch .NET SDK.
- Change the placeholder values of the variables in the
Run()
function of therun.csx
file to reflect your Batch and storage credentials. You can find your Batch and storage account credentials in the Azure portal in the Keys section of your Batch account.- Retrieve your Batch and storage account credentials in the Azure portal in the Keys section of your Batch account.
Upload any or all of the scanned files from the input_files
directory on GitHub to your input container. Monitor Batch Explorer to confirm that a task gets added to ocr-pool
for each file. After a few seconds, the file with OCR applied is added to the output container. The file is then visible and retrievable on Storage Explorer.
Additionally, you can watch the logs file at the bottom of the Azure Functions web editor window, where you'll see messages like this for every file you upload to your input container:
2019-05-29T19:45:25.846 [Information] Creating job...
2019-05-29T19:45:25.847 [Information] Accessing input container <inputContainer>...
2019-05-29T19:45:25.847 [Information] Adding <fileName> as a resource file...
2019-05-29T19:45:25.848 [Information] Name of output text file: <outputTxtFile>
2019-05-29T19:45:25.848 [Information] Name of output PDF file: <outputPdfFile>
2019-05-29T19:45:26.200 [Information] Adding OCR task <taskID> for <fileName> <size of fileName>...
To download the output files from Storage Explorer to your local machine, first select the files you want and then select the Download on the top ribbon.
Tip
The downloaded files are searchable if opened in a PDF reader.
In this tutorial you learned how to:
[!div class="checklist"]
- Use Batch Explorer to create pools and jobs
- Use Storage Explorer to create blob containers and a shared access signature (SAS)
- Create a blob-triggered Azure Function
- Upload input files to Storage
- Monitor task execution
- Retrieve output files
-
For more examples of using the .NET API to schedule and process Batch workloads, see the samples on GitHub.
-
To see more Azure Functions triggers that you can use to run Batch workloads, see the Azure Functions documentation.