title | description | author | services | ms.service | ms.topic | ms.date | ms.author | ms.custom |
---|---|---|---|---|---|---|---|---|
Search unstructured data in Azure cloud storage |
Searching unstructured data using Azure search. |
roygara |
storage |
storage |
tutorial |
10/12/2017 |
rogarana |
mvc |
In this tutorial, you learn how to search unstructured data using Azure Search using data stored in Azure blobs. Unstructured data is data that either is not organized in a pre-defined manner or does not have a data model. An example would be a .txt file.
In this tutorial you learn how to:
[!div class="checklist"]
- Create a resource group
- Create a storage account
- Create a container
- Upload data to your container
- Create a search service through the portal
- Use the search service to search your container
A sample data set has been prepared for you. Download clinical-trials.zip and unzip it to its own folder.
The sample consists of text files obtained from clinicaltrials.gov. You can use them as example text files to search using Azure.
Log in to the Azure portal.
A storage account provides a unique location to store and access your Azure Storage data objects.
Currently, there are two types of storage accounts, Blob and General-purpose. For this tutorial, you create a General-purpose storage account.
If you are not familiar with the process of creating a General-purpose storage account, here's how to create one:
-
From the left menu, select Storage Accounts, then select Add.
-
Enter a unique name for your storage account.
-
Select Resource Manager for your Deployment model and select General purpose from the Account kind drop-down.
-
Select Locally-redundant storage (LRS) from the Replication drop-down.
-
Under Resource group, select Create new and enter a unique name.
-
Select an appropriate subscription.
-
Choose a Location and select Create.
Containers are similar to folders and are used to store blobs.
For this tutorial, you use a single container to store the text files obtained from clinicaltrials.gov.
-
Navigate to your storage account in the Azure portal.
-
Select Browse blobs under Blob Service.
-
Add a new container.
-
Name the container "data" and select Container for the public access level.
-
Select OK to create the container.
Now that you have a container, you can upload your example data to it.
-
Select your container and select Upload.
-
Select the blue folder icon pictured next to the Files field and browse to the local folder where you extracted the sample data.
-
Select all of the extracted files and select Open
-
Select Upload to begin the upload process.
The upload process may take a moment.
After it completes, navigate back into your data container to confirm the text files uploaded.
Azure Search is a search-as-a-service cloud solution that gives developers APIs and tools for adding a rich search experience over your data in web, mobile, and enterprise applications.
If you are not familiar with the process of creating a search service, here's how to create one:
-
Navigate to your storage account in the Azure portal.
-
Scroll down and click Add Azure Search under BLOB SERVICE.
-
Under Import Data, select Pick your service.
-
Select Click here to create a new search service.
-
Inside New Search Service enter a unique name for your search service in the URL field.
-
Under Resource group, select Use existing and choose the resource group you created earlier.
-
For the Pricing tier, select the Free tier and click Select.
-
Select Create to create the search service.
Now that you have a search service, you can attach it to your blob storage. This section walks you through the process of choosing a data source, creating an index, and creating an indexer.
-
Navigate to your storage account.
-
Select Add Azure Search under BLOB SERVICE.
-
Select Search Service inside Import Data and then click the search service you created in the preceding section. This opens up New data source.
A data source specifies which data to index and how to access the data. A data source can be used multiple times by the same search service.
-
Enter a name for the data source. Under Data to extract, select Content and Metadata. The data source specifies which parts of the blob are indexed.
a. In your own future scenarios, you can also select Storage metadata only. You would make that selection if you wish to limit the data that is indexed to standard blob properties or user-defined properties.
b. You can also choose All metadata to obtain both standard blob properties and all content-type specific metadata.
-
Since the blobs you're using are text files, set Parsing Mode to Text.
a. In your own future scenarios, you may wish to select other parsing modes depending on the contents of your blobs.
-
Select Storage Container to list the available storage accounts.
-
Select your storage account, then select the container you created previously.
-
Click Select to return to New data source and select OK to continue.
An index is a collection of fields from your data source that can be searched. The index is how your search service knows in what ways your data should be searched.
-
In Import data select Customize target index.
-
Enter a name for your index in the Index name field.
-
Select the Retrievable attribute's checkbox under metadata_storage_name.
- Click OK, which brings up Create an Indexer.
The parameters of your index and the attributes you give those parameters are important. The parameters specify what data to store, the attributes specify how to store that data.
The FIELD NAME column contains the parameters. The following table provides a listing of the available attributes and their descriptions.
Attribute | Description |
---|---|
Key | A string that provides the unique ID of each document, used for document lookup. Every index must have one key. Only one field can be the key, and its type must be set to Edm.String. |
Retrievable | Specifies whether a field can be returned in a search result. |
Filterable | Allows the field to be used in filter queries. |
Sortable | Allows a query to sort search results using this field. |
Facetable | Allows a field to be used in a faceted navigation structure for user self-directed filtering. Typically fields containing repetitive values that you can use to group multiple documents together (for example, multiple documents that fall under a single brand or service category) work best as facets. |
Searchable | Marks the field as full-text searchable. |
An indexer connects a data source with a search index, and provides a schedule to reindex your data.
- Enter a name in the Name field and select OK.
- You are brought back to Import Data, select OK to complete the connection process.
You've now successfully connected your blob to your search service. It takes a few minutes for the portal to show that the index is populated. However, the search service begins indexing immediately so you can begin searching right away.
To search your files, open the search explorer inside the index of your newly created search service.
The following steps show you where to find the search explorer and provides you some example queries:
- Navigate to all resources and find your newly created search service.
- Select your index to open it up.
- Select Search Explorer to open up the search explorer, where you can make live queries on your data.
- Select Search while the query string field is empty. An empty query returns all the data from your blobs.
Enter "Myopia" in the Query string field and select Search. Initiating a search of the contents of the files and returning a subset of them, which contains the word "Myopia."
You can also create queries that search by system properties using the $select
parameter. Enter $select=metadata_storage_name
into the query string and press enter, returning only that particular field.
The query string is directly modifying the URL, so spaces are not permitted. To search multiple fields, use a comma, such as: $select=metadata_storage_name,metadata_storage_path
The $select
parameter can only be used with fields that are marked retrievable when defining your index.
You have now completed this tutorial and have a searchable set of unstructured data.
In this tutorial, you learned about searching unstructured data using Azure Search, such as how to:
[!div class="checklist"]
- Create a resource group
- Create a storage account
- Create a container
- Uploading data to your container
- Create a Search Service
- Using the Search Service to search your container
Follow this link to learn more about indexing documents with Azure Search.
[!div class="nextstepaction"] Indexing Documents in Azure Blob Storage with Azure Search