Skip to content

Commit

Permalink
Merge pull request #30245 from HeidiSteen/heidist-azsearch
Browse files Browse the repository at this point in the history
how to model complex data types
  • Loading branch information
jomolnar authored Sep 8, 2016
2 parents 7e8ff11 + 0510ef6 commit b87c361
Show file tree
Hide file tree
Showing 3 changed files with 146 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
146 changes: 146 additions & 0 deletions articles/search/search-howto-complex-data-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
<properties
pageTitle="How to model complex data types in Azure Search | Microsoft Azure Search"
description="Nested or hierarchical data structures can be modeled in an Azure Search index using flattened rowset and Collections data type."
services="search"
documentationCenter=""
authors="LiamCa"
manager="pablocas"
editor=""
tags="complex data types; compound data types; aggregate data types"
/>

<tags
ms.service="search"
ms.devlang="na"
ms.workload="search"
ms.topic="article"
ms.tgt_pltfrm="na"
ms.date="09/07/2016"
ms.author="liamca"
/>

# How to model complex data types in Azure Search

External datasets used to populate an Azure Search index sometimes include hierarchical or nested substructures that do not break down neatly into a tabular rowset. Examples of such structures might include multiple locations and phone numbers for a single customer, multiple colors and sizes for a single SKU, multiple authors of a single book, and so on. In modeling terms, you might see these structures referred to as *complex data types*, *compound data types*, *composite data types*, or *aggregate data types*, to name a few.

Complex data types are not natively supported in Azure Search, but a proven workaround includes a two-step process of flattening the structure and then using a **Collection** data type to reconstitute the interior structure. Following the technique described in this article allows the content to be searched, faceted, filtered, and sorted.

## Example of a complex data structure

Typically, the data in question resides as a set of JSON or XML documents, or as items in a NoSQL store such as DocumentDB. Structurally, the challenge stems from having multiple child items that need to be searched and filtered. As a starting point for illustrating the workaround, take the following JSON document that lists a set of contacts as an example:

~~~~~
[
{
"id": "1",
"name": "John Smith",
"company": "Adventureworks",
"locations": [
{
"id": "1",
"description": "Adventureworks Headquarters"
},
{
"id": "2",
"description": "Home Office"
}
]
},
{
"id": "2",
"name": "Jen Campbell",
"company": "Northwind",
"locations": [
{
"id": "3",
"description": "Northwind Headquarter"
},
{
"id": "4",
"description": "Home Office"
}
]
}]
~~~~~

While the fields named ‘id’, ‘name’ and ‘company’ can easily be mapped one-to-one as fields within an Azure Search index, the ‘locations’ field contains an array of locations, having both a set of location IDs as well as location descriptions. Given that Azure Search does not have a data type that supports this, we need a different way to model this in Azure Search.

> [AZURE.NOTE] This technique is also described by Kirk Evans in a blog post [Indexing DocumentDB with Azure Search](https://blogs.msdn.microsoft.com/kaevans/2015/03/09/indexing-documentdb-with-azure-seach/), which shows a technique called "flattening the data", whereby you would have a field called `locationsID` and `locationsDescription` that are both [collections](https://msdn.microsoft.com/library/azure/dn798938.aspx) (or an array of strings).
## Part 1: Flatten the array into individual fields

To create an Azure Search index that accommodates this dataset, create individual fields for the nested substructure: `locationsID` and `locationsDescription` with a data type of [collections](https://msdn.microsoft.com/library/azure/dn798938.aspx) (or an array of strings). In these fields you would index the values ‘1’ and ‘2’ into the `locationsID` field for John Smith and the values ‘3’ & ‘4’ into the `locationsID` field for Jen Campbell.

Your data within Azure Search would look like this:

![sample data, 2 rows](./media/search-howto-complex-data-types/sample-data.png)


## Part 2: Add a collection field in the index definition

In the index schema, the field definitions might look similar to this example.

~~~~
var index = new Index()
{
Name = indexName,
Fields = new[]
{
new Field("id", DataType.String) { IsKey = true },
new Field("name", DataType.String) { IsSearchable = true, IsFilterable = false, IsSortable = false, IsFacetable = false },
new Field("company", DataType.String) { IsSearchable = true, IsFilterable = false, IsSortable = false, IsFacetable = false },
new Field("locationsId", DataType.Collection(DataType.String)) { IsSearchable = true, IsFilterable = true, IsFacetable = true },
new Field("locationsDescription", DataType.Collection(DataType.String)) { IsSearchable = true, IsFilterable = true, IsFacetable = true }
}
};
~~~~

## Validate search behaviors and optionally extend the index

Assuming you created the index and loaded the data, you can now test the solution to verify search query execution against the dataset. Each **collection** field should be **searchable**, **filterable** and **facetable**. You should be able to run queries like:

* Find all people who work at the ‘Adventureworks Headquarters’.
* Get a count of the number of people who work in a ‘Home Office’.
* Of the people who work at a ‘Home Office’, show what other offices they work along with a count of the people in each location.

Where this technique falls apart is when you need to do a search that combines both the location id as well as the location description. For example:

* Find all people where they have a Home Office AND has a location ID of 4.

If you recall the original content looked like this:

~~~~
{
id: '4',
description: 'Home Office'
}
~~~~

However, now that we have separated the data into separate fields, we have no way of knowing if the Home Office for Jen Campbell relates to `locationsID 3` or `locationsID 4`.

To handle this case, define another field in the index that combines all of the data into a single collection. For our example, we will call this field `locationsCombined` and we will separate the content with a `||` although you can choose any separator that you think would be a unique set of characters for your content. For example:

![sample data, 2 rows with separator](./media/search-howto-complex-data-types/sample-data-2.png)

Using this `locationsCombined` field, we can now accommodate even more queries, such as:

* Show a count of people who work at a ‘Home Office’ with location Id of ‘4’.
* Search for people who work at a ‘Home Office’ with location Id ‘4’.

## Limitations

This technique is useful for a number of scenarios, but it is not applicable in every case. For example:

1. If you do not have a static set of fields in your complex data type and there was no way to map all the possible types to a single field.
2. Updating the nested objects requires some extra work to determine exactly what needs to be updated in the Azure Search index

## Sample code

You can see an example on how to index a complex JSON data set into Azure Search and perform a number of queries over this dataset at this [GitHub repo](https://github.com/liamca/AzureSearchComplexTypes).

## Next step

[Vote for native support for complex data types](https://feedback.azure.com/forums/263029-azure-search) on the Azure Search UserVoice page and provide any additional input that you’d like us to consider regarding feature implementation. You can also reach out to me directly on Twitter at @liamca.



0 comments on commit b87c361

Please sign in to comment.