title | description | services | documentationcenter | author | manager | editor | ms.assetid | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Advanced aggregations in Azure Log Analytics queries| Microsoft Docs |
Describes some of the more advanced aggregation options available to Log Analytics queries. |
log-analytics |
bwren |
carmonm |
log-analytics |
na |
na |
na |
conceptual |
08/16/2018 |
bwren |
Note
You should complete Aggregations in Log Analytics queries before completing this lesson.
[!INCLUDE log-analytics-demo-environment]
This article describes some of the more advanced aggregation options available to Log Analytics queries.
You can use makelist
to pivot data by the order of values in a particular column. For example, you may want to explore the most common order events take place on your machines. You can essentially pivot the data by the order of EventIDs on each machine.
Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makelist(EventID) by Computer
Computer | list_EventID |
---|---|
computer1 | [704,701,1501,1500,1085,704,704,701] |
computer2 | [326,105,302,301,300,102] |
... | ... |
makelist
generates a list in the order that data was passed into it. To sort events from oldest to newest, use asc
in the order statement instead of desc
.
It is also useful to create a list of just distinct values. This is called a Set and can be generated with makeset
:
Event
| where TimeGenerated > ago(12h)
| order by TimeGenerated desc
| summarize makeset(EventID) by Computer
Computer | list_EventID |
---|---|
computer1 | [704,701,1501,1500,1085] |
computer2 | [326,105,302,301,300,102] |
... | ... |
Like makelist
, makeset
also works with ordered data and will generate the arrays based on the order of the rows that are passed into it.
The inverse operation of makelist
or makeset
is mvexpand
, which expands a list of values to separate rows. It can expand across any number of dynamic columns, both JSON and array. For example, you could check the Heartbeat table for solutions sending data from computers that sent a heartbeat in the last hour:
Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, Solutions
Computer | Solutions |
---|---|
computer1 | "security", "updates", "changeTracking" |
computer2 | "security", "updates" |
computer3 | "antiMalware", "changeTracking" |
... | ... |
Use mvexpand
to show each value in a separate row instead of a comma-separated list:
Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mvexpand Solutions
Computer | Solutions |
---|---|
computer1 | "security" |
computer1 | "updates" |
computer1 | "changeTracking" |
computer2 | "security" |
computer2 | "updates" |
computer3 | "antiMalware" |
computer3 | "changeTracking" |
... | ... |
You could then use makelist
again to group items together, and this time see the list of computers per solution:
Heartbeat
| where TimeGenerated > ago(1h)
| project Computer, split(Solutions, ",")
| mvexpand Solutions
| summarize makelist(Computer) by tostring(Solutions)
Solutions | list_Computer |
---|---|
"security" | ["computer1", "computer2"] |
"updates" | ["computer1", "computer2"] |
"changeTracking" | ["computer1", "computer3"] |
"antiMalware" | ["computer3"] |
... | ... |
A useful application of mvexpand
is the need to fill default values in for missing bins. For example, suppose you're looking for the uptime of a particular machine by exploring its heartbeat. You also want to see the source of the heartbeat which is in the category column. Normally, we would use a simple summarize statement as follows:
Heartbeat
| where TimeGenerated > ago(12h)
| summarize count() by Category, bin(TimeGenerated, 1h)
Category | TimeGenerated | count_ |
---|---|---|
Direct Agent | 2017-06-06T17:00:00Z | 15 |
Direct Agent | 2017-06-06T18:00:00Z | 60 |
Direct Agent | 2017-06-06T20:00:00Z | 55 |
Direct Agent | 2017-06-06T21:00:00Z | 57 |
Direct Agent | 2017-06-06T22:00:00Z | 60 |
... | ... | ... |
In these results though the bucket associated with "2017-06-06T19:00:00Z" is missing because there isn't any heartbeat data for that hour. Use the make-series
function to assign a default value to empty buckets. This will generate a row for each category with two extra array columns, one for values, and one for matching time buckets:
Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category
Category | count_ | TimeGenerated |
---|---|---|
Direct Agent | [15,60,0,55,60,57,60,...] | ["2017-06-06T17:00:00.0000000Z","2017-06-06T18:00:00.0000000Z","2017-06-06T19:00:00.0000000Z","2017-06-06T20:00:00.0000000Z","2017-06-06T21:00:00.0000000Z",...] |
... | ... | ... |
The third element of the count_ array is a 0 as expected, and there is a matching timestamp of "2017-06-06T19:00:00.0000000Z" in the TimeGenerated array. This array format is difficult to read though. Use mvexpand
to expand the arrays and produce the same format output as generated by summarize
:
Heartbeat
| make-series count() default=0 on TimeGenerated in range(ago(1d), now(), 1h) by Category
| mvexpand TimeGenerated, count_
| project Category, TimeGenerated, count_
Category | TimeGenerated | count_ |
---|---|---|
Direct Agent | 2017-06-06T17:00:00Z | 15 |
Direct Agent | 2017-06-06T18:00:00Z | 60 |
Direct Agent | 2017-06-06T19:00:00Z | 0 |
Direct Agent | 2017-06-06T20:00:00Z | 55 |
Direct Agent | 2017-06-06T21:00:00Z | 57 |
Direct Agent | 2017-06-06T22:00:00Z | 60 |
... | ... | ... |
A common scenario is to select the names of some specific entities based on a set of criteria and then filter a different data set down to that set of entities. For example you might find computers that are known to have missing updates and identify IPs that these computers called out to:
let ComputersNeedingUpdate = toscalar(
Update
| summarize makeset(Computer)
| project set_Computer
);
WindowsFirewall
| where Computer in (ComputersNeedingUpdate)
See other lessons for using the Log Analytics query language: