title | description | services | documentationcenter | author | manager | editor | tags | ms.service | ms.custom | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Query Azure Log Analytics to monitor Azure HDInsight clusters | Microsoft Docs |
Learn how to run queries on Azure Log Analytics to monitor jobs running in an HDInsight cluster. |
hdinsight |
nitinme |
jhubbard |
cgronlun |
azure-portal |
hdinsight |
hdinsightactive |
big-data |
na |
na |
article |
11/17/2017 |
nitinme |
Learn some basic scenarios on how to use Azure Log Analytics to monitor Azure HDInsight clusters:
-
You must have configured an HDInsight cluster to use Azure Log Analytics. For instructions, see Use Azure Log Analytics with HDInsight clusters.
-
You must have added the HDInsight cluster-specific management solutions to the Operations Management Suite (OMS) workspace as described in Add HDInsight cluster management solutions to Log Analytics.
Learn how to look for specific metrics for your HDInsight cluster.
-
Open an HDInsight cluster that you have associated with Azure Log Analytics in the Azure portal.
-
Click Monitoring, and then click Open OMS Dashboard.
-
Click Log Search on the left menu.
-
Type the following query in the search box to search for all metrics for all available metrics for all HDInsight clusters configured to use Azure Log Analytics, and then press ENTER.
`search *`
The output shall look like:
-
From the left pane, under Type, select a metric that you want to dig deep into, and then click Apply. The following screenshot shows the
metrics_resourcemanager_queue_root_default_CL
type is selected.[!NOTE] You may need to click the [+]More button to find the metric you are looking for. Also, the Apply button is at the bottom of the list so you must scroll down to see it.
Notice that the query in the text box changes to one shown in the highlighted box in the following screenshot:
-
To dig deeper into this specific metric. For example, you can refine the existing output based on the average of resources used in a 10-minute interval, categorized by cluster name using the following query:
search in (metrics_resourcemanager_queue_root_default_CL) * | summarize AggregatedValue = avg(UsedAMResourceMB_d) by ClusterName_s, bin(TimeGenerated, 10m)
-
Instead of refining based on the average of resources used, you can use the following query to refine the results based on when the maximum resources were used (as well as 90th and 95th percentile) in a 10-minute window:
search in (metrics_resourcemanager_queue_root_default_CL) * | summarize ["max(UsedAMResourceMB_d)"] = max(UsedAMResourceMB_d), ["pct95(UsedAMResourceMB_d)"] = percentile(UsedAMResourceMB_d, 95), ["pct90(UsedAMResourceMB_d)"] = percentile(UsedAMResourceMB_d, 90) by ClusterName_s, bin(TimeGenerated, 10m)
Learn how to look error messages during a specific time window. The steps here are just one example on how you can arrive at the error message you are interested in. You can use any property that is available to look for the errors you are trying to find.
-
Open an HDInsight cluster that you have associated with Azure Log Analytics in the Azure portal.
-
Click Monitoring, and the click Open OMS Dashboard.
-
In the OMS dashboard, from the home screen, click Log Search.
-
Type the following query to search for all error messages for all HDInsight clusters configured to use Azure Log Analytics, and then press ENTER.
search "Error"
You shall see an output like the following output:
-
From the left pane, under Type category, select an error type that you want to dig deep into, and then click Apply. Notice the results are refined to only show the error of the type you selected.
-
You can dig deeper into this specific error list by using the options available in the left pane. For example,
-
To see the specific error. You can click [+]show more to look at the actual error message.
The first step to create an alert is to arrive at a query based on which the alert is triggered. For simplicity, let's use the following query that provides list of failed applications running on HDInsight clusters.
metrics_resourcemanager_queue_root_default_CL | where AppsFailed_d > 0
You can use any query that you want to create an alert.
-
Open an HDInsight cluster that you have associated with Azure Log Analytics in the Azure portal.
-
Click Monitoring, and then click Open OMS Dashboard.
-
In the OMS dashboard, from the home screen, click Log Search.
-
Run the following query on which you want to create an alert, and then press ENTER.
metrics_resourcemanager_queue_root-default-CL | where AppsFailed_d > 0
-
Click Alert on the top of the page.
-
In the Add Alert Rule window, enter the query and other details to create an alert, and then click Save.
The screenshot shows the configuration for sending an e-mail notification when the alert query returns an output.
-
You can also edit or delete an existing alert. To do so, from any page in the OMS portal, click the Settings icon.
-
From the Settings page, click Alerts to see the alerts you have created. You can also enable or disable an alert, edit it, or delete it. For more information, see Working with alert rules in Log Analytics.