Skip to content

Commit 00e3abf

Browse files
committed
stack trace artciles
1 parent 932c0e8 commit 00e3abf

4 files changed

+128
-116
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
<properties
2+
pageTitle="Out of memory error (OOM) - Hive settings | Microsoft Azure"
3+
description="Fix an out of memory error (OOM) from a Hive query in Hadoop in HDInsight. The customer scenario is a query across many large tables."
4+
keywords="out of memory error, OOM, Hive settings"
5+
services="service-name"
6+
documentationCenter=""
7+
authors="rashimg"
8+
manager="paulettm"
9+
editor="cgronlun"/>
10+
11+
<tags
12+
ms.service="hdinsight"
13+
ms.devlang="na"
14+
ms.topic="article"
15+
ms.tgt_pltfrm="na"
16+
ms.workload="big-data"
17+
ms.date="12/10/2015"
18+
ms.author="rashimg;cgronlun"/>
19+
20+
# Fix an Out of Memory (OOM) error with Hive memory settings in Hadoop in Azure HDInsight
21+
22+
One of the common problems our customers face is getting an Out of Memory (OOM) error when using Hive. This article describes a customer scenario and the Hive settings we recommended to fix the issue.
23+
24+
## Scenario: Hive query across large tables
25+
26+
A customer ran the query below using Hive.
27+
28+
SELECT
29+
COUNT (T1.COLUMN1) as DisplayColumn1,
30+
31+
32+
….
33+
FROM
34+
TABLE1 T1,
35+
TABLE2 T2,
36+
TABLE3 T3,
37+
TABLE5 T4,
38+
TABLE6 T5,
39+
TABLE7 T6
40+
where (T1.KEY1 = T2.KEY1….
41+
42+
43+
44+
Some nuances of this query:
45+
46+
* T1 is an alias to a big table, TABLE1, which has lots of STRING column types.
47+
* Other tables are not that big but do have a large number of columns.
48+
* All tables are joining each other, in some cases with multiple columns in TABLE1 and others.
49+
50+
When the customer ran the query using Hive on MapReduce on a 24 node A3 cluster, the query ran in about 26 minutes. The customer noticed the following warning messages when the query was run using Hive on MapReduce:
51+
52+
Warning: Map Join MAPJOIN[428][bigTable=?] in task 'Stage-21:MAPRED' is a cross product
53+
Warning: Shuffle Join JOIN[8][tables = [t1933775, t1932766]] in Stage 'Stage-4:MAPRED' is a cross product
54+
55+
Because the query finished executing in about 26 minutes, the customer ignored these warnings and instead started to focus on how to improve the this query’s performance further.
56+
57+
The customer consulted [Optimize Hive queries for Hadoop in HDInsight](hdinsight-hadoop-optimize-hive-query.md), and decided to use Tez execution engine. Once the same query was run with the Tez setting enabled the query ran for 15 minutes, and then threw the following error:
58+
59+
Status: Failed
60+
Vertex failed, vertexName=Map 5, vertexId=vertex_1443634917922_0008_1_05, diagnostics=[Task failed, taskId=task_1443634917922_0008_1_05_000006, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
61+
at
62+
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
63+
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
64+
at
65+
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
66+
at
67+
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
68+
at
69+
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
70+
at java.security.AccessController.doPrivileged(Native Method)
71+
at javax.security.auth.Subject.doAs(Subject.java:415)
72+
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
73+
at
74+
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
75+
at
76+
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
77+
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
78+
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
79+
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
80+
at java.lang.Thread.run(Thread.java:745)
81+
Caused by: java.lang.OutOfMemoryError: Java heap space
82+
83+
The customer then decided to use a bigger VM (i.e. D12) thinking a bigger VM would have more heap space. Even then, the customer continued to see the error. The customer reached out to the HDInsight team for help in debugging this issue.
84+
85+
## Debug the Out of Memory (OOM) error
86+
87+
Our support and engineering teams together found one of the issues causing the Out of Memory (OOM) error was a [known issue described in the Apache JIRA](https://issues.apache.org/jira/browse/HIVE-8306). From the description in the JIRA:
88+
89+
When hive.auto.convert.join.noconditionaltask = true we check noconditionaltask.size and if the sum of tables sizes in the map join is less than noconditionaltask.size the plan would generate a Map join, the issue with this is that the calculation doesnt take into account the overhead introduced by different HashTable implementation as results if the sum of input sizes is smaller than the noconditionaltask size by a small margin queries will hit OOM.
90+
91+
We confirmed that **hive.auto.convert.join.noconditionaltask** was indeed set to **true** by looking under hive-site.xml file:
92+
93+
<property>
94+
<name>hive.auto.convert.join.noconditionaltask</name>
95+
<value>true</value>
96+
<description>
97+
Whether Hive enables the optimization about converting common join into mapjoin based on the input file size.
98+
If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the
99+
specified size, the join is directly converted to a mapjoin (there is no conditional task).
100+
</description>
101+
</property>
102+
103+
Based on the warning and the JIRA, our hypothesis was Map Join was the cause of the Java Heap Space OOM error. So we dug deeper into this issue.
104+
105+
As explained in the blog post [Hadoop Yarn memory settings in HDInsight](http://blogs.msdn.com/b/shanyu/archive/2014/07/31/hadoop-yarn-memory-settings-in-hdinsigh.aspx), when Tez execution engine is used the heap space used actually belongs to the Tez container. See the image below describing the Tez container memory.
106+
107+
![Tez container memory diagram: Hive out of memory error OOM](./media/hdinsight-hadoop-hive-out-of-memory-error-oom/hive-out-of-memory-error-oom-tez-container-memory.png)
108+
109+
110+
As the blog post suggests, the following two memory settings define the container memory for the heap: **hive.tez.container.size** and **hive.tez.java.opts**. From our experience, the OOM exception does not mean the container size is too small. It means the Java heap size (hive.tez.java.opts) is too small. So whenever you see OOM, you can try to increase **hive.tez.java.opts**. If needed you might have to increase **hive.tez.container.size**. The **java.opts** setting should be around 80% of **container.size**.
111+
112+
> [AZURE.NOTE] The setting **hive.tez.java.opts** must always be smaller than **hive.tez.container.size**.
113+
114+
Since a D12 machine has 28GB memory, we decided to use a container size of 10GB (10240MB) and assign 80% to java.opts. This was done on the Hive console using the setting below:
115+
116+
SET hive.tez.container.size=10240
117+
SET hive.tez.java.opts=-Xmx8192m
118+
119+
Based on these settings, the query successfully ran in under ten minutes.
120+
121+
## Conclusion: OOM errors and container size
122+
123+
Getting an OOM error doesn't necessarily mean the container size is too small. Instead, you should configure the memory settings so that the heap size is increased and is at least 80% of the container memory size.

articles/hdinsight/hdinsight-hadoop-hive-out-of-memory-error.md

-113
This file was deleted.

articles/hdinsight/hdinsight-hadoop-stack-trace-error-messages.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,10 @@
1717
ms.date="12/09/2015"
1818
ms.author="rashimg;cgronlun"/>
1919

20-
# Hadoop stack trace errors in HDInsight: Index of troubleshooting articles
21-
22-
Use this index of Hadoop stack trace errors to troubleshoot in HDInsight. Click a link below to go to troubleshooting documentation.
20+
# Hadoop stack trace errors in HDInsight: Index of troubleshooting articles
2321

22+
Use this index of Hadoop stack trace errors to troubleshoot in HDInsight. Articles are organized by types of error messages.
2423

24+
## Out of Memory error messages
25+
* [Fix an Out of Memory (OOM) error with Hive settings](hdinsight-hadoop-hive-out-of-memory-error-oom.md):
26+
Fix an out of memory error (OOM) from a Hive query. The customer scenario includes a query across many large tables.

0 commit comments

Comments
 (0)