A more general and accurate implementation of MetricsAggregator for Spark #144

ericsahit · 2016-09-22T12:08:46Z

A more general and accurate implementation of MetricsAggregator for Spark which is compatible with Spark on YARN as well as executor dynamic allocation mode.

What's new:

Memory: Use actual executor memory request considering memoryOverhead instead of only spark.executor.memory.
Time: Use running time of all existed executor, not duration(sum of task executing time) of executors only including of those which are alive while ending of application.

I test this feature for three real Spark applications on our YARN cluster and compare Aggregate Memory Resource Allocation(Resource used in dr-elephant) metric of applications in YARN(showed when spark application finished in AM page), dr-original and and dr-optimized Below is result of one application for example:

YARN: 410620516 MB-seconds
dr-original: 1394513510.4 MB-seconds
dr-optimized: 405405696 MB-seconds

Consider YARN showed as the standard value,

deviation of dr-original：(410620516-1394513510.4)/410620516=239.61%
deviation of dr-optimized：(410620516-405405696)/410620516=1.27%

Average deviation of this feature is 1.88%, as average deviation of dr-original is 280.86%.

This is our first PR, so there has also some TODO work about code style, code reasonable, and etc, so please help us review and feel free to give us advise.

Future work we plan:

Add cpu-time resource usage for CPU are also can be scheduled and CPU resource even is bottleneck resource in our company.
Catch the real physical memory resource usage and optimize the Resource wasted metric.

Thanks!

…eue limit. Currently disabled. It will be enabled through pluggable heuristics later.

… Size' help pages

…job history server on hadoop2

…ephant

…n#96)

Conflicts: app/com/linkedin/drelephant/ElephantRunner.java app/com/linkedin/drelephant/util/Utils.java app/controllers/Application.java test/com/linkedin/drelephant/util/UtilsTest.java

Aggregated Metrics Feature

…nables metrics by default. (linkedin#115) Following endpoints are exposed 1) /ping 2) /healthcheck 3) /metrics Following stats are available at present 1. Application status - /ping 2. Queue Size 3. Skipped Jobs Count 4. Retry Queue Size 5. Thread Deadlocks - /healthcheck 6. GC Count 7. GC Time 8. GC Count to Uptime ratio 9. Heap Memory Usage stats

Fixed it by using a static initializer which I think is a good choice because 1. We don't have to add a call to initializer code anywhere (either within or outside of InfoExtractor) 2. I was able to make the instance variable final 3. Automatic thread safety is provided at the language level

…linkedin#121) This commit brings Dr. Elephant's main producer/consumer run loop in line with standard Java best practices by submitting tasks to a work queue as Runnables making things much more robust. Previously, if a worker thread's main while loop exited for any reason that thread would live on but would stop processing data forever since the Executor would sit waiting for additional work to be submitted. Essentially, the problem was that things were being done in a standard way and there were 2 queues. We noticed that every once in a while a worker thread's while loop would exit leading to stalled progress over time after they had all done this. I tested this on CDH 5.3 with MapReduce 2.5.0 and Spark 1.5.2.

This Fetcher fetches MapReduce data from logs in HDFS directly and doesn't depend on the job history server, which is the bottleneck of the analyzing flow.

* adding more metrics for monitoring process jobs count and latency * Fix metrics null ptr API usage issues * Add jobs processed in last 24 hours metric

Includes resource usage, resource wastage and runtime metrics for Spark.

…e is to publish the metrics to other applications. The property is disabled by default and users who wish to make use of this specify their own agent jar.

…or Spark which is compatible with Spark on YARN as well as executor dynamic allocation mode.

akshayrai · 2016-09-22T16:22:33Z

@ericsahit, thanks for the PR. I'll take a look at it and add my comments.

ericsahit · 2016-09-23T12:06:53Z

@akshayrai , OK. I have some login problem with gitter, Anyone please feel free to give me any suggestions.

shankar37 · 2016-09-23T11:18:49Z

app/com/linkedin/drelephant/spark/aggregator/YarnSparkMetricsAggregator.java

+  private static final Logger logger = LoggerFactory.getLogger(YarnSparkMetricsAggregator.class);
+
+  private AggregatorConfigurationData _aggregatorConfigurationData;
+  private double _storageMemWastageBuffer = 0.5;


_storageMemwastageBuffer should be configurable

I thought this variable will be useless in Resource aggregation of Spark, because like MapReduce, we should figure out way of identifying the actual physical memory usage of each Executor, then use (1-physicalMemoryUsage/ExecutorMemory) as resource wasted. But in our company we add feature of checking the actual physical memory usage embedded in our Spark branch(different from official spark version).
That's some suggestion about resource wasted metric calculation, and i will change _storageMemwastageBuffer for temporary.

shankar37 · 2016-09-23T11:21:31Z

app/com/linkedin/drelephant/spark/aggregator/YarnSparkMetricsAggregator.java

+      resourceUsed += (executorDuration / Statistics.SECOND_IN_MS) * (perExecutorMem / FileUtils.ONE_MB);
+      // maxMem is the maximum available storage memory
+      // memUsed is how much storage memory is used.
+      // any difference is wasted after a buffer of 50% is wasted


rewrite as "any difference after a buffer of 50% is wasted"

shankar37 · 2016-09-23T11:32:55Z

app/com/linkedin/drelephant/spark/listener/ExecutorsTrackingListener.scala

+      val eid = info.executorId
+      taskEnd.reason match {
+        case Resubmitted =>
+          // Note: For resubmitted tasks, we continue to use the metrics that belong to the


Since you are returning when the task was resubmitted, you are using the last attempt's metrics not first attempt's metrics, right ?

shankar37 · 2016-09-23T11:42:05Z

app/com/linkedin/drelephant/spark/listener/ExecutorsTrackingListener.scala

+          executorToTasksComplete(eid) = executorToTasksComplete.getOrElse(eid, 0) + 1
+      }
+
+      executorToTasksActive(eid) = executorToTasksActive.getOrElse(eid, 1) - 1


You are addion the duration of task's duration to executor's duration. I have a couple of questions here.

if there are multiple tasks running in parellel inside an executor wouldn't that count it same duration multiple times ?

the reverse case where the executor is idle. you are not counting that. shouldn't you be ?

Yes. Because in MapReduce, the basic resource allocation unit in resource manager(like YARN) will be one map or reduce task. But in Spark, the corresponding is Executor instead of computing task, because task is in form of thread and share memory and CPU resource of one Executor. We could see this difference like this picture Spark vs MapReduce.

As described below, although one Executor is idle, it also take CPU and memory logical resource in resource manager(YARN). So i suppose we should calculate this metric according to the lifecycle of Executor instead of Task.

shankar37 · 2016-09-27T08:32:02Z

app/com/linkedin/drelephant/util/SparkUtils.scala

+    if (isYARNMode(envData.getSparkProperty(SPARK_MASTER, SPARK_MASTER_DEFAULT))) {
+      val memoryOverheadFactor = envData.getSparkProperty(MEMORY_OVERHEAD_FACTOR, "0.20").toDouble
+      val memoryOverhead: Int = envData.getSparkProperty(MEMORY_OVERHEAD,
+        math.max((memoryOverheadFactor * executorMemory).toInt, 384).toString).toInt


what's the magic number 384 ? also, can you create constants for each of these

shankar37 · 2016-09-27T08:35:31Z

app/org/apache/spark/deploy/history/SparkDataCollection.scala

+        info.totalTasks = info.activeTasks + info.failedTasks + info.completedTasks
+        info.duration = executorsListener.executorToDuration.getOrElse(info.execId, 0L)
+        info.inputBytes = executorsListener.executorToInputBytes.getOrElse(info.execId, 0L)
+        info.shuffleRead = executorsListener.executorToShuffleRead.getOrElse(info.execId, 0L)


where are you using shuffleread, inputBytes etc. only startTime and finishTime are used, right ? Do you plan to use the rest later sometime

cwsteinbach · 2017-03-02T15:55:02Z

Hi @ericsahit @shankar37, is this PR still relevant or has it been superseded by one of the other Spark PRs?

fusonghe · 2019-05-22T07:08:00Z

/**
Controls the Compare Feature
*/
public static Result compare() {
DynamicForm form = Form.form().bindFromRequest(request());
String partialFlowExecId1 = form.get(COMPARE_FLOW_ID1);
partialFlowExecId1 = (partialFlowExecId1 != null) ? partialFlowExecId1.trim() : null;
String partialFlowExecId2 = form.get(COMPARE_FLOW_ID2);
partialFlowExecId2 = (partialFlowExecId2 != null) ? partialFlowExecId2.trim() : null;

List<AppResult> results1 = null;
List<AppResult> results2 = null;
if (partialFlowExecId1 != null && !partialFlowExecId1.isEmpty() && partialFlowExecId2 != null && !partialFlowExecId2.isEmpty()) {
  IdUrlPair flowExecIdPair1 = bestSchedulerInfoMatchGivenPartialId(partialFlowExecId1, AppResult.TABLE.FLOW_EXEC_ID);
  IdUrlPair flowExecIdPair2 = bestSchedulerInfoMatchGivenPartialId(partialFlowExecId2, AppResult.TABLE.FLOW_EXEC_ID);
  results1 = AppResult.find
      .select(AppResult.getSearchFields() + "," + AppResult.TABLE.JOB_DEF_ID + "," + AppResult.TABLE.JOB_DEF_URL
          + "," + AppResult.TABLE.FLOW_EXEC_ID + "," + AppResult.TABLE.FLOW_EXEC_URL)
      .where().eq(AppResult.TABLE.FLOW_EXEC_ID, flowExecIdPair1.getId()).setMaxRows(100)
      .fetch(AppResult.TABLE.APP_HEURISTIC_RESULTS, AppHeuristicResult.getSearchFields())
      .findList();
  results2 = AppResult.find
      .select(
          AppResult.getSearchFields() + "," + AppResult.TABLE.JOB_DEF_ID + "," + AppResult.TABLE.JOB_DEF_URL + ","
              + AppResult.TABLE.FLOW_EXEC_ID + "," + AppResult.TABLE.FLOW_EXEC_URL)
      .where().eq(AppResult.TABLE.FLOW_EXEC_ID, flowExecIdPair2.getId()).setMaxRows(100)
      .fetch(AppResult.TABLE.APP_HEURISTIC_RESULTS, AppHeuristicResult.getSearchFields())
      .findList();
}
return ok(comparePage.render(compareResults.render(compareFlows(results1, results2))));

}

how do conf azkaban scheduler.xml @ @ericsahit

Akshay Rai and others added 30 commits September 9, 2014 01:57

HADOOP-6636 : Dr. Elephant heuristic to detect tasks nearing their qu…

bd44342

…eue limit. Currently disabled. It will be enabled through pluggable heuristics later.

HADOOP-7028: Update version to 0.5-SNAPSHOT

3d9223c

HADOOP-6642: Enforce code style conventions for Dr. Elephant

87defee

HADOOP-6635: Mapper Spill Heuristic added

c2ffd30

HADOOP-5675: Dr. Elephant Search trims leading and trailing whitespaces

bcdc4e0

HADOOP-6565: Pluggable heuristics feature added

c12afef

HADOOP-7151: Dr. Elephant - Updating 'Mapper Speed' and 'Mapper Input…

16cf892

… Size' help pages

HADOOP-7029: Missing jobs in Dr. Elephant

2bdbbd9

HADOOP-7196: Dr.e get incorrect time data from jobtracker for a few jobs

f8ab8cf

HADOOP-7374: Pluggable Heuristics is not working on Dr.E

873cd15

HADOOP-7428: Deploy dr.elephant v0.5 on magic

75187e4

HADOOP-7465: Update Dr.Elephant version number to 0.6-SNAPSHOT

8d07178

HADOOP-5357: Logging for Dr. Elephant

89875fd

HADOOP-7474: Dr. Elephant successfully filters HadoopJava jobs

f4183a2

HADOOP-7420: Filtering pig jobs using pig.script property

5be44ea

HADOOP-7264:Dr.E give inconsistent result in MapperSpeedHeuristic

03d0056

HADOOP-7603: Cluster deployment for dr.E

b747f2f

HADOOP-7636: Dr.E get NPE on hadoop2 if a task in a succeeded job failed

40ee725

HADOOP-7580: Rewrite Dr. Elephant's MapperInputSize Heuristic

42ee5b5

HADOOP-7642: Release Dr.E v0.6

d35c112

HADOOP-7477: Deploy dr.elephant v0.6 on nertz

4d0e029

HADOOP-7810: Update Dr.Elephant version number to 0.6.1-SNAPSHOT

221f28f

HADOOP-7581: Fix for filtering jobs by Start Date in Hadoop 2

16022d1

HADOOP-7852: Dr. Elephant needs to periodically re-authenticate with …

c54ccc6

…job history server on hadoop2

HADOOP-7809: Cluster deployment for Dr. Elephant Part II

f4a63cf

HADOOP-7885: Release Dr. Elephant v0.6.1

8b513d7

HADOOP-7651: Update log location in Dr. Elephant v0.6.1

0fd5501

HADOOP-7931: Update Dr.Elephant version number to 0.6.2-SNAPSHOT

f2a8875

HADOOP-7652: Rewrite search box in Dr. Elephant

9c3e783

HADOOP-7930: Add hash index for execution url in db schema for Dr. El…

b92ec7e

…ephant

YannByron and others added 24 commits July 7, 2016 14:46

dr-elephant#95: copy conf files to zip package automatically (linkedi…

69e37af

…n#96)

Add Gitter badge (linkedin#102)

dcf0990

Put build and gitter badges on the same line (linkedin#105)

7297e28

Update wiki images for aggregated metrics feature.

ddaba87

Merge remote-tracking branch 'origin/AggregatedMetrics'

9086515

Conflicts: app/com/linkedin/drelephant/ElephantRunner.java app/com/linkedin/drelephant/util/Utils.java app/controllers/Application.java test/com/linkedin/drelephant/util/UtilsTest.java

Add 3.sql and revert modifications in 1.sql

27bcaf6

Merge pull request linkedin#109 from nntnag17/master

600f330

Aggregated Metrics Feature

Make flow search more powerful (linkedin#99)

f22440f

Add thread to all log output for better debugging (linkedin#108)

a8b2f8a

Fix Flow/Job ID/Url mixups (linkedin#100)

84919e2

fix a log error when throttling spark event log (linkedin#114)

b876c02

Add default for boolean unboxing (linkedin#119)

189acb4

Faster fetcher for MapReduce apps (linkedin#67)

32bc429

This Fetcher fetches MapReduce data from logs in HDFS directly and doesn't depend on the job history server, which is the bottleneck of the analyzing flow.

Show result count in search (linkedin#120)

71a69e7

LIHADOOP-21657: Release Dr. Elephant v. 2.0.5 (linkedin#117)

6c1e86d

Add rest interface for getting resource usage by user (linkedin#125)

5783a9f

Improved metrics (linkedin#126)

65e0ed8

* adding more metrics for monitoring process jobs count and latency * Fix metrics null ptr API usage issues * Add jobs processed in last 24 hours metric

Add Metrics feature for Spark Jobs (linkedin#135)

91a6f64

Includes resource usage, resource wastage and runtime metrics for Spark.

Changes to load an agent jar along with Dr. Elephant. Current use-cas…

dad905c

…e is to publish the metrics to other applications. The property is disabled by default and users who wish to make use of this specify their own agent jar.

Add a more general and accurate implementation of MetricsAggregator f…

eb114c3

…or Spark which is compatible with Spark on YARN as well as executor dynamic allocation mode.

Add a more general and accurate implementation of MetricsAggregator f…

197e9ce

…or Spark which is compatible with Spark on YARN as well as executor dynamic allocation mode.

shankar37 reviewed Sep 27, 2016

View reviewed changes

akshayrai force-pushed the master branch from 7c2fd7f to 8b46933 Compare December 12, 2017 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A more general and accurate implementation of MetricsAggregator for Spark #144

A more general and accurate implementation of MetricsAggregator for Spark #144

ericsahit commented Sep 22, 2016

akshayrai commented Sep 22, 2016

ericsahit commented Sep 23, 2016

shankar37 Sep 23, 2016

ericsahit Oct 3, 2016

shankar37 Sep 23, 2016

shankar37 Sep 23, 2016

shankar37 Sep 23, 2016

ericsahit Oct 3, 2016 •

edited

Loading

shankar37 Sep 27, 2016

shankar37 Sep 27, 2016

cwsteinbach commented Mar 2, 2017

fusonghe commented May 22, 2019

A more general and accurate implementation of MetricsAggregator for Spark #144

Are you sure you want to change the base?

A more general and accurate implementation of MetricsAggregator for Spark #144

Conversation

ericsahit commented Sep 22, 2016

akshayrai commented Sep 22, 2016

ericsahit commented Sep 23, 2016

shankar37 Sep 23, 2016

Choose a reason for hiding this comment

ericsahit Oct 3, 2016

Choose a reason for hiding this comment

shankar37 Sep 23, 2016

Choose a reason for hiding this comment

shankar37 Sep 23, 2016

Choose a reason for hiding this comment

shankar37 Sep 23, 2016

Choose a reason for hiding this comment

ericsahit Oct 3, 2016 • edited Loading

Choose a reason for hiding this comment

shankar37 Sep 27, 2016

Choose a reason for hiding this comment

shankar37 Sep 27, 2016

Choose a reason for hiding this comment

cwsteinbach commented Mar 2, 2017

fusonghe commented May 22, 2019

ericsahit Oct 3, 2016 •

edited

Loading