BestConf for Hadoop+Hive

Experimental Settings

We executed Bestconf for the Hadoop cluster with 4 nodes. The Hadoop cluster consists of 1 master node and 3 slave nodes. All nodes used in our experiment are shown below.

Node	OS	CPU	Memory
Master	CentOS	16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz	32G
Slave 1	CentOS	16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz	32G
Slave 2	CentOS	16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz	32G
Slave 3	CentOS	16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz	32G

Performance Surface

We use HiBench that is a widely adopted benchmark tools in the workload generator for Spark to generate the target workload. Figure 1 plot the highly differed performance surfaces for Hadoop+Hive Join workload.

Figure 1: The performance surface of Hadoop+Hive under Hibench-Join workload

Test Results

The test results of Hadoop under Join workload hadoopJoin.arff.
The test results of Hadoop under Pagerank workload hadoopPageRank.arff.
The test results of Hadoop under Join workload with 500 samples join-trainingBestConf.arff and join-BestConfig.arff.

Interface Impl

The source files of HadoopConfigReadin and HadoopConfigWrite implement the interfaces of ConfigReadin and ConfigWrite respectively.

Download

git clone https://github.com/zhuyuqing/bestconf.git
wget https://github.com/zhuyuqing/bestconf/archive/master.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hadoop.md

hadoop.md

BestConf for Hadoop+Hive

Experimental Settings

Performance Surface

Test Results

Interface Impl

Download

Files

hadoop.md

Latest commit

History

hadoop.md

File metadata and controls

BestConf for Hadoop+Hive

Experimental Settings

Performance Surface

Test Results

Interface Impl

Download