Spark-Hive-Hadoop

This contains a self created dumb way of installing Spark, Hive and Hadoop with instructions to PLUG and PLAY.

Hadoop - 3.2.1 Spark - 2.4.5 Hive - 3.1.2

For UBUNTU 18.04 (haven't tested with others)

My hostname - shockWAVE My username - pankaj (use the same if you don't have time to change some spark/hive/hadoop configuration files by yourself)

Install openjdk-8-jdk

Open terminal and run - "bash script.sh"
Check: "java --version" "pyspark --version"
From the terminal, RUN:

3.1 'hadoop namenode -format; 3.1.1 See if hdfs is working now.

3.2 'start-dfs.sh && start-yarn.sh' 3.2.1 See if all services are running using - 'jps'
Install VSCODE(preferred by me) 5.1 Check python version being used - "python --version" and "python3 --version" (should be same) - 3.6.9
git clone SparkPractices from vpankaj97 (Self Advertisment I know)
Open any notebook from inside VSCODE in SparkPractices ()
Install - Python extension - runs the notebook
Note choose a Python interpretor - 3.6.x
VSCODE will ask you to install Data Science libraries, -- SAY YES.

RUN IT ALL!!

TROUBLESHOOT - Hive and Spark

Problem - conflicting versions in Hive and Spark. Solution : Steps:

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Hadoop-3.2.1/etc/hadoop		Hadoop-3.2.1/etc/hadoop
Hive-3.1.2		Hive-3.1.2
Spark-3.0.0/conf		Spark-3.0.0/conf
data		data
hdfs		hdfs
spark-2.4.5-bin-hadoop2.7/conf		spark-2.4.5-bin-hadoop2.7/conf
.bashrc		.bashrc
.bashrc20_04		.bashrc20_04
.bashrc_manjaro		.bashrc_manjaro
.python-version		.python-version
README.md		README.md
get-pip.py		get-pip.py
script.sh		script.sh
script20_04.sh		script20_04.sh
script_manjaro.sh		script_manjaro.sh