PRE-Requisite in case we don't have internet connectivity at the location.
- Do the initial setup of the HDP sandbox. The initial root password is hadoop, but you are required to change it.
ssh [email protected] -p 2222
[email protected]'s password:
You are required to change your password immediately (root enforced)
Last login: Tue Mar 1 21:05:47 2016 from
Changing password for root.
(current) UNIX password:
New password:
Retype new password:
[root@sandbox ~]# ambari-admin-password-reset
Please set the password for admin:
Please retype the password for admin:
The admin password has been set.
Restarting ambari-server to make the password change effective...
Using python /usr/bin/python2
Restarting ambari-server
Using python /usr/bin/python2
Stopping ambari-server
Ambari Server stopped
Using python /usr/bin/python2
Starting ambari-server
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Server PID at: /var/run/ambari-server/
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start....................
Ambari Server 'start' completed successfully.
It is suggested that you set the admin Ambari user's password to admin for consistency with the other web UIs.
Install numpy as we will use it in pyspark:
sudo yum install -y numpy
for the MEETUP lab, you will need to download datasets: Switch from root to zeppelin user.
sudo su - zeppelin
################################################# #FOR PHILLY CRIME DATA Analysis #################################################
wget ls pwd hadoop fs -mkdir /user/zeppelin/crime hadoop fs -put philadelphia-crime-data-2015-ytd.csv /user/zeppelin/crime/ hadoop fs -ls /user/zeppelin hadoop fs -ls /user/zeppelin/crime #################################################
#dataset="" #dataset=""
#Get the dataset
echo "" rm -rf movielens hadoop fs -rm -r -f /user/zeppelin/movielens/
mkdir movielens cd movielens wget $dataset -o /dev/null -O unzip rm mv ml* ml gzip ml/ratings.dat
hadoop fs -mkdir /user/zeppelin/movielens hadoop fs -put ml/movies.dat /user/zeppelin/movielens/movies.dat hadoop fs -put ml/ratings.dat.gz /user/zeppelin/movielens/ratings.dat.gz
echo "Files in HDFS:/user/zeppelin/movielens" hadoop fs -ls /user/zeppelin/movielens
wget unzip hadoop fs -mkdir -p /user/zeppelin/SensorDemo hadoop fs -copyFromLocal -f SensorFiles/HVAC.csv /user/zeppelin/SensorDemo/ hadoop fs -tail /user/zeppelin/SensorDemo/HVAC.csv
On your local machine , NOT the HDP sandbox.
Clone my repo to pull the Zeppelin notebooks:
git clone
4) Optional: It is recommended to increase the yarn memory per node for the spark jobs
Login to the sandbox ambari UI
Click Yarn -> Config - increase yarn memory per node to 5120mb
Than restart all necessary components that show the restart symbol
Also stop all the unused components like : Atlas, Flume, Ranger to conserve memory on your HDP sandbox.
- Open a new tab in your browser and navigate to:
Than click import notebook.
If you cloned this github repo, you can point to your local filesystem: phillyCrimeAnalysis.json otherwise you can point to this url:
Do the same for 2nd notebook:
Written with StackEdit.