The rest of this note should not be modified.

PRE-Requisite in case we don't have internet connectivity at the location.

Do the initial setup of the HDP sandbox. The initial root password is hadoop, but you are required to change it.

https://github.com/HortonworksUniversity/Essentials/blob/master/demos/SandboxSetup.md

ssh [email protected] -p 2222
[email protected]'s password: 
You are required to change your password immediately (root enforced)
Last login: Tue Mar  1 21:05:47 2016 from 10.0.2.2
Changing password for root.
(current) UNIX password: 
New password: 
Retype new password: 
[root@sandbox ~]# ambari-admin-password-reset
Please set the password for admin: 
Please retype the password for admin: 

The admin password has been set.
Restarting ambari-server to make the password change effective...

Using python  /usr/bin/python2
Restarting ambari-server
Using python  /usr/bin/python2
Stopping ambari-server
Ambari Server stopped
Using python  /usr/bin/python2
Starting ambari-server
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start....................
Ambari Server 'start' completed successfully.
It is suggested that you set the admin Ambari user's password to admin for consistency with the other web UIs.

Install numpy as we will use it in pyspark:

sudo yum install -y numpy

for the MEETUP lab, you will need to download datasets: Switch from root to zeppelin user.

sudo su - zeppelin

################################################# #FOR PHILLY CRIME DATA Analysis #################################################

wget https://raw.githubusercontent.com/zeltovhorton/sparkworkshop/master/data/philadelphia-crime-data-2015-ytd.csv ls pwd hadoop fs -mkdir /user/zeppelin/crime hadoop fs -put philadelphia-crime-data-2015-ytd.csv /user/zeppelin/crime/ hadoop fs -ls /user/zeppelin hadoop fs -ls /user/zeppelin/crime #################################################

#FOR RECOMMENDATION ENGINE example

dataset="http://files.grouplens.org/datasets/movielens/ml-1m.zip"

#dataset="http://files.grouplens.org/datasets/movielens/ml-10m.zip" #dataset="http://files.grouplens.org/datasets/movielens/ml-20m.zip"

#################################################

The rest of this note should not be modified.

#################################################

#Get the dataset

echo "" rm -rf movielens hadoop fs -rm -r -f /user/zeppelin/movielens/

mkdir movielens cd movielens wget $dataset -o /dev/null -O ml.zip unzip ml.zip rm ml.zip mv ml* ml gzip ml/ratings.dat

Move the required files to HDFS

hadoop fs -mkdir /user/zeppelin/movielens hadoop fs -put ml/movies.dat /user/zeppelin/movielens/movies.dat hadoop fs -put ml/ratings.dat.gz /user/zeppelin/movielens/ratings.dat.gz

echo "Files in HDFS:/user/zeppelin/movielens" hadoop fs -ls /user/zeppelin/movielens

#################################################

FOR SENSOR DEMO

#################################################

wget http://s3.amazonaws.com/hw-sandbox/tutorial14/SensorFiles.zip unzip SensorFiles.zip hadoop fs -mkdir -p /user/zeppelin/SensorDemo hadoop fs -copyFromLocal -f SensorFiles/HVAC.csv /user/zeppelin/SensorDemo/ hadoop fs -tail /user/zeppelin/SensorDemo/HVAC.csv
On your local machine , NOT the HDP sandbox.

Clone my repo to pull the Zeppelin notebooks:

git clone https://github.com/zeltovhorton/njmeetup_spark.git

4) Optional: It is recommended to increase the yarn memory per node for the spark jobs

Login to the sandbox ambari UI

http://sandbox:8080/#/login

Click Yarn -> Config - increase yarn memory per node to 5120mb

Than restart all necessary components that show the restart symbol

Also stop all the unused components like : Atlas, Flume, Ranger to conserve memory on your HDP sandbox.

Open a new tab in your browser and navigate to:

http://sandbox:9995/#/

Than click import notebook.

If you cloned this github repo, you can point to your local filesystem: phillyCrimeAnalysis.json otherwise you can point to this url:

https://raw.githubusercontent.com/zeltovhorton/njmeetup_spark/master/phillyCrimeAnalysis.json

Do the same for 2nd notebook:

https://raw.githubusercontent.com/zeltovhorton/njmeetup_spark/master/recommendation_engine_als.json

Written with StackEdit.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
README.md		README.md
phillyCrimeAnalysis.json		phillyCrimeAnalysis.json
re_backup.json		re_backup.json
recommendation_engine_als.json		recommendation_engine_als.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The rest of this note should not be modified.

Move the required files to HDFS

FOR SENSOR DEMO

About

Releases

Packages

Contributors 2

zeltovhorton/njmeetup_spark

Folders and files

Latest commit

History

Repository files navigation

The rest of this note should not be modified.

Move the required files to HDFS

FOR SENSOR DEMO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages