Setting up a Hadoop ecosystem for learning, developing, and testing could be a time consuming task. Cloudera offers a guide for installing a Proof-of-Concept version of their Hadoop platform called Cloudera Distribution for Hadoop (CDH). This repository aims to automate the configuration of a cluster in a local machine using Virtualbox and Vagrant. Note: The guide explains how to install CDH Express Edition, that is good for learning and testing, but not for production or commercial use.
The idea of repository was inspired by ambari-vagrant. Most of the configuration is taken from there.
- A computer with at least 16GB of RAM and 8 CPU threads.
- This works with Mac OS, Linux, and Windows 10.
- For Windows 10:
- Execute the scripts in the Windows Subsystem for Linux (WSL)
- Install the same version of Vagrant in Windows and WSL
- Virtualbox must be installed only on Windows 10
- Add the following to your .bashrc:
export VAGRANT_WSL_ENABLE_WINDOWS_ACCESS="1"
- Clone the repository to
C:/cdh-vagrant/
which is accessible from/mnt/c/cdh-vagrant/
in WSL - For more details check Vagrant and Windows Subsystem for Linux.
- For Windows 10:
- Ideally 4 VMs with 8GB of RAM and 4 cores should work fine if you want to use more than the Essentials
- Clone this repository using
git clone https://github.com/frederickayala/cdh-vagrant.git
- If you are using Windows the repository must be cloned to a Windows drive path (e.g.,
C:/cdh-vagrant
or/mnt/c/cdh-vagrant
- If you are using Windows the repository must be cloned to a Windows drive path (e.g.,
- Go to the repository folder using
cd cdh-vagrant
- Append the lines in
append-to-hosts-file
to your hosts files.- For MacOS, Linux, and WSL the file is located in
/etc/hosts
- For Windows the file is located in
C:\Windows\System32\Drivers\etc\hosts
- Windows user must append the lines in both WSL and Windows.
- For MacOS, Linux, and WSL the file is located in
- Check the HARDWARE NOTE in the Vagrantfile to verify that matched your computer capabilities.
- Run the command
bash up.sh 3
this will start three VMs.- You can add up to 8 hosts.
- If it is the first time that you are running vagrant run
vagrant init
- SSH to the node
vagrant ssh c7401
the password isvagrant
- Switch to sudo using
sudo su -
- Download CDH:
wget https://archive.cloudera.com/cm6/6.2.0/cloudera-manager-installer.bin
- Change the permissions:
chmod u+x cloudera-manager-installer.bin
- Run the Cloudera Manager Installer:
./cloudera-manager-installer.bin
- Accept the terms and conditions
- When the installation is finished you will see a message saying that you should open your Web browser to
http://c7401.cdh.testlab:7180/
- Open your browser and go to
http://c7401.cdh.testlab:7180/
if it does not work please check your hosts file or wait a couple of minutes for the process to start. - The default username and password is
admin
- Click
Continue
and accept the terms and conditions - Select
Cloudera Express
, clickContinue
, and name your clustertestlab
- In
Specify Hosts
section writec74[01-03].cdh.testlab
and clickSearch
. If you are using more than 3 VMs modify accordingly. A list of the three hosts should appear sayingHost was successfully scanned.
ClickContinue
. - Leave the
Select Repository
section as it is. ClickContinue
- Select
Install Oracle Java SE Development Kit (JDK 8)
and clickContinue
- In the
Enter Login Credentials
section:- In Authentication Method select:
All hosts accept same private key
- Click
File Upload
, Choose File, and select theinsecure_private_key
file from the repository folder. - Click
Continue
- In Authentication Method select:
- In the
Install Agents
section wait until the three agents sayInstallation completed successfully
and clickContinue
- In the
Install Parcels
section wait for the selected parcels to download, distribute, unpack, and activate. - In the
Inspect Cluster
section clickInspect Network Performance
andInspect Hosts
. After both have the green mark selectI understand the risks, let me continue with cluster creation.
and clickContinue
. - In the
Select services
section you can pick the services that you want.- For example click custom services and select: HBase, HDFS, Hue, Spark, YARN (MR2 Included), Oozie, and ZooKeeper.
- Note that 3 VMs is very little computing power for such services so choose only what you want to try.
- In the
Assign Roles
section you might need to select additional hosts for the services if there are warning. ClickContinue
- In the
Setup Database
section ignore the warning about the embedded database, write down the passwords, and clickTest Connection
, then clickContinue
- In
Review Changes
you might need to add additional configuration if there are warnings. Do so and clickContinue
. - Note: In
First Run Command
things might fail because of timeouts. We are running to much stuff in the same machine. It is usually enough to clickResume
- It will take some time for this step to finish, you might also be logged out of Cloudera Manager. If this happens, refresh the window, login again using the
admin
password and username. - Again, if things fail is usually enough to click
Resume
- It will take some time for this step to finish, you might also be logged out of Cloudera Manager. If this happens, refresh the window, login again using the
- Eventually you will see
Status Finished
. Click on theCloudera Manager
logo or go tohttp://c7401.cdh.testlab:7180/cmf/home
- That's it, you can start using the services.
- SSH to a node:
vagrant ssh
- Suspend the cluster:
vagrant suspend
- Resume the cluster:
vagrant resume
- To erase all the VMs:
vagrant destroy -f
- Check the status of the VMs:
vagrant status
- We are will add exercises for the most common tools. Coming soon!