--->Ansible Configured
________________________Hadoop Software
_____________________Java JDK software
Click on the above Hyperlinks to download the packages👆
Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain. Automate everything from code deployment to network configuration to cloud management, in a language that approaches plain English, using SSH, with no agents to install on remote systems. Ansible is the simplest way to automate apps and IT infrastructure. Application Deployment + Configuration Management + Continuous Delivery Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows.Playbooks are the files where Ansible code is written. ... Playbooks are one of the core features of Ansible and tell Ansible what to execute. They are like a to-do list for Ansible that contains a list of tasks. Playbooks contain the steps which the user wants to execute on a particular machine.
Ansible modules are reusable, standalone scripts that can be used by the Ansible API, or by the ansible or ansible-playbook programs. They return information to ansible by printing a JSON string to stdout before exiting. They take arguments in one of several ways.Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. ... The NameNode is a Single Point of Failure for the HDFS Cluster.
DataNodes store data in a Hadoop cluster and is the name of the daemon that manages the data. File data is replicated on multiple DataNodes for reliability and so that localized computation can be executed near the data.
A complete setup of HDFS Cluster is quite time consuming , and involves repetitive steps , to avoid that we used intelligence and power of ansible. It involves these steps:-
1. download the packages(jdk and hadoop)
2. install the packages(jdk and hadoop)
3. configure hdfs-site.xml file ->
4. configure core-site.xml file ->
5. create a directory
6. format that directory
7. start the namenode service
At data-node
1. download the packages(jdk and hadoop)
2. install jdk and hadoop
3. configure hdfs-site.xml file ->
4. configure core-site.xml file ->
5. create a directory
7. start the datanode service
8. check the report of the cluster
start_datanode: "hadoop-daemon.sh start datanode"
hdfs_loc: "/etc/hadoop/hdfs-site.xml"
hdfs_datanode: "/root/hadoop_ansible/hdfs1.xml"
coresite_location: "/etc/hadoop/core-site.xml"
coresite_file: "/root/hadoop_ansible/core-site.xml.j2"
hadoop_install: "rpm -ivh /root/hadoop-1.2.1-1.x86_64.rpm --force"
jdk_install: "rpm -ivh /root/jdk.rpm"
packages: "/root/hadoop_ansible/packages/*"
start_namenode: "hadoop-daemon.sh start namenode"
format_namenode: "yes Y | hadoop namenode -format"
hdfs_namenode: "/root/hadoop_ansible/hdfs.xml"
directory_namenode: "/name_node"
directory_datanode: "/data_node"
- hosts: name_node
- variables.yml
- name: Copying the necessary softwares
src: "{{ item }}"
dest: /root/
- /root/hadoop_ansible/packages/*
- name: Creating directory for name_node to store metdata
path: "{{ directory_namenode }}"
state: directory
mode: '0755'
- name: Installing the java software
shell: "{{ jdk_install }}"
register: result
ignore_errors: yes
- name: Installing hadoop software
shell: "{{ hadoop_install }}"
register: result1
ignore_errors: yes
- name: Status check
- result
- result1
- name: Copying the coresite file
src: "{{ coresite_file }}"
dest: "{{ coresite_location}}"
- name: Copying hdfs-site.xml file
src: "{{hdfs_namenode}}"
dest: "{{ hdfs_loc }}"
- name: disabling the firewalld
name: firewalld
state: stopped
enabled: False
- name: Format the folder
shell: "{{ format_namenode }}"
register: format_status
ignore_errors: yes
- name: start the namenode
shell: "{{ start_namenode }}"
register: format_status
ignore_errors: yes
- hosts: data_node
- variables.yml
- name: Copying the necessary softwares
src: "{{ item }}"
dest: /root/
- "{{ packages }}"
- name: creating directory for data_node to share the storage
path: "{{directory_datanode}}"
state: directory
mode: '0755'
- name: Installing java
shell: "{{ jdk_install }}"
register: result2
ignore_errors: yes
- name: Installing hadoop software
shell: "{{ hadoop_install }}"
register: result3
ignore_errors: yes
- name: testing
var :
- result2
- result3
- name: Copying the coresite file
src: "{{coresite_file}}"
dest: "{{ coresite_location }}"
- name: Copying hdfs-site.xml
src: "{{ hdfs_datanode }}"
dest: "{{ hdfs_loc }}"
- name: disabling the firewalld
name: firewalld
state: stopped
enabled: False
- name: starting data-node
shell: "{{ start_datanode }}"
register: datanode_status
ignore_errors: yes
- name: report of cluster
shell: "{{ hadoop_report }}"
register: datanode_status
ignore_errors: yes
{% for i in groups['name_node'] %}
<value>hdfs://{{ i }}:9001</value>
{% endfor %}
<ip_address> ansible_user=root ansible_ssh_pass=<password> ansible_connection=ssh
<ip_address> ansible_user=root ansible_ssh_pass=<password> ansible_connection=ssh
ansible-playbook playbook.yml
After that all the changes would be applied on the mentioned systems in the hosts.txt file
That's it our cluster has been configured, we can verfiy by following ways:-
1.running jps
and hadoop
2.running hadoop dfsadmin -report
3.going to webUI at http://<ip_of_namenode>:50070
Thanks for reading..... :)
See this project running live at:-