Administration Lab 6: Multi-node Cluster Setup
Create Additional VM Image
Repeat steps outlined in Lab 1 to create a new Virtual Machine and install JDK and Hadoop
Configure Bridged Adapter
- Power off both Virtual Machines (Machine => ACPI Shutdown)
- Highlight each VM and click on Settings => Network
- In the “Attached To” field, select “Bridged Adapter
In
the “Name” field, select the correct network card adapter
(wired/wireless) that you use on your primary Operating System to
connect to the Internet
Uncheck all devices except “Hard Disk”
Find out Master & Slave IPs
- Run ifconfig on master and slave
- Write down IPs for both hosts
Execute Installation from Lab1
- Except the following steps:
- sudo apt-get install hadoop-0.20-namenode
- sudo apt-get install hadoop-0.20-jobtracker
- They are not needed on slave node
Configuration For Both Nodes
- /etc/hadoop-0.20/conf/mapred-site.xml
Provisioning IPs
- /etc/hosts
- Add entires for master and slave node to ensure proper network communication
Reformat namenode and delete data from data directory
- sudosuhdfs
- Execute hadoop-0.20 namenode format
- Delete data from data directories on both machine
- /var/lib/hadoop-0.20/cache/hdfs/dfs/data
Start Distributed Cluster
- Start master node normally
- Start datanode and tasktracker on slave node only
- sudo /etc/init.d/hadoop-0.20-tasktracker start
- sudo /etc/init.d/hadoop-0.20-datanode start
Verification
- Check Web interface to verify that second node is connected
- Run sleep example to make sure that mapreduce is working properly