Tuesday, December 27, 2011

Multi-Node Cluster Setup

Administration Lab 6: Multi-node Cluster Setup

Create Additional VM Image

Repeat steps outlined in Lab 1 to create a new Virtual Machine and install JDK and Hadoop 


Configure Bridged Adapter
  1. Power off both Virtual Machines (Machine => ACPI Shutdown)
  1. Highlight each VM and click on Settings => Network
  1. In the “Attached To” field, select “Bridged Adapter
In the “Name” field, select the correct network card adapter (wired/wireless) that you use on your primary Operating System to connect to the Internet


Uncheck all devices except “Hard Disk”


Find out Master & Slave IPs
  1. Run ifconfig on master and slave
  1. Write down IPs for both hosts
Execute Installation from Lab1
  1. Except the following steps:
  1. sudo apt-get install hadoop-0.20-namenode
  2. sudo apt-get install hadoop-0.20-jobtracker
  1. They are not needed on slave node
Configuration For Both Nodes
  1. /etc/hadoop-0.20/conf/mapred-site.xml
Provisioning IPs
  1. /etc/hosts
  1. Add entires for master and slave node to ensure proper network communication
Reformat namenode and delete data from data directory
  1. sudosuhdfs
  1. Execute hadoop-0.20 namenode format
  1. Delete data from data directories on both machine
  1. /var/lib/hadoop-0.20/cache/hdfs/dfs/data
Start Distributed Cluster
  1. Start master node normally
  1. Start datanode and tasktracker on slave node only
  1. sudo /etc/init.d/hadoop-0.20-tasktracker start
  1. sudo /etc/init.d/hadoop-0.20-datanode start
Verification
  1. Check Web interface to verify that second node is connected
  1. Run sleep example to make sure that mapreduce is working properly

No comments:

Post a Comment