Hadoop Tutorial: Multi-Node Cluster Setup

Administration Lab 6: Multi-node Cluster Setup

Create Additional VM Image

Repeat steps outlined in Lab 1 to create a new Virtual Machine and install JDK and Hadoop

Configure Bridged Adapter

Power off both Virtual Machines (Machine => ACPI Shutdown)

Highlight each VM and click on Settings => Network

In the “Attached To” field, select “Bridged Adapter

In the “Name” field, select the correct network card adapter (wired/wireless) that you use on your primary Operating System to connect to the Internet

Uncheck all devices except “Hard Disk”

Find out Master & Slave IPs

Run ifconfig on master and slave

Write down IPs for both hosts

Execute Installation from Lab1

Except the following steps:

sudo apt-get install hadoop-0.20-namenode
sudo apt-get install hadoop-0.20-jobtracker

They are not needed on slave node

Configuration For Both Nodes

/etc/hadoop-0.20/conf/mapred-site.xml

Provisioning IPs

/etc/hosts

Add entires for master and slave node to ensure proper network communication

Reformat namenode and delete data from data directory

sudosuhdfs

Execute hadoop-0.20 namenode format

Delete data from data directories on both machine

/var/lib/hadoop-0.20/cache/hdfs/dfs/data

Start Distributed Cluster

Start master node normally

Start datanode and tasktracker on slave node only

sudo /etc/init.d/hadoop-0.20-tasktracker start

sudo /etc/init.d/hadoop-0.20-datanode start

Verification

Check Web interface to verify that second node is connected

Run sleep example to make sure that mapreduce is working properly

Hadoop Tutorial

Tuesday, December 27, 2011

Multi-Node Cluster Setup

No comments:

Post a Comment

About Me

Blog Archive