Hadoop: from Single-Node Mode to Cluster Mode

The information provided in this page might be out-of-date. Please see a newer version at Step-by-Step Guide to Setting Up an R-Hadoop System.

After setting up an R Hadoop system on a single computer, it is easy to switch Hadoop from single-node mode to cluster mode by following the instructions on the Tech Tots blog, and they are summarized to 6 steps below.

1. SSH

Run commands below under Terminal, to create public key, and then copy it to all slave machines.

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2. Setup name node (master machine)

2.1 Configure the following 3 files on master machine

- core-site.xml

- hdfs-site.xml

- mapred-site.xml

2.2 Masters and slaves files

- file "masters": IP address or hostname of namenode (master machine)

- file "slaves": a list of IP addresses or hostnames of datanodes (slave machines)

3. Setup data nodes (slave machines)

Tar the hadoop directory, copy it to all slaves and then untar it.

4. Firewall

Enable incoming connections for Java on all machines, otherwise, slaves would not be able to receive any jobs.

5. Format name node

Go to Handoop directory and run

bin/hadoop namenode -format

6. Start Hadoop

Start HDFS and MapReduce

bin/start-dfs.sh

bin/start-mapred.sh

Monitor nodes and jobs with browser:

- http://IP_OF_NAMENODE:50030

- http://IP_OF_NAMENODE:50070

Stop Hadoop and MapReduce

bin/stop-dfs.sh

bin/stop-mapred.sh

For more details, please check

- http://techtots.blogspot.com.au/2011/12/setting-up-hadoop-in-clustered-mode.html; and

- http://hadoop.apache.org/docs/stable/cluster_setup.html.