Hadoop: from Single-Node Mode to Cluster Mode

The information provided in this page might be out-of-date. Please see a newer version at Step-by-Step Guide to Setting Up an R-Hadoop System.

After setting up an R Hadoop system on a single computer, it is easy to switch Hadoop from single-node mode to cluster mode by following the instructions on the Tech Tots blog, and they are summarized to 6 steps below.

Run commands below under Terminal, to create public key, and then copy it to all slave machines.

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

2.1 Configure the following 3 files on master machine

- core-site.xml

- hdfs-site.xml

- mapred-site.xml

2.2 Masters and slaves files

- file "masters": IP address or hostname of namenode (master machine)

- file "slaves": a list of IP addresses or hostnames of datanodes (slave machines)

Tar the hadoop directory, copy it to all slaves and then untar it.

Enable incoming connections for Java on all machines, otherwise, slaves would not be able to receive any jobs.

Go to Handoop directory and run

bin/hadoop namenode -format

Start HDFS and MapReduce

bin/start-dfs.sh

bin/start-mapred.sh

Monitor nodes and jobs with browser:

- http://IP_OF_NAMENODE:50030

- http://IP_OF_NAMENODE:50070

Stop Hadoop and MapReduce

bin/stop-dfs.sh

bin/stop-mapred.sh

For more details, please check

Report abuse