Hadoop: from Single-Node Mode to Cluster Mode
The information provided in this page might be out-of-date. Please see a newer version at Step-by-Step Guide to Setting Up an R-Hadoop System.
After setting up an R Hadoop system on a single computer, it is easy to switch Hadoop from single-node mode to cluster mode by following the instructions on the Tech Tots blog, and they are summarized to 6 steps below.
Run commands below under Terminal, to create public key, and then copy it to all slave machines.
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2. Setup name node (master machine)
2.1 Configure the following 3 files on master machine
2.2 Masters and slaves files
- file "masters": IP address or hostname of namenode (master machine)
- file "slaves": a list of IP addresses or hostnames of datanodes (slave machines)
3. Setup data nodes (slave machines)
Tar the hadoop directory, copy it to all slaves and then untar it.
Enable incoming connections for Java on all machines, otherwise, slaves would not be able to receive any jobs.
5. Format name node
Go to Handoop directory and run
bin/hadoop namenode -format
6. Start Hadoop
Start HDFS and MapReduce
Monitor nodes and jobs with browser:
Stop Hadoop and MapReduce
For more details, please check
- http://techtots.blogspot.com.au/2011/12/setting-up-hadoop-in-clustered-mode.html; and