Hadoop: from Single-Node Mode to Cluster Mode
The information provided in this page might be out-of-date. Please see a newer version at Step-by-Step Guide to Setting Up an R-Hadoop System.
After setting up an R Hadoop system on a single computer, it is easy to switch Hadoop from single-node mode to cluster mode by following the instructions on the Tech Tots blog, and they are summarized to 6 steps below.
1. SSH
Run commands below under Terminal, to create public key, and then copy it to all slave machines.
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2. Setup name node (master machine)
2.1 Configure the following 3 files on master machine
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
2.2 Masters and slaves files
- file "masters": IP address or hostname of namenode (master machine)
- file "slaves": a list of IP addresses or hostnames of datanodes (slave machines)
3. Setup data nodes (slave machines)
Tar the hadoop directory, copy it to all slaves and then untar it.
4. Firewall
Enable incoming connections for Java on all machines, otherwise, slaves would not be able to receive any jobs.
5. Format name node
Go to Handoop directory and run
bin/hadoop namenode -format
6. Start Hadoop
Start HDFS and MapReduce
bin/start-dfs.sh
bin/start-mapred.sh
Monitor nodes and jobs with browser:
- http://IP_OF_NAMENODE:50030
- http://IP_OF_NAMENODE:50070
Stop Hadoop and MapReduce
bin/stop-dfs.sh
bin/stop-mapred.sh
For more details, please check
- http://techtots.blogspot.com.au/2011/12/setting-up-hadoop-in-clustered-mode.html; and