peachyo bio photo

peachyo

minimalist

Email Github

Create Ubuntu VM

  • Download and intall VirtualBox
  • Download Ubuntu Image
  • Create a new VM “hadoop1” for Ubuntu and install Ubuntu image
  • Change network settings to use bridged network

Create a dedicated hadoop user

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

Configure SSH

$ sudo apt-get install openssh-server
$ sudo update-rc.d ssh defaults

$ su - hduser
$ ssh-keygen -t rsa -P ""
$ cat .ssh/id_rsa.put >> $ .ssh/authorized_keys
$ ssh localhost

Disable IPv6

open /etc/sysctl.conf and add the following lines to the end of the file

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Install Hadoop

Extract tar ball

$ cd /usr/local
$ sudo tar xzf hadoop-2.5.0.tar.gz
$sudo mv hadoop-2.5.0 hadoop
$sudo chown -R hduser:hadoop hadoop

Add the following lines to the end of the .bashrc file

export HADOOP_HOME=/usr/local/hadoop
JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
export PATH=$PATH:$HADOOP_HOME/bin

Update hadoop configuration files: see Link

Edit “slaves” file to add all nodes

hadoop1.example.com
hadoop2.example.com

Format hdfs

$ ./bin/hadoop namenode -format

Clone the VM with initialize MAC address and linked node checked

  • update /etc/hostname to hadoop2
  • update /etc/hosts to hadoop2
  • Add hadoop1, hadoop2 ipaddresses to /etc/hosts on both nodes

Start cluster

$ ./sbin/start-dfs.sh
$ ./sbin/start-yarn.sh