Tutorial for setting up a Hadoop environment for learning and testing in VirtualBox

With this tutorial you can set up your own Hadoop environment using virtual machines. To get started download and install VirtualBox .

Next you will need to get the centos virtual image. Once downloaded create a directory on the root of your drive called VMs and unzip the contents of the centos zip file to the VMs directory.

In VirtualBox, you will need to create a new virtual machine. Choose Linux from type and then Red Hat 64-bit from version. You need to allocate 2048Mb memory and for the Hard Disk choose use an existing virtual hard disk file and navigate to the centos image directory in the VMs directory. Choose the vmdk file whose filename does not end with a number. Once you have done that the machine is ready to start up.

Next is to install the VirtualBox Guest Additions which gives you better performance amongst other enhancement. Before you install Guest Additions, run the following install commands to prepare for the installation

# yum update
# yum install gcc
# yum install kernel-devel
# yum install bzip2
# shutdown -r 00

If you need to return the cursor to Windows press the host key which is the right Ctrl key by default. Choose devices from the VirtualBox menu and choose Insert Guest Additions CD Image. Follow the prompts to install. Restart the virtual machine for the changes to take effect.


Installing Apache BigTop

BigTop is ideal for learning Big Data components like Hadoop. Lets get started with the installation.

First get the repo file which points to the download of Hadoop and it’s dependencies.

wget -O /etc/yum.repos.d/bigtop.repo \

Next is to select and install the Hadoop components

yum install hadoop\* mahout\* oozie\* hbase\* hive\* hue\* pig\* zookeeper\*

Choose yes for the code signing prompts. Once Hadoop and the selected components is installed the next step is to configure Hadoop. After the configuration Hadoop will be ready to start.

Download and install java

yum install java-1.7.0-openjdk-devel.x86_64

Format the namenode

sudo /etc/init.d/hadoop-hdfs-namenode init

Start the Hadoop services for your cluster

for i in hadoop-hdfs-namenode hadoop-hdfs-datanode ; 

do sudo service $i start ;


Create a sub-directory structure in HDFS

sudo /usr/lib/hadoop/libexec/init-hdfs.sh

Start the YARN daemons

sudo service hadoop-yarn-resourcemanager start;
sudo service hadoop-yarn-nodemanager start

If everything we well you now have a working Hadoop installation.


Leave a Reply