Professional Documents
Culture Documents
VirtualBox
1 INTRODUCTION ............................................................................................................................................. 1
2 UNZIPPING THE SANDBOX OVA FILE IN VIRTUALBOX .................................................................................... 1
3 CONFIGURING REMOTE SHELL ACCESS AND FILE TRANSFER FOR THE SANDBOX VM..................................... 6
4 USER ACCOUNT AND PASSWORDS .............................................................................................................. 10
1 Introduction
Hortonworks Sandbox (http://hortonworks.com/products/sandbox/) is a portable container containing a
complete installation of the latest version of Hadoop and its various ecosystem components, in particular
those that are bundled with the specific Hadoop solution that Hortonworks distributes: the Hortonworks
Data Platform (HDP) (http://hortonworks.com/products/). It provides full control and access to all aspects
of the Hadoop environment without the need to bother with time consuming administrative tasks such
as installation of individual components or deployment to a cloud service. It is ideal for experimenting
with various features of the Hadoop ecocsystem components before employing them in a real life big data
application, and thus serves as an ideal platform for learning about Hadoop.
This lab tutorial will guide you through the process of setting up the Hortonworks. It is derived from the
online version available at: http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-
hortonworks-sandbox/
Download the OVA file for the current version of HDP (HDP_xx.xx_virtualbox.ova). Determine a
suitable directory on your machine to unzip the OVA file into: you should ideally have around 20GB of free
space on the partition for this directory to hold the VM disk image files contained within the OVA file.
Start Oracle VirtualBox and set this directory location (File -> Preferences):
The VM disk images files (*.vmdk) for the Hortonworks Sandbox should now be present in the
destination directory you designated earlier:
Before starting the VM, ensure that you have allocated sufficient resources from the host system to it so
that it can run properly and smoothly. Right click on the Horton Sandbox item in the left panel and choose
Settings:
Ensure that a minimum of 8GB RAM and 4 CPUs are allocated to the VM. You can choose to allocate more
memory and processors to facilitate faster operation if you have enough resources to spare on your guest
OS.
A successful bootup should complete with the following confirmation screen shown below. The Sandbox
will resolve to the host OS (Windows) with a particular IP address. Note down this IP address as we will
be using it to connect to the Sandbox via a browser in the host OS as well as configuring the MobaXTerm
client.
Click on Launch Dashboard and at the login prompt, enter the following for both the username and
password: maria_dev
You should now be able to access the main Ambari dashboard. Ambari is an open source management
platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. It provides a GUI
that simplifies interaction with many key components of the Hadoop ecosystem component, and we will
be using it in the various lab sessions to follow.
You can also access the main Ambari dashboard directly by typing: http://Sandbox-IP-address:8080 into
a browser tab.
After you have downloaded and installed MobaXTerm, start it and select Session, then select SSH to create
a new SSH session for connecting to the Sandbox VM.
Setup the session using the Sandbox IP address and port 2222 and login with the root account (i.e.
username: root)
When done click ok, and enter the following when prompted for the password: hadoop
Select Yes to save the password for this session.
You will then be prompted to change the password. Type in the default password (hadoop) and enter a
new one (hortonworks) twice to confirm. You can choose another password if you wish, but please
ensure you remember it for the remaining lab sessions as this is the password for the root / superuser
account.
Right clicking in an active session window in MobaXTerm allows you to access a variety of functionality
that you will find useful in the later lab sessions. This include copying and pasting information between
the session window and text editors on Windows host system, saving an archive of the commands typed
in the shell to a file, recording a macro of typed commands for future replay, increasing and decreasing
font size.
Exit the shell by typing exit and press return to close the tab.
Once you are logged in, you should be able to view the contents of the current directory in the Linux VM
(/root) in the left tab. The SFTP functionality allows you to transfer files between the Linux VM and
Windows by dragging and dropping, and also provides other related remote file access, control and
navigation functionality via the toolbar icons on the left tab.
Note that you can open multiple shell sessions to the Sandbox VM with the same or different user
accounts; this will facilitate navigation in different directories and working with different applications
simultaneously.
There are 4 user accounts (maria_dev, raj_ops, holger_gov and amy_ds) already present in
the Sandbox with different administrative rights and roles (See Section 3 in
http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox). The
password for all of these accounts are the same as the usernames. You will use them to login to the main
Ambari dashboard.
We will need to set the admin password for the Ambari admin account as we will be using this in a later
lab exercise. Login using the root session created earlier and type:
The ambari-server will then restart and do a database consistency check. This will take some time to
complete.
For most of the lab exercises that follow, we will use the maria_dev account which reflects the
developer role in a big data application pipeline and which will provide access to the various relevant
Hadoop components such as Hive, Pig, Falcon, Oozie and Spark.
Create another 3 more SSH sessions in MobaXTerm for connecting to the Sandbox VM with this username
and login in with these sessions (the password is also maria_dev).
You can rename these sessions to make them clearer if you wish.