You are on page 1of 12

Setting up HortonWorks Sandbox using Oracle

VirtualBox

1 INTRODUCTION ............................................................................................................................................. 1
2 UNZIPPING THE SANDBOX OVA FILE IN VIRTUALBOX .................................................................................... 1
3 CONFIGURING REMOTE SHELL ACCESS AND FILE TRANSFER FOR THE SANDBOX VM..................................... 6
4 USER ACCOUNT AND PASSWORDS .............................................................................................................. 10

1 Introduction
Hortonworks Sandbox (http://hortonworks.com/products/sandbox/) is a portable container containing a
complete installation of the latest version of Hadoop and its various ecosystem components, in particular
those that are bundled with the specific Hadoop solution that Hortonworks distributes: the Hortonworks
Data Platform (HDP) (http://hortonworks.com/products/). It provides full control and access to all aspects
of the Hadoop environment without the need to bother with time consuming administrative tasks such
as installation of individual components or deployment to a cloud service. It is ideal for experimenting
with various features of the Hadoop ecocsystem components before employing them in a real life big data
application, and thus serves as an ideal platform for learning about Hadoop.

This lab tutorial will guide you through the process of setting up the Hortonworks. It is derived from the
online version available at: http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-
hortonworks-sandbox/

2 Unzipping the Sandbox OVA file in VirtualBox


There are 3 distribution versions of the Hortonworks Sandbox available at this point of time
(http://hortonworks.com/downloads/#sandbox): 2 are Open Virtual Alliance (OVA) files that contain VM
images for the VirtualBox and VMWare virtualization platforms, while the 3rd is a Docker wrapper. For this
workshop, we will be using VirtualBox to run the VM containing the Sandbox.

Download the OVA file for the current version of HDP (HDP_xx.xx_virtualbox.ova). Determine a
suitable directory on your machine to unzip the OVA file into: you should ideally have around 20GB of free
space on the partition for this directory to hold the VM disk image files contained within the OVA file.
Start Oracle VirtualBox and set this directory location (File -> Preferences):

Victor Tan 2017 Do not distribute without permission


Next, choose File Import Alliance, and open the OVA file that you downloaded earlier:

Keep the default settings:

Victor Tan 2017 Do not distribute without permission


The import process will unzip the OVF package structure in the OVA file and will take some time to
complete. The Hortonworks Docker Sandbox will be shown as an active item in the VirtualBox main view
upon successful completion.

The VM disk images files (*.vmdk) for the Hortonworks Sandbox should now be present in the
destination directory you designated earlier:

Before starting the VM, ensure that you have allocated sufficient resources from the host system to it so
that it can run properly and smoothly. Right click on the Horton Sandbox item in the left panel and choose
Settings:

Ensure that a minimum of 8GB RAM and 4 CPUs are allocated to the VM. You can choose to allocate more
memory and processors to facilitate faster operation if you have enough resources to spare on your guest
OS.

Victor Tan 2017 Do not distribute without permission


Highlight the Hortonworks Sandbox item and click Start to run the VM. The guest OS (CentOS Linux 7) will
take a couple of minutes to boot up.

A successful bootup should complete with the following confirmation screen shown below. The Sandbox
will resolve to the host OS (Windows) with a particular IP address. Note down this IP address as we will
be using it to connect to the Sandbox via a browser in the host OS as well as configuring the MobaXTerm
client.

Victor Tan 2017 Do not distribute without permission


Open a browser and point to the specified address and port number (for e.g.
http://127.0.0.1:8888/), from which you can then access Ambari (Launch Dashboard) or view
other services in the HDP environment (Quick Links).

Click on Launch Dashboard and at the login prompt, enter the following for both the username and
password: maria_dev

You should now be able to access the main Ambari dashboard. Ambari is an open source management
platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. It provides a GUI
that simplifies interaction with many key components of the Hadoop ecosystem component, and we will
be using it in the various lab sessions to follow.

You can also access the main Ambari dashboard directly by typing: http://Sandbox-IP-address:8080 into
a browser tab.

Victor Tan 2017 Do not distribute without permission


3 Configuring remote shell access and file transfer for the Sandbox VM
To open a shell in the Sandbox VM for typing Linux CLI commands, we will need to use a SSH client. The
most well known open source version of this for Windows is PuTTY (www.putty.org). We will also need a
SFTP and SCP client to transfer files between the host OS (Windows) and the Linux file system of the
Sandbox VM. There are several versions available; a popular one is WinSCP
(https://winscp.net/eng/download.php). For the lab sessions in this workshop, we will use MobaXTerm
instead. This provides all the required protocol functionality that we require (SSH, SCP), a tabbed SSH
client interface as well as variety of other network tools that greatly simplify remote Linux/Unix
administration. A free version of this is downloadable from
http://mobaxterm.mobatek.net/download.html

After you have downloaded and installed MobaXTerm, start it and select Session, then select SSH to create
a new SSH session for connecting to the Sandbox VM.

Setup the session using the Sandbox IP address and port 2222 and login with the root account (i.e.
username: root)

Victor Tan 2017 Do not distribute without permission


You can customize the settings for the session to facilitate working in the remote shell. Some of the
parameters you might wish to change is the font size and type, and background and foreground color for
the shell.

When done click ok, and enter the following when prompted for the password: hadoop
Select Yes to save the password for this session.

You will then be prompted to change the password. Type in the default password (hadoop) and enter a
new one (hortonworks) twice to confirm. You can choose another password if you wish, but please
ensure you remember it for the remaining lab sessions as this is the password for the root / superuser
account.

Victor Tan 2017 Do not distribute without permission


You are now currently logged in to the Sandbox VM and can interact with it via the Linux terminal shell.
Determine the current version of Hadoop installed on the Sandbox by typing at the shell: sandbox-
version

Right clicking in an active session window in MobaXTerm allows you to access a variety of functionality
that you will find useful in the later lab sessions. This include copying and pasting information between
the session window and text editors on Windows host system, saving an archive of the commands typed
in the shell to a file, recording a macro of typed commands for future replay, increasing and decreasing
font size.

Exit the shell by typing exit and press return to close the tab.

Victor Tan 2017 Do not distribute without permission


You can double click on the current saved session in the left tab to reopen it whereupon you will be
prompted for the new password. Enter this and select to update the password.

Once you are logged in, you should be able to view the contents of the current directory in the Linux VM
(/root) in the left tab. The SFTP functionality allows you to transfer files between the Linux VM and
Windows by dragging and dropping, and also provides other related remote file access, control and
navigation functionality via the toolbar icons on the left tab.

Note that you can open multiple shell sessions to the Sandbox VM with the same or different user
accounts; this will facilitate navigation in different directories and working with different applications
simultaneously.

Victor Tan 2017 Do not distribute without permission


There is also a client running on the VM which can provide remote shell access via a browser. Open a new
tab in your browser and point it to http://sandbox-IP-address:4200 and enter the username
and password in the usual manner. You should however use the tabbed SSH client in MobaXTerm as far
as possible as this provides a lot of additional functionality that is not possible here.

4 User account and passwords

There are 4 user accounts (maria_dev, raj_ops, holger_gov and amy_ds) already present in
the Sandbox with different administrative rights and roles (See Section 3 in
http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox). The
password for all of these accounts are the same as the usernames. You will use them to login to the main
Ambari dashboard.

We will need to set the admin password for the Ambari admin account as we will be using this in a later
lab exercise. Login using the root session created earlier and type:

[root@sandbox ~]# ambari-admin-password-reset

When prompted, set the new password as: admin

Victor Tan 2017 Do not distribute without permission


Please set the password for admin:
Please retype the password for admin:

The ambari-server will then restart and do a database consistency check. This will take some time to
complete.

The admin password has been set.


Restarting ambari-server to make the password change effective...

Using python /usr/bin/python


Restarting ambari-server
Using python /usr/bin/python
Stopping ambari-server
Ambari Server stopped
Using python /usr/bin/python
Starting ambari-server
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...

No errors were found.


Ambari database consistency check finished
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start....................
Ambari Server 'start' completed successfully.

For most of the lab exercises that follow, we will use the maria_dev account which reflects the
developer role in a big data application pipeline and which will provide access to the various relevant
Hadoop components such as Hive, Pig, Falcon, Oozie and Spark.

Create another 3 more SSH sessions in MobaXTerm for connecting to the Sandbox VM with this username
and login in with these sessions (the password is also maria_dev).

You can rename these sessions to make them clearer if you wish.

Victor Tan 2017 Do not distribute without permission


In the later lab exercises, it will be useful to have several sessions open simultaneously so that you can be
resident in different directory locations in different shells and type relevant commands for each location
without needing to constantly change between different directories which would be the case if you
working through a single shell.

Victor Tan 2017 Do not distribute without permission

You might also like