You are on page 1of 1

Relational Database SQL to Java Map Reduce, and Back

Ben Filippo, Rick Shingu, Nick Zarzosa, Dean Cowart


Client: Naval Postgraduate School
Mentor: Arijit Das
Overview

Project Objective

The goal of our project is to provide an


affordable solution to consolidate large amounts
of data. While the current methods available
work well, most are not a cost effective way to
achieve this.

While Sqoop is an open source solution that is


compatible with Hadoop Distributed File System,
it is clear that it is not the perfect solution when
compared to its payware counterpart.

Our approach to this issue is to implement a


working man's solution to a corporate sized
problem. Sqoop is an open-source application
utilized to address the problem of operation
costs.

-Use of sqoop as freeware alternative

- Further automation of porting process.


There were two solutions at our disposal:.

Our Approach

There are currently several former CSUMB


students working on this project with NPS. Most
of the information for the project is fully
implemented, however, it needs to be expanded
upon. There will be guidelines and cooperation
to complete the tasks needed by NPS.
As contributors we will attempt to create a cost
effective solution with Apache Sqoop, as this
was not successfully implemented in the past.
The problem occurred when data was missing
from the transfer of data. Along with this
process, we have documented the procedures
and steps taken for future use.
Hadoop is being used by NPS to take advantage
of commodity hardware (existing hardware) to
avoid using expensive specialized hardware for a
distributed file system. Using Sqoop is
necessary to keep costs to a minimum with the
functionality unchanged.

Future Work
We will be providing documentation of the
process and methods that we used while
applying Sqoop as a solution for running
Hadoop through databases. This documentation
will consist of instructions on how to run the
process, what pitfalls were present through this
open source solution, and other considerations
that our group had examined.
- Detailed comparisons between current and
alternative methods.

-Lossless transfer between file system and


database

Our answer works through two major concepts;


the reduction of a database into the Hadoop
Distributed File System (HDFS), and filling a
table from the HDFS without losing the datas
integrity.

Background

Solution Design

Our initial approach was to use an Oracle XE


database with Sqoop to transfer data to and
from the Hadoop Distributed File System. After
running into issues with CLASSPATHS and
drivers our team chose to take another
approach.

1. Use Oracles Big Data Loader to bring data


into the Hadoop Distributed File System.
(Payware solution)
2. Utilize the Sqoop software to bring data into
the Hadoop Distributed File System.
(Freeware solution)

Results

We first gathered information from the first failed


attempt. This information included screenshots
and error codes. From there, we gave our
mentor the information to send to Cloudera
support. This was to make sure that the original
research and project work did not got to waste.
Through collaboration with our mentor, our team
realized that we needed to use a more
compatible database. After doing extensive
research, we were able to use Sqoop in
conjunction with a MySQL database. We were
able to successfully connect the database with
the HDFS (Hadoop Distributed File System)
using Sqoop.

Autobiographical Info

Ben Filippo
Network and Security Concentration.
Semi-professional High Fiver
Video Game Enthusiast
Rick Shingu
Network and Security Concentration.
Photographer | Musician
Nickolas Zarzosa
Software Engineering concentration.
Video games and tabletops
Dean Cowart
Network and Security Concentration.

Test Pools:
~5gb (no data loss)

Acknowledgements

~200gb (no data loss)


SQL database configured
Connection between databases and file
system established.
Data imported and exported between
databases (no data loss)
Testing database with larger data for
integrity

Our group would like to thank:


- CSUMB for giving us this opportunity
-Arijit Das for assisting us every step of the
way
- Sathya Narayanan for project assistance.
-The internet, for its endless resources

You might also like