You are on page 1of 44

BIG DATA FUNDAMENTALS – An Introductory 0

Big Data Fundamentals Final Report

Brian Brooks, Devorah Akhamzadeh, Seema Khan

California State University Monterey Bay

CST 499

June 17, 2018


BIG DATA FUNDAMENTALS – An Introductory 1

Executive Summary

This project is based on the growing industry need of big data skills expected of the

individuals graduating with a Bachelor of Science in Computer science (BSCS) degree. Big data

is a growing field and any BSCS program that is available needs to provide this course as an

option to students to have a complete understanding of the entire landscape of technologies

available for the graduating students. Currently, the courses available do not meet the needs of

BSCS students because of multiple reasons: not design for onlinelearning modality, based on

satndard semester or quarter system, do not take into account the unoque needs of eight week

schedule of working students. As of the cohort, the team did not see any course available to the

students that meets this critical need. This introductory course, “Big Data Fundamentals” that

will meet the needs of the future BSCS online students. It will provide a thorough understanding

of the concepts in the big data field, the technologies used and the skills needed to be successful

in careers in the field.


BIG DATA FUNDAMENTALS – An Introductory 2

Table of Contents
Executive Summary .................................................................................................................. 1
Table of Contents ...................................................................................................................... 2
Introduction .............................................................................................................................. 3
Project Goals and Objectives.................................................................................................... 6
Background and Environmental Scan ..................................................................................... 7
Stakeholders and Community ................................................................................................ 11
Methodology ............................................................................................................................ 12
Experimental Project and Design ........................................................................................... 13
Design Details ....................................................................................................................... 14
Ethical and Legal Considerations........................................................................................... 15
Informational Documents ....................................................................................................... 16
Final Deliverables.................................................................................................................... 16
Timeline and Budget ............................................................................................................... 17
Timeline ................................................................................................................................ 17
Project Budget....................................................................................................................... 17
Usability Testing and Evaluation ........................................................................................... 18
Phase One: In-process testing and evaluation ....................................................................... 18
Phase Two: Internal functionality testing............................................................................... 19
Phase Three: External Testing and Content Evaluation ......................................................... 20
Phase Four: Immediate changes ............................................................................................ 22
Phase 5: Long-term changes ................................................................................................. 22
Final Implementation.............................................................................................................. 23
Course Website Screenshots ................................................................................................... 24
Website URL ......................................................................................................................... 31
Conclusion ............................................................................................................................... 32
Appendix A.............................................................................................................................. 35
Module Evaluation Form....................................................................................................... 35
Appendix B .............................................................................................................................. 37
Team Members ...................................................................................................................... 37
Appendix C.............................................................................................................................. 38
Usability and Testing Report-Summary ................................................................................. 38
Appendix D.............................................................................................................................. 41
Focus Group Results ............................................................................................................. 41
BIG DATA FUNDAMENTALS – An Introductory 3

Big Data Fundamentals

An Introductory Course for Big Data and Related Technologies

Introduction

Big data is a growing field and it is not a hype or buzzword that is being used to attract

attention by individuals and corporations. Big data has revolutionized our lives in ways

unimaginable and it has also become one of the fastest growing industries across the globe. It has

become one of the most sought-after skill in the high-tech industry. Corporations and academia

are in pursuit of ways to address the need in terms of developing courses, publishing papers on

the latest and the most cutting-edge technology related to the field.

However, before these corporations and academia world focuses on the deep technical

aspects, it is important to educate individuals who are new to the field about the basic concepts

and technologies related to Big Data. Equally important is to educate individuals about the

resources available to learn and practice these skills without getting completely overwhelmed

and unable to venture to take up careers in the field. Given the ubiquitous need of “Big Data”

skills, it is critical that every student enrolled in the undergraduate program related to technical

field whether it be IT or computer science has the course available through the program.

However, before we delve further into what is available and what is not and the quality of

resources available, we need to define “Big Data”. According to Gartner, Big Data is defined as

“high-volume, high-velocity and/or high-variety information assets that demand cost-effective,

innovative forms of information processing that enable enhanced insight, decision making, and

process automation.” Forbes magazine defines it as “a collection of data from traditional and

digital sources that represent a source for ongoing discovery and analysis”.
BIG DATA FUNDAMENTALS – An Introductory 4

Big data is playing a key role in almost every field including healthcare, sports, retail,

entertainment, policy making, business management, social media, etc. Organizations are using

analytics to harness data and using it to identify new opportunities, effective operations, cost

reduction, improve customer satisfaction, etc. Alistair Croll, author of Lean Analytics: Use Data

to Build a Better Startup Faster, refers to Big Data as the new superpower according to his quote,

“With an abundance of information, knowing what to pay attention to is a superpower.” Clearly,

Big Data has become that magical power that is not only at the forefront of all the industries

mentioned but it is also at the forefront of the top ten tech skills in demand. Looking at the graph

below from Google Trends, it is evident that interest in data has grown exponentially; interest in

the topic of big data continues to grow.

Bentley University commissioned a study to find which skills are growing in demand by

looking at millions of job listings posted on more than 40,000 online job sites, jobs analytics firm

Burning Glass determined which skills saw the biggest increases in demand when comparing

2011 to 2015. Based on the information from the study and recent job market reports, Forbes

published a report, Technical Skills with the Biggest Increases in Demand.

According to the report, there will be a 3,977% increase in demand for individuals with

Big Data skills and this was the biggest increase in compared to other technical skills including
BIG DATA FUNDAMENTALS – An Introductory 5

programming languages, data analytics and visualization, Apache Hadoop, etc. To be successful

individuals are expected to have a solid grounding in computing science, be comfortable with

numbers, have an in-depth understanding of particular analytic models and algorithms, and have

an overall idea of different big data visualization and analytics tools. According to an article

published by William Fry in Forbes magazine, the vast majority of companies are making use of

big data as is evident from the chart below.

Given all this information, it is critical that every BSCS program provides a course that

covers all these concepts and is a part of the learning path so that students gain Big Data

knowledge and be well prepared for jobs in this industry. Every field needs to understand big

data and how it can help them do better in their domain – the possibilities are endless when it
BIG DATA FUNDAMENTALS – An Introductory 6

comes to the impact and potential of big data in every possible domain: healthcare, financial,

technical, industrial internet of things, etc.

Our research objective explored this idea of learning about Big Data technologies further.

We did further research and examined what options are available to students related to Big Data

related learning and job skills preparation. If so, are these courses and resources provide the

necessary information that will prepare students better for the Big Data and related jobs.

Furthermore, we explored if these courses are suited and better aligned to the online learning

environment.

After analyzing that information, we designed a “Big Data Fundamentals” course that has

the potential to better prepare BSCS students for the Big Data jobs. The final course layout was

designed keeping in mind the specific needs of the industry and provide BSCS online graduates

with the necessary knowledge, skills, and resources to have successful careers in the Big Data

field. We also explored ways to ensure that this course allows students to always stay in sync

with the current trends in the Big Data field.

Project Goals and Objectives

The goal as mentioned above was to develop a course called “Big Data Fundamentals”. This

is an introductory course in the area of Big Data which will cover many concepts. This course is

intended for individuals who are completely new to the field of Big Data. It will help the learners

understand the basics of Big Data field, increase knowledge of the Big Data and help learners

understand why Big Data has become one of the fastest and most sought-after skill in many other

fields beside technology.


BIG DATA FUNDAMENTALS – An Introductory 7

This course will help learners become comfortable with the terminology and the core concepts

behind big data problems, applications, and systems. It will help students understand how they

can apply and use these skills in their current or future career choices.

At the end of this course, learners will be able to:

• Describe the Big Data field with examples of real world big data problems

• Understand the sources of Big data

• Explain the V’s of Big Data

• Understand how to Get value out of Big Data

• Learn about Data Visualization tools

• Learn about programming models used for scalable big data analysis

• Understand aspects of Data management, storage and transfer technologies

• Become familiar with Data protection and distribution philosophies

Background and Environmental Scan

Data analysis has been performed for hundreds of years in the sense that individuals

looked at information, made connections, and arrived at conclusions to meet their needs. These

needs could be to improve their business transactions, manufacturing processes, or create a

service based on the information. The data was collected via feedback, surveys, observation

notes, etc. Businesses have relied on databases, used data collection, and used data analytics

technologies to spot patterns and predict market trends for a long time. Government uses data

collection and analysis for tax purposes, national census analysis, etc. Since, there are challenges

related to the costs and the process of analyzing and storing such datasets, these data have been

produced in tightly controlled ways using sampling techniques that limit their scope and size.
BIG DATA FUNDAMENTALS – An Introductory 8

However, data industry has transformed drastically in the past decade with the advent of

the big data. The “3 Vs” are the key characteristics that differentiate big data from the data we

knew a decade ago and these are volume, velocity (speed), and variety. Volume refers to the

voluminous data that is generated from various sources. A shopping transaction includes all the

items that an individual bought including details like size, color, etc. This is massive volume

compared to a final sales receipt data that was generated as result of a single transaction.

Another area that has witnessed burgeoning growth is social data and mobile phone data

usage, especially among the teenagers. Velocity refers to the speed at which the data is generated

and analyzed in real time. People are all too familiar with sports statistics that are generated

during each play of a basketball game and displayed in real time. Finally, variability refers to the

variety of data from various sources like data from financial systems, social networking sites

(Facebook, Twitter, etc.).

In the current era of big data and internet of things, for companies like Google and

Facebook “data is the new currency and with the amounts these companies have they could buy

anyone”. Big data is not limited to shopping, entertainment, and sports statistics. It has and can

lead to unimaginable medical breakthroughs that can treat illnesses and help human beings have

a longer and healthier life. With the tools and products available, efforts are underway to treat

human illnesses at the genetic level. The unprecedented growth of companies like Google,

Facebook, and eBay is a testimony that the big data field is poised for continued growth. There is

a huge need for professionals with big data knowledge who can understand and apply their skills

to create innovative products, services, and tools.

Given all the above background information, it is obvious that all these companies are

looking for developing solutions to address the need and also to educate their employees and
BIG DATA FUNDAMENTALS – An Introductory 9

give back to the community in form of open source projects. A number of companies have

developed various solutions using big data and related technologies. According to a study done

by McKinsey, “By the end of 2018, US will face a 50% to 60% gap between requisite demand

and supply of analytic talent” and big data is the fundamental skill needed for such individuals to

fill that critical demand.

To fully comprehend and appreciate these solutions and services, it is important to

understand a few technical terms (Hadoop, types of data) and historical background related to the

developments in the big data field. Data types refer to three different types of data: structured,

unstructured, and semi-structured data. In very simple terms, structured data is organized data or

information stored according to specific rules and follows a data model like date stored in

relational databases, excel spreadsheets, etc. Semi-structured data is organized data that may not

necessarily follow a specific data model or standards like information written or stored in a Word

document. A collection of facts, random dates are examples of unstructured data that is basically

information that is not stored based on predefined data model or rules. Hadoop is an open-source

infrastructure of distributed file system and a MapReduce application development framework

that provides ability to store and process massive amounts of data. Currently the most effective

programming model to process big data is MapReduce, “a batch- oriented parallel computing

model”.

Internet companies like Google and Facebook developed Hadoop because they realized

that they had petabytes of structured, semi-structured, and unstructured data about their users and

the usage of their products. In order to extract value out of this large amount of data, they

decided to create open source projects that would encourage people to find solutions to this

problem. Google published academic papers on MapReduce, which defines MapReduce as a


BIG DATA FUNDAMENTALS – An Introductory 10

programming model and an associated implementation for processing and generating large data

sets. Yahoo created an open source product, Hadoop, based on information published in

Google’s research papers. Facebook created a product, Hive, which provides a SQL interface

over Hadoop. Hive interprets unstructured data as RDBMS tables, allows developers to write

SQL queries over the data, and translates the queries into MapReduce jobs to produce the desired

results. Many corporations are leading efforts to build big data products and services. Teradata,

IBM Big Data Analytics, Mongo DB, Datastax Big Data, Cloudera, Amazon Web Services and

Splunk are few of the key players in this industry. Each company is focused on some aspect of

big data related technology but none have developed a complete platform for big data solution.

Most of them have been working on one specific aspect of big data and developing it into a

robust product or service. For example, Splunk has spent the last few years doing research and

development in the area of machine data. Their product focuses on producing real time analytics

to identify patterns and trends in the machine data. It works with Hadoop and NoSQL data for

analysis and visualization of data that is unstructured.

Based on our initial scan of available courses, we saw many options available to address

the need of teaching Big Data but each with some shortcomings. Either each course focuses too

deep on one topic or misses out on some key concepts. The courses we reviewed include courses

available on MOOCs like Coursera, Udacity, etc. We also reviewed courses available on

corporate learning websites like Hortonworks, Cloudera, and Simplilearn. Simplilearn has a large

array of technical courses like Hadoop, SAS, Apache Spark, and R. We did not find an

introductory course on Big data. The second option is Cloudera which provides very technical

options like CCP Spark and Hadoop Developer certification but each course is very expensive.
BIG DATA FUNDAMENTALS – An Introductory 11

Hortonworks is another popular name in Big Data but these are expensive and do not have any

accreditation.

Most of the courses available in the academia world are based on quarter system or

semester system. After reviewing multiple courses the key concerns identified were: courses

lacked relevant content, concepts were too technical for beginners, did not provide information

about the latest technology trends, or not meant for the target audience with a varying degree of

background and experiences like most of CSUMB BSCS online students.

Stakeholders and Community

The key stakeholders who stand to benefit from this project are:

1. BSCS online students

2. BSCS online Faculty

The first target audience for this project are BSCS online students at CSUMB. Most of the

students enrolled in this program are very motivated and looking to learn and acquire skills in

Computer Science that can be very helpful in the real world. This perspective comes from the

fact that majority of them are working full time and have decided to enroll in this program for

multiple reasons:

• Complete unfinished degree

• Change career path

• Grow in current career and BSCS is critical for that

• Extremely motivated

• Eager to learn and use the learning to be more effective in current job

• Learn about the latest technology


BIG DATA FUNDAMENTALS – An Introductory 12

• Acquire skills to stay at the cutting edge of technology

The other key stakeholders for this project are the faculty who are teaching various courses in the

BSCS program. They are always looking for ways to improve the program and also include

courses that help enhance the learning experience and improve job prospects for their graduates.

At a personal level, it will also help the faculty understand what the students need and how

students learn best if they have a course designed by students who have been a target audience

themselves and understand the challenges of online learning environment. Besides focus on the

subject matter, this course will provide the layout for what online students feel is the optimal

model and blueprint of an online course.

Methodology

We started the process by collecting information from students and their interests. We also

gathered information from students to develop a baseline of their understanding about the field

and their awareness. The focus was to understand how well these students understand the field

and if they do, are they aware of resources available, if any, to help them develop the necessary

knowledge and skills in the area.

Based on our personal experience as undergraduate students in the online program and

preliminary inquiry, we learned that individuals feel overwhelmed by the amount of information

available. Often, they need a simple and easy start before delving deeper. Also, sometimes the

information presented in courses is too technical for an introductory course. We collected input

from students and also examine data available from various resources to ensure that these

concerns are addresses.

Finally, we connected with industry experts to ensure that we have ongoing input during the

process. This allowed us to stay on track and ensure that all the information we collected and
BIG DATA FUNDAMENTALS – An Introductory 13

used in designing the proposed course was in sync with the current trends of the industry. We

also received memberships and attended meetups, conferences related to the field. In some cases,

we were unable to get membership, but the publicly available information provided valuable

information to help us guide through the process.

Mostly, we are relied on open source big-data tools to learn and collect information for our

research project like Hadoop, High Performance Computing Cluster (HPCC), Cassandra,

Elasticsearch, etc.

Experimental Project and Design

Although there are many courses available that provide a lot of information and intended

to help learners understand the concepts, none of them is designed specifically for students and

the target audience which has a unique set of requirements and background. Since BSCS course

at CSUMB is on an eight-week schedule, it makes it different from quarter or semester system.

The course that we developed is aligned to the eight-week schedule without compromising the

quality of learning and the intent of achieving the objectives.

At this point it is more theory and concepts based. However, we have tried to provide

some hands-on experience that is applicable to the course. The goal is to provide enough

exposure to the learners such that they can learn about the field and be able to extend this

learning on their own. We have developed a course that provides the information, tools, and

resources that will help the learners understand about Big Data. It will introduce them to

resources available that they can use to become well versed in the field of Big Data.
BIG DATA FUNDAMENTALS – An Introductory 14

Design Details

Since the goal is to ensure that students are able to fully understand the concepts covered in the

designated time of eight weeks, our goal was to pay specific attention to the material covered in

each module and the length of the trimester, which is eight weeks for the online BSCS students.

Also, the goal was to include relevant resources, videos, and resources for hands on activities to

ensure that we address all learning modalities are addressed.

At a high level, the course is divided into following modules:

Module 1 - What is Big Data?

• Why do we need to learn about Big Data?

• Sources of Big Data

• Characteristics of Big Data

• Dimensions of Scalability

Module 2 – Introduction to Big Data Platform and Data Science

• The Big Data Platform

Module 3 –Big Data and Data Science

• Big Data and Data Science

• Skills for Data Scientists

• The Data Science Process

Module 4 – Introduction to Hadoop

• Introduction to Hadoop Framework


BIG DATA FUNDAMENTALS – An Introductory 15

Module 5 – Introduction to Hadoop

• Understanding MapReduce

Module 6 – Big Data Advanced Topics: Networks

• Data management

Module 7 – Big Data Advanced Topics: Storage

• Storage

• Transfer technologies

Module 8 – Big Data Advanced Topics: Data Protection

• Data protection

• Distribution philosophies

Ethical and Legal Considerations

During the production of our project we did not have to deal with any ethical or legal issues.

Similarly, we did not face any similar issues when the project was implemented.

However, we utilized some resources and technology from open source technology like:

• Hadoop

• Spark

• Databases

o MongoDB

o Cassandra

Other resources that we referred to and included as resources include:


BIG DATA FUNDAMENTALS – An Introductory 16

• Videos: YouTube videos, videos available on academic websites like Khan Academy

Informational Documents

• Images

• Links to textbooks, etc.

Open source resources have their specific licenses and usage restrictions. To ensure that we

are complying with their usage guidelines, we reviewed those licenses and usage restrictions to

ensure that we met all the requirements. Also, some of the videos, online content, images have

copyright and permissions guidelines. If and when we used any such resource, we provided

details of the product or service so that there are no ethical or legal violations to the best of our

intentions even though this project is for educational purpose only.

Final Deliverables

The final deliverable is a course in the format that is used by most CSUMB BSCS courses

hosted on iLearn. For each week, we designed a module with a topic and related videos,

readings, etc. The team worked together to design some form of assessment for each week. Our

original plan was to include formative assessments for each sub-topic within a module.

However, due to time restrictions, we were not able to include formative assessments for

each sub topic within each module. Hence, we made sure that each module has a summative

assessment or check for understanding assessment with key elements that need to be evaluated

for that week.


BIG DATA FUNDAMENTALS – An Introductory 17

Timeline and Budget

Timeline

BIG DATA FUNDAMENTALS TIMELINE

• Research Big Data concepts and related technologies


Week 1
• Reach out to SMEs

• Develop Outline/Modules
Week 2
• Each team member takes ownership of module

• Develop modules FIRST DRAFT


Week 3
• Team meeting: check in/feedback

• Deliver SECOND DRAFT


Week 4
• Team meeting: check in/feedback

Week 5 • Deliver FINAL DRAFT

Week 6 • Testing/ Updates

Week 7 • Testing/ Updates

Week 8 • Release final version of the course

In terms of time allocated, we were able to deliver more than we had planned but it went over

the time we had allocated as individuals and as a team. Our initial plan was to spend about 100-

120 hours total. However, given the updates and changes we had to make based on the results of

usability and testing, our team spent over 40 hours extra to achieve the desired results.

Project Budget

There was no expected cost since the team relied on using open resources and personal

expertise to research and design the modules.


BIG DATA FUNDAMENTALS – An Introductory 18

Usability Testing and Evaluation

Usability testing and evaluation of our course material was done in two phases, internal and

external testing. Each of our team members performed basic internal review of their own

material, and I performed basic internal testing of functionality. For external testing, I was able

to recruit three of my colleagues from work to review the content and functionality of the site.

From initial testing and their feedback, we were able to make significant improvements in the

short term, and develop ideas for the long-term to make the course more valuable for future

students.

Phase One: In-process testing and evaluation

Phase one of ensuring usability of our content started with the initial development of the

course material. For example, my process started with a bullet-point list of what I wanted to

cover in my particular modules - networking and storage basics and fundamental philosophies of

data management. However, our intent was to develop an introductory course for less

technically-inclined people, so I had to make sure I kept the material as basic as possible while

still communicating the necessary information. We were also encouraged to avoid jargon as

much as possible. With this in mind, I iterated through my list multiple times to determine what

was necessary information and what was too technical for inclusion.

Once that was done, the next step was to write out in long-form the same information

contained in the bullet points. This was again iterative, with constant review of the content for

clarity and correctness.

Once the long-form content was created, I worked to convert the bullet-points into a

presentation format, with changes that lead from the development of the long-form document. I

also began reviewing available online content for addition to the course, primarily in the form of
BIG DATA FUNDAMENTALS – An Introductory 19

Youtube videos. With usability in mind I viewed them with an eye toward entertainment and

simplicity as well as content.

With the initial content creation completed for my modules, I evaluated the documentation I

had developed against the framework developed by my team members and found it wholly

inadequate. In software terms, the test failed initial testing so I went back and refactored the

code. I rebuilt my slides and long-form document as three separate modules, broken up into

networks, storage and media technologies, and data management and protection. This module

layout lead to the creation of three quizzes to assess students’ understanding of the material. My

three modules and their associated content were the last to be added to the site. With those three

modules complete, our site was ready for initial functionality testing.

Phase Two: Internal functionality testing

Phase two of testing consisted of basic functional tests of our course platform, which is a

web site hosted on Google Sites. Our initial release for the site consisted of a main page and 6

modules - “Introduction of Big Data”, “Intro to Big Data Platform”, “Introduction to Hadoop”,

“Advanced Topics: Big Data Networks”, “Advanced Topics: Storage Media and Technologies”,

and “Advanced Topics: Data Protection” - along with presentations, documents, and links to

external information and videos within each module. The site was developed within the

CSUMB.edu ecosystem and I performed initial testing and review while logged in to my

CSUMB.edu account. I verified that all material was accessible, that all links worked, and that

all videos played. I performed minimal review of content in this phase of testing.

Checking the site in “incognito” mode in my browser exposed the first issues to be

fixed. Since the site had been built within the CSUMB.edu environment, it was only open to

CSUMB.edu users. Once this was fixed we found a number of other items that had similar
BIG DATA FUNDAMENTALS – An Introductory 20

restrictions, so the three of us went through all of our material and made sure that the

permissions were open to all users.

Phase Three: External Testing and Content Evaluation

For phase three of our testing I recruited three of my colleagues from work. Mark is a

65-year-old male with a master’s degree in computer engineering and over 40 years of

experience in designing systems, including a “Big Data” system, but not as much experience

with the intricacies of networking and storage technologies. Jason is a 45-year-old male with

significant experience as a computer user proficient with applications like Microsoft Office and

Adobe Photoshop but with limited experience in software development, systems design, or

networking and computer components. Cody is a 26-year-old male with an associate’s degree in

computer science; he has experience in software development and he builds and maintains his

own computers, but he has limited exposure to networking and in-depth storage technologies

such as RAID.

Unfortunately, due to issues of schedule and geography I was unable to meet with any

members of the focus group directly as they performed their evaluation, but I did provide them

with a brief explanation of our intent, as well as a description of the goals of the course

assignment. I asked them to evaluate the material for both function - does the site work, can they

access all of the materials - as well as presentation and content. All three of them were asked to

review the material in terms of its applicability and clarity for a less-technical audience.

Cody was able to complete most tasks, but found that there were still permissions issues

that prevented him from accessing some functions and material, such as the assessment for

Module 1. He pointed out that there are what appear to be links to forum discussions, but we

have no forum functionality implemented so the different-colored text simply looks broken.
BIG DATA FUNDAMENTALS – An Introductory 21

Some assessment questions were not covered in the course material, one assessment had the

“Name” entry out of order, and some course material looked like it could be condensed and

optimized for better presentation of information. Overall he felt that the material presented was

clear and informative, and that the course was a good introduction to the topic.

Jason felt that the modules could be better constructed to follow the same format of slide

presentations with embedded video and “forum discussion” articles, rather than direct video

embedding and links to external sites. He preferred an independent presentation of the

information than that of a particular company, and also felt like some modules seemed to be a

little too advanced for a novice student. He also noted that the assessments were not consistent

in their design and function, and that the Autoplay functions of the available Google Slides was

far too fast. Overall, he did feel that much of the material had a good presentation, but that some

of it could use some work to appeal more to the layperson.

Mark was very thorough in his responses. He was very complimentary of the basic style

of the presentations, and felt that the combination of slides, articles, and videos was very

effective in presenting the information. He considered the variety of video and article styles

“refreshing” and appropriate to the material. He liked how the data was broken up into discrete

modules, and felt that the assessments matched up well with the material presented in each

module. He did have minor problems with navigation, both within the Google Slides

presentations and the site as a whole. His largest concern was with better understanding the

audience, timing, and goals of each module, and felt that some of the modules may be too

technical for a more generalized audience. Overall, he found the presentation “effective and

engaging” and “a great opportunity for shared learning”, but he felt that the depth of the modules

should be reviewed against their intended audience.


BIG DATA FUNDAMENTALS – An Introductory 22

Despite not being able to review the material and site design with them in person, I found

their feedback to be incredibly helpful.

Phase Four: Immediate changes

The most significant and immediate change we made was splitting up modules 2 and 3

into four separate modules. This allowed us to better spread out the information therein and

spend more time on the introduction and explanation of the material. Please note that the

comments from our testing group were before this change, so any discussion of modules 2 and 3

apply to modules 2 through 5, and any discussion of modules 4 through 6 apply to modules 6

through 8.

The assessments for modules 1 and 4 were updated immediately following the feedback

received. Module 1’s assessment had its permissions fixed to allow non-CSUMB accounts to

access it, and module 4’s assessment was changed from random presentation to fixed to prevent

the “Name” question from being mixed in with the other quiz questions.

Phase 5: Long-term changes

In the longer term, I would like to revisit the immediate changes with the focus group, as

well as expanding on their evaluation of the content and functionality. I would also like to

expand the focus group to a wider range of individuals with more varied backgrounds and

experience levels. I feel that the material presented can be improved, especially in some of the

slide decks, and that the module assessments can be reevaluated both for content and for

consistency of presentation. I think each module needs to be evaluated against a planned 12-20

hour per week requirement, and the material adjusted accordingly. Finally, as technology

continues to change, the materials presented will have to be updated accordingly.


BIG DATA FUNDAMENTALS – An Introductory 23

Final Implementation

The original plan was to create four modules and that is how we build the first prototype. We

continued working on the same plan but after completing the four modules, we had the peer

review. After several iterations of peer review and brainstorming sessions via email and team

meetings, our team concluded to further divide up the modules by topics. As the team was

contemplating and working on this strategy, we went ahead and did the focus group and user

testing. It was done by a team of users with a range of experience from beginners in technology

field to individuals with over twenty years of experience in Software engineering. The strong

feedback and input from the focus group led us to the decision of organizing the course into eight

modules. As a result of that update, our final implementation resulted in eight modules:

1. Introduction to Big Data

2. Big Data Platforms

3. Big Data and Data Science

4. Introduction to Hadoop

5. Introduction to MapReduce

6. Advanced Topics: Networks

7. Advanced Topics: Storage and Media Technologies

8. Advanced Topics: Data Protection

Currently, the entire course is hosted on Googlesites which can be transferred to any portal

easily. The reason to use Googlesites was that it allowed ease of collaboration, cross team

review, easy updates, availability of history of each document. Screenshots for the completed

modules are provided below.


BIG DATA FUNDAMENTALS – An Introductory 24

Course Website Screenshots


Homepage
BIG DATA FUNDAMENTALS – An Introductory 25

Module 1 - What is Big Data?


BIG DATA FUNDAMENTALS – An Introductory 26
BIG DATA FUNDAMENTALS – An Introductory 27

Module 2 – Intro to Big Data Platform


BIG DATA FUNDAMENTALS – An Introductory 28

Module 3 –Big Data and Data Science


BIG DATA FUNDAMENTALS – An Introductory 29

Module 4 – Introduction to Hadoop

Module 5 – Introduction to MapReduce


BIG DATA FUNDAMENTALS – An Introductory 30

Module 6: Advanced Topics Big Data Networks

Module 7: Advanced Topics Storage Media and Technologies


BIG DATA FUNDAMENTALS – An Introductory 31

Module 8: Advanced Topics Storage Data Protection

Website URL

The course is hosted on Googlesites and the URL to access the course website is:

https://sites.google.com/a/csumb.edu/big-data-fundamentals/
BIG DATA FUNDAMENTALS – An Introductory 32

Conclusion

The need for this project was based on the growing industry need of big data skills

expected of the individuals graduating with a Bachelor of Science in Computer science (BSCS)

degree. We believe, this course meets the critical need of providing students a complete

understanding of the entire landscape of technologies in Big Data which is a growing field and

any BSCS program that is available needs to provide such exposure and understanding to

learners. Current courses available do not meet the needs of BSCS students because of multiple

reasons: not design for online learning modality, based on standard semester or quarter system,

do not take into account the unique needs of eight-week schedule of working students.

Our team’s approach to design the course with the unique eight weeks schedule of the

BSCS online program turned out to be as successful as anticipated. In its current state of eight

modules with videos, slides, white papers and other resources with forum discussion will provide

a thorough understanding of the concepts in the big data field, the technologies used and the

skills needed to be successful in careers in the field.

As a team, we learned that each of us have a critical role to play and no matter where we

are located we can overcome challenges if the goals and objectives are clearly set. As a team, we

made sure that the work breakdown structure was clearly defined and regular reminders and

collaborative tools like Google docs, Google drive or other similar tools are critical for the

success of any team much less a team that functions in different time zones and different

locations. It provided us with the understanding what collaboration is truly in a real-world

scenario where teams are distributed across the globe in time, space, culture, and different skill

levels.
BIG DATA FUNDAMENTALS – An Introductory 33

We anticipate that this course will need further review and changes, it sets the tone on

what are the needs of understanding in Big Data field and how to achieve them. In the long term,

we hope that this course design with its eight modules and resources, along with the extensive

usability testing by a diverse range of individuals will provide the blueprint to extend and build a

“Big Data Specialization” track to meet the needs of the future BSCS and MSCS students.
BIG DATA FUNDAMENTALS – An Introductory 34

References

Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of business intelligence
technology. Communications of the ACM Commun. ACM, 54(8), 88.


Dean, J., & Ghemawat, S. (2008). MapReduce. Communications of the ACM Commun. ACM,
51(1), 107.


Domingos, P. (2012). A few useful things to know about machine learning. Communications of
the ACM Commun. ACM, 55(10), 78.


Lourenço, J. R., Cabral, B., Carreiro, P., Vieira, M., & Bernardino, J. (2015). Choosing the right
NoSQL database for the job: A quality attribute evaluation. Journal of Big Data, 2(1).

Miller, H. J. (2010). The Data Avalanche Is Here. Shouldn’T We Be Digging? Journal of


Regional Science, 50(1), 181-201.


Pybus, J., Cote, M., & Blanke, T. (2015). Hacking the social life of Big Data. Big Data &
Society, 2(2).


Smale, S. (n.d.). The Mathematics of Learning: Dealing With Data. 2005 International
Conference on Neural Networks and Brain.


Teng, P., Li, H., & Zhang, X. (2015). Survey on Visualization Layout for Big Data. Intelligence
Science and Big Data Engineering. Big Data and Machine Learning

Techniques Lecture Notes in Computer Science, 384-394.
Tsai, C., Lai, C., Chao, H., &
Vasilakos, A. V. (2015). Big data analytics: A survey. Journal of Big Data, 2(1).


Wu, X., Zhu, X., Wu, G., & Ding, W. (2014). Data mining with big data. IEEE Trans. Knowl.
Data Eng. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97-107.
BIG DATA FUNDAMENTALS – An Introductory 35

Appendix A

Module Evaluation Form

Module/Module Title:

Date:

Below are a series of statements. Please respond by circling the number you feel most reflects
your opinion.

Agree or
Disagree

Neither

Disagree

Disagree
Strongly

Strongly
Agree

Agree
The module fulfilled the objectives 5 4 3 2 1

The module satisfied my own needs and 5 4 3 2 1

expectations

The content was presented at a level which could 5 4 3 2 1

readily be understood

There was opportunity for group work 5 4 3 2 1

There was opportunity for individual participation 5 4 3 2 1

The material presented had practical relevance 5 4 3 2 1

The module content built on prior learning and 5 4 3 2 1

experience

I was motivated to learn 5 4 3 2 1

Module videos, handouts & texts helped reinforce 5 4 3 2 1

learning

There was a variety of learning material 5 4 3 2 1


BIG DATA FUNDAMENTALS – An Introductory 36

Additional Comments (Please feel free to continue comments overleaf)


Which aspects of the module worked well? __________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
How could the module/module be improved? _________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
_____________________________________________________________________________
Would you recommend this module to others? If not, please outline your reasons.
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Any other comments, ____________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

Name/Signature: (optional) ________________________________

Thank you for taking the time to complete this form.


Your input is an integral part of improving this module.
BIG DATA FUNDAMENTALS – An Introductory 37

Appendix B
Team Members

The entire team was involved in creating all documents and proposals. Besides that, all members

worked to design the outline and develop overview of each module. Once that was complete,

each team member took ownership of the module based on their individual interest and expertise.

Finally, all modules were put together and all team members reviewed the entire course and

provided feedback on individual modules, course flow, topic cohesiveness, etc. The division of

labor and roles were evaluated and assigned based on interest and background in subject matter.

The team members with their roles and responsibilities are as follows:

• Brian Brooks

o With strong background in industrial big data, Brian worked on Advanced topics

modules. He provided support as an SME since he has extensive experience

working with Industrial data and related technologies.

• Devorah Akhamzadeh

o Devorah has experience in programming and interest in learning new topics. She

focused on exploring and doing further research on specific topics that would be

useful to add in the course. She was available a s technical resource as needed.

• Seema Khan

o Seema has experience in managing course development and provided overall

program management expertise. She developing few modules and worked with all

team members to ensure that course development stays on track and the product

was ready on time.


BIG DATA FUNDAMENTALS – An Introductory 38

Appendix C

Usability and Testing Report-Summary

Brian led the effort to conduct the testing and organized a focus group. The focus group

membership was well represented in terms of experience in the technical field, from Associate

degree holder to Master's degree. This is exactly the audience we are targeting for our course.

The feedback was detailed and very helpful.

Given the time frame we had, our team was pleased and appreciative of the focus group's

effort to provide such an exhaustive, thoughtful, and constructive feedback. Our team plans to

address the concerns and will try to make updates in the modules as time permits. However, we

feel comfortable that the course exceeded our expectation from being an introductory blueprint

to a comprehensive course with introductory topics to advanced concepts. This might serve as a

course that can be divided into Introductory and Advanced courses with Modules 1-3 being the

introduction modules and Modules 4, 5, 6 being the next course. Modules 4, 5, 6 will allow

learners to develop an in depth understanding of the field and will help them if they choose to

follow Big Data as a career choice. Eventually, the team decided to divide the module into eight

modules.

Target Audience: Number of Individuals who tested the modules


● 1 Adult male, Mark, (MS in CE, experience in software development)
● 1 Adult male, Jason (Some computer experience, not a developer)
● 1 Adult male, Cody (CS, Associate’s degree)
TASKS:
● Navigate through the module (Note any issues through different topics, videos, links, etc.)
● Respond to the forum discussion (Note any difficulty in understanding)
● Complete the assessment (Note any problem responding to the questions)
● Overall observe and note any points where there is lack of clarity or difficulty

Questions Summary of Response(s) Notes


BIG DATA FUNDAMENTALS – An Introductory 39

Were they able to complete all ● Some errors encountered, but most ● We corrected the
of the tasks? feedback was positive; clear and permissions issues early
informative. on.
● Permissions problems initially ● Need to edit the
prevented focus group members publication settings for
from accessing the material. slide time advancement;
● Some slides advance too quickly in 15-30 seconds would
“play” mode. probably work.
● Some “future” links (forum ● Module 4 assessment was
discussions) are non-functional set to randomize
and could be changed so they don’t questions. Removed the
appear broken. randomization.
● Module 3 assessment link is
broken.
● Module 4 assessment “Name”
entry was in the middle of the quiz
questions.

Did they get stuck along the ● Some information in the ● Need to review content
way? assessments not covered in the and assessments to look
lessons/modules. for discrepancies.
● Some presentations were too ● Work on optimizing the
lengthy and could be refined and presentations.
condensed. ● Might need to work on
● Best modules had a mix of slides, simplifying some content
videos, and text. for laypersons.
● Some modules may be too ● Need to review the video
advanced for the “basic” intent. content for process bugs.
● The modules and assessments ● Need to be clearer on
don’t always follow the same module navigation;
format. perhaps a brief tutorial?
● Could use a little more variety in
video content.
● Some videos sent the user to the
beginning of the slide deck.
● Some trouble with module
navigation.

What will you do to improve ● Our team can work together to try
your project in the short term to refine our presentations so that
before the festival? they look more like one coherent
course rather than three
independent products.
● Need to fix some of the minor
issues and web page formatting.
● We need to review the timing for
each module - is each module one
week?

In the long term? ● Long term maintenance of this


course will require updates as
technologies for dealing with Big
BIG DATA FUNDAMENTALS – An Introductory 40

Data advance.
● We need to review and refine the
depth of some of the modules.
Some don’t quite meet our “non-
technical” goal.
● We could add voiceovers to the
text pages in the slide decks.
BIG DATA FUNDAMENTALS – An Introductory 41

Appendix D

Focus Group Results

Big Data: Usability Testing and Evaluation


Feedback: Cody

Cody is a 27 year old male with significant computer experience. He has an associate’s degree

in computer science, he does some software development at work, and builds and maintains his

own computers. He has limited experience with networking, RAID, and Big Data. This is his

feedback.

Module 1:
- Slides were clear and easy to read, video was descriptive
- Article while informative was quite lengthy for and "indtroduction to Big Data"
-Page mentions forum but no links are given to this forum.
-At time of this review module requires user to request access from Google account to take the
assesment and because of this I was unable to take the assesment.
-Image below: Looks like it should be a hyperlink to the forum? Doesn't work if it is and if it isn't
why do you change font/color for it?=

Module 2:
-Video was informative, articles were throughrogh but not too lengthy.
-Slides had too little information spread across almost 100 slide. Information for most slides
could be bullet points on a larger more condensed slide of information.
Module 3:
-Good information about MapReduce and Hadoop however the videos did start to get repetative
and repeat the same information.
-Assesment asks about Daemons but no information was given about Daemons in the module.
Module 4/5/6:
-All 3 modules contained useful information, and little to no errors that prevented me from
accessing any of the content (assesments, slides, videos, links, etc all worked)
-Videos embedded into the slideshow kept the page neat and condensed.
- Assesment had good questions that pertained to the material. Module 4 assement seemed to
have the "name" question stuffed in the middle of the assesment.
BIG DATA FUNDAMENTALS – An Introductory 42

Big Data: Usability Testing and Evaluation


Feedback: Jason

Jason is a 45 year old male with significant experience as a computer user. He is comfortable

and proficient with applications like Microsoft Office and Adobe Photoshop, and does limited

development-type work in an application called Cimplicity HMI. He is not well versed with the

internal functions of networks or computer systems, but is able to do basic troubleshooting, and

is a user of some Big Data-like systems. This is his feedback.

Brian,

Here's a list of my feedback items:

1) Google Slides play button makes the slides move much too quickly for me to read. You have
to pause and click through manually. Not a huge issue, I know, but something that bugged me.

2) I think all of the modules should be the same as far as structure and layout. I liked to see and
I anticipated seeing the Google Slides on each Module, and was disappointed when some
Modules didn't contain them. I didn't appreciate having to click on a link to another site in order
to view the content or slides. **The only exception to this was "Forum Discussion" items. Having
to click on links to external sites for this kind of information is fine, in my opinion. Having said
that--I definitely appreciated how forum discussion items in Modules 4, 5 and 6 were linked to a
Google Docs document instead of links to websites from external companies. I'd rather see a
top-level summary of the industry rather than a single company's vision.

3) I'm not necessarily satisfied with only YouTube videos as the primary learning content. I could
simply do that on my own. I'm looking to be educated about something. The Modules that have
a mixture of slides and YouTube content made it much easier to learn. Perhaps I want to be
lead by the hand when learning the information presented here, but isn't that the point?

4) Given that I might classify myself as a "layperson" when it comes to this content, I was
confused by some of the Modules. They didn't seem to be basic enough for my understanding. I
felt like some (particularly #2-3) of them made a lot of assumptions, while the others walked me
through the whole process of understanding.
BIG DATA FUNDAMENTALS – An Introductory 43

Big Data: Usability Testing and Evaluation


Feedback: Mark

Mark is a 65 year old male with significant computer hardware and software development

experience. He has a master’s degree in computer engineering and has spent over 40 years

designing, testing, and maintaining data acquisition systems for jet engine and component

testing. He does not have significant exposure to the inner-workings of subsystems like RAID

or networks, but has been involved with the development of a Big Data management system.

This is his feedback.

Hi Brian:

Thanks for giving me the opportunity to provide review/comments on your team’s “Big Data”
project…..

Apologize in advance for the somewhat disjointed nature of my commentary…..

Format:
I liked the style of presentation, and the fact that all modules followed that same style……..
Each Module followed the same style:
- ​Title:
- ​Objectives
o ​By the end of this module, you will understand……
o ​Point 1
o ​Point 2……
- ​Slides
- ​Embedded Videos
- ​Forum Discussion
o ​Articles
o ​Videos
o ​Forum question(s), challenge(s)
- ​Assessment

You might also like