Professional Documents
Culture Documents
CST 499
Executive Summary
This project is based on the growing industry need of big data skills expected of the
individuals graduating with a Bachelor of Science in Computer science (BSCS) degree. Big data
is a growing field and any BSCS program that is available needs to provide this course as an
available for the graduating students. Currently, the courses available do not meet the needs of
BSCS students because of multiple reasons: not design for onlinelearning modality, based on
satndard semester or quarter system, do not take into account the unoque needs of eight week
schedule of working students. As of the cohort, the team did not see any course available to the
students that meets this critical need. This introductory course, “Big Data Fundamentals” that
will meet the needs of the future BSCS online students. It will provide a thorough understanding
of the concepts in the big data field, the technologies used and the skills needed to be successful
Table of Contents
Executive Summary .................................................................................................................. 1
Table of Contents ...................................................................................................................... 2
Introduction .............................................................................................................................. 3
Project Goals and Objectives.................................................................................................... 6
Background and Environmental Scan ..................................................................................... 7
Stakeholders and Community ................................................................................................ 11
Methodology ............................................................................................................................ 12
Experimental Project and Design ........................................................................................... 13
Design Details ....................................................................................................................... 14
Ethical and Legal Considerations........................................................................................... 15
Informational Documents ....................................................................................................... 16
Final Deliverables.................................................................................................................... 16
Timeline and Budget ............................................................................................................... 17
Timeline ................................................................................................................................ 17
Project Budget....................................................................................................................... 17
Usability Testing and Evaluation ........................................................................................... 18
Phase One: In-process testing and evaluation ....................................................................... 18
Phase Two: Internal functionality testing............................................................................... 19
Phase Three: External Testing and Content Evaluation ......................................................... 20
Phase Four: Immediate changes ............................................................................................ 22
Phase 5: Long-term changes ................................................................................................. 22
Final Implementation.............................................................................................................. 23
Course Website Screenshots ................................................................................................... 24
Website URL ......................................................................................................................... 31
Conclusion ............................................................................................................................... 32
Appendix A.............................................................................................................................. 35
Module Evaluation Form....................................................................................................... 35
Appendix B .............................................................................................................................. 37
Team Members ...................................................................................................................... 37
Appendix C.............................................................................................................................. 38
Usability and Testing Report-Summary ................................................................................. 38
Appendix D.............................................................................................................................. 41
Focus Group Results ............................................................................................................. 41
BIG DATA FUNDAMENTALS – An Introductory 3
Introduction
Big data is a growing field and it is not a hype or buzzword that is being used to attract
attention by individuals and corporations. Big data has revolutionized our lives in ways
unimaginable and it has also become one of the fastest growing industries across the globe. It has
become one of the most sought-after skill in the high-tech industry. Corporations and academia
are in pursuit of ways to address the need in terms of developing courses, publishing papers on
the latest and the most cutting-edge technology related to the field.
However, before these corporations and academia world focuses on the deep technical
aspects, it is important to educate individuals who are new to the field about the basic concepts
and technologies related to Big Data. Equally important is to educate individuals about the
resources available to learn and practice these skills without getting completely overwhelmed
and unable to venture to take up careers in the field. Given the ubiquitous need of “Big Data”
skills, it is critical that every student enrolled in the undergraduate program related to technical
field whether it be IT or computer science has the course available through the program.
However, before we delve further into what is available and what is not and the quality of
resources available, we need to define “Big Data”. According to Gartner, Big Data is defined as
innovative forms of information processing that enable enhanced insight, decision making, and
process automation.” Forbes magazine defines it as “a collection of data from traditional and
digital sources that represent a source for ongoing discovery and analysis”.
BIG DATA FUNDAMENTALS – An Introductory 4
Big data is playing a key role in almost every field including healthcare, sports, retail,
entertainment, policy making, business management, social media, etc. Organizations are using
analytics to harness data and using it to identify new opportunities, effective operations, cost
reduction, improve customer satisfaction, etc. Alistair Croll, author of Lean Analytics: Use Data
to Build a Better Startup Faster, refers to Big Data as the new superpower according to his quote,
Big Data has become that magical power that is not only at the forefront of all the industries
mentioned but it is also at the forefront of the top ten tech skills in demand. Looking at the graph
below from Google Trends, it is evident that interest in data has grown exponentially; interest in
Bentley University commissioned a study to find which skills are growing in demand by
looking at millions of job listings posted on more than 40,000 online job sites, jobs analytics firm
Burning Glass determined which skills saw the biggest increases in demand when comparing
2011 to 2015. Based on the information from the study and recent job market reports, Forbes
According to the report, there will be a 3,977% increase in demand for individuals with
Big Data skills and this was the biggest increase in compared to other technical skills including
BIG DATA FUNDAMENTALS – An Introductory 5
programming languages, data analytics and visualization, Apache Hadoop, etc. To be successful
individuals are expected to have a solid grounding in computing science, be comfortable with
numbers, have an in-depth understanding of particular analytic models and algorithms, and have
an overall idea of different big data visualization and analytics tools. According to an article
published by William Fry in Forbes magazine, the vast majority of companies are making use of
Given all this information, it is critical that every BSCS program provides a course that
covers all these concepts and is a part of the learning path so that students gain Big Data
knowledge and be well prepared for jobs in this industry. Every field needs to understand big
data and how it can help them do better in their domain – the possibilities are endless when it
BIG DATA FUNDAMENTALS – An Introductory 6
comes to the impact and potential of big data in every possible domain: healthcare, financial,
Our research objective explored this idea of learning about Big Data technologies further.
We did further research and examined what options are available to students related to Big Data
related learning and job skills preparation. If so, are these courses and resources provide the
necessary information that will prepare students better for the Big Data and related jobs.
Furthermore, we explored if these courses are suited and better aligned to the online learning
environment.
After analyzing that information, we designed a “Big Data Fundamentals” course that has
the potential to better prepare BSCS students for the Big Data jobs. The final course layout was
designed keeping in mind the specific needs of the industry and provide BSCS online graduates
with the necessary knowledge, skills, and resources to have successful careers in the Big Data
field. We also explored ways to ensure that this course allows students to always stay in sync
The goal as mentioned above was to develop a course called “Big Data Fundamentals”. This
is an introductory course in the area of Big Data which will cover many concepts. This course is
intended for individuals who are completely new to the field of Big Data. It will help the learners
understand the basics of Big Data field, increase knowledge of the Big Data and help learners
understand why Big Data has become one of the fastest and most sought-after skill in many other
This course will help learners become comfortable with the terminology and the core concepts
behind big data problems, applications, and systems. It will help students understand how they
can apply and use these skills in their current or future career choices.
• Describe the Big Data field with examples of real world big data problems
• Learn about programming models used for scalable big data analysis
Data analysis has been performed for hundreds of years in the sense that individuals
looked at information, made connections, and arrived at conclusions to meet their needs. These
service based on the information. The data was collected via feedback, surveys, observation
notes, etc. Businesses have relied on databases, used data collection, and used data analytics
technologies to spot patterns and predict market trends for a long time. Government uses data
collection and analysis for tax purposes, national census analysis, etc. Since, there are challenges
related to the costs and the process of analyzing and storing such datasets, these data have been
produced in tightly controlled ways using sampling techniques that limit their scope and size.
BIG DATA FUNDAMENTALS – An Introductory 8
However, data industry has transformed drastically in the past decade with the advent of
the big data. The “3 Vs” are the key characteristics that differentiate big data from the data we
knew a decade ago and these are volume, velocity (speed), and variety. Volume refers to the
voluminous data that is generated from various sources. A shopping transaction includes all the
items that an individual bought including details like size, color, etc. This is massive volume
compared to a final sales receipt data that was generated as result of a single transaction.
Another area that has witnessed burgeoning growth is social data and mobile phone data
usage, especially among the teenagers. Velocity refers to the speed at which the data is generated
and analyzed in real time. People are all too familiar with sports statistics that are generated
during each play of a basketball game and displayed in real time. Finally, variability refers to the
variety of data from various sources like data from financial systems, social networking sites
In the current era of big data and internet of things, for companies like Google and
Facebook “data is the new currency and with the amounts these companies have they could buy
anyone”. Big data is not limited to shopping, entertainment, and sports statistics. It has and can
lead to unimaginable medical breakthroughs that can treat illnesses and help human beings have
a longer and healthier life. With the tools and products available, efforts are underway to treat
human illnesses at the genetic level. The unprecedented growth of companies like Google,
Facebook, and eBay is a testimony that the big data field is poised for continued growth. There is
a huge need for professionals with big data knowledge who can understand and apply their skills
Given all the above background information, it is obvious that all these companies are
looking for developing solutions to address the need and also to educate their employees and
BIG DATA FUNDAMENTALS – An Introductory 9
give back to the community in form of open source projects. A number of companies have
developed various solutions using big data and related technologies. According to a study done
by McKinsey, “By the end of 2018, US will face a 50% to 60% gap between requisite demand
and supply of analytic talent” and big data is the fundamental skill needed for such individuals to
understand a few technical terms (Hadoop, types of data) and historical background related to the
developments in the big data field. Data types refer to three different types of data: structured,
unstructured, and semi-structured data. In very simple terms, structured data is organized data or
information stored according to specific rules and follows a data model like date stored in
relational databases, excel spreadsheets, etc. Semi-structured data is organized data that may not
necessarily follow a specific data model or standards like information written or stored in a Word
document. A collection of facts, random dates are examples of unstructured data that is basically
information that is not stored based on predefined data model or rules. Hadoop is an open-source
that provides ability to store and process massive amounts of data. Currently the most effective
programming model to process big data is MapReduce, “a batch- oriented parallel computing
model”.
Internet companies like Google and Facebook developed Hadoop because they realized
that they had petabytes of structured, semi-structured, and unstructured data about their users and
the usage of their products. In order to extract value out of this large amount of data, they
decided to create open source projects that would encourage people to find solutions to this
programming model and an associated implementation for processing and generating large data
sets. Yahoo created an open source product, Hadoop, based on information published in
Google’s research papers. Facebook created a product, Hive, which provides a SQL interface
over Hadoop. Hive interprets unstructured data as RDBMS tables, allows developers to write
SQL queries over the data, and translates the queries into MapReduce jobs to produce the desired
results. Many corporations are leading efforts to build big data products and services. Teradata,
IBM Big Data Analytics, Mongo DB, Datastax Big Data, Cloudera, Amazon Web Services and
Splunk are few of the key players in this industry. Each company is focused on some aspect of
big data related technology but none have developed a complete platform for big data solution.
Most of them have been working on one specific aspect of big data and developing it into a
robust product or service. For example, Splunk has spent the last few years doing research and
development in the area of machine data. Their product focuses on producing real time analytics
to identify patterns and trends in the machine data. It works with Hadoop and NoSQL data for
Based on our initial scan of available courses, we saw many options available to address
the need of teaching Big Data but each with some shortcomings. Either each course focuses too
deep on one topic or misses out on some key concepts. The courses we reviewed include courses
available on MOOCs like Coursera, Udacity, etc. We also reviewed courses available on
corporate learning websites like Hortonworks, Cloudera, and Simplilearn. Simplilearn has a large
array of technical courses like Hadoop, SAS, Apache Spark, and R. We did not find an
introductory course on Big data. The second option is Cloudera which provides very technical
options like CCP Spark and Hadoop Developer certification but each course is very expensive.
BIG DATA FUNDAMENTALS – An Introductory 11
Hortonworks is another popular name in Big Data but these are expensive and do not have any
accreditation.
Most of the courses available in the academia world are based on quarter system or
semester system. After reviewing multiple courses the key concerns identified were: courses
lacked relevant content, concepts were too technical for beginners, did not provide information
about the latest technology trends, or not meant for the target audience with a varying degree of
The key stakeholders who stand to benefit from this project are:
The first target audience for this project are BSCS online students at CSUMB. Most of the
students enrolled in this program are very motivated and looking to learn and acquire skills in
Computer Science that can be very helpful in the real world. This perspective comes from the
fact that majority of them are working full time and have decided to enroll in this program for
multiple reasons:
• Extremely motivated
• Eager to learn and use the learning to be more effective in current job
The other key stakeholders for this project are the faculty who are teaching various courses in the
BSCS program. They are always looking for ways to improve the program and also include
courses that help enhance the learning experience and improve job prospects for their graduates.
At a personal level, it will also help the faculty understand what the students need and how
students learn best if they have a course designed by students who have been a target audience
themselves and understand the challenges of online learning environment. Besides focus on the
subject matter, this course will provide the layout for what online students feel is the optimal
Methodology
We started the process by collecting information from students and their interests. We also
gathered information from students to develop a baseline of their understanding about the field
and their awareness. The focus was to understand how well these students understand the field
and if they do, are they aware of resources available, if any, to help them develop the necessary
Based on our personal experience as undergraduate students in the online program and
preliminary inquiry, we learned that individuals feel overwhelmed by the amount of information
available. Often, they need a simple and easy start before delving deeper. Also, sometimes the
information presented in courses is too technical for an introductory course. We collected input
from students and also examine data available from various resources to ensure that these
Finally, we connected with industry experts to ensure that we have ongoing input during the
process. This allowed us to stay on track and ensure that all the information we collected and
BIG DATA FUNDAMENTALS – An Introductory 13
used in designing the proposed course was in sync with the current trends of the industry. We
also received memberships and attended meetups, conferences related to the field. In some cases,
we were unable to get membership, but the publicly available information provided valuable
Mostly, we are relied on open source big-data tools to learn and collect information for our
research project like Hadoop, High Performance Computing Cluster (HPCC), Cassandra,
Elasticsearch, etc.
Although there are many courses available that provide a lot of information and intended
to help learners understand the concepts, none of them is designed specifically for students and
the target audience which has a unique set of requirements and background. Since BSCS course
The course that we developed is aligned to the eight-week schedule without compromising the
At this point it is more theory and concepts based. However, we have tried to provide
some hands-on experience that is applicable to the course. The goal is to provide enough
exposure to the learners such that they can learn about the field and be able to extend this
learning on their own. We have developed a course that provides the information, tools, and
resources that will help the learners understand about Big Data. It will introduce them to
resources available that they can use to become well versed in the field of Big Data.
BIG DATA FUNDAMENTALS – An Introductory 14
Design Details
Since the goal is to ensure that students are able to fully understand the concepts covered in the
designated time of eight weeks, our goal was to pay specific attention to the material covered in
each module and the length of the trimester, which is eight weeks for the online BSCS students.
Also, the goal was to include relevant resources, videos, and resources for hands on activities to
• Dimensions of Scalability
• Understanding MapReduce
• Data management
• Storage
• Transfer technologies
• Data protection
• Distribution philosophies
During the production of our project we did not have to deal with any ethical or legal issues.
Similarly, we did not face any similar issues when the project was implemented.
However, we utilized some resources and technology from open source technology like:
• Hadoop
• Spark
• Databases
o MongoDB
o Cassandra
• Videos: YouTube videos, videos available on academic websites like Khan Academy
Informational Documents
• Images
Open source resources have their specific licenses and usage restrictions. To ensure that we
are complying with their usage guidelines, we reviewed those licenses and usage restrictions to
ensure that we met all the requirements. Also, some of the videos, online content, images have
copyright and permissions guidelines. If and when we used any such resource, we provided
details of the product or service so that there are no ethical or legal violations to the best of our
Final Deliverables
The final deliverable is a course in the format that is used by most CSUMB BSCS courses
hosted on iLearn. For each week, we designed a module with a topic and related videos,
readings, etc. The team worked together to design some form of assessment for each week. Our
original plan was to include formative assessments for each sub-topic within a module.
However, due to time restrictions, we were not able to include formative assessments for
each sub topic within each module. Hence, we made sure that each module has a summative
assessment or check for understanding assessment with key elements that need to be evaluated
Timeline
• Develop Outline/Modules
Week 2
• Each team member takes ownership of module
In terms of time allocated, we were able to deliver more than we had planned but it went over
the time we had allocated as individuals and as a team. Our initial plan was to spend about 100-
120 hours total. However, given the updates and changes we had to make based on the results of
usability and testing, our team spent over 40 hours extra to achieve the desired results.
Project Budget
There was no expected cost since the team relied on using open resources and personal
Usability testing and evaluation of our course material was done in two phases, internal and
external testing. Each of our team members performed basic internal review of their own
material, and I performed basic internal testing of functionality. For external testing, I was able
to recruit three of my colleagues from work to review the content and functionality of the site.
From initial testing and their feedback, we were able to make significant improvements in the
short term, and develop ideas for the long-term to make the course more valuable for future
students.
Phase one of ensuring usability of our content started with the initial development of the
course material. For example, my process started with a bullet-point list of what I wanted to
cover in my particular modules - networking and storage basics and fundamental philosophies of
data management. However, our intent was to develop an introductory course for less
technically-inclined people, so I had to make sure I kept the material as basic as possible while
still communicating the necessary information. We were also encouraged to avoid jargon as
much as possible. With this in mind, I iterated through my list multiple times to determine what
was necessary information and what was too technical for inclusion.
Once that was done, the next step was to write out in long-form the same information
contained in the bullet points. This was again iterative, with constant review of the content for
Once the long-form content was created, I worked to convert the bullet-points into a
presentation format, with changes that lead from the development of the long-form document. I
also began reviewing available online content for addition to the course, primarily in the form of
BIG DATA FUNDAMENTALS – An Introductory 19
Youtube videos. With usability in mind I viewed them with an eye toward entertainment and
With the initial content creation completed for my modules, I evaluated the documentation I
had developed against the framework developed by my team members and found it wholly
inadequate. In software terms, the test failed initial testing so I went back and refactored the
code. I rebuilt my slides and long-form document as three separate modules, broken up into
networks, storage and media technologies, and data management and protection. This module
layout lead to the creation of three quizzes to assess students’ understanding of the material. My
three modules and their associated content were the last to be added to the site. With those three
modules complete, our site was ready for initial functionality testing.
Phase two of testing consisted of basic functional tests of our course platform, which is a
web site hosted on Google Sites. Our initial release for the site consisted of a main page and 6
modules - “Introduction of Big Data”, “Intro to Big Data Platform”, “Introduction to Hadoop”,
“Advanced Topics: Big Data Networks”, “Advanced Topics: Storage Media and Technologies”,
and “Advanced Topics: Data Protection” - along with presentations, documents, and links to
external information and videos within each module. The site was developed within the
CSUMB.edu ecosystem and I performed initial testing and review while logged in to my
CSUMB.edu account. I verified that all material was accessible, that all links worked, and that
all videos played. I performed minimal review of content in this phase of testing.
Checking the site in “incognito” mode in my browser exposed the first issues to be
fixed. Since the site had been built within the CSUMB.edu environment, it was only open to
CSUMB.edu users. Once this was fixed we found a number of other items that had similar
BIG DATA FUNDAMENTALS – An Introductory 20
restrictions, so the three of us went through all of our material and made sure that the
For phase three of our testing I recruited three of my colleagues from work. Mark is a
65-year-old male with a master’s degree in computer engineering and over 40 years of
experience in designing systems, including a “Big Data” system, but not as much experience
with the intricacies of networking and storage technologies. Jason is a 45-year-old male with
significant experience as a computer user proficient with applications like Microsoft Office and
Adobe Photoshop but with limited experience in software development, systems design, or
networking and computer components. Cody is a 26-year-old male with an associate’s degree in
computer science; he has experience in software development and he builds and maintains his
own computers, but he has limited exposure to networking and in-depth storage technologies
such as RAID.
Unfortunately, due to issues of schedule and geography I was unable to meet with any
members of the focus group directly as they performed their evaluation, but I did provide them
with a brief explanation of our intent, as well as a description of the goals of the course
assignment. I asked them to evaluate the material for both function - does the site work, can they
access all of the materials - as well as presentation and content. All three of them were asked to
review the material in terms of its applicability and clarity for a less-technical audience.
Cody was able to complete most tasks, but found that there were still permissions issues
that prevented him from accessing some functions and material, such as the assessment for
Module 1. He pointed out that there are what appear to be links to forum discussions, but we
have no forum functionality implemented so the different-colored text simply looks broken.
BIG DATA FUNDAMENTALS – An Introductory 21
Some assessment questions were not covered in the course material, one assessment had the
“Name” entry out of order, and some course material looked like it could be condensed and
optimized for better presentation of information. Overall he felt that the material presented was
clear and informative, and that the course was a good introduction to the topic.
Jason felt that the modules could be better constructed to follow the same format of slide
presentations with embedded video and “forum discussion” articles, rather than direct video
information than that of a particular company, and also felt like some modules seemed to be a
little too advanced for a novice student. He also noted that the assessments were not consistent
in their design and function, and that the Autoplay functions of the available Google Slides was
far too fast. Overall, he did feel that much of the material had a good presentation, but that some
Mark was very thorough in his responses. He was very complimentary of the basic style
of the presentations, and felt that the combination of slides, articles, and videos was very
effective in presenting the information. He considered the variety of video and article styles
“refreshing” and appropriate to the material. He liked how the data was broken up into discrete
modules, and felt that the assessments matched up well with the material presented in each
module. He did have minor problems with navigation, both within the Google Slides
presentations and the site as a whole. His largest concern was with better understanding the
audience, timing, and goals of each module, and felt that some of the modules may be too
technical for a more generalized audience. Overall, he found the presentation “effective and
engaging” and “a great opportunity for shared learning”, but he felt that the depth of the modules
Despite not being able to review the material and site design with them in person, I found
The most significant and immediate change we made was splitting up modules 2 and 3
into four separate modules. This allowed us to better spread out the information therein and
spend more time on the introduction and explanation of the material. Please note that the
comments from our testing group were before this change, so any discussion of modules 2 and 3
apply to modules 2 through 5, and any discussion of modules 4 through 6 apply to modules 6
through 8.
The assessments for modules 1 and 4 were updated immediately following the feedback
received. Module 1’s assessment had its permissions fixed to allow non-CSUMB accounts to
access it, and module 4’s assessment was changed from random presentation to fixed to prevent
the “Name” question from being mixed in with the other quiz questions.
In the longer term, I would like to revisit the immediate changes with the focus group, as
well as expanding on their evaluation of the content and functionality. I would also like to
expand the focus group to a wider range of individuals with more varied backgrounds and
experience levels. I feel that the material presented can be improved, especially in some of the
slide decks, and that the module assessments can be reevaluated both for content and for
consistency of presentation. I think each module needs to be evaluated against a planned 12-20
hour per week requirement, and the material adjusted accordingly. Finally, as technology
Final Implementation
The original plan was to create four modules and that is how we build the first prototype. We
continued working on the same plan but after completing the four modules, we had the peer
review. After several iterations of peer review and brainstorming sessions via email and team
meetings, our team concluded to further divide up the modules by topics. As the team was
contemplating and working on this strategy, we went ahead and did the focus group and user
testing. It was done by a team of users with a range of experience from beginners in technology
field to individuals with over twenty years of experience in Software engineering. The strong
feedback and input from the focus group led us to the decision of organizing the course into eight
modules. As a result of that update, our final implementation resulted in eight modules:
4. Introduction to Hadoop
5. Introduction to MapReduce
Currently, the entire course is hosted on Googlesites which can be transferred to any portal
easily. The reason to use Googlesites was that it allowed ease of collaboration, cross team
review, easy updates, availability of history of each document. Screenshots for the completed
Website URL
The course is hosted on Googlesites and the URL to access the course website is:
https://sites.google.com/a/csumb.edu/big-data-fundamentals/
BIG DATA FUNDAMENTALS – An Introductory 32
Conclusion
The need for this project was based on the growing industry need of big data skills
expected of the individuals graduating with a Bachelor of Science in Computer science (BSCS)
degree. We believe, this course meets the critical need of providing students a complete
understanding of the entire landscape of technologies in Big Data which is a growing field and
any BSCS program that is available needs to provide such exposure and understanding to
learners. Current courses available do not meet the needs of BSCS students because of multiple
reasons: not design for online learning modality, based on standard semester or quarter system,
do not take into account the unique needs of eight-week schedule of working students.
Our team’s approach to design the course with the unique eight weeks schedule of the
BSCS online program turned out to be as successful as anticipated. In its current state of eight
modules with videos, slides, white papers and other resources with forum discussion will provide
a thorough understanding of the concepts in the big data field, the technologies used and the
As a team, we learned that each of us have a critical role to play and no matter where we
are located we can overcome challenges if the goals and objectives are clearly set. As a team, we
made sure that the work breakdown structure was clearly defined and regular reminders and
collaborative tools like Google docs, Google drive or other similar tools are critical for the
success of any team much less a team that functions in different time zones and different
scenario where teams are distributed across the globe in time, space, culture, and different skill
levels.
BIG DATA FUNDAMENTALS – An Introductory 33
We anticipate that this course will need further review and changes, it sets the tone on
what are the needs of understanding in Big Data field and how to achieve them. In the long term,
we hope that this course design with its eight modules and resources, along with the extensive
usability testing by a diverse range of individuals will provide the blueprint to extend and build a
“Big Data Specialization” track to meet the needs of the future BSCS and MSCS students.
BIG DATA FUNDAMENTALS – An Introductory 34
References
Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of business intelligence
technology. Communications of the ACM Commun. ACM, 54(8), 88.
Dean, J., & Ghemawat, S. (2008). MapReduce. Communications of the ACM Commun. ACM,
51(1), 107.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of
the ACM Commun. ACM, 55(10), 78.
Lourenço, J. R., Cabral, B., Carreiro, P., Vieira, M., & Bernardino, J. (2015). Choosing the right
NoSQL database for the job: A quality attribute evaluation. Journal of Big Data, 2(1).
Pybus, J., Cote, M., & Blanke, T. (2015). Hacking the social life of Big Data. Big Data &
Society, 2(2).
Smale, S. (n.d.). The Mathematics of Learning: Dealing With Data. 2005 International
Conference on Neural Networks and Brain.
Teng, P., Li, H., & Zhang, X. (2015). Survey on Visualization Layout for Big Data. Intelligence
Science and Big Data Engineering. Big Data and Machine Learning
Techniques Lecture Notes in Computer Science, 384-394.
Tsai, C., Lai, C., Chao, H., &
Vasilakos, A. V. (2015). Big data analytics: A survey. Journal of Big Data, 2(1).
Wu, X., Zhu, X., Wu, G., & Ding, W. (2014). Data mining with big data. IEEE Trans. Knowl.
Data Eng. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97-107.
BIG DATA FUNDAMENTALS – An Introductory 35
Appendix A
Module/Module Title:
Date:
Below are a series of statements. Please respond by circling the number you feel most reflects
your opinion.
Agree or
Disagree
Neither
Disagree
Disagree
Strongly
Strongly
Agree
Agree
The module fulfilled the objectives 5 4 3 2 1
expectations
readily be understood
experience
learning
Appendix B
Team Members
The entire team was involved in creating all documents and proposals. Besides that, all members
worked to design the outline and develop overview of each module. Once that was complete,
each team member took ownership of the module based on their individual interest and expertise.
Finally, all modules were put together and all team members reviewed the entire course and
provided feedback on individual modules, course flow, topic cohesiveness, etc. The division of
labor and roles were evaluated and assigned based on interest and background in subject matter.
The team members with their roles and responsibilities are as follows:
• Brian Brooks
o With strong background in industrial big data, Brian worked on Advanced topics
• Devorah Akhamzadeh
o Devorah has experience in programming and interest in learning new topics. She
focused on exploring and doing further research on specific topics that would be
useful to add in the course. She was available a s technical resource as needed.
• Seema Khan
program management expertise. She developing few modules and worked with all
team members to ensure that course development stays on track and the product
Appendix C
Brian led the effort to conduct the testing and organized a focus group. The focus group
membership was well represented in terms of experience in the technical field, from Associate
degree holder to Master's degree. This is exactly the audience we are targeting for our course.
Given the time frame we had, our team was pleased and appreciative of the focus group's
effort to provide such an exhaustive, thoughtful, and constructive feedback. Our team plans to
address the concerns and will try to make updates in the modules as time permits. However, we
feel comfortable that the course exceeded our expectation from being an introductory blueprint
to a comprehensive course with introductory topics to advanced concepts. This might serve as a
course that can be divided into Introductory and Advanced courses with Modules 1-3 being the
introduction modules and Modules 4, 5, 6 being the next course. Modules 4, 5, 6 will allow
learners to develop an in depth understanding of the field and will help them if they choose to
follow Big Data as a career choice. Eventually, the team decided to divide the module into eight
modules.
Were they able to complete all ● Some errors encountered, but most ● We corrected the
of the tasks? feedback was positive; clear and permissions issues early
informative. on.
● Permissions problems initially ● Need to edit the
prevented focus group members publication settings for
from accessing the material. slide time advancement;
● Some slides advance too quickly in 15-30 seconds would
“play” mode. probably work.
● Some “future” links (forum ● Module 4 assessment was
discussions) are non-functional set to randomize
and could be changed so they don’t questions. Removed the
appear broken. randomization.
● Module 3 assessment link is
broken.
● Module 4 assessment “Name”
entry was in the middle of the quiz
questions.
Did they get stuck along the ● Some information in the ● Need to review content
way? assessments not covered in the and assessments to look
lessons/modules. for discrepancies.
● Some presentations were too ● Work on optimizing the
lengthy and could be refined and presentations.
condensed. ● Might need to work on
● Best modules had a mix of slides, simplifying some content
videos, and text. for laypersons.
● Some modules may be too ● Need to review the video
advanced for the “basic” intent. content for process bugs.
● The modules and assessments ● Need to be clearer on
don’t always follow the same module navigation;
format. perhaps a brief tutorial?
● Could use a little more variety in
video content.
● Some videos sent the user to the
beginning of the slide deck.
● Some trouble with module
navigation.
What will you do to improve ● Our team can work together to try
your project in the short term to refine our presentations so that
before the festival? they look more like one coherent
course rather than three
independent products.
● Need to fix some of the minor
issues and web page formatting.
● We need to review the timing for
each module - is each module one
week?
Data advance.
● We need to review and refine the
depth of some of the modules.
Some don’t quite meet our “non-
technical” goal.
● We could add voiceovers to the
text pages in the slide decks.
BIG DATA FUNDAMENTALS – An Introductory 41
Appendix D
Cody is a 27 year old male with significant computer experience. He has an associate’s degree
in computer science, he does some software development at work, and builds and maintains his
own computers. He has limited experience with networking, RAID, and Big Data. This is his
feedback.
Module 1:
- Slides were clear and easy to read, video was descriptive
- Article while informative was quite lengthy for and "indtroduction to Big Data"
-Page mentions forum but no links are given to this forum.
-At time of this review module requires user to request access from Google account to take the
assesment and because of this I was unable to take the assesment.
-Image below: Looks like it should be a hyperlink to the forum? Doesn't work if it is and if it isn't
why do you change font/color for it?=
Module 2:
-Video was informative, articles were throughrogh but not too lengthy.
-Slides had too little information spread across almost 100 slide. Information for most slides
could be bullet points on a larger more condensed slide of information.
Module 3:
-Good information about MapReduce and Hadoop however the videos did start to get repetative
and repeat the same information.
-Assesment asks about Daemons but no information was given about Daemons in the module.
Module 4/5/6:
-All 3 modules contained useful information, and little to no errors that prevented me from
accessing any of the content (assesments, slides, videos, links, etc all worked)
-Videos embedded into the slideshow kept the page neat and condensed.
- Assesment had good questions that pertained to the material. Module 4 assement seemed to
have the "name" question stuffed in the middle of the assesment.
BIG DATA FUNDAMENTALS – An Introductory 42
Jason is a 45 year old male with significant experience as a computer user. He is comfortable
and proficient with applications like Microsoft Office and Adobe Photoshop, and does limited
development-type work in an application called Cimplicity HMI. He is not well versed with the
internal functions of networks or computer systems, but is able to do basic troubleshooting, and
Brian,
1) Google Slides play button makes the slides move much too quickly for me to read. You have
to pause and click through manually. Not a huge issue, I know, but something that bugged me.
2) I think all of the modules should be the same as far as structure and layout. I liked to see and
I anticipated seeing the Google Slides on each Module, and was disappointed when some
Modules didn't contain them. I didn't appreciate having to click on a link to another site in order
to view the content or slides. **The only exception to this was "Forum Discussion" items. Having
to click on links to external sites for this kind of information is fine, in my opinion. Having said
that--I definitely appreciated how forum discussion items in Modules 4, 5 and 6 were linked to a
Google Docs document instead of links to websites from external companies. I'd rather see a
top-level summary of the industry rather than a single company's vision.
3) I'm not necessarily satisfied with only YouTube videos as the primary learning content. I could
simply do that on my own. I'm looking to be educated about something. The Modules that have
a mixture of slides and YouTube content made it much easier to learn. Perhaps I want to be
lead by the hand when learning the information presented here, but isn't that the point?
4) Given that I might classify myself as a "layperson" when it comes to this content, I was
confused by some of the Modules. They didn't seem to be basic enough for my understanding. I
felt like some (particularly #2-3) of them made a lot of assumptions, while the others walked me
through the whole process of understanding.
BIG DATA FUNDAMENTALS – An Introductory 43
Mark is a 65 year old male with significant computer hardware and software development
experience. He has a master’s degree in computer engineering and has spent over 40 years
designing, testing, and maintaining data acquisition systems for jet engine and component
testing. He does not have significant exposure to the inner-workings of subsystems like RAID
or networks, but has been involved with the development of a Big Data management system.
Hi Brian:
Thanks for giving me the opportunity to provide review/comments on your team’s “Big Data”
project…..
Format:
I liked the style of presentation, and the fact that all modules followed that same style……..
Each Module followed the same style:
- Title:
- Objectives
o By the end of this module, you will understand……
o Point 1
o Point 2……
- Slides
- Embedded Videos
- Forum Discussion
o Articles
o Videos
o Forum question(s), challenge(s)
- Assessment