Professional Documents
Culture Documents
Drive better business decisions with an overview of how big data is organized, analyzed, and interpreted. Apply your insights
to real-world problems and questions.
Do you need to understand big data and how it will impact your business? This Specialization is for you. You will gain an
understanding of what insights big data can provide through hands-on experience with the tools and systems used by big
data scientists and engineers. Previous programming experience is not required! You will be guided through the basics of
using Hadoop with MapReduce, Spark, Pig and Hive. By following along with provided code, you will experience how one can
perform predictive modeling and leverage graph analytics to model problems. This specialization will prepare you to ask the
right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex data-
sets. In the final Capstone Project, developed in partnership with data software company Splunk, you’ll apply the skills you
learned to do basic analyses of big data.
* Describe the Big Data landscape including examples of real world big data problems including the three key sources of
Big Data: people, organizations, and sensors.
* Explain the V’s of Big Data (volume, velocity, variety, veracity, valence, and value) and why each impacts data collection,
monitoring, storage, analysis and reporting.
* Get value out of Big Data by using a 5-step process to structure your analysis.
* Identify what are and what are not big data problems and be able to recast big data problems as data science questions.
* Provide an explanation of the architectural components and programming models used for scalable big data analysis.
* Summarize the features and value of core Hadoop stack components including the YARN resource and job management
system, the HDFS file system and the MapReduce programming model.
This course is for those new to data science. No prior programming experience is needed, although the ability to install
applications and utilize a virtual machine is necessary to complete the hands-on assignments.
Hardware Requirements:
(A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your
hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking
Properties; (Mac): Open Overview by clicking on the Apple menu and clicking “About This Mac.” Most computers with 8 GB
RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection
because you will be downloading files up to 4 Gb in size.
Software Requirements:
This course relies on several open-source software tools, including Apache Hadoop. All required software can be download-
ed and installed free of charge. Software requirements include: Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+
VirtualBox 5+.
WEEK 1
Welcome
Welcome to the Big Data Specialization! We're excited for you to get to know us and we're looking forward to learning
about you!
Video · Welcome to the Big Data Specialization Video · Tell us about yourself and learn about
your classmates
Reading · By the end of this course you will be able to... Other · Let's Discuss: Why are you taking this
class?
Reading · Optional: Watch this fun video about
the San Diego Supercomputer Center!
Video · What launched the Big Data era? Video · Getting Started: Where Does Big Data
Come From?
Video · Applications: What makes big data valuable Video · Machine-Generated Data: It's Everywhere
and There's a Lot!
Other · Let's Discuss: What application Video · Big Data Generated By People:
area interests you? The Unstructured Challenge
Video · Example: Saving lives with Big Data Video · Big Data Generated By People:
How Is It Being Used?
Video · Example: Using Big Data to Help Patients Video · Organization-Generated Data:
Structured but often siloed
Video · A Sentiment Analysis Success Story:
Meltwater helping Danone Video · Organization-Generated Data: Benefits
Reading · Did you know?: 25 facts about big data Come From Combining With Other Data Types
Reading · Slides: Machine-Generated Data: Reading · Slides: The Key - Integrating Diverse Dataa
Advantages
Reading · Slides: Big Data Generated By People:
The Unstructured Challenge
Reading · Slides: Big Data Generated By People:
How is it Being Used?
WEEK 2
Characteristic Of Big Data and Dimension of Scalability
You may have heard of the "Big Vs". We'll give examples and descriptions of the commonly discussed 5. But, we want to
propose a 6th V and we'll ask you to practice writing Big Data questions targeting this -- value
Video · Getting Started: Characteristics Of Big Data Reading · Slides: Getting Started
- Characteristics of Big Data
Video · Characteristics of Big Data - Volume Reading · Slides: Characteristics of Big Data - Volume
Reading · What does astronomical scale mean? Reading · Slides: Characteristics of Big Data - Variety
Video · Characteristics of Big Data - Variety Reading · Slides: Characteristics of Big Data - Velocity
Video · Characteristics of Big Data - Velocity Reading · Slides: Characteristics of Big Data - Veracity
Video · Characteristics of Big Data - Veracity Reading · Slides: Characteristics of Big Data - Value
Video · Characteristics of Big Data - Valence Reading · Slides: Characteristics of Big Data - Valence
Video · Data Science: Getting Value out of Big Data Reading · Slides: Getting Value Out of
Big Data
Video · Building a Big Data Strategy Reading · Slides: Building a Big Data
Strategy
Video · How does big data science happen?: Five Components of Reading · Slides: The Five P's of Data
Data Science Science
Reading · Five P's of Data Science Reading · Slides: Asking the Right Questions
Other · Let's Discuss: Thinking more deeply about the Ps Reading · Slides: Steps in the Data Science
Process
Video · Asking the Right Questions Reading · Slides: Step 1 - Acquiring Data
Video · Steps in the Data Science Process Reading · Slides: Step 2B-Preprocessing
Data
Video · Step 1: Acquiring Data Reading · Slides: Step 3-Data Analysis
Video · Step 2-B: Pre-Processing Data Reading · Slides: Step 5-Turning Insights
Into Action
Video · Getting Started: Why worry about foundations? Reading · Slides: Getting Started
-Why Worry About Foundations?
Video · What is a Distributed File System? Reading · Slides: What is a Distributed File System?
Video · Scalable Computing over the Internet Reading · Slides: Scalable Computing Over the
Internet
Video · Programming Models for Big Data Reading · Slides: Programming Models for Big
Data
Video · Hadoop: Why, Where and Who? Video · Cloud Service Models: An Exploration
of Choices
Video · The Hadoop Ecosystem: Welcome to the zoo! Video · Value From Hadoop and Pre-built
Hadoop Images
Video · The Hadoop Distributed File System: A Storage Reading · Slides for Getting Started With
System for Big Data Hadoop
Video · YARN: A Resource Manager for Hadoop
Quiz · Intro to Hadoop
Video · MapReduce: Simple Programming for Big Results
Peer Review · Understand by Doing:
MapReduce
Reading · MapReduce in the Pasta Sauce Example
Reading · Downloading and Installing the
Video · When to Reconsider Hadoop? Cloudera VM Instructions (Mac)
Reading · FAQ
Week 3
Foundations for Big Data Systems and Programming
Big Data requires new programming frameworks and systems. For this course, we don't programming knowledge or
experience -- but we do want to give you a grounding in some of the key concepts.
Video · Getting Started: Why worry about foundations? Reading · Slides: Getting Started
-Why Worry About Foundations?
Video · What is a Distributed File System? Reading · Slides: What is a Distributed File System?
Video · Scalable Computing over the Internet Reading · Slides: Scalable Computing Over the
Internet
Video · Programming Models for Big Data Reading · Slides: Programming Models for Big
Data
Video · Hadoop: Why, Where and Who? Video · Cloud Service Models: An Exploration
of Choices
Video · The Hadoop Ecosystem: Welcome to the zoo! Video · Value From Hadoop and Pre-built
Hadoop Images
Video · The Hadoop Distributed File System: A Storage Reading · Slides for Getting Started With
System for Big Data Hadoop
Video · YARN: A Resource Manager for Hadoop
Quiz · Intro to Hadoop
Video · MapReduce: Simple Programming for Big Results
Peer Review · Understand by Doing:
MapReduce
Reading · MapReduce in the Pasta Sauce Example
Reading · Downloading and Installing the
Video · When to Reconsider Hadoop? Cloudera VM Instructions (Mac)
Reading · FAQ
Subtitles English
This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming
experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the
hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifica-
tions.
Hardware Requirements:
(A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your
hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking
Properties; (Mac): Open Overview by clicking on the Apple menu and clicking “About This Mac.” Most computers with 8 GB
RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection
because you will be downloading files up to 4 Gb in size.
Software Requirements:
This course relies on several open-source software tools, including Apache Hadoop. All required software can be download-
ed and installed free of charge (except for data charges from your internet provider). Software requirements include:
Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+
Week 1
Introduction to Big Data Modeling and Management
Welcome to this course on big data modeling and management. Modeling and managing data is a central focus of all big
data projects. In these lessons we introduce you to the concepts behind big data modeling and management and set the
stage for the remainder of the course.
Video · Welcome to Big Data Modeling and Management Video · Energy Data Management Challenges at ConEd
Video · Why is this a New Course in the Big Data Reading · Slides: Energy Data Management
Specialization? Challenges at ConEd
Other · Getting to know you: Tell us about yourself Video · Gaming Industry Data Management:
and why you are taking this course Q&A with Apmetrix CTO Mark Caldwell
Video · Flight Data Management at FlightStats:
Video · Summary of Introduction to Big Data (Part 1) A Lecture by CTO Chad Berkley
Reading · Slides: Flight Data Management at FlightStats
Video · Summary of Introduction to Big Data (Part 2)
Other · Let's discuss: What are the design criteria in
Video · Summary of Introduction to Big Data (Part 3) the big data applications you have heard?
Other · Let's discuss: What area of big data management interests you most?
Other · Let's discuss: Modeling data in your daily life Video · Exploring the Array Data Model of an Image?
Other · Let's discuss: What area of big data management interests you most?
Video · Vector Space Model Reading · Exploring Vector Data Models with Lucene
Reading · Slides: Vector Space Model Video · Exploring the Lucene Search Engine's Vector
Data Model
Video · Graph Data Model Reading · Exploring Graph Data Models with Gephi
Reading · Slides: Graph Data Model Video · Exploring Graph Data Models with Gephi
Video · Data Model vs. Data Format Reading · Exploring Streaming Sensor Data
Reading · Slides: Data Model vs. Data Format Video · Exploring Streaming Sensor Data
Other · Let's discuss: Analytical tasks to make Catch the Pink Flamingo better
Peer Review · Designing a Data Model for 'Catch the Pink Flamingo'
Other · Let's discuss: Using the data model for Catch the Pink Flamingo
English
*Retrieve data from example database and big data management systems
*Describe the connections between data management operations and the big data processing patterns needed to utilize
them in large-scale analytical applications
*Identify when a big data problem needs data integration
*Execute simple big data integration and processing on Hadoop and Spark platforms
This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming
experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the
hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifica-
tions.
Hardware Requirements:
(A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your
hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking
Properties; (Mac): Open Overview by clicking on the Apple menu and clicking “About This Mac.” Most computers with 8 GB
RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection
because you will be downloading files up to 4 Gb in size.
Software Requirements:
This course relies on several open-source software tools, including Apache Hadoop. All required software can be download-
ed and installed free of charge (except for data charges from your internet provider). Software requirements include:
Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+.
Week 1
Welcome to Big Data Integration and Processing
Welcome to the third course in the Big Data Specialization. This week you will be introduced to basic concepts in big data
integration and processing. You will be guided through installing the Cloudera VM, downloading the data sets to be used for
this course, and learning how to run the Jupyter server.
Video · Why is Big Data Processing Different? Reading · Software Installation Frequently Asked
Questions (FAQ)
Other · Getting to know you: Tell us about yourself and Reading · Instructions for Starting Jupyter
why you are taking this course.
Reading · Slides: Summary & Why Is Big Data
Processing Different
Video · What is Data Retrieval? Part 1 Reading · Querying Relational Data with Postgres
Video · What is Data Retrieval? Part 2 Video · Querying Relational Data with Postgres
Video · Subqueries
Video · Querying JSON Data with MongoDB Reading · Querying Documents in MongoDB
Video · Integration for Multichannel Customer Analytics Video · Installing Splunk Enterprise on Linux
Other · Let's Discuss: Big Data Integration Video · Exploring Splunk Queries
Reading · Slides: Information Integration Reading · Optional: Instructions for Splunk Pivot
Tutorial
Video · Optional: Creating Pivot Reports in Splunk
Video · Big Data Management and Processing Using
Splunk and Datameer
Quiz · Hands-On With Splunk
Video · Why Splunk?
Video · Typical Analytical Operations in Big Data Other · Let's Discuss: Word Count
Pipelines
Other · Let's Discuss: Big Data Pipelines in Your
World
Reading · Big Data Processing Pipelines Slides
Video · Spark Core: Programming In Spark using Reading · Exploring SparkSQL and Spark DataFrames
RDDs in Pipelines
Video · Spark Core: Transformations Video · Exploring SparkSQL and Spark DataFrames
Video · Spark Core: Actions Reading · Instructions for Configuring VirtualBox for
Spark Streaming
Reading · Slides for Module 5 Lesson 1 Reading · Analyzing Sensor Data with Spark Streaming
Reading · Let's Analyze Soccer Tweets! Reading · Analyzing Tweets About Countries
Subtitles English
Video · Welcome to Machine Learning With Other · Getting to Know You: Tell us about yourself and
Big Data why you are taking this course.
Video · Summary of Big Data Integration and Other · Discussion Forum for Course Content Issues
Processing
Video · Machine Learning Process Reading · PDFs of Readings for Week 1 Hands-On
Video · Data Exploration through Summary Statistics Video · Exploring Data with KNIME Plots
Data Preparation
Video · Data Quality Reading · Slides: Data Preparation for Machine Learning
Video · Dimensionality Reduction Quiz · Handling Missing Values in KNIME and Spark Quiz
Other · Domain Knowledge in Data Preparation Reading · PDFs for Data Preparation Hands-On Readings
WEEK 3
Classification
Video · Building and Applying a Classification Model Video · Classification using Decision Tree in KNIME
Reading · Slides: What is Classification? Reading · Instructions for Changing the Number
of Cloudera VM CPUs
Quiz · Classification
WEEK 4
Evaluation of Machine Learning Models
Subtitles English
Want to understand your data network structure and how it changes under different conditions? Curious to know how to
identify closely interacting clusters within a graph? Have you heard of the fast-growing area of graph analytics and want to
learn more? This course gives you a broad overview of the field of graph analytics so you can learn new ways to model,
store, retrieve and analyze graph-structured data.
After completing this course, you will be able to model a problem into a graph database and perform analytical tasks over
the graph in a scalable manner. Better yet, you will be able to apply these techniques to understand the significance of
your data sets for your own projects.
WEEK 1
Welcome to Graph Analytics
Meet your instructor, Amarnath Gupta and learn about the course objectives.
Reading · What to learn in this module Video · Community Analytics and Local Properties
Other · Let's Discuss: Where do you see path Video · Optional Lecture 3: Power Law Graphs
problems in your life?
Video · Optional Lecture 4: Measuring Graph Evolution
Video · Welcome to Graph Analytics Techniques Reading · Basic Queries in Neo4j With Cypher -
Supplementary Resources
Reading · About the Supplementary Resources
Video · Hands-On: Basic Queries in Neo4j With Cypher
- Part 1
Reading · Downloading, Installing, and
Running Neo4j - Supplementary Resources Video · Hands-On: Basic Queries in Neo4j With Cypher
- Part 2
Video · Hands-On: Downloading, Installing, and
Running Neo4j Reading · Path Analytics in Neo4j With Cypher -
Supplementary Resources
Reading · Getting Started With Neo4j -
Supplementary Resources Video · Hands-On: Path Analytics in Neo4j Using Cypher -
Part 1
Video · Hands-On: Getting Started With Neo4j
Video · Hands-On: Path Analytics in Neo4j Using Cypher -
Reading · Adding to and Modifying a Graph - Part 2
Supplementary Resources Reading · Connectivity Analytics in Neo4j with Cypher
Video · Hands-On: Modifying a Graph With Neo4j - Supplementary Resources
Reading · Download datasets used in this Quiz · Quiz: Graph Analytics With Neo4j
Graph Analytics with Neo4j
Reading · Assignment: Practicing Graph Analytics in
Reading · Importing Data Into Neo4j - Neo4j With Cypher
Supplementary Resources
Quiz · Assessment Questions on 'Practicing Graph
Video · Hands-On: Importing Data Into Neo4j Analytics in Neo4j With Cypher'
Reading · FAQ
Reading · Download All Neo4j Supplementary
Resources (PDFs)
WEEK 5
Computing Platforms for Graph Analytics
In the last two modules we have learned about graph analytics and graph data management. This week we will study how
they come together. There are programming models and software frameworks created specifically for graph analytics. In
this module we'll give an introductory tour of these models and frameworks. We will learn to implement what you learned
in Week 2 and build on it using GraphX and Giraph.
Video · Introduction: Large Scale Graph Processing Reading · Hands On: Building a Degree Histogram
Reading
Video · A Parallel Programming Model for Graphs Video · Hands On: Plot the Degree Histogram
Video · Pregel: The System That Changed Graph Reading · Hands On: Plot the Degree Histogram Reading
Processing
Video · Giraph and GraphX Video · Hands On: Network Connectedness and Clustering
Components
Video · Beyond Single Vertex Computation Reading · Hands On: Network Connectedness and
Clustering Components Reading
English
· ·
·
· ·
Reading · Informing business strategies Reading · Practice with PySpark MLlib Clustering
based on client base
Peer Review · Recommending Actions from
Other · Is there only “one way” to cluster a Clustering Analysis
client base?
Reading · Understanding the Peer Review · Graph Analytics With Chat Data Using
Simulated Chat Data Generated by the Scripts Neo4j
Peer Review · Final Project Practice Peer Review · Optional 3-minute video: Splunk opportunity
Video · Congratulations! Some Final Words... Reading · Part 2: Help us connect your video to your LinkedIn profile