You are on page 1of 3

Apache Spark Training Course Curriculum

Module 1

Introduction to Big Data and Spark

Overview of BigData and Spark

MapReduce limitations

Spark History

Spark Architecture

Spark and Hadoop Advantages

Benefits of Spark + Hadoop

Introduction to Spark Eco-system

Spark Installation
Module 2

Introduction to Scala

Scala foundation

Features of Scala

Setup Spark and Scala on Unbuntu and Windows OS

Install IDE's for Scala

Run Scala Codes on Scala Shell

Understanding Data types in Scala

Implementing Lazy Values

Control Structures

Looping Structures

Functions

Procedures

Collections

Arrays and Array Buffers

Map's, Tuples and Lists


Module 3

Object Oriented Programming in Scala

Implementing Classes

Implementing Getter & Setter

Object & Object Private Fields

Implementing Nested Classes

Using Auxilary Constructor

Primary Constructor

Companion Object

Apply Method

Understanding Packages

Override Methods

Type Checking

Casting

Abstract Classes
Module 4

Functional Programming in Scala

Understanding Functional programming in Scala

Implementing Traits

Layered Traits

Rich Traits

Anonymous Functions

Higher Order Functions

Closures and Currying

Performing File Processing


Module 5

Foundation to Spark

Spark Shell and PySpark

Basic operations on Shell

Spark Java projects

Spark Context and Spark Properties

Persistance in Spark

HDFS data from Spark

Implementing Server Log Analysis using Spark


Module 6

Working with Resilient Distributed DataSets (RDD)

Understanding RDD

Loading data into RDD

Scala RDD, Paired RDD, Double RDD & General RDD Functions

Implementing HadoopRDD, Filtered RDD, Joined RDD

Transformations, Actions and Shared Variables

Spark Operations on YARN

Sequence File Processing

Partitioner and its role in Performance improvement


Module 7

Spark Eco-system - Spark Streaming & Spark SQL

Introduction to Spark Streaming

Introduction to Spark SQL

Querying Files as Tables

Text file Format

JSON file Format

Parquet file Format

Hive and Spark SQL Architecture

Integrating Spark & Apache Hive

Spark SQL performance optimization

Implementing Data visualization in Spark

You might also like