Professional Documents
Culture Documents
PIG
What is Pig?
Why Pig?
1 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
Use case
Suppose you have user data in one file, website data in another, and you need to find the top 5
most visited pages by users aged 18 - 25.
ETL
Processing large amount of log data.
Clean bad data.
Research of Raw data:
User audit logs.
Schema may be unknown or inconsistent.
2 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
Example: Sai
Example: (Sai,20)
Example: {(1,2),(3,4)}
Map – A set of Key Value Pairs. Map is represented in a square bracket. The # is used to
separate key and value.
3 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
Pig Program Structure
Script
Pig can run a script that contains pig commands. Example -->pig pig1.pig
Grunt
Embedded
Local mode
Map/Reduce Mode
In this mode, whenever we execute the Pig Latin statements to process the data, a MapReduce
job is invoked in the back-end to perform a particular operation on the data that exists in the
HDFS.
4 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
Pig Architecture
5 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
Pig Latin Relational Operators
LOAD
STORE
DUMP
Filtering
FILTER
DISTINCT
FOREACH...GENERATE
STREAM
JOIN
Sorting
ORDER
LIMIT
6 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
7 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
8 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
9 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
10 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
11 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
12 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
13 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
14 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
15 sairavi.bigdata@gmail.com
99520 29030
BIGDATA
16 sairavi.bigdata@gmail.com
99520 29030