You are on page 1of 24

PRESENTED BY: Pradeep Devarasetty

Md Abdul Mujaheed
INTRODUCING APACHE NiFi
History

● NiFi (previously Niagara Files) was in


development and used within the National
Security Agency(NSA), USA for the last 8 years.

● It was donated to the Apache Software


Foundation(ASF) on November 2014 through
NSA's Technology Transfer Program.
What is NiFi?

● NiFi Stands for Niagara Files


● Apache NiFi is an easy to use, powerful and reliable
framework to process and distribute data
● It is a platform for automating the movement of data
between disparate systems
● It is a component based extension model
● In simpler terms, NiFi is a system for moving, filtering, and
enhancing data
● We can trace the data in NiFi, just like we track our delivery
package from Flipkart, FedEx, etc
What is NiFi? Contd...

● NiFi allows user to send, receive, route,


transform, and sort data, as needed, in an
automated and configurable way
Why NiFi?

● NiFi was designed from the begining to be field ready-


flexible, extensible & suitable for a wide range of devices.
● It allows us to interact with the dataflow directly in the
browser.
● It provides us with real time control which makes it easy to
manage the movement of data between any data source
to any destination.
● It features a fine grained data provenance tools.
● NiFi has several extensions for dealing with file-based
dataflows such as FTP, SFTP, HTTP, etc.
What is Apache NiFi used for?

● Reliable and secure transfer of data between


systems
● Delivery of data from sources to analytic
platforms
● Enrichment and preparation of data:
-Conversion between formats
-Extraction/Parsing
-Routing decisions
Advantages

● Data source and destination-agnostic


● Provides connection processors for many data
sources
● Runs on any device that runs Java
● Build in one place, copy to anywhere else
● Apache NiFi is ideal for data sources sitting out
on the edge or sources with poor connectivity
and priority data
Terminology
● FlowFile
-Unit of data moving through the system
-Content + Attributes (key/value pairs).

Processor
-Performs the work, can access FlowFiles.

Connection
-Links between processors.
-Queues that can be dynamically prioritized.

Process Group
-Set of processors and their connections.
-Receive data via input ports, send data via output ports.
NiFi - Provenance

● Tracks data at each point as


it flows through the system
● Records, indexes, and
makes events available for
display
● Handles merging and
splitting of data
● View attributes and content
at given point of time
NiFi – Queue Prioritization
● Configure a prioritizer
per connection
● Determine what is
important for your data
– time based, arrival
order, importance of a
data set
● Funnel many
connections down to a
single connection to
prioritize across data
sets
● Develop your own
prioritizer if needed
NiFi – Back Pressure
● Configure back-pressure per
connection
● Based on number of FlowFiles
or total size of FlowFiles
● Upstream processor no longer
scheduled to run until below
threshold
NiFi - Architecture
NiFi - Explaining Architecture
● NiFi executes within a JVM on a host operating system.
- The Primary Components are
● Web Server
-The purpose of the web server is to host NiFi’s HTTP-based command and
control API.
● Flow Controller
-The flow controller is the brains of the operation. It provides threads for
extensions to run on, and manages the schedule of when extensions receive
resources to execute.
● Extensions
-NiFi has several extensions for dealing with file-based dataflows such as FTP,
SFTP, HTTP, etc
NiFi – Architecture contd..

● FlowFile Repository
-The FlowFile Repository is where NiFi keeps track of
the state of what it knows about a given FlowFile
● Content Repository
-The Content Repository is where the actual content
bytes of a given FlowFile live.
● Provenance Repository
-The Provenance Repository is where all provenance
event data is stored.
GETTING STARTED WITH
APACHE NiFi
Downloading

NiFi can be downloaded from its apache's official
website: https://nifi.apache.org/download.html
NiFi - Release
NiFi - Installing

● To run NiFi, the system should be installed with jdk1.8 or more


● Extract the NiFi tar file
● Open the terminal and navigate to to the directory where NiFi is
installed
● To run NiFi in the foreground, run bin/nifi.sh run
● To run NiFi in the background, instead run bin/nifi.sh start
● To check the status and see if NiFi is currently running, execute
the command bin/nifi.sh status
● NiFi can be shutdown by executing the command bin/nifi.sh
stop
NiFi – User Interface

● Drag and drop processors to build a flow


● Start, stop, and configure components in real time
● View errors and corresponding error messages
● View statistics and health of data flow
● Create templates of common processor & connections
NiFi - Processor

● There are in total 165 processors in the latest release


of NiFi-nifi-1.0.0
● Of this we have explored some of these processors
– ConvertAvroToJson
– ExecuteSQL
– EvaluateJsonPath
– EvaluateXPath
– ExtractText
– HandleHttpRequest
Processor contd..

– HandleHttpResponse
– InvokeHttp
– LogAttribute
– PutSQL
– RouteOnAttribute
– ReplaceText
– SplitXML
– UpdateAttribute
Resources

● https://nifi.apache.org/docs.html
● http://hortonworks.com/apache/nifi/#section_1
● https://community.hortonworks.com/articles/7999/
apache-nifi-part-1-introduction.html
● https://nifi.apache.org/developer-guide.html
● https://kisstechdocs.wordpress.com/2015/01/15/w
hat-is-apache-nifi/
● http://www.ssglimited.com/what-is-apache-nifi/
● https://www.federallabs.org/index.php?tray=su
ccess_stories&tid=1FLtop55&cid=flcSS57
Thank You

You might also like