You are on page 1of 29

Presented By Idrees Fazili 11ofof33

33
Mining Anomalies Using Traffic Feature Distributions

Presented by
Idrees Fazili
( I Semester M.Sc IT)
Enrollment No. : 100217

Friday, Nov 12, 2010

Department of Information Technology


Central University of Kashmir
Mining Anomalies Using Traffic Feature Distributions

Based on Research work carried out by:

Anukool lakhina Mark Crovella Christophe Diot


Dept. Of Computer Science, Dept. Of Computer Science, Intel Research,
Boston University. Boston University. Cambridge, UK.
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES

Perplexity is the beginning of knowledge


Presented By
Idrees Fazili 44ofof29
29
>>RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES

Presented By
Idrees Fazili 55ofof 29
29
Research Paper Focus
Analysis of feature distributions using entropy as summarization tool
It enables highly sensitive detection of a wide range of anomalies,
augmenting detections by volume-based methods,
It enables automatic classification of anomalies via unsupervised learning
Validate claims on data from two backbone networks

Presented By Idrees Fazili 66ofof 29


29
RESEARCH PAPER FOCUS
>>KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES

Presented By Idrees Fazili 77ofof 29


29
Key Terms
Anomaly Detection
The technology that seeks to identify an attack on a computer system by
looking for behavior that is out of the norm.
Anomaly Classification
Single-Link & Network-Wide Traffic
Entropy

Presented By Idrees Fazili 88ofof29


29
RESEARCH PAPER FOCUS
KEY TERMS
>>INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES

Presented By Idrees Fazili 99ofof29


29
Introduction
Network operators are routinely confronted with a wide range of
unusual events — some of which, but not all, may be malicious
Operators need to detect these anomalies as they occur and classify them in
order to choose appropriate response
Principal challenge in automatically detecting and classifying anomalies
is anomalies can span vast range of events
General anomaly diagnosis system should therefore be able to detect a
range of anomalies with diverse structure, distinguish between
different types of anomalies and group similar anomalies

10ofof29
10 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
>>RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES

11ofof29
11 29
Presented By Idrees Fazili
Related Work
Anomalies treated as deviations in overall traffic volume
Much of work in anomaly detection and identification has been
restricted to point-solutions for specific types of anomalies
Much of work in anomaly detection has focused on single-link traffic
data
use of Entropy as a summarization tool for feature distributions, with
much broader objective : that of detecting and classifying general
anomalies, not just individual types of anomalies

12ofof29
12 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
>>FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES

13ofof 29
13 29
Presented By Idrees Fazili
Feature Distributions
Analysis of traffic feature distributions is powerful tool for detection and
classification of network anomalies because many important kinds of
traffic anomalies cause change in the distribution of address or ports
observed in traffic.
Table lists a set of anomalies commonly encountered in backbone
network traffic

14ofof 29
14 29
Presented By Idrees Fazili
Feature Distributions Cont…
Traffic feature is a field in the header of a packet.
Four fields
Source address (sometimes called source IP and denoted srcIP)
Destination address (or destination IP, denoted dstIP)
Source port (srcPort)
destination port (dstPort)

15ofof29
15 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
>>DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
QUESTIONS
REFERENCES

16ofof29
16 29
Presented By Idrees Fazili
Diagnosis Methodology
Anomaly diagnosis methodology leverages observations about entropy
to detect and classify anomalies.
To detect anomalies introduced
Multiway subspace method
Showed how it can be used to detect anomalies across multiple traffic
features and across multiple Origin-Destination (or point to point) flows
To classify anomalies adopted
An unsupervised classification strategy
Show how to cluster structurally similar anomalies together
Together multiway subspace method and clustering algorithms form
foundation of anomaly diagnosis methodology

17ofof29
17 29
Presented By Idrees Fazili
Diagnosis Methodology: Multi -way Subspace Method

Subspace method
Its goal is to identify typical variation in a set of correlated metrics, and
detect unusual conditions based on deviation from that typical variation
Normal variation
projection of data onto this subspace
Abnormal variation
Any significant deviation of data from this subspace
Introduce multiway subspace to address anomalies typically induce
changes in multiple traffic features

18ofof29
18 29
Presented By Idrees Fazili
Diagnosis Methodology: Multi -way Subspace Method

To detect an anomaly in OD flow, we should


isolate correlated changes across all its four
traffic features
Multiple OD flows may collude to produce
network-wide anomalies
In addition to analyzing multiple traffic features,
detection method must also be able to extract
anomalous changes across ensemble of OD
flows Figure 3: Multivariate, multi-way data to analyze
Effective way of analyzing multiway data is to
recast it into a simpler, single-way
representation
Idea behind multiway subspace method is to
“unfold” multiway matrix into single, large
matrix
Once transformation is complete, subspace
method can be applied to detect anomalies
across different OD flows and different features

19ofof 29
19 29
Presented By Idrees Fazili
Diagnosis Methodology: Unsupervised Classification

Clustering approach used for unsupervised method, so that can


potentially adapt to new anomalies as they arise
Two types of clustering algorithms:
Partitional
Such algorithms exploit global structure to divide data into a choice of k
clusters, with goal of producing meaningful partitions
Hierarchical
Such algorithms use local neighborhood structure and work bottom-up
merging existing clusters with neighboring clusters.

20ofof29
20 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
>>DATA
DETECTION
CLASSIFCATION
CONCLUSION
QUESTIONS
REFERENCES

21ofof29
21 29
Presented By Idrees Fazili
Data
Proposed anomaly detection and classification framework using
sampled flow data collected from all access links of two backbone
networks Abilene and G´eant
Abilene is the Internet2 backbone network
Connecting over 200 US universities and peering with research networks in
Europe and Asia
It consists of 11 Points of Presence (PoPs), spanning continental US
Collected three weeks of sampled IP-level traffic flow data from every PoP
in Abilene
Sampling is periodic, at a rate of 1out of 100 packets
Abilene anonymizes destination and source IP addresses by masking out
their last 11 bits
G´eant is European Research network
Twice as large as Abilene, with 22 PoPs, located in major European capitals
Collected three weeks of sampled flow data from G´eant as well
Data from G´eant is sampled periodically, at a rate of 1 every 1000 packets
G´eant flow records are not anonymized

22ofof29
22 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
>>DETECTION
CONCLUSION
QUESTIONS
REFERENCES

23ofof29
23 29
Presented By Idrees Fazili
Detection
First step in anomaly diagnosis is detection
Consideration in using feature distributions in anomaly detection
Does entropy allow detection of a larger set of anomalies than can be
detected via volume-based methods alone?
Are the additional anomalies detected by entropy fundamentally different
from those detected by volume-based methods?
How precise is entropy-based detection?
Compare sets of anomalies detected by
Volume-based
Entropy-based methods
Manually inspect anomalies detected to determine their type and to
determine false alarm rate
Inject known anomalies into existing traffic traces to determine
detection rate

24ofof29
24 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
>>CONCLUSION
QUESTIONS
REFERENCES

25ofof29
25 29
Presented By Idrees Fazili
Concluding Words
Network anomaly diagnosis is an ambitious goal, but advent of
network-wide flow data brings goal closer to feasibility. Treating
anomalies yields considerable diagnostic power, in detecting
new anomalies, in understanding the structure of anomalies,
and in classifying anomalies
Ongoing work on extending feature-based diagnosis methodology
Online extensions to clustering methods, devising methods to expose raw
flow records involved in anomaly, and investigating additional information
that can aid in better classifying anomalies by their root-cause

26ofof29
26 29
Presented By Idrees Fazili
RESEARCH PAPER FOCUS
KEY TERMS
INTRODUCTION
RELATED WORK
FEATURE DISTRIBUTIONS
DIAGNOSIS METHODOLOGY
DATA
DETECTION
CONCLUSION
>>QUESTIONS
REFERENCES

27ofof29
27 29
Presented By Idrees Fazili
Questions

? ?
? ?
?
?
? ?
28ofof29
28 29
Presented By Idrees Fazili
Thank You!
Whenever you learn something new, the whole world becomes that much richer

You might also like