You are on page 1of 6

Journal of Computer Applications (JCA)

ISSN: 0974-1925, Volume V, Issue 3, 2012

A Heuristic Approach for Alert Aggregation in


Intrusion Detection System
a, b,1 c,2
N.Anitha *, S.Anitha , B.Anitha

Abstract - Intrusion Detection System (IDS) is an which include source from where it is originated, target to
important protection mechanism for wireless which it is send and category of attack. Even the single
networks. It helps to identify suspicious attacks and intrusive action generated by a single intruder often allow
provide an alert. In IDS, alert aggregation is one of the hundreds or thousands of alerts be created, which cause
mandatory subtasks, in which alerts are grouped into incorrect action by the network. IDS focus only on detecting
clusters. Based on the information provided by the the different types of attack by the attacker irrespective of
cluster head, alerts are aggregated and send to the different ways of attack caused to the system. Increase in the
reaction layer. We proposed to introduce a new layer number of low rates of false alerts caused by a single attack
between detection and alert aggregation layers namely would damage the entire network in a severe manner [2]. In
alert pre-processing layer. This layer filters the false order to overcome this, IDS creates low level of abstraction
alerts by sending only the correct packets to the techniques to minimize the false alerts. The information from
destination and thus prevent suspicious one to proceed single alert might be incorrect with high probability, so it is
further. We proposed this scheme for enhanced very difficult for security expert to identify those groups of
detection and false alarm rates. alerts.
Low-level IDS may generate alerts with the use of firewall
Index Terms – Intrusion detection, alert aggregation, genetic etc.,. To avoid the overhead of alerts generated from single
algorithm, backtracking. attack, clustering those alerts is performed. Information
about the clustered alert is called as Meta-alert also
I. INTRODUCTION generated. The main motive is to minimize the number of
alerts originated for single attack instance without losing
Due to the enormous and fast growth of computer networks, important information which gives perfect clue for finding
varieties of attacks are grown accordingly. Intrusion the attack type but in turn false or redundant meta-alerts to a
Detection System (IDS) is the system that identifies certain degree is accepted.
different categories of attacks by different security Based on the principles of evolution and natural selection,
mechanisms and safeguards the system properties and genetic algorithm works by using the model created from the
configuration including data. An IDS always analyze the different problems of various domains. The model
traffic entering into the network and differentiates between resembles the chromosomes like structure and various
true packet and attack. The system classifies the attack processes like selection, recombination and mutation takes
identification methods into two general types: anomaly and place. Genetic algorithm is used in computer security to find
misuse detection. An Intrusion Detection (ID) system the best result to a specific problem by compromising
collects and analyzes required information from various certain parameters.
components in a computer or network to identify possible Selecting the number of chromosomes constitute
loopholes that makes the system insecure. An Intrusion population in a random manner is the foremost step in the
detection system is designed in such a way that gathers data genetic algorithm. The problem is solved using the
as normal or abnormal. Day by day ID systems are being chromosome representation. Each chromosome positions
developed to minimize the increasing number of attacks on are encoded as bits, characters or numbers according to the
significant sites and in different types of networks. Intrusion attribute requirement of the problem. During evolution, each
detection is the action of separating both wanted and position of chromosomes say gene can be randomly changed
unwanted traffic on a network or in a device. For different within specified range. Population is the set of chromosomes
network configurations, many IDS technologies exist in the that are present during the evolution stage. Each
present and increases further in the near future [1]. Currently, chromosome is selected based on the evaluation function
several IDS are reliable in detecting various suspicious goodness. Natural reproduction and mutation are simulated
actions by evaluating TCP/IP connections or log files, for using two basic operators‘ crossover and mutation during
example. Whenever IDS finds the suspicious packet, it evaluation. Based on the fittest chromosomes, survival of
creates an alert chromosomes and its combination is determined.
Manuscript received 10/Sep/2012. In our perspective, ideal IDS must know about the various
Manuscript selected 4/Oct/2012. types of attack and attackers. In the existing system, a novel
technique called Generative Data Stream Modeling is used
N.Anitha, Department of Information Technology, Kongu Engineering for online alert aggregation and meta-alerts are generated
College, Assistant Professor ,Perundurai. Tamil Nadu, India,
E-mail: anitha@kongu.ac.in [2].In this paper, we make an important step towards
S.Anitha, Department of Information Technology, Kongu Engineering generation of meta-alerts by introducing a new layer
College, Assistant Professor, Perundurai. Tamil Nadu, India. in-between detection and alert processing layer namely alert
E-mail: anitha4ciet@gmail.com pre-processing layer.
B.Anitha, Department of Information Technology, Kongu Engineering
College, Assistant Professor, Perundurai. Tamil Nadu, India.
E-mail: anitha_b@kongu.ac.in

101
A Heuristic Approach for Alert Aggregation in Intrusion Detection System

Our approach has the following distinctive properties: assumed to belong to a specific attack instance. Thus, so called
 It is a genetic algorithm approach using heuristic meta-alerts are generated. Meta-alerts are used or enhanced in
methods. Once the decision is raised based on the various ways, e.g., scenario detection or decentralized alert
suspicious alert, we generate the offspring such as false correlation. An important task of the reaction layer is
positive (FP) and false negative (FN) functions. reporting.[4]
 It is a backtracking approach in which each observed In other words, with the alert aggregation module—on which
false alert is prevented to proceed further into the we focus in this paper—we want to have a minimal number of
system. missing meta-alerts (false negatives) and we accept some false
meta alerts (false positives) and redundant meta-alerts in turn.
The remainder of this paper is organized as follows: In Section II
With the creation of a new component, an appropriate meta-
review of related work is presented. Section III describes the
alert that represents the information about the component in an
proposed alert generation approach. Finally Section IV describes abstract way is created. Every time a new alert is added to a
the conclusion and future work. component, the corresponding meta-alert is updated
incrementally, too. That is, the meta-alert ―evolves‖ with the
II. REVIEW OF RELATED WORKS component. Meta-alerts may be the basis for a whole set
Most existing IDS are optimized to detect attacks with high further tasks:
accuracy. However, they still have various disadvantages that Sequences of meta-alerts may be investigated further in
have been outlined in a number of publications and a lot of order to detect more complex attack scenarios.
work has been done to analyze IDS in order to direct future Meta-alerts may be exchanged with other ID agents in order
research [3] .Besides others, one drawback is the large amount to detect distributed attacks such as one-to many attacks.
of alerts produced some of which are redundant and Based on the information stored in the meta-alerts, reports
unnecessary. Alert aggregation approach which is at each point may be generated to inform a human security expert about
in time based on probabilistic model of the current situation. the ongoing attack situation.
This system focuses on a structurally very similar so-called ID Meta-alerts could be used at various points in time from the
agent. initial creation until the deletion of the corresponding
component. For instance, reports could be generated
immediately after the creation of the component or which
could be more preferable in some cases a sequence of updated
reports could be created in regular time intervals. Another
example is the exchange of meta-alerts between ID agents: Due
to high communication costs, meta-alerts could be exchanged
based on the evaluation of their interestingness [6].
According to the task for which meta-alerts are used, they may
contain different attributes. Examples for those attributes are
aggregated alert attributes (e.g., lists or intervals of source
addresses or targeted service ports, or a time interval that marks
the beginning and the end—if available—of the attack
instance), attributes extracted from the probabilistic model
(e.g., the distribution parameters or the number of alerts
assigned to the component), an aggregated alert assessment
provided by the detection layer (e.g., the attack type
classification or the classification confidence), and also
information about the current attack situation (e.g., the number
of recent attacks of the same or a similar type, links to attacks
originating from the same or a similar source).
The existing technique detects the attacks using rule set with
the help of Genetic Algorithm [7]. It develops rules R2L,
U2R, Probe, DoS attacks. The average performance of the
method is low detection rate. Another existing technique is a
Figure 1.Outline of the Layered Architecture of an ID Agent combination of fuzzy data mining procedures and Genetic
algorithm in identifying network anomalies and misuses.
The sensor layer provides the interface to the network and the The attributes of the network audit data are not recognized
host on which the agent resides. Sensors acquire raw data from accurately in the most of the existing Genetic Algorithm
both the network and the host, filter incoming data and extract based IDS. Though the features play a main role in Intrusion
interesting and potentially valuable information which is Detection, the author introduces fuzzy numerical functions.
needed to construct an appropriate event. At the detection Another technique uses Genetic Algorithm to recognize the
layer, different detectors, e.g., classifiers trained with machine best parameters of the fuzzy functions for choosing the
learning techniques such as support vector machines (SVM) or features of the related network [5]. The network anomalies
conventional rule-based systems such as Snort assess these can be identified by applying multiple agent techniques and
events and search for known attack signatures (misuse Genetic Programming. The set of agents that establish the
detection) and suspicious behavior (anomaly detection). In
network actions can be finding out by an agent, which
case of attack suspicion, they create alerts which are then
examines one parameter of the network audit data and
forwarded to the alert processing layer. Alerts may also be
produced by FW or the like. At the alert processing layer, the Genetic Programming. Several small independent agents
alert aggregation module has to combine alerts that are can be used in that technique which is an advantage and the
communication between the agents is a problem.
102
Journal of Computer Applications (JCA)
ISSN: 0974-1925, Volume V, Issue 3, 2012

Another Proposed Genetic Algorithm technique [8] for


anomaly detection. Random digits were produced using
Genetic Algorithm. An entry value was produced at any
conviction value more than this threshold value was
classified as a malicious attack. The main drawback of this
approach was established the threshold value is more
difficult and high false alarm rate leading when used to
detect unknown or new attacks. One IDS tool that uses GAs
to detect intrusions, and is available to the public is the
Genetic Algorithm as an Alternative Tool for security Audit
Trails Analysis (GASSATA). GASSATA finds among all
possible sets of known attacks, the subset of attacks that are
the most likely to have occurred in a set of audit data. Since
there can be many possible attack types, and finding the
optimal subset is very expensive to compute. GAs is used to
search efficiently. The population to be evolved consists of
vectors with a bit set for each attack that is comprised in the
data set. Crossover and mutation converge the population to
the most probable attacks.
This paper presents Genetic Algorithm and backtracking
algorithm which recognizes attack type connections. These
two algorithms consider different features by duration,
protocol type, hot etc. in creating a rule set. The Genetic
Algorithm and backtracking algorithms in order to create a
set of rules which applied on Intrusion Detection System
classify different kinds of attacks. Our goal is to produce a
high detection rate and low false alarm rate for Denial
of Service (DoS), Root to Local (R2L), User to Root (U2R)
and Probe attacks. We mainly focus on introducing genetic
algorithm with backtracking to reduce the minimum number Figure 2.Proposed Layered Architecture of an ID Agent
of alerts as well as to handle the new types of attacks.
Genetic algorithms are defined as a computational concept
III. A HYBRID APPROACH FOR ALERT inspired by the mechanics of natural evolution, including
survival of the fittest, reproduction and mutation In the
Generation standard Genetic algorithm, an initial population of
In the alert pre-processing layer, novel approaches such as individuals is generated at random or heuristically. In every
Genetic Algorithm and Backtracking is used. A Genetic generation the individuals in the current population are
algorithm is essentially a type of search algorithm which is evaluated according to some predefined quality criterion
used to solve a wide variety of problems. The goal of a referred to as the fitness. Fitness is determined by the fitness
Genetic algorithm is to create optimal solutions to specific function. The fitness function takes a string and assigns a
problems. Potential solutions are encoded as a sequence of relative fitness value to the string. Based on their fitness,
bits, characters or numbers. This unit of encoding is called strings are selected as parents using selection operators‘ .To
a gene and the encoding sequence is known as a form a new generation or child, the strings are put together
chromosome. The GA begins with a set of these and they reproduce through operators such as crossover and
chromosomes and an evaluation function that measures the mutation. The Genetic algorithm comes to a halt when the
determined fitness value is met or when variation of
fitness of each chromosome. It uses reproduction such as
individuals from one generation to the next reaches a pre
crossover and mutation to create new solutions which are
specified level of stability.
then evaluated
First, an initial population of strings is created. Then the
individuals are selected iteratively according to the fitness.
Based on the fitness value of each string, strings which
comply with the fitness value are combined to make a new
generation that may be able to solve the problem. Initially
the process selects individuals referred to
as ‗parents‘. The fit individuals of the new generation then
become parents. If a solution is found, then the loop
terminates, otherwise the loop starts from the individuals
selected from the new generation and continues until the
termination criteria are met.

103
A Heuristic Approach for Alert Aggregation in Intrusion Detection System

Backtracked rules to detect intrusions, such rules in the rule set will be
codified to the GA format in the GA rule set. Each rule will
be represented in the form of a chromosome in the GA. This
Genetic Learning Response is carried out by extracting certain characteristics of the
Algorithm Rule Phase attributes in the rule set into a GA format. As stated before
set the GA uses the rules in the GA rule set which are encoded
as chromosomes to detect anomalous connections. The first
True alerts
part of the GA will act as a search algorithm. In the initial
Testing Phase Genetic stage, only the search algorithm will beexecuted. This is to
Algorithm
classifier
help the rules acquire values which are to be later used in the
fitness function, when the complete GA is executed. Initially
the search algorithm will match the rules with any
anomalous connections that occur on the network to detect
False alerts dropped an intrusion. Each rule will carry values for the intrusions
that they have detected, and a value for a false alarm that the
Figure 3.The Simple Structure of the Proposed Model
rule produces. The initial values for the rule will be
initialized to zero. The rules will acquire these values when
The network traffic used for the GA is a pre-classified data
the search algorithm is executed. Once the rules have
set that differentiates normal network connections from
acquired the values, then the complete GA, which includes
anomalous ones. This pre-classified data set is manually
the fitness function and mutation, is executed.
created by analyzing the data captured by the network
The second part of the GA is the fitness function. The fitness
sniffer. The network sniffer is a program used to record
function ‗F‘ determines whether a rule is ‗good‘ i.e. it
network traffic without doing something harmful to the
detects intrusions, or whether the rule is ‗bad‘, i.e. it does not
network traffic. The data set includes the necessary
detect intrusions. ‗F‘ is calculated for each rule. It will
information to generate rules. This information includes the
depend on the following equation In the initial stage, this
source IP address, the destination IP address, the source
equation will be used to determine the fitness function, but
port, the destination port, the protocol used, and finally a
future work will test and improve
field indicating whether the specific connection indicates an
the equation to make the GA more effective in selecting fit
intrusion or not. The data set will include both normal and
individuals.
anomalous network connections. A connection refers to an
F=a/A–b/B
entry in the dataset. If the connection is an intrusion, then it
In the fitness function, ‗a‘ contains the value that the
will be indicated by the value true, and if it is not an
specific rule carries for the number of correctly detected
intrusion, it will be indicated by the value false. These
intrusions. ‗b‘ contains the value that the specific rule carries
network connections in the dataset are, as stated before,
for the number of false alarms. ‗A‘ is calculated by adding
manually created. This is the initial phase of developing the
the value of the correctly detected intrusions from all the
system using the GA. Once the GA is trained with the rules,
rules. ‗B‘ is the total number of normal connections in the
more network connections can be added to the dataset. This
dataset. A normal connection is not an intrusion, and is
means that the dataset will have to be updated by
indicated by the value false. When an intrusion occurs, it is
administrators to add a new connection or to discard a
notified by the response mechanism. The response
connection. Once the initial data set is created, the next
mechanism is a popup window indicating the rule, and a
action is to create the rule set. By analyzing the dataset, rules
message notifying that an intrusion has occurred. When an
will be generated in the rule set. These rules will be in the
intrusion does not occur, but the response mechanism
form of an ‗if then‘ format as follows.
confirms it as an intrusion, then it is considered as a false
if {condition} then {act}
alarm. When a rule pops up indicating an intrusion, but the
The condition in the format above refers to the attributes in
connection actually has not taken place, then it is a false
the rule set that forms a network connection in the dataset, as
alarm. The network sniffer provides the information of
shown in table 1, such as source and destination IP
connections on the network. Hence, when an intrusion is
addresses, source and destination port numbers, protocol
detected, the network sniffer will be executed to determine
used, and a field indicating the possibility of an intrusion.
whether it is an intrusion or a false alarm.
Note that the condition will result in a ‗true‘ or ‗false‘. The
act field in the ‗if-then‘ format above will refer to an action
Simple generational genetic algorithm procedure:
once the condition is true, such as reporting an alert to the
1. Choose the initial population of individuals
system administrator. For example, a rule in the rule set can
2. .Evaluate the fitness of each individual in that
be defined as follows:
Population
if {the connection has the following information: source IP
3. Repeat on this generation until termination
address 150.165.13.1; destination IP address:
(time limit, sufficient fitness achieved, etc.):
130.179.16.43; source port number: 25; destination port
1. Select the best-fit individuals for reproduction
number: 80; protocol used: IP} then {detect whether the
2. Breed new individuals through crossover and mutation
connection is an intrusion or not}
operations to give birth to offspring
This rule will detect an intrusion because the source IP
3. Evaluate the individual fitness of new individuals
address 150.165.13.1 is recognized by the IDS as, for
4. Replace least-fit population with new individuals
example, a blacklisted address. Hence any service requested
from this address is rejected. Since the GA has to use such

104
Journal of Computer Applications (JCA)
ISSN: 0974-1925, Volume V, Issue 3, 2012

Algorithm for New Layer in Intrusion detection agent: Where Rs(x) is probability of selection
Formation of Rule set with Genetic Algorithm individuals(x) is rank of individuals sum is sum of all fitness
Input: Production number, Binary String Set, values
Range of Population, possibility of Crossover and Step 5) For New Population with chromosomes
Mutation Output: Selected Features set Step 6) Chromosome is applied to crossover
Step 7) Chromosome is applied to mutation operator
Simple generational genetic algorithm procedure: Step 8) Choose new population with 60% of top best
1. Choose the initial population of individuals chromosomes
2. Evaluate the fitness of each individual in that Step 9) Continue upto the number of generations
Population goto Step 3
3. Repeat on this generation until termination
(time limit, sufficient fitness achieved, etc.): A backtracking algorithm tries to build a solution to a
1. Select the best-fit individuals for reproduction computational problem incrementally. Whenever the
2. Breed new individuals through crossover and algorithm needs to decide between two alternatives to the
mutation operations to give birth to offspring next component of the solution, it simply tries both options
3. Evaluate the individual fitness of new individuals recursively.
4. Replace least-fit population with new individuals
Backtracking algorithm for false alert:
Genetic algorithm procedure for Alert Generation: 1. If P is a goal node, return success
1. Choose the initial population of alerts 2. If P is a leaf node, return failure
2. Evaluate the FP and FN of each alert in that 3. For each child C of P
Population 3.1 Explore C
3. Repeat on this generation until termination 3.1.1. If C was successful, return ―Success‖
1. Select the appropriate attack for both FP 4. Return Failure
and FN
2. Generate offspring for best FP and FN IV. CONCLUSION AND FUTURE WORK
attack The Genetic Algorithm is a well suitable mechanism for
3. Assign weight for the best offspring Intrusion Detection compared to enhanced C4.5 algorithm.
4. Remove false alert and send the packet Obtain different classification rules for Intrusion Detection
to the destination[7] through Genetic Algorithm. The proposed Genetic
Algorithm with backtracking presents the Intrusion
The Algorithm first generates the initial population and Detection System for detecting different types of attacks
loads the network audit data. Then the initial population is with different Datasets. It will reduce the high detection
developed for a number of generations. In every creation, rate and low false alarm rate. Backtracking algorithm is for
the qualities of the rules are firstly calculated, and then increasing the efficiency of intrusion detection system. In
quantities of best-fit rules are selected. The training the future we will implement this idea to detect various
procedure starts by arbitrarily generating an initial attacks such as DoS, R2L, U2R, Probe from KDDCUP99
population of rules (Step 1). Step 2 estimates the total
Dataset.
number of records in the audit data. Steps 3 compute the
fitness of each rule and select the best-fit rules into new
REFERENCES
population. Step 4 estimates the rank selection of entities.
[1] S. Axelsson, ―Intrusion Detection Systems: A Survey and
Step 5-7 apply the crossover and mutation operators to every
Taxonomy,‖Technical Report 99-15, Dept. of Computer Eng.,
rule in the new population. Step 8 chooses the top best Chalmers Univ.of Technology, 2000.
chromosomes into new population. Finally, Step 9 verifies [2] T.Pietraszek, ―Alert Classification to Reduce False Positives in
and decides whether to stop the training process or to go into Intrusion Detection,‖, July 2006.
the next generation to continue the development process [3] A.Allen,―Intrusion Detection Systems: Perspective‖, Technical
Report DPRO-95367, Gartner, Inc., 2003.
[4] Alexander Hofmann, ‖Online Intrusion Alert aggregation with
Algorithm for New Layer in Intrusion detection agent: Generative Data Stream Modeling‖, Proc. IEEE Transactions on
Dependable and Secure Computing, pp. 282-294.
Formation of Rule set with Genetic Algorithm [5] S. Selvakani K, Rengan S Rajesh ―Integrated Intrusion Detection System Using Soft
Computing‖, IJNS, Vol.10, No.2, pp.87-92, March 2010.
Input: Production number, Binary String Set, [6] Hofmann.A, I. Dedinski, B. Sick, and H. de Meer, ―A Novelty-Driven
Range of Population, possibility of Crossover and Approach to Intrusion Alert Correlation Based on Distributed Hash
Mutation Tables,‖ Proc. 12th IEEE Symp. Computers and Comm. (ISCC ‘07),
Output: Selected Features set pp. 71-78, 2007.
[7] Dr. J.A. Chandula,‖Machine Learning Techniques for Intrusion
Step 1) Random Population initialization Detection System‖, (IJCSIS) International Journal of Computer
Step 2) Number of Training Set Records Science and Information Security, Vol. 10,No.4, April 2012.
Step 3) Estimate Fitness = f(a)/ f (sum) [8] Venter . H.S.,‖ An Approach to Implement a Network Intrusion
Where f (a) is the fitness of individual a and f Detection System using Genetic Algorithms‖ Proceeding SAICSIT
'04 Proceedings of the 2004 annual research conference of the South
is the entire fitness of all individuals African institute of computer scientists and information
Step 4) Rank Selection Rs(x) = s(x) / ssum technologists on IT research in developing countries.

105
A Heuristic Approach for Alert Aggregation in Intrusion Detection System

BIOGRAPHY

N.Anitha received B.E Degree in Information


Technology from Shri Angalamman College of
Engineering and Technology, Trichy in 2004 and
M.Tech Degree in Advanced IT from Bharathidhasan
University in 2006. From 2006 to 2008 she worked as
a Lecturer in the department of IT in Shri
Angalamman College of Engineering and Technology,
Trichy. Currently she is working as an Assistant
Professor in the Department of IT, Kongu Engineering College,
Perundurai. She has conducted various workshops and published several
papers in the area of Security. She has research interest towards Intrusion
Detection Techniques. She is a member of ISTE. E-mail:
anitha@kongu.ac.in

S.Anitha received B.E Degree in Electronics and


Communication Engineering from Coimbatore Institute
of Engineering and Information Technology,
Coimbatore in 2006 and M.E Degree in Computer
Science and Engineering from Kongu Engineering
College in 2009. From 2009 to 2010 she worked as a
Lecturer in the department of IT, Velalar College of
Engineering and Technology. Currently she is working as an Assistant
Professor in the Department of IT, Kongu Engineering College,
Perundurai. She has conducted various workshops and published several
papers in the area of Network Security. She is a member of ISTE.
E-mail:anitha4ciet@gmail.com

B.Anitha received B.E Degree in Computer Science


and Engineering from K.S.R College of Technology,
Erode in 2001 and M.E Degree in Computer Science
and Engineering from Kongu Engineering College in
2006. From 2007 to 2009 she worked as a Lecturer in
the department of Computer Science and Engineering
in Bannari Amman College of Engineering and
Technology, Sathyamangalam. Currently she is working as an Assistant
Professor in the Department of IT, Kongu Engineering College,
Perundurai. She has conducted various workshops and published several
papers in the area of Security Techniques. She has research interest
towards Intrusion Detection Techniques. She is a member of ISTE.
E-mail: anitha_b@kongu.ac.in

106