You are on page 1of 7

Anomaly Intrusion Detection for System Call

using the Soundex Algorithm and Neural Networks


ByungRae Cha
Dept. of Computer Eng.,
Honam Univ., Korea
chabr@honam.ac.kr
Binod Vaidya
Dept. of Electronics & Computer Eng.,
Tribhuvan Univ., Nepal
bnvaidya@yahoo.com
Seungjo Han
Dept. of Information & Communication Eng.,
Chosun Univ., Korea
sjbhan@chosun.ac.kr
Abstract
To improve the anomaly intrusion detection system using
system calls, this study focuses on supervisor learning neu-
ral networks using the Soundex algorithmwhich is designed
to change feature selection and variable length data into a
xed length learning pattern. That is, by changing vari-
able length sequential system call data into a xed length
behavior pattern using the Soundex algorithm, this study
conducted neural learning by using a backpropagation al-
gorithm. The proposed method and N-gram technique are
applied for anomaly intrusion detection of system call using
Sendmail Data of UNM to demonstrate its performance.
1. Introduction
The intrusion detection system secures a system de-
signed to construct a database of intrusion pattern beyond
simple access control, monitor real-time use of the network
or computer system, and detect intrusion. The intrusion de-
tection is roughly categorized into intrusion detection and
misuse intrusion detection. In general, the latter has been
popular in markets, but it has some problems as follows:
new and modicated intrusion patterns can not be detected
and excessive time and expense are required to analyse the
types of intrusion for misuse detection and encode a misuse
detection rule. To solve the problems, studies on abnormal
intrusion to detect intrusion from normal or abnormal be-
haviors are in progress, but they are far from satisfying.
The early detection systems judged intrusion by encod-
ing signatures of attacks that have been already known.
However, it is very difcult to create and expand rules by
passive methods and its effectiveness are very low. To
overcome the problems, such techniques as articial in-
telligence, machine learning and data mining have been
used to detect intrusion. Many detection systems based
on the supervisor learning separate learning from detec-
tion. Therefore, for intrusion detection, a course of learning
should necessarily be given and much expenses are needed
to achieve stable performance. The collection and classi-
cation of a great amount of learning data are very difcult
and the performance of the detection system depends on the
quality of learning data. It is very difcult for many al-
gorithms that have been currently used for re-intrusion de-
tection to perform a great amount of data processing and
gradual learning at the same time. And it is also difcult
for them to detect the intrusion and provide information on
intrusion types[1,2,3,4].
The host-based anomaly intrusion detection is catego-
rized into enumerative type, frequency -based type, data
mining access type, and nite state machine type. The enu-
merative type detects unknown patterns by tracing normal
behaviors based on experiences. The frequency-based type
detects intrusion based on frequency distribution of various
events. The data-mining type detects intrusion by nding
features from common elements that occur in normal be-
havior data and describing them as a group of rules. And
the nite state machine-type, one of machine learning tech-
niques, detects anomaly intrusion by a nite state machine
that recognizes through trace of programs[5].
This study is to apply a Soundex alrogithm to solve the
problem in variable length system call data that are used for
learning in an intrusion detection system based on supervi-
sor learning neural networks. It is thought that the Soundex
algorithmmakes simple learning algorithmpossible and de-
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005)
1530-1346/05 $20.00 2005 IEEE
crease complexity of learning through transforming variable
length data into a xed length pattern. To detect the host-
based intrusion, sessions are identied by process ID, and
using a system call, the behavior pattern of host is trans-
formed by the Soundex algorithm as follows: transforma-
tion of variable length data into a xed length pattern. This
study is to prole normal behaviors using a normal behav-
ior pattern, and detect abnormal behaviors by the supervisor
learning of neural networks.
2. Related Works
2.1 Soundex Algorithm
When customer service is processed by phone calls as in
airline companies, some problems such as inarticulate pro-
nunciation or wrong search of customers names often oc-
cur. Though such problems do not happen, in case that there
are a great number of customers names saved in database,
a linear search that customers names are checked one by
one needs excessive time and efforts. To solve the prob-
lem, Margaret K. Odell and Robert C. Russell developed
the Soundex algorithm. It has been used for personal record
of U.S.A army and demographical research. And it has
been used for the engines of spell checkers and by Ancestor
Search website.
The Soundex algorithm is composed of the following
four rules: Rule 1 saves the rst letter of the name and re-
moves a, e, i, o, u, w, y out of the remaining letters except
the rst letter. Rule 2 gives the following numbers to the
letters existing in the name : b, f, p, v : 1, c, g, j, k, q, s, x,
z : 2, d, t : 3, l : 4, m, n : 5, r : 6. Rule 3 removes sequen-
tially neighboring letters in the name except the rst letter.
Rule 4 omits the remaining ones when numbers are above
three to arrange in a sequence of letter, number, number and
number, and when numbers are below three, 0 is put to the
last to complete the form[6].
2.2 N-gram Technique
The assumption of the programbehavior-based intrusion
detection system is that most of the intrusions may occur
due to program defects or bugs, and the behavior is abnor-
mal compared with normal use of the program. Therefore
if programs behavior is properly expressed, it can be used
as a behavior feature for detection.
A representative study for automatic collection and def-
inition of normal behaviors of programs is N-gram tech-
nique developed by Forrest research team in the University
of New Mexico. This is an example adopting the concept
of immunology for detection[7,8]. The N-gram technique
constructs a prole database using system call sequences
with a certain length produced by programs, that is, N-gram
or a string N-gram. After construction of prole database,
if there is no a series of system calls with a specic length
within the system calls generated by the programs, it is con-
sidered as anomaly behavior to be counted. If the proportion
of the numbers of strings that are considered as anomaly
is very high, the session is judged as anomaly. Forrest re-
search team applied this technique for main daemon pro-
grams that are performed by root privilege of the UNIX
programs such as Sendmail, ftpd, and inetd which showed a
high detection rate. However, this technique has a problem
that every program needs a huge prole[9,10]. The N-gram
has a high detection rate using a simple algorithm, but it has
a disadvantage that the size of prole data and overhead is
very big.
3 Design of Neural Network using Soundex
Algorithm
To detect anomaly intrusion in a system call base, this
study applied neural networks using a Soundex algorithm.
This study presents Soundex Algorithm and Neural Net-
works as SDX-Alg and NN.
To detect intrusion using system calls, rst, we classied
sessions by a process ID. The session is a unit of behav-
iors, and one session is transformed into a behavior pattern.
In use of a normal system call data with variable length,
a normal behavior pattern with a xed length is generated
by the SDX-Alg, followed by construction of a normal be-
havior prole. Through a normal behavior pattern, supervi-
sor learning of the neural network was performed to detect
anomaly intrusion detection.
3.1 Behavior Pattern Generation using a Soundex
Algorithm
For host-based intrusion detection, the system call infor-
mation of the host was used. This study used system call
information to prole a normal behavior using the SDX-
Alg and detect intrusions through neural network learning.
To construct a normal behavior prole, we needed a method
to describe a behavior.
As signs to describe a behavior used in this study, <
and > mean the beginning and end of each session. And
- means division between system calls. Part of the Send-
mail Data Sets[11] of UNM to describe session as behav-
iors of hosts, is presented in Figure 1. Behaviors patterns
are used for construction of a normal behavior prole.
For session division by process ID using system call in-
formation, the size of the session is variable, not stable. The
kinds of system calls used in sessions ranged from a min-
imum of 2 to a maximum of 40. And the size of the ses-
sions variably ranges from a minimum of 7 to a maximum
of 31927. For the variable length data, data processing is
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005)
1530-1346/05 $20.00 2005 IEEE
Figure 1. Behavior pattern generation by PID
difcult and application as a learning pattern for neural net-
work learning is also difcult as well.
This study intended to construct a xed behavior pattern
prole while maintaining session information through ap-
plication of the SDX-Alg in a variable session made of sys-
tem call data. To apply variable behavior data that are com-
ponents of a session to neural networks learning, rst, fea-
ture selection to generate patterns is necessary. This study
selected three features. As shown in Table 1, pattern vec-
tors are generated by selection of features such as size of
session, kinds of system calls, and enumeration.
Table 1. Composition of Pattern Vectors by Feature
Selection
Feature Selection Type Detail
Session Length Integer #(Num.) of system
calls in session
Kind of Integer Kinds of system
system call call in session
System calls Sequential Order of system
enumeration Integer call occurrence
according to kinds
Of the features selected, enumeration eld is composed
of 40 items. As a result of the examination of all the behav-
ior patterns, it was found that system calls are 182 kinds and
maximum system calls used in behavior data are 40 kinds.
The total number of behavior patterns used were 40 or less.
Therefore, size of the enumerated items is decided to be 40.
Table 2. Number of Patterns to be used in Learning
# of kinds of average
Pattern # of pattern system of pattern
Class session vector calls vector
Normal 199 228,181 53 1,147
Intrusion 15 4,186 48 279
Trace 10 2,569 43 257
Total 224 234,936 1,049
Figure 2. Feature vector distribution of Sys-
tem calls
Table 2 shows feature information of learning patterns
used in simulation. The training pattern to be used in neu-
ral networks learning is normal and classied as a tracing
pattern to be used for evaluation after learning. The training
pattern was classied into normal and abnormal. System
calls used in a normal pattern class were 53 kinds, the ones
for an abnormal one were 48 kinds, and the ones for a test
pattern class were 43 kinds. The sessions of the normal pat-
tern were 199, the ones of the abnormal pattern were 15 and
the ones of the test pattern session were 10. For the training
pattern, 199 normal behavior patterns were used to perform
a neural network learning.
This study classied system call data sessions and gen-
erated normal behaviors, intrusion behaviors and trace pat-
terns using Perl. Figure 2 presents normal, intrusion and
trace patterns in kinds of system calls and size of sessions.
In Figure 2, (a) indicates distribution of system call session
of normal pattern, x-axis indicates kinds of system calls in
sessions, and y-axis indicates number of system calls made
of sessions; (b) indicates distribution of sessions with out-
lier removed fromnormal system calls; (c) indicates the dis-
tribution of system call sessions according to intrusion pat-
tern; (d) indicates the distribution of system calls according
to trace patterns.
3.2 Learning of BPN
Neural network is a eld of articial intelligence by
which mechanism of brain activity is mathematically repro-
duced. The neural network imitates the brain of humans to
learn intelligent ability and constructs knowledge base of
computers. Using the constructed knowledge base, it infers
through the given data, and predicts and explains the results.
There are many algorithms which are used for learn-
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005)
1530-1346/05 $20.00 2005 IEEE
Figure 3. Backpropagation model of 2 layers
ing of features of the given data, but back-propagation
which gradually minimizes errors is the most common. The
back-propagation algorithm is the most common supervi-
sor learning technique to be considered as non-linear ex-
pansion of the least mean square algorithm. That is, the
back-propagation algorithm is explained as follows: an in-
put pattern is given to each nod of input layer, the signal
is transformed at each nod to be delivered to hidden layer,
and through calculation, it is output at output layer. At this
time, the output value and the target value were compared to
decrease a difference between them, that is, redundancy of
errors, and weights were adjusted. Figure 3 shows a learn-
ing model using 2 layers back-propagation neural network
software.
Table 3. Equations of Backpropagation Algorithm
net
h
pj
=

N
i=1
w
h
ji
x
pi
+
h
j
(1)
i
h
p
= f
h
j
(net
h
pj
) (2)
net
o
pk
=

L
j=1
w
o
kj
i
pj
+
o
k
(3)
o
pk
= f
o
k
(net
o
pk
) (4)

o
pk
= (y
pk
o
pk
)f
o

k
(net
o
pk
) (5)

h
pj
= f
h

j
(net
h
pj
)

k

o
pk
w
o
kj
(6)
w
o
kj
(t + 1) = w
o
kj
(t) +
o
pk
i
pj
(7)
w
h
ji
(t + 1) = w
h
ji
(t) +
h
pj
x
i
(8)
In learning of the back-propagation neural network, Eq.
(1) and Eq. (2) in Table 3 indicate functions of the hid-
den layer, and Eq. (3) and Eq. (4) indicate functions of
output layer of the Layer 2 as in Figure 4. Next, learning
is achieved by teaching signals. The teaching signals use
errors in the output layer and back-propagate errors as indi-
cated by the name of back-propagation. That is to say, er-
rors in hidden layer are calculated by errors in output layer.
Errors in output layer and hidden layer are calculated by Eq.
(5) and (6). Learning is done by modication of weights of
the output layer and input layer. The modication of the
weights are calculated by Eq. (7) and (8), using learning
rate and Eq. (5) and (6) as teaching signals. Until out-
put value of the output layer and errors in target reaches
Figure 4. Procedure of N-gram and NN using
a SDX-Alg
the level of reliability, or restrictive conditions are satised,
learning continues.
This study performed a back-propagation neural net-
works learning for intrusion detection, for which 42 items of
a normal behavior pattern generated by the SDX-Alg were
used. When learning was completed, trace data evaluated
intrusion performance and compared the result with that of
N-gram technique.
4 Simulation
To detect anomaly intrusion in a system call base, this
simulation applied neural networks using a Soundex algo-
rithm and a N-gram technique, and a procedure to compare
their performance is presented in Figure 4.
For host-based intrusion detection, we used the N-gram
technique which has a simple algorithm and high detection
rate. To construct a normal behavior prole, we constructed
a prole using N-gram technique and the one using a SDX-
Alg and neural networks and then compared performances
between the two models.
4.1 N-gram Technique
A normal behavior pattern was generated by application
of N-gram technique for normal behavior data. With in-
dicating the size of window of N-gram technique, being
changed, intrusion detection rates were compared by trace
data.
First, through a construction of the proles of normal be-
haviors, intrusion behaviors and trace behaviors, the results
were obtained as presented in Table 4. If N, size of window,
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005)
1530-1346/05 $20.00 2005 IEEE
was increased, the number of normal patterns decreased ex-
cept part of them, and if redundancy of the patterns was
excluded, the number of patterns tended to increase. And
through comparison of intrusions and trace data based on
normal behavior data, the redundancy of the patterns which
do not exist in normal behavior was removed.
Table 4. Number of the anomaly patterns according to
window size of normal, intrusion and trace data
the number redundancy
windows of normal removed patterns
size (N) patterns normal intrusion trace
3 809,997 440 21 18
4 227,584 570 176 177
5 227,385 693 55 38
6 227,186 811 66 46
7 226,987 910 72 55
8 226,788 996 78 64
9 226,640 1076 84 73
10 326,179 1153 90 80
As N, window size, increased, the number of the pat-
terns detected in intrusion and trace data increased, but the
number of redundancy-removed patterns also increased. In
particular, in case that N=4, the number of redundancy-
removed patterns in intrusion and trace data suddenly in-
creased, and the largest value was obtained. With N, win-
dow size, being changed from 3 to 10, N with the largest
number of undetected anomaly patterns compared with nor-
mal data was optimal. That the number of the patterns
with redundancy removed is large means that there are more
anomaly detection information to be differentiated from
normal data.
With N , window size of normal data, being changed,
anomaly detection of the trace data composed of ten ses-
sions was performed and the results are presented in Table
5. In Table 4, with N=4, through a comparison with normal
data, various kinds of anomaly patterns were detected from
intrusion data and trace data. However, in Table 5 , when
N ranged from 4 to 10, detection rates of the sessions were
same.
Table 5.Number of Anomaly Detection According to
Window Size of Normal Behaviors using N-gram
Window N
Size 3 4 5 6 7 8 9 10
detection session
/ whole session 8/10 9/10
detection rate 80% 90%
As the size of N in N-gram increased from 3 to 10, the
number of the detected patterns increased. When the size of
window increased from 3 to 4, the detection rate increased
from 80% to 90%. Of ten sessions of trace data with N=3,
Figure 5. When neurons in hidden layer is 12
Figure 6. Output value of NN in intrusion data
and Trace data
the two sessions such as PID 107 and PID 144 were unde-
tected, but with N=4, only one session, PID 144, was unde-
tected. Intuitively, for the N-gram technique, when window
size was more than four, the highest detection rate, 90%,
was obtained. Though N was increased more, the detection
rate did not increase any longer, and only the number of the
detected anomaly patterns increased. Therefore, based on
the Occams Razor, the most effective window size is four.
4.2 Over-tting and Under-tting of Neural Net-
works Learning
For neural network learning, the number of neurons in a
hidden layer was changed from10 to 40 to investigate learn-
ing rate and errors. To overcome over-tting and under-
tting, which are disadvantages of neural network learning,
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005)
1530-1346/05 $20.00 2005 IEEE
Figure 7. Detection Results by N-gramand the
Proposed Method
the number of the neurons in the hidden layer should be
decided. The over-tting means learning even noise includ-
ing learning data, and under-tting means learning is not
perfectly achieved. Figure 5 shows that the number of neu-
rons in a hidden layer is twelve and the state that learning is
achieved by 428 epoch.
For neural network learning, the number of the neurons
in a hidden layer was decided as 12 and learning of 199
normal behavior patterns was progressed. The normal be-
havior pattern generated system call data as 42 items of the
learning pattern by a SDX-Alg, and learning was performed
with 0.01 of error rate, 0.2 of learning rate and 5000 epoch
times or fewer. Figure 6 shows the results of detection by
inputting intrusions and trace data to the learned neural net-
work.
Figure 7 shows the result of the detection by N-gramand
NN technique using SDX-Alg. Figure 7 (a) shows the re-
sults of detection with N = 3 and eight sessions of the ten
sessions in trace data were detected. Figure 7 (b) shows the
results of detection by neural network using a SDX-Alg in
the distribution of feature vectors. Nine sessions of the ten
sessions in trace data were detected and the detection rate
was 90%.
4.3 Comparison of Anomaly Detections between
Neural Networks using Soundex Algorithm
and N-gram technique
This study simulated an anomaly detection from system
call data sets of Sendmail Deamon by UNM by neural net-
works using SDX-Alg and a N-gram technique. The system
call data were transformed into 42 items of learning patterns
by SDX-Alg and NN learning was performed. And for N-
gram technique, window size was changed from 3 to 10 to
detect anomaly and then the results were compared as in
Table 6.
Table 6. Comparative Analysis of the Proposed Method
and N-gram
N-gram
Items 3 4 6 10 BPN
repe- # of 809, 227, 227, 326,
tition pattern 997 584 186 179 199
Data 9.6 3.47 4.97 12.08 22
size MB MB MB MB KB
repe- # of 1,
tition pattern 440 570 811 153 41
remove Data 5 7 14.4 34 5
size KB KB KB KB KB
Error 0.2 0.1 0.1 0.1 0.1
MDL Com- 0. 0. 1. 1. 0.
plexity 997 999 000 000 999
1. 1. 1. 1. 1.
Total 197 099 100 100 099
Epoch # - 428
Detection
Rate 80% 90% 90%
MDL(Minimum Description Length)[12] is composed
of the loss of errors L(D | H) and the loss of complexity
L(H). MDL is a more effective model as it has the less value.
The loss of error is dened as 1 - anomaly detection rate
length and the loss of complexity is dened as 1 - occu-
pancy rate of information in data description space. Table
6 compares the proposed method and N-gram technique by
MDL.
It was demonstrated that with N=4, the N-gram tech-
nique was the most effective. However, when the sug-
gested method was compared with N-gram technique with
N=4, their detection rates were identical, but in an aspect of
model complexity, it was demonstrated that the suggested
method was more effective.
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005)
1530-1346/05 $20.00 2005 IEEE
When window size of N-gram was changed from 3 to
4 and 10, detection rates were 80%, 90% and 90%. And
as window of N-gram increased, space to describe patterns
and time to process patterns increased. However, when a
SDX-Alg and neural networks were used, detection rate
was 90%. As detection of the suggested neural networks
was compared with that of N-gramtechnique with 3 of win-
dow size, the suggested method was absolutely superior in
detection rate and complexity aspect. With window size
changed from4 to 10, detection rates of these methods were
same, but neural networks technique was superior in time
and space complexity aspects.
5 Conclusion
This study applied the Soundex algorithm to solve the
problems of variable length data to be used for detection
system using the neural network of supervisor learning, a
machine learning. By transformation of variable length sys-
tem call data into a xed length patterns using the Soundex
algorithm, learning algorithm of neural networks could be
simple, and complexity in space and time required for learn-
ing aiming at intrusion detection could be overcome. To de-
tect host-based anomaly intrusion, rst, we classied ses-
sions, and generated hosts behavior patterns by transform-
ing the variable length data into a xed length pattern. For
normal behavior pattern, we detected anomaly behaviors by
learning normal behavior patterns using back-propagation
neural networks of supervisor learning. By solving difcul-
ties of a variable length data processing, a learning algo-
rithm became simple and complexity in space and time for
learning was overcome, which contributed to improvement
of anomaly intrusion detection.
Compared with the N-gram technique under the condi-
tion that neural networks and window size were 3, the sug-
gested method showed a higher detection rate, but when
windowsize was changed from4 to 10, the detection rate of
the suggested method was the same as that of the N-gram
technique, which was 90%. However, in the complexity
of time and space for algorithm performance, intrusion de-
tection of neural networks using a Soundex algorithm was
superior.
Acknowledgment : This study was supported (in part)
by research funds from Chosun University, 2004.
References
[1] Leonid Portnoy, Intrusion detection with unla-
beled data using clustering, Undergraduate Thesis,
Columbia University, 2000.
[2] Jack Marin, Daniel Ragsdale, and John Shurdu, A Hy-
brid Approach to the Prole Creation and Intrusion De-
tection, Proceedings of DARPA Information Surviv-
ability Conference and Exposition, IEEE, 2001.
[3] Nong Ye, and Xiangyang Li, A Scalable Clustering
Technique for Intrusion Signature Recognition, Pro-
ceedings of 2001 IEEE Workshop on Information As-
surance and Security, 2001.
[4] Wenke Lee, Salvatore J. Stolfo, Philip K. Chan, Eleazar
Eskin, Wei Fan, Matthew Miller, Shlomo Hershkop,
and Junxin Zhang, Real Time Data Mining - based In-
trusion Detection, IEEE, 2001.
[5] Christina Warrender, Stephanie Forrest, Barak Pearl-
mutter, Detecting Intrusion Using System Calls : Al-
ternative Data Models, 1998.
[6] http://www.archives.gov/research room/genealogy/
census/soundex.html
[7] S. Forrest, S. Hofmeyr, A. Somayaji and T. Longstaff,
A sense of self for unix processes, In IEEE Sympo-
sium on Security and Privacy, pp.120-128, 1996.
[8] Steven A. Hofmeyr, Stephanie Forrest, Anil Somayaji,
Intrusion Detection using Sequences of SystemCalls,
Journal of Computer Security, Vol.6, pp.151-180, Au-
gust 18, 1998.
[9] A. K. Ghosh, A. Schwarzbard and M. Shatz, Learn-
ing program behavior proles for intrusion detection,
Proceedings of the 1st USENIX Workshop on Intrusion
Detection and Network Monitoring, April, 1999.
[10] A. K. Ghish, J. Wanken and F. charron, Detecting
anomalous and unknown intrusions against programs,
Proceedings of the 1998 Annual Computer Security
Applications Conference(ACSAC 98), 1998.
[11] Http://cs.unm.edu/ immsec/data/synth-sm.html.
[12] Christopher M. Bishop, Neural Networks for Pattern
Recognition, Oxford Press, pp.429-433, 1995.
Proceedings of the 10th IEEE Symposium on Computers and Communications (ISCC 2005)
1530-1346/05 $20.00 2005 IEEE

You might also like