Professional Documents
Culture Documents
by,
GUIDED BY
Mr. Manju N
Lecturer in
Information Science and Engineering Department,
SJCE,
Mysore.
1
Network Traffic Classification
Table of Contents
Table of Contents ...................................................................................................................... 2
1. Abstract ............................................................................................................................... 3
2. Introduction ......................................................................................................................... 4
3. Literature Survey ................................................................................................................ 5
4. Problem Defination ............................................................................................................. 6
5. Methodology ........................................................................................................................ 7
6. References ............................................................................................................................ 8
2
Network Traffic Classification
1. ABSTRACT
Network Traffic Classification has become an important part of Cloud Computing. With the
development of Cloud Computing and Mobile Computing, many users gradually adopt to
using a variety of application on mobile and cloud. Huge amount of data travel through the
network. This leads to increasing demand for classification of the traffic over the network.
Network Traffic Classification is the basis for network Quality of Service management,
security and intrusion detection. Network traffic classification can be used to identify
different applications and protocols that exist in a network. Actions such as monitoring,
discovery, control and optimization can be performed by using classified network traffic. The
overall goal of network traffic classification is improving the network performance. Recently,
many new machine learning algorithms prepared to analyze the network traffic. New
machine learning algorithms are coming in this field for building the network traffic
classifiers. This project deals with one of the approach for Network Traffic Analysis in Cloud
Computing. In the cloud, the network traffic data are collected and those data are in the cloud
database and a machine learning system is formed. In a cloud computing scenario, the
network traffic data sends to the classification machine or clustering machine as per the
labeled or unlabeled network traffic data, which classifies them into different applications.
The approach we have considered is a semi-supervised classification method. Our approach
allows classifiers to be designed from training data that consists of only a few labeled and
many unlabeled traffic. We use K-means Clustering algorithm for this purpose.
3
Network Traffic Classification
2. INTRODUCTION
Earlier Port-based classification method was used. This was an effective method at time, as
there were very limited applications and each application used a single well-known port
number. As the number of application increased, it became difficult to assign port numbers to
each application. Hence dynamic port numbers were used, which lead to downfall of this
method.
Next is the Payload-based classification, in which the whole payload was analyzed to
determine whether they contain characteristic signatures of known applications and based on
that traffic was classified. But this had a lot of disadvantages, as analyzing the complete
payload causes a lot of overhead. Also looking into the payload is breaching the privacy of
the sender and hence is not ethical. Another significant disadvantage is that these techniques
typically require increased processing and storage capacity.
The limitations of port-based and payload-based analysis have motivated use of transport
layer statistics for traffic classification. These classification techniques rely on the fact that
different applications typically have distinct behavior patterns when communicating on a
network. This technique uses machine learning approach. There are various machine learning
approaches which can be used. They can be supervised, unsupervised or semi-supervised
techniques. In our project we are using semi-supervised machine learning approach.
In this project, we propose a method in which different protocols are used to classify different
applications. As we are using semi-supervised machine learning approach, to consist of some
labeled and some unlabeled samples. The model has to identify the classes of unlabeled
applications based on the study it does on the labeled samples. For this we would be using
Simple K-means clustering algorithm. This algorithm forms clusters of similar samples which
consist of both labeled and unlabeled samples. Then it would predict the application types of
the unlabeled samples based on which cluster they belong to. The model also evaluates the
accuracy of the algorithm in predicting the type of the application. The methodology used is
given in section-5 of this report.
4
Network Traffic Classification
3. LITERATURE SURVEY
Following are the papers that we studied to decide the topic and got some ideas about the
project can be done. They helped us to understand the problem definition well and to come
up with the appropriate approach. After understanding the following papers, the approach we
are using is machine learning approach and semi-supervised learning using K-means
algorithm. We have considered most of the advantages and disadvantages of various methods
and came out with the approach.
Internet Traffic classification Methods By Indra Bhan Arya and Rachna Mishra
Machine learning is one of the promising approach for traffic classification. Many
approaches have been evolved till date like unsupervised approach, supervised approach and
semi-supervised approach. The method in which the training data is labeled before is called
as supervised learning. Labeled data means the input set for which the class to which it
belong is known. The methodology in which the training data is unlabeled is called as
unsupervised method. Unlabeled dataset is one for which class to which it belongs is
unknown and is to be properly classified. Traffic classification use features, set of attributes
of each instance, to evaluate the outcome of class. A class is a special attribute of each
instance, which shows result of instance. Feature selection algorithms play an important role
for ML algorithms. It is not only reduces the features sets but also improve computational
performance and classification accuracy.
5
Network Traffic Classification
Traffic Classification using Clustering algorithms By Jeffrey Erman, Martin Arlitt and
Anirban Mahanti
The author concentrates on semi-supervised learning in their paper and why it is better than
any other approaches. This paper compares different clustering algorithms namely K Means,
DBSCAN and Autoclass used in traffic classification. The K-Means algorithm partitions
objects in a data set into a fixed number of K disjoint subsets. For each cluster, the
partitioning algorithm maximizes the homogeneity within the cluster by minimizing the
square-error. The DBSCAN algorithm is based on the concepts of density reachability and
density connectivity. Density-based algorithms regard clusters as dense areas of objects that
are separated by less dense areas. Autoclass is a Probabilistic model based clustering. This
algorithm allows for the automatic selection of the number of clusters and the soft clustering
of the data. Soft clusters allow the data objects to be fractionally assigned to more than one
cluster.
The results showed that the AutoClass algorithm produces the best overall accuracy.
However, the DBSCAN algorithm has great potential because it places the majority of the
connections in a small subset of the clusters The overall accuracy of the K-Means algorithm
is only marginally lower than that of the AutoClass algorithm, but is more suitable for
problems which requires faster model building time. This helped us to choose the K-means
algorithm.
This paper tells about evaluating the performance of machine learning methods. All traffic
classification techniques use some metrics to evaluate the result. These classification
techniques can be differentiated by using criterion known as predictive accuracy. The
common metrics which are used: False Negative(FN), False Positive(FP), True
Negative(TN), True Positive(TP). A good classifier minimizes FN & FP. Some other
evaluation metrics used are: Accuracy, Recall and Precision. The evaluation approach we are
using is accuracy.
Recent research explored the feasibility of using Machine Learning methods to provide
accurate network traffic classification. Accurate real-time traffic classification is of
fundamental importance to network operations and managements. It serves as the input for
6
Network Traffic Classification
7
Network Traffic Classification
traffic information by client we have constructed boosted classifier with high accuracy. This
system is used to classify application like FTP, Skype, TCP, etc. For constructing c5.0
classifier we have to provide unique dataset and training set to algorithm. We have used
semi-supervised K-means classification where trained data set is not required this is the
advantage of using K-means classification.
4. PROBLEM DEFINATION
The problem that we are considering is the network traffic classification for the better
management of the cloud. Classifying the network traffic helps to improve the Quality of
service, provides more security and helps in maintaining the database in the cloud.
Classifying the network traffic means identifying the different applications which travel in
the network. For this the approach we are considering is the machine learning approach using
semi-supervised learning. K-means suits the problem statement well. The goal of the model is
to learn the applications that travel in the network and predict the applications of unknown
traffic.
8
Network Traffic Classification
5. METHODOLOGY
Our proposed system is based on clustering and assignment technique using semi-supervised
machine learning to analyze and classify network traffic on both labeled and unlabeled
traffic. The following are the steps that we follow to build the model:
9
Network Traffic Classification
6. REFERENCES
10