You are on page 1of 37

AN EFFICIENT ALGORITHM FOR CLASSIFICATION OF

DELUGE TIME SERIES DATA

A project report submitted in partial fulfillment of the requirement for degree of

BACHELOR OF TECHNOLOGY

In

COMPUTER SCIENCE & ENGINEERING

By

VIJENDRA REDDY KURAPALLE


R111199

Under the guidance of

Mr. RAVIKUMAR PENUGONDA

Assistant Professor in the Department of Computer Science & Engineering

RGUKT-R.K.Valley,

Kadapa(Dt), Andhra Pradesh-516330, India

2016-17
RAJIVGANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES

(A.P. Government Act 18 of 2008)

RGUKT-R.K.Valley
Idupulapaya, Kadapa, Andhra Pradesh – 516330.

CERTIFICATE OF EXAMINATION

This is to certify that we have examined the thesis entitled “AN EFFICIENT
ALGORITHM FOR CLASSIFICATION OF DELUGE TIME SERIES DATA USING
LOCAL EXTREMES ", submitted by VIJENDRA REDDY.K (R111199), and hereby
accord our approval of it as a study carried out and presented in a manner required
for its acceptance in partial fulfillment for the award of Bachelor of Technology
degree for which it has been submitted. This approval does not necessarily endorse
or accept every statement made, opinion expressed or conclusions drawn, as
recorded in this thesis. It only signifies the acceptance of this thesis for the purpose
for which it has been submitted.

EXAMINER

2
RAJIVGANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES

(A.P. Government Act 18 of 2008)


RGUKT-R.K.Valley
Idupulapaya, Kadapa, Andhra Pradesh – 516330.

CERTIFICATE OF PROJECT COMPLETION

This is to certify that the project entitled “AN EFFICIENT ALGORTHM


FOR CLASSIFICATION OF DELUGE TIME SERIES DATA USING LOCAL
EXTREMES" submitted by Vijendra Reddy.K(R111199) under our guidance and
supervision for the partial fulfillment for the degree of Bachelor of Technology in
Computer Science and Engineering during the academic session January 2017 -
April 2017 at RGUKT- R.K.Valley.
To the best of our knowledge, the results embodied in this dissertation work
have not been submitted to any university or institute for the award of any degree
or diploma.

Project Guide Head of the Dept.


Mr. P. Ravi Kumar, Mr. T.Chandrasekhar ,
Assistant Professor in the Dept. of CSE, Assistant Professor in the Dept. of CSE
RGUKT-RKvalley RGUKT-RKvalley.

3
RAJIVGANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES

(A.P. Government Act 18 of 2008)


RGUKT-R.K.Valley
Idupulapaya, Kadapa, Andhra Pradesh – 516330.

DECLARATION

I VIJENDRA REDDY.K (R111199), hereby declare that the project


report entitled "AN EFFICIENT ALGORTHM FOR CLASSIFICATION OF DELUGE
TIME SERIES DATA USING LOCAL EXTREMES” done by me under the guidance Mr
Ravikumar P is submitted in partial fulfillment for the degree of Bachelor Technology
in Computer Science and Engineering during the academic session January 2017 - April
2017 at RGUKT- R.KValley.

I also declare that this project is a result of my own effort and has not been copied
or imitated from any source. Citations from any websites are mentioned in the references.

The results embodied in this project report have not been submitted to any other
university of institute for the award of any degree or diploma.

Vijendra Reddy.K
R111199

4
ACKNOWLEDGEMENT

I would like to express our deep sense of gratitude & respect to all those people
behind the screen who guided, inspired and helped me crown all my efforts with success.
We wish to express my gratitude to Mr.Ravi Kumar P for his valuable guidance
at all stages of study, advice, constructive suggestions, supportive attitude and continuous
encouragement, without which it would not be possible to complete this project.
I would also like to extend my deepest gratitude & reverence to the Director of
RGUKT, Idupulapaya Prof.G.Bhagavannarayana and HOD of Computer Science and
Engineering Mr.T.Chandrasekhar for their constant support and encouragement.
Last but not least I express our gratitude to our parents for their constant source of
encouragement and inspiration for me to keep my morals high.

5
Table of Contents

CHAPTER-1 ...........................................................................................................10
1.1. INTRODUCTION: PATTERN RECOGNIZATION ................................................................. 10
1.1 TEMPLATE MATCHING ......................................................................................................... 11
1.2 STATISTICAL CLASSIFICATION .......................................................................................... 12
1.3 SYNTACTICAL OR STRUCTURAL MATCHING ................................................................. 13
1.4 NEURAL NETWORK ............................................................................................................... 14
1.4.1 Artificial Neural Network ................................................................................................... 15
1.4.2 Perceptron ........................................................................................................................... 16

CHAPTER-2 ...........................................................................................................18
2.1. EXISTIG TECHNIQUES FOR CLASSIFICATION ................................................................. 18
2.1.1. Nearest Neighbor Classification ......................................................................................... 18
2.1.2. Algorithm ............................................................................................................................ 19
2.1.3. Parameter selection ............................................................................................................. 19
2.1.4. 1-nn ..................................................................................................................................... 20
2.1.5. K-NN................................................................................................................................... 20
2.1.6. Properties ............................................................................................................................ 21
2.2.1. Training Phase..................................................................................................................... 21
2.2.2. Classification Phase ............................................................................................................ 21
2.3. NAÏVE BAYES .......................................................................................................................... 22
2.3.1. Introduction ......................................................................................................................... 23
2.4. DECISION TREE ....................................................................................................................... 24
2.4.1. Introduction ......................................................................................................................... 24
2.4.2. Advantages .......................................................................................................................... 25
2.4.3. Disadvantages ..................................................................................................................... 25
2.5. K-MEANS CLUSTERING......................................................................................................... 25
2.5.1. Introduction ......................................................................................................................... 25
2.5.2. Procedure ............................................................................................................................ 26

6
2.5.3. Advantages .......................................................................................................................... 26
2.5.4. Disadvantages ..................................................................................................................... 27
2.6. HIERARCHICAL CLUSTERING ............................................................................................. 28
2.6.1. Algorithmic steps for Agglomerative Hierarchical clustering ............................................ 29
2.6.2. Advantages .......................................................................................................................... 29
2.6.3. Disadvantages ..................................................................................................................... 29

CHAPTER-3 ...........................................................................................................31
3.1. PROPOSED TECHNIQUE OF CLASSIFICATION ................................................................. 31
3.1.1. AN EFFICIENT ALGORITHM FOR CLASSIFICATION OF DELUGE TIME SERIES
DATA using local extremes ................................................................................................................ 31

CHAPTER-4 ...........................................................................................................33
1.5 ALGORITHM............................................................................................................................. 33
1.5.1 RESULTS ........................................................................................................................... 35

CHAPTER -6 ..........................................................................................................36
6.1 CONCLUSION ..................................................................................................................................... 36

4. FUTURE SCOPE.............................................................................................36
6. BIBLIOGRAPHY ............................................................................................37

7
LISTOF FIGURES

Figure 1: classification 11

Figure 2: k-means clustering 15

Figure 3: k-means algorithm 22

Figure 4: dendogram 23

Figure 5: results 1 for lighting data 26

Figure 6: results for flowers data 27

Figure 7: results for lighting data 28

Figure 8: final results 30

8
ABSTRACT

Pattern recognition and Classification of time series are important tasks of data mining.
For example, document classification is the process of assigning different documents to
one or more classes, Electro cardiogram (ECG) is used to identify the abnormal behavior
of the heart, speech recognition (SR) system will be trained by an individual person by
reading the data into SR system which can be used for recognition of his voice. If we are
performing classification on large data sets then we have to take care about the amount of
computational time along with the accuracy of classification process. Classification of
time series can be carried out in different ways. In this paper the distance based
classification is discussed which requires a distance or similarity measure in order to
perform classification on time series data.

In this paper we have developed a new classification algorithm named “An efficient
algorithm for classification of deluge time series data using local extremes” which yields
a good performance in terms of time complexity and accuracy. We know that one nearest
neighbor (1NN) euclidean classifier has often been found to perform better than any other
method for time series classification. The proposed algorithm is developed based on
KNN classifier and tested on different datasets.

9
CHAPTER-1
1.1. INTRODUCTION: PATTERN RECOGNIZATION

Pattern recognition as a field of study developed significantly in the 1960s. It was very much an
interdisciplinary subject, covering developments in the areas of statistics, engineering, artificial
intelligence, computer science, psychology and physiology, among others. Some people entered
the field with a real problem to solve. The large numbers of applications, ranging from the
classical ones such as automatic character recognition and medical diagnosis to the more recent
ones, have attracted considerable research effort, with many methods developed and advances
made. Other researchers were motivated by the development of machines with “brain-like
performance that in some way could emulate human performance. There were many over
optimistic and unrealistic claims made, and to some extent there exist strong parallels with the
growth of research on knowledge-based systems in the 1970s and neural networks in the 1980s.
Nevertheless, within these areas significant progress has been made, particularly where the
domain overlaps with probability and statistics, and within recent years there have been many
exciting new developments, both in methodology and applications. These build on the solid
foundations of earlier research and take advantage of increased computational resources readily
available nowadays.

Automatic (machine) recognition, description, classification, and grouping of patterns are


important problems in a variety of engineering and scientific disciplines such as biology,
psychology, medicine, marketing, computer vision, artificial intelligence, and remote sensing. A
pattern could be a fingerprint image, a handwritten cursive word, a human face, or a speech
signal. Given a pattern, its recognition/classification may consist of one of the following two
tasks:

1) Supervised classification (e.g., discriminant analysis): In which the input pattern is


identified as a member of a predefined class.
2) Unsupervised classification (e.g., clustering): In which the pattern is assigned to a
hitherto unknown class.
The recognition problem here is being posed as a classification or categorization task, where the
classes are either defined by the system designer (in supervised classification) or are learned
based on the similarity of patterns (in unsupervised classification).These applications include
data mining (identifying a “pattern”,

10
e.g., correlation, or an outlier in millions of multidimensional patterns), document classification
(efficiently searching text documents), financial forecasting, organization and retrieval of
multimedia databases, and biometrics. The rapidly growing and available computing power,
while enabling faster processing of huge data sets, has also facilitated the use of elaborate and
diverse methods for data analysis and classification. At the same time, demands on automatic
pattern recognition systems are rising enormously due to the availability of large databases and
stringent performance requirements (speed, accuracy, and cost).

The design of a pattern recognition system essentially involves the following three aspects:
1) Data acquisition and preprocessing
2) Data representation
3) Decision making
The problem domain dictates the choice of sensor(s), preprocessing technique, representation
scheme, and the decision making model. It is generally agreed that a well-defined and
sufficiently constrained recognition problem (small intra-class variations and large interclass
variations) will lead to a compact pattern representation and a simple decision making strategy.
Learning from a set of examples (training set) is an important and desired attribute of most
pattern recognition systems.
The four best known approaches for pattern recognition are:

1) Template matching,
2) Statistical classification,
3) Syntactic or structural matching
4) Neural networks.

1.1 TEMPLATE MATCHING

One of the simplest and the first approaches to pattern recognition is based on a Template
matching (comparison on the basis of the test samples and templates). Matching is the generic
algorithm in pattern recognition, which is used to determine the similarity between two entities
(points, curves, other services) of specified type. The template is the most important element of
recognition in this method. The test sample, which is an effort to recognize indications of
diseases, is compared with template. The comparison is making with respect to the metrics and is
calculated the similarity. It is necessary to make the normalizing changes of sample, in order to
achieve the best similarity.

11
1.2 STATISTICAL CLASSIFICATION

The primary goal of pattern recognition is supervised or unsupervised classification. Among the
various frameworks in which pattern recognition has been traditionally formulated, the statistical
approach has been most intensively studied and used in practice. In machine learning and
statistics, classification is the problem of identifying to which of a set of categories (sub-
populations) a new observation belongs, on the basis of a training set of data containing
observations (or instances) whose category membership is known. The individual observations
are analyzed into a set of quantifiable properties, known as various explanatory variables,
features, etc. These properties may variously be categorical (e.g. "A", "B", "AB" or "O", for
blood type), ordinal (e.g. "large", "medium" or "small"), integer-valued (e.g. the number of
occurrences of a part word in an email) or real-valued (e.g. a measurement of blood pressure).
Some algorithms work only in terms of discrete data and require that real-valued or integer-
valued data be discretized into groups (e.g. less than 5, between 5 and 10, or greater than 10). An
example would be assigning a given email into "spam" or "non-spam" classes or assigning a
diagnosis to a given patient as described by observed characteristics of the patient (gender, blood
pressure, presence or absence of certain symptoms, etc.).
In the statistical approach, each pattern is represented in terms of d features or measurements and
is viewed as a point in a d-dimensional space. The goal is to choose those features that allow
pattern vectors belonging to different categories to occupy compact and disjoint regions in a d-
dimensional feature space. The effectiveness of the representation space (feature set) is
determined by how well patterns from different classes can be separated. Given a set of training
patterns from each class, the objective is to establish decision boundaries in the feature space
which separate patterns belonging to different classes. In the statistical decision theoretic
approach, the decision boundaries are determined by the probability distributions of the patterns
belonging to each class, which must either be specified or learned. One can also take a
discriminant analysis-based approach to classification: First a parametric form of the decision
boundary (e.g., linear or quadratic) is specified; then the “best” decision boundary of the
specified form is found based on the classification of training patterns. Such boundaries can be
constructed using, for example, a mean squared error criterion. The direct boundary construction
approaches are supported. “If you possess a restricted amount of information for solving some
problem, try to solve the problem directly and never solve a more general problem as an
intermediate step. It is possible that the available information is sufficient for a direct solution
but is insufficient for solving a more general intermediate problem.” Statistical pattern
recognition draws from established concepts in statistical decision theory to discriminate among
data from different groups based upon quantitative features of the data. There are a wide variety
of statistical techniques that can be used within the description task for feature extraction,
ranging from simple descriptive statistics to complex transformations. Examples of statistical
feature extraction techniques include mean and standard deviation computations, frequency
count summarizations, Fourier transformations, wavelet transformations, and Hough
transformations. The quantitative features extracted from each object for statistical pattern
recognition are organized into a fixed length feature vector where the meaning associated with
each feature is determined by its position within the vector (i.e., the first feature describes a
particular characteristic of the data, the second feature describes another characteristic, and so
on). The collections of feature vectors generated by the description task are passed to the

12
classification task. Statistical techniques used as classifiers within the classification task include
those based on similarity (e.g., template matching, k-nearest neighbor), probability (e.g., Bays
rule), boundaries e.g., decision trees, neural networks), and clustering (e.g., k-means,
hierarchical).

1.3 SYNTACTICAL OR STRUCTURAL MATCHING


The quantitative nature of statistical pattern recognition makes it difficult to discriminate among
groups based on the morphological (i.e., shape based or structural) sub patterns and their
interrelationships embedded within the data. This limitation provided the impetus for the
development of a structural approach to pattern recognition that is supported by psychological
evidence pertaining to the functioning of human perception and cognition. Object recognition in
humans has been demonstrated to involve mental representations of explicit, structure oriented
characteristics of objects, and human classification decisions have been shown to be made on the
basis of the degree of similarity between the extracted features and those of a prototype
developed for each group. For instance, Biederman proposed the recognition by components
theory to explain the process of pattern recognition in humans:
(1) The object is segmented into separate regions according to edges defined by differences in
surface characteristics (e.g., luminance, texture, and color),
(2) Each segmented region is approximated by a simple geometric shape, and
(3) The object is identified based upon the similarity in composition between the geometric
representation of the object and the central tendency of each group. This theorized functioning of
human perception and cognition serves as the foundation for the structural approach to pattern
recognition. Structural pattern recognition, sometimes referred to as syntactic pattern recognition
due to its origins in formal language theory, relies on syntactic grammars to discriminate among
data from different groups based upon the morphological interrelationships (or interconnections)
present within the data.

Structural features, often referred to as primitives, represent the sub-patterns (or building blocks)
and the relationships among them which constitute the data. The semantics associated with each
feature are determined by the coding scheme (i.e., the selection of morphologies) used to identify
primitives in the data. Feature vectors generated by structural pattern recognition systems contain
a variable number of features (one for each primitive extracted from the data) in order to
accommodate the presence of superfluous structures which have no impact on classification.
Since the interrelationships among the extracted primitives must also be encoded, the feature
vector must either include additional features describing the relationships among primitives or
take an alternate form, such as a relational graph, that can be parsed by a syntactic grammar. The
emphasis on relationships within data makes a structural approach to pattern recognition most
sensible for data which contain an inherent, identifiable organization such as image data (which
is organized by location within a visual rendering) and time-series data (which is organized by
time); data composed of independent samples of quantitative measurements, such as the Fisher
iris data, lack ordering and require a statistical approach. Methodologies used to extract
structural features from image data such as morphological image processing techniques result in
primitives such as edges, curves, and regions; feature extraction techniques for time series data
include chain codes, piecewise linear regression, and curve fitting which are used to generate

13
primitives that encode sequential, time ordered relationships. The classification task arrives at an
identification using parsing: the extracted structural features are identified as being
representative of a particular group if they can be successfully parsed by a syntactic grammar.
When discriminating among more than two groups, a syntactic grammar is necessary for each
group and the classifier must be extended with an adjudication scheme so as to resolve multiple
successful parsing. In many recognition problems involving complex patterns, it is more
appropriate to adopt a hierarchical perspective where a pattern is viewed as being composed of
simple sub-patterns which are themselves built from yet simpler sub-pattern. The
simplest/elementary Sub-patterns to be recognized are called primitives and the given complex
pattern is represented in terms of the interrelationships between these primitives. In syntactic
pattern recognition, a formal analogy is drawn between the structure of patterns and the syntax of
a language. The patterns are viewed as sentences belonging to a language, primitives are viewed
as the alphabet of the language, and the sentences are generated according to a grammar. Thus, a
large collection of complex patterns can be described by a small number of primitives and
grammatical rules.
The grammar for each pattern class must be inferred from the available training samples.
Structural pattern recognition is intuitively appealing because, in addition to classification, this
approach also provides a description of how the given pattern is constructed from the primitives.
This paradigm has been used in situations where the patterns have a definite structure which can
be captured in terms of a set of rules, such as ECG waveforms, textured images, and shape
analysis of contours.
The implementation of a syntactic approach, however, leads to many difficulties which primarily
have to do with the segmentation of noisy patterns (to detect the primitives) and the inference of
the grammar from training data. The syntactic approach may yield a combinatorial explosion of
possibilities to be investigated, demanding large training sets and very large computational
efforts.

1.4 NEURAL NETWORK

More recently, the addition of artificial neural network techniques theory have been receiving
significant attention. In spite of almost 50 years of research and development in this field, the
general problem of recognizing complex patterns with arbitrary orientation, location, and scale
remains unsolved. New and emerging applications, such as data mining, web searching, retrieval
of multimedia data, face recognition, and cursive handwriting recognition, require robust and
efficient pattern recognition techniques. The main characteristics of neural networks are that they
have the ability to learn complex nonlinear input-output relationships, use sequential training
procedures, and adapt themselves to the data. The most commonly used family of neural
networks for pattern classification tasks is the feed-forward network, which includes multilayer
perception and Radial-Basis Function (RBF) networks. Another popular network is the Self-
Organizing Map (SOM), or Kohonen-Network, which is mainly used for data clustering and
feature mapping. The learning process involves updating network architecture and connection
weights so that a network can efficiently perform a specific classification/clustering task. The
increasing popularity of neural network models to solve pattern recognition problems has been
primarily due to their seemingly low dependence on domain-specific knowledge and due to the

14
availability of efficient learning algorithms for practitioners to use. ANN (ANNs) provides a new
suite of nonlinear algorithms for feature extraction (using hidden layers) and classification (e.g.,
multilayer perceptrons).
In addition, existing feature extraction and classification algorithms can also be mapped on
neural network architectures for efficient (hardware) implementation. An ANN is an information
processing paradigm that is inspired by the way biological nervous systems, such as the brain,
process information. The key element of this paradigm is the novel structure of the information
processing system. It is composed of a large number of highly interconnected processing
elements (neurons) working in unison to solve specific problems. An ANN is configured for a
specific application, such as pattern recognition or data classification, through a learning process.
Learning in biological systems involves adjustments to the synaptic connections that exist
between the neurons. In 2006, the artificial neural network method was used for ECG pattern
recognition. Four types of ECG patterns were chosen from the MIT-BIH database to recognized,
including normal sinus rhythm, premature ventricular contraction, atrial premature beat and left
bundle branch block beat. ECG morphology and R-R interval features were performed as the
characteristic representation of the original ECG signals to be fed into the neural network
models. Three types of artificial neural network models, SOM, BP and LVQ networks were
separately trained and tested for ECG pattern recognition and the experimental results of the
different models have been compared.

1.4.1 Artificial Neural Network

Artificial Neural Network has provide an exciting alternative method for solving a variety of
problems in different fields of science and engineering. An artificial neural network was first
evolved by looking at how the human brain works. The human brain has millions of neurons
which communicate with other neurons using electrochemical signals. Signals are received by
neurons through junctions called synapses. The inputs to a neuron are combined in some way
and if it is above a threshold, the neuron fires and an output is sent out to other neurons through
the axon. This principle is also used in artificial neural networks.
Algorithm:

If x1, x2, · · · , xn are the inputs to the neuron with


the corresponding weights
w1, w2, · · · , wn, the activation will be
a = w1x1 + w2x2 + · · · + wnxn
The output, o, of the neuron is a function of this activation.
The output of the Neural network depends on the input and the weights in the network. the
training of the neural network consists of making the network give the correct output for every
input. it starts by taking random weights for every link in this network.
when an input is given to the network the output is observed if the output is correct, then nothing
is done to the weights in the network.
If the output is wrong, the error is calculated and it is used to update all the weights in the
network.
This procedure is carried out for a large number of inputs, till the output is the correct output for
every output.

15
Learning in neural networks nothing but making appropriate change in weights.
For updating the weights
Wj = Wj-1 + lamda( y-y') * Xi
Where
Wj = updated weight
Wj-1 = previous weight
lamda = learning rate
y = given output
y' = predicted output
Xi = inputs
(y-y') = error of prediction

Perceptron is the best example for Artificial Neural Network.

1.4.2 Perceptron

Perceptron is a simplest neural network model. A Neural network is a system that is inspired by
biological neural system such as brain. The human brain consist mainly nerve cells called
‘neurons’ , which linked together with other neurons via axons. In general axons are used to
transmit nerve impulses from one neuron to another whenever the neurons are stimulated. Each
neuron receives thousands of connections with other neurons constantly receiving incoming
signals to reach the cell body. If the resulting sum of the signals exceeding a certain threshold a
response is send through the axon. This same signal transforming phenomenon is followed by
the Artificial neural network model. So perceptron is considered as an artificial neuron.
Linearly seperable:
Linear discriminate function can be used to discriminate patterns belonging to two or more
classes. In the above figure the plane or line which discriminate two classes is represented by the
equation. The bias ‘b’ allows to shift the plane towards up or down, which discriminate the two
classes. The change in weight w^t changes the slope of the plane which discriminate the patterns
belongs to two classes.

Perceptron consist of nodes, the summation processor and activation function. Nodes are two
types
1. Input nodes – which are used represent the input attributes
2. Output node – which is used to represent the model output
Each input node is connected via weighted link to the output node. The perceptron consist of
another input known as the bias.

16
A perceptron computes its output value ‘o’ , by performing a weighted sum on its input,
subtracting a bias factor ‘b’ from the sum by using any activation function. A perceptron takes
weighted sum of inputs and produce output as below:

Fig 1: Perceptron

17
CHAPTER-2
2.1. EXISTIG TECHNIQUES FOR CLASSIFICATION

2.1.1. Nearest Neighbor Classification

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-
parametric method used for classification and regression. In both cases, the input consists of the
k closest training examples in the feature space. The output depends on whether k-NN is used for
classification or regression:
In k-NN classification, the output is a class membership. An object is classified by a majority
vote of its neighbors, with the object being assigned to the class most common among its k
nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply
assigned to the class of that single nearest neighbor.

In k-NN regression, the output is the property value for the object. This value is the average
of the values of its k nearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. The k-NN algorithm is
among the simplest of all machine learning algorithms.
Both for classification and regression, it can be useful to assign weight to the contributions of the
neighbors, so that the nearer neighbors contribute more to the average than the more distant ones.
For example, a common weighting scheme consists in giving each neighbor a weight of 1/d,
where d is the distance to the neighbour.
The neighbors are taken from a set of objects for which the class (for k-NN classification) or the
object property value (for k-NN regression) is known. This can be thought of as the training set
for the algorithm, though no explicit training step is required.
A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the
data.[citation needed] The algorithm is not to be confused with k-means, another popular
machine learning technique.

18
2.1.2. Algorithm

The training examples are vectors in a multidimensional feature space, each with a class label.
The training phase of the algorithm consists only of storing the feature vectors and class labels of
the training samples.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test
point) is classified by assigning the label which is most frequent among the k training samples
nearest to that query point.
A commonly used distance metric for continuous variables is Euclidean distance. For discrete
variables, such as for text classification, another metric can be used, such as the overlap metric
(or Hamming distance). In the context of gene expression microarray data, for example, k-NN
has also been employed with correlation coefficients such as Pearson and Spearman. Often, the
classification accuracy of k-NN can be improved significantly if the distance metric is learned
with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood
components analysis.
A drawback of the basic "majority voting" classification occurs when the class distribution is
skewed. That is, examples of a more frequent class tend to dominate the prediction of the new
example, because they tend to be common among the k nearest neighbors due to their large
number. One way to overcome this problem is to weight the classification, taking into account
the distance from the test point to each of its k nearest neighbors. The class (or value, in
regression problems) of each of the k nearest points is multiplied by a weight proportional to the
inverse of the distance from that point to the test point. Another way to overcome skew is by
abstraction in data representation. For example, in a self-organizing map (SOM), each node is a
representative (a center) of a cluster of similar points, regardless of their density in the original
training data. K-NN can then be applied to the SOM.

2.1.3. Parameter selection

The best choice of k depends upon the data; generally, larger values of k reduce the effect of
noise on the classification, but make boundaries between classes less distinct. A good k can be
selected by various heuristic techniques (see hyper parameter optimization). The special case
where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is
called the nearest neighbor algorithm.
The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or
irrelevant features, or if the feature scales are not consistent with their importance. Much

19
research effort has been put into selecting or scaling features to improve classification. A
particularly popular[citation needed] approach is the use of evolutionary algorithms to optimize
feature scaling. Another popular approach is to scale features by the mutual information of the
training data with the training classes.[citation needed]
In binary (two class) classification problems, it is helpful to choose k to be an odd number as this
avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via
bootstrap method

2.1.4. 1-nn

The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier that
assigns a point x to the class of its closest neighbour in the feature space.
As the size of training data set approaches infinity, the one nearest neighbour classifier
guarantees an error rate of no worse than twice the Bayes error rate (the minimum achievable
error rate given the distribution of the data)

2.1.5. K-NN

In this approach there are two phases:

1) Training phase :

 For each and every class find the mean, the mean pattern has average of all dimensions of
same interval of that class.
2) Classification phase :

 Calculate the distance between test pattern and mean of each class.
 Sort the distances of every class with the test pattern.
 Take the first k classes in the sorted order.
 Find the distance between each and every pattern of the class for first k classes
 Assign the label of the class having less dissimilarity
 Repeat the same procedure for remaining test patterns to get class label

20
2.1.6. Properties

The naive version of the algorithm is easy to implement by computing the distances from the test
example to all stored examples, but it is computationally intensive for large training sets. Using
an appropriate nearest neighbour search algorithm makes k-NN computationally tractable even
for large data sets. Many nearest neighbour search algorithms have been proposed over the years;
these generally seek to reduce the number of distance evaluations actually performed.
k-NN has some strong consistency results. As the amount of data approaches infinity, the two-
class k-NN algorithm is guaranteed to yield an error rate no worse than twice the Bayes error rate
(the minimum achievable error rate given the distribution of the data). Various improvements to
the k-NN speed are possible by using proximity graphs.

2.2. FUZZY CLASSIFICATION

Fuzzy classification is the process of grouping ekements into a fuzzy set whose membership
function is defined by the truth value of a fuzzy propositional function.

2.2.1. Training Phase

The steps followed by Algorithm 1 are as follows:


• For each class k we create three arrays each of size 1 × n which are named as
◦ minki← which holds the min value of training data of class k of dimension i
◦ maxki ←which holds the max value of training data of class k of dimension i
◦ meanki ← which holds the mean value of training data of class k of dimension i
• For every dimension of time series we calculate the min, max and mean of each class from
training data which is used to perform the classification on test data.

2.2.2. Classification Phase

In the classification phase of the classifier, we calculate the score for every class of the test data.
We try to find the class to which the test pattern has the highest membership value. The test
pattern is classified as belonging to that class.
Procedure:
1) This subroutine will take test data as well as the above calculated min, max and mean of the
each class as input.
2) For every test pattern, for every dimension i we find the fuzzy membership function of the test
pattern to every class. For a class j,

21
If TSi is less than meanji then

3) For each class j we add up the μji values to get a score j.


4) Find the membership function of the test pattern for every class j
memj = score j / Σi=1 to N score i
5) We classify the test pattern as belonging to the class j with maximum memj value.
6) We repeat the step 2, 3 and 4 for every remaining test examples.

2.3. NAÏVE BAYES

In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based
on applying Bayes' theorem with strong (naive) independence assumptions between the features.
Naive Bayes has been studied extensively since the 1950s. It was introduced under a different
name into the text retrieval community in the early 1960s, and remains a popular (baseline)
method for text categorization, the problem of judging documents as belonging to one category
or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the
features. With appropriate pre-processing, it is competitive in this domain with more advanced
methods including support vector machines. It also finds application in automatic medical
diagnosis.
Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the
number of variables (features/predictors) in a learning problem. Maximum-likelihood training
can be done by evaluating a closed-form expression, which takes linear time, rather than by
expensive iterative approximation as used for many other types of classifiers.
In the statistics and computer science literature, Naive Bayes models are known under a variety
of names, including simple Bayes and independence Bayes. All these names reference the use
of Bayes' theorem in the classifier's decision rule, but naive Bayes is not (necessarily) a Bayesian
method

22
2.3.1. Introduction

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to
problem instances, represented as vectors of feature values, where the class labels are drawn
from some finite set. It is not a single algorithm for training such classifiers, but a family of
algorithms based on a common principle: all naive Bayes classifiers assume that the value of a
particular feature is independent of the value of any other feature, given the class variable. For
example, a fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter.
A naive Bayes classifier considers each of these features to contribute independently to the
probability that this fruit is an apple, regardless of any possible correlations between the color,
roundness, and diameter features.
For some types of probability models, naive Bayes classifiers can be trained very efficiently in a
supervised learning setting. In many practical applications, parameter estimation for naive Bayes
models uses the method of maximum likelihood; in other words, one can work with the naive
Bayes model without accepting Bayesian probability or using any Bayesian methods.
The Bayesian classification is on the Base theorem. The posterior probability of the class that a
record belongs to a class by using prior probability drawn from the training set. It estimates the
likelihood of the record belonging to each class. The class with highest probability becomes the
class label for the record.
Bayes theorem:

Navie Bayes method makes the assumption that attributes are independent of each other given
the class label. If a set of events are independent, the probability that all other them happen at the
same time equals the product of probabilities for the individual events. Therefore the class
conditional probability p(X/Y=y) is estimated as product of all conditional probabilities
P(X1|Y=y),P(X2|Y=y)…… P(Xd|Y=y).

23
Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers
have worked quite well in many complex real-world situations. In 2004, an analysis of the
Bayesian classification problem showed that there are sound theoretical reasons for the
apparently implausible efficacy of naive Bayes classifiers. Still, a comprehensive comparison
with other classification algorithms in 2006 showed that Bayes classification is outperformed by
other approaches, such as boosted trees or random forests.
An advantage of naive Bayes is that it only requires a small number of training data to estimate
the parameters necessary for classification

2.4. DECISION TREE

2.4.1. Introduction

A decision tree is a graphical representation of possible solutions to a decision based on certain


conditions. It's called a decision tree because it starts with a single box (or root), which then
branches off into a number of solutions, just like a tree. Decision trees are helpful, not only
because they are graphics that help you 'see' what you are thinking, but also because making a
decision tree requires a systematic, documented thought process.
A flow-chart-like tree structure, an Internal node denotes a test on an attribute, a branch
represents an outcome of the test, e.g., Color=red. A leaf node represents a class label or class
label distribution, at each node, one attribute is chosen to split training examples into distinct
classes as much as possible, a new case is classified by following a matching path to a leaf node.
Decision tree generation consists of two phases
Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
Tree pruning
• Identify and remove branches that reflect noise or outliers

24
• We first make the decision tree to a large depth. Then we start at the bottom and start
removing leaves which are giving us negative returns when compared from the top Decision
trees use multiple algorithms to decide to split a node in two or more sub-nodes.

2.4.2. Advantages

Relatively fast compared to other classification models Obtain similar and sometimes
better accuracy compared to other models Simple and easy to understand Can be converted into
simple and easy to understand classification rules Able to handle both numerical and categorical
data. Other techniques are usually specialized in analyzing datasets that have only one type of
variable.

2.4.3. Disadvantages

Over fitting: Over fitting is one of the most practical difficulty for decision tree models.
This problem gets solved by setting constraints on model parameters and pruning. Not fit for
continuous variables: While working with continuous numerical variables, decision tree loose
information when it categorizes variables in different categories.

2.5. K-MEANS CLUSTERING

2.5.1. Introduction

k-means clustering is a method of vector quantization, originally from signal processing, that is
popular for cluster analysis in data mining. k-means clustering aims to partition n observations
into k clusters in which each observation belongs to the cluster with the nearest mean, serving as
a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficient heuristic
algorithms that are commonly employed and converge quickly to a local optimum. These are
usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions
via an iterative refinement approach employed by both algorithms. Additionally, they both use
cluster centers to model the data; however, k-means clustering tends to find clusters of
comparable spatial extent, while the expectation-maximization mechanism allows clusters to
have different shapes.

The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine
learning technique for classification that is often confused with k-means because of the k in the
name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means
25
to classify new data into the existing clusters. This is known as nearest centroid classifier or
Rocchio algorithm.

2.5.2. Procedure

Randomly select ‘c’ cluster centers.

2) Calculate the distance between each data point and cluster centers.

3) Assign the data point to the cluster center whose distance from the cluster center is minimum
of all the cluster centers..

4) Recalculate the new cluster center using:

where, ‘ci’ represents the number of data points in ith cluster.

5) Recalculate the distance between each data point and new obtained cluster centers.

6) If no data point was reassigned then stop, otherwise repeat from step 3).

2.5.3. Advantages

1) Fast, robust and easier to understand.

2) Relatively efficient: O(tknd), where n is # objects, k is # clusters, d is # dimension of each


object, and t is # iterations. Normally, k, t, d << n.

3) Gives best result when data set are distinct or well separated from each other.

26
Fig I: Showing the result of k-means for 'N' = 60 and 'c' = 3

Note: For more detailed figure for k-means algorithm please refer to k-means figure sub page.

2.5.4. Disadvantages

1) The learning algorithm requires apriori specification of the number of cluster centers.

2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means
will not be able to resolve that there are two clusters.

3) The learning algorithm is not invariant to non-linear transformations i.e. with different
representation of data we get different results (data represented in form of cartesian co-ordinates
and polar co-ordinates will give different results).

4) Euclidean distance measures can unequally weight underlying factors.

5) The learning algorithm provides the local optima of the squared error function.

6) Randomly choosing of the cluster center cannot lead us to the fruitful result.

7) Applicable only when mean is defined i.e. fails for categorical data.

8) Unable to handle noisy data and outliers.

9) Algorithm fails for non-linear data set.

27
Fig II: Showing the non-linear data set where k-means algorithm fails

2.6. HIERARCHICAL CLUSTERING

Hierarchical clustering algorithm is of two types:

i) Agglomerative Hierarchical clustering algorithm or AGNES (agglomerative nesting) and

ii) Divisive Hierarchical clustering algorithm or DIANA (divisive analysis).

Both this algorithm are exactly reverse of each other. So we will be covering Agglomerative
Hierarchical clustering algorithm in detail.

Agglomerative Hierarchical clustering -This algorithm works by grouping the data one by one
on the basis of the nearest distance measure of all the pairwise distance between the data point.
Again distance between the data point is recalculated but which distance to consider when the
groups has been formed? For this there are many available methods. Some of them are:

1) single-nearest distance or single linkage.

2) complete-farthest distance or complete linkage.

3) average-average distance or average linkage.

4) centroid distance.

5) ward's method - sum of squared euclidean distance is minimized.

28
This way we go on grouping the data until one cluster is formed. Now on the basis of dendogram
graph we can calculate how many number of clusters should be actually present.

2.6.1. Algorithmic steps for Agglomerative Hierarchical clustering

Let X = {x1, x2, x3, ..., xn} be the set of data points.

1) Begin with the disjoint clustering having level L(0) = 0 and sequence number m = 0.

2) Find the least distance pair of clusters in the current clustering, say pair (r), (s), according to
d[(r),(s)] = min d[(i),(j)] where the minimum is over all pairs of clusters in the current
clustering.

3) Increment the sequence number: m = m +1.Merge clusters (r) and (s) into a single cluster to
form the next clustering m. Set the level of this clustering to L(m) = d[(r),(s)].

4) Update the distance matrix, D, by deleting the rows and columns corresponding to clusters (r)
and (s) and adding a row and column corresponding to the newly formed cluster. The distance
between the new cluster, denoted (r,s) and old cluster(k) is defined in this way: d[(k), (r,s)] = min
(d[(k),(r)], d[(k),(s)]).

5) If all the data points are in one cluster then stop, else repeat from step 2).

Divisive Hierarchical clustering - It is just the reverse of Agglomerative Hierarchical approach.

2.6.2. Advantages

1) No apriori information about the number of clusters required.

2) Easy to implement and gives best result in some cases.

2.6.3. Disadvantages

1) Algorithm can never undo what was done previously.

29
2) Time complexity of at least O(n2 log n) is required, where ‘n’ is the number of data points.

3) Based on the type of distance matrix chosen for merging different algorithms can suffer with
one or more of the following:

i) Sensitivity to noise and outliers

ii) Breaking large clusters

iii) Difficulty handling different sized clusters and convex shapes

4) No objective function is directly minimized

5) Sometimes it is difficult to identify the correct number of clusters by the dendogram

Fig I: Showing dendogram formed from the data set of size 'N' = 60

30
CHAPTER-3

3.1. PROPOSED TECHNIQUE OF CLASSIFICATION

3.1.1. AN EFFICIENT ALGORITHM FOR CLASSIFICATION OF DELUGE TIME


SERIES DATA using local extremes

So far we discussed various classification techniques now we try to derive an algorithm which
could be better than them in terms of accuracy and time complexity.
Combination of Nearest neighbor and fuzzy classification yields best results. Our proposed
algorithm is combination of nearest neighbor and fuzzy classification together on local
extremes of particular intervals of training patterns.

In this approach there are two phases:


1) Training phase :
 For each and every class find the mean, the mean pattern has average of all dimensions of
same interval of that class.
 For each and every class find the maximum, the maximum pattern has maximum of all
dimensions.
 For each and every class find the minimum, the minimum pattern has minimum of all
dimensions.
 Divide the mean,max and minimum patterns of each class into specified number of
intervals
 In every interval find local extreme values
2) Classification phase :
 Divide the test pattern of into specified number of intervals
 In every interval find local extreme values
 If local extreme value is greater than mean extreme
Then

31
Calculate the membership of that extreme

 Sum the membership of all intervals of each class separately to get membership of each
class
 Assign the label of the class having maximum membership.
 Repeat the same procedure for remaining test patterns to get class label
In this paper we have developed a new classification algorithm which yields a good performance
in terms of time complexity and accuracy. We know that one nearest neighbor (1NN) euclidean
classifier has often been found to perform better than any other method for time series
classification. The algorithm is tested and examined on different data sets.

32
CHAPTER-4
1.5 ALGORITHM

An efficient algorithm for classification of deluge time series data using local extremes is
developed in matlab, the algorithm is written below.

Classify_data.mm

Mload train dataset

A remove label of M

T LOAD test dataset

Label_real store 1st colum 1os A

Delete column 1 of A

For pattern  1: length(A)

Test=T[pattern]

For class  1:n

a=find mean(class)

for interval  1:5:length(a)

Mx=max(interval)

maximum=max(A)

for interval  1:5:length(maximum)

Mx_mx=max(interval)

minimum=min(A)

for interval  1:5:length(minimum)

Mx_mn=max(interval)

for interval  1:5:length(test)

33
Mx_tst=max(interval)

mem(class)=membership(Mx,Mx_mx,Mx_mn,Mx_tst)

weight=[weight;mem]

for w  1:length(weight)

[M,I]=max(w)

Label=[label I]

For labels  1:length(real_label)

If label == label__real then

Count +=1

Accuracy=count/length(test)

Membership.m

function mem=membership(Mx,Mx_mx,Mx_mn,Mx_tst)

mem=0;

for I  1:length(Mx_tst)

if Mx_tst(i) greater than or equals to Mx(i) then

m1  (tst(i)-Mx(i))/ (Mx_mx(i)-Mx(i));

m=1-m1;

mem=mem+m;

end

if Mx_tst(i) less than Mx(i) then

d=(Mx(i)-tst(i))/(Mx(i)-Mx_mn(i));

m=1-d;

mem=mem+m;

34
end

end

1.5.1 RESULTS
Dataset Size Our approach KNN KNN
ED (%) manhattan

Lighting7 70 52.94 57.54 45.21

Lighting2 60 55.00 81.54 80.21

ECG200 100 70.00 79.67 71.08

35
CHAPTER -6
6.1 CONCLUSION

This report introduced to develop an algorithm which yields good performance in terms time
complexity and accuracy to classify deluge amount of data compared to existing techniques. In
this paper various pattern recognition approaches has been discussed. Among the various
traditional approaches of pattern recognition the statistical approach has been most intensively
studied and used in practice. The design of a recognition system requires careful attention to the
following issues: definition of pattern classes, sensing environment, pattern representation,
feature extraction and selection, cluster analysis, classifier design and learning, selection of
training and test samples, and performance evaluation.

The results shows that the algorithm is efficient in terms of time complexity it is almost equivalent to
KNN which is more consistent in terms of performance. The above algorithm is good to run on hadoop
cluster which is made of commodity hardware to achieve good performance and accuracy with low cost,
make sure the data is as large as possible.

5.FUTURE SCOPE
For organizations to not waste precious time and money and manpower over these issues, there is a
need to develop expertise and process of creating small scale prototypes quickly and test them to
demonstrate its correctness, matching with business goals. As we see data mining techniques are going to
be outdated there is a need of new technologies much likely big data frameworks. The above algorithm is
good to run on hadoop cluster which is made of commodity hardware to achieve good performance and
accuracy with low cost, make sure the data is as large as possible.

Future work in this model may include the feature extraction of ECG signal using nonlinear
techniques and feature classification using ANN methods.

36
6. BIBLIOGRAPHY

[1] Ravikumar, P; Devi, V.S,. Fuzzy classification of time series data. Fuzzy Systems (FUZZ), 2013
IEEE international Conference on, vol., no., pp.1,6, 7-10 July 2013.
[2] E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana. The UCR Time Series Classification/Clustering
Homepage: http://www.cs.ucr.edu/_eamonn/time series data/, 2006.
[3] X. Xi, E. Keogh, C. Shelton, L. Wei, C. A. Ratanamahatana. Fast time series classification using
numerosity reduction. ICML ’06.
[4] HAN HU1, YONGGANG WEN, (Senior Member, IEEE), TAT-SENG CHUA, AND XUELONG LI,
(Fellow, IEEE) Toward Scalable Systems for Big Data Analytics: A Technology Tutorial
[5] M. Yong, N. Garegrat, and S. Mohan, ``Towards a resource aware scheduler in Hadoop,'' in Proc. Int.
Conf. Web Services (ICWS), 2009, pp. 102_109
[6] A. Abouzied, K. Bajda-Pawlikowski, J. Huang, D. J. Abadi, and A. Silberschatz, ``Hadoopdb in
action: Building real world applications,'' in Proc. Assoc. Comput. Mach. (ACM) SIGMOD Int. Conf.
Manag. Data, 2010, pp. 1111_1114.
[7] Charu C.Agarwal Datamining the text book IBM T. J. Watson Research Center Yorktown Heights,
New York

37