0 views

Uploaded by Vijendhar Reddy

efficient algorithm for classification of data

- data science a.pptx
- IJDKP
- Techniques of Data Stream Mining for Health Care Application
- Top Data Science Institutes in Hyderabad
- Complex Engineering Problem
- Exercises Classificatiwqeon
- Logistic regression
- 3.IJCSEITRAPR20173
- IJCSE12-04-03-050
- HW1
- Data Science A
- JCIT4-184028 Camera Ready
- Dissertation New2
- A Study on the Performances of Representation Strategies Handled for Text Categorization
- TextClassifier_usingWeka-2
- Performance Analysis of Chain Code Descriptor for Hand Shape Classification
- A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Document Categorization
- A.hybrid.user.Model.for.News.story.classification
- POS Classifier
- Comparative Study of Classification Algorithm for Text Based Categorization

You are on page 1of 37

BACHELOR OF TECHNOLOGY

In

By

R111199

RGUKT-R.K.Valley,

2016-17

RAJIVGANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES

RGUKT-R.K.Valley

Idupulapaya, Kadapa, Andhra Pradesh – 516330.

CERTIFICATE OF EXAMINATION

This is to certify that we have examined the thesis entitled “AN EFFICIENT

ALGORITHM FOR CLASSIFICATION OF DELUGE TIME SERIES DATA USING

LOCAL EXTREMES ", submitted by VIJENDRA REDDY.K (R111199), and hereby

accord our approval of it as a study carried out and presented in a manner required

for its acceptance in partial fulfillment for the award of Bachelor of Technology

degree for which it has been submitted. This approval does not necessarily endorse

or accept every statement made, opinion expressed or conclusions drawn, as

recorded in this thesis. It only signifies the acceptance of this thesis for the purpose

for which it has been submitted.

EXAMINER

2

RAJIVGANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES

RGUKT-R.K.Valley

Idupulapaya, Kadapa, Andhra Pradesh – 516330.

FOR CLASSIFICATION OF DELUGE TIME SERIES DATA USING LOCAL

EXTREMES" submitted by Vijendra Reddy.K(R111199) under our guidance and

supervision for the partial fulfillment for the degree of Bachelor of Technology in

Computer Science and Engineering during the academic session January 2017 -

April 2017 at RGUKT- R.K.Valley.

To the best of our knowledge, the results embodied in this dissertation work

have not been submitted to any university or institute for the award of any degree

or diploma.

Mr. P. Ravi Kumar, Mr. T.Chandrasekhar ,

Assistant Professor in the Dept. of CSE, Assistant Professor in the Dept. of CSE

RGUKT-RKvalley RGUKT-RKvalley.

3

RAJIVGANDHI UNIVERSITY OF KNOWLEDGE TECHNOLOGIES

RGUKT-R.K.Valley

Idupulapaya, Kadapa, Andhra Pradesh – 516330.

DECLARATION

report entitled "AN EFFICIENT ALGORTHM FOR CLASSIFICATION OF DELUGE

TIME SERIES DATA USING LOCAL EXTREMES” done by me under the guidance Mr

Ravikumar P is submitted in partial fulfillment for the degree of Bachelor Technology

in Computer Science and Engineering during the academic session January 2017 - April

2017 at RGUKT- R.KValley.

I also declare that this project is a result of my own effort and has not been copied

or imitated from any source. Citations from any websites are mentioned in the references.

The results embodied in this project report have not been submitted to any other

university of institute for the award of any degree or diploma.

Vijendra Reddy.K

R111199

4

ACKNOWLEDGEMENT

I would like to express our deep sense of gratitude & respect to all those people

behind the screen who guided, inspired and helped me crown all my efforts with success.

We wish to express my gratitude to Mr.Ravi Kumar P for his valuable guidance

at all stages of study, advice, constructive suggestions, supportive attitude and continuous

encouragement, without which it would not be possible to complete this project.

I would also like to extend my deepest gratitude & reverence to the Director of

RGUKT, Idupulapaya Prof.G.Bhagavannarayana and HOD of Computer Science and

Engineering Mr.T.Chandrasekhar for their constant support and encouragement.

Last but not least I express our gratitude to our parents for their constant source of

encouragement and inspiration for me to keep my morals high.

5

Table of Contents

CHAPTER-1 ...........................................................................................................10

1.1. INTRODUCTION: PATTERN RECOGNIZATION ................................................................. 10

1.1 TEMPLATE MATCHING ......................................................................................................... 11

1.2 STATISTICAL CLASSIFICATION .......................................................................................... 12

1.3 SYNTACTICAL OR STRUCTURAL MATCHING ................................................................. 13

1.4 NEURAL NETWORK ............................................................................................................... 14

1.4.1 Artificial Neural Network ................................................................................................... 15

1.4.2 Perceptron ........................................................................................................................... 16

CHAPTER-2 ...........................................................................................................18

2.1. EXISTIG TECHNIQUES FOR CLASSIFICATION ................................................................. 18

2.1.1. Nearest Neighbor Classification ......................................................................................... 18

2.1.2. Algorithm ............................................................................................................................ 19

2.1.3. Parameter selection ............................................................................................................. 19

2.1.4. 1-nn ..................................................................................................................................... 20

2.1.5. K-NN................................................................................................................................... 20

2.1.6. Properties ............................................................................................................................ 21

2.2.1. Training Phase..................................................................................................................... 21

2.2.2. Classification Phase ............................................................................................................ 21

2.3. NAÏVE BAYES .......................................................................................................................... 22

2.3.1. Introduction ......................................................................................................................... 23

2.4. DECISION TREE ....................................................................................................................... 24

2.4.1. Introduction ......................................................................................................................... 24

2.4.2. Advantages .......................................................................................................................... 25

2.4.3. Disadvantages ..................................................................................................................... 25

2.5. K-MEANS CLUSTERING......................................................................................................... 25

2.5.1. Introduction ......................................................................................................................... 25

2.5.2. Procedure ............................................................................................................................ 26

6

2.5.3. Advantages .......................................................................................................................... 26

2.5.4. Disadvantages ..................................................................................................................... 27

2.6. HIERARCHICAL CLUSTERING ............................................................................................. 28

2.6.1. Algorithmic steps for Agglomerative Hierarchical clustering ............................................ 29

2.6.2. Advantages .......................................................................................................................... 29

2.6.3. Disadvantages ..................................................................................................................... 29

CHAPTER-3 ...........................................................................................................31

3.1. PROPOSED TECHNIQUE OF CLASSIFICATION ................................................................. 31

3.1.1. AN EFFICIENT ALGORITHM FOR CLASSIFICATION OF DELUGE TIME SERIES

DATA using local extremes ................................................................................................................ 31

CHAPTER-4 ...........................................................................................................33

1.5 ALGORITHM............................................................................................................................. 33

1.5.1 RESULTS ........................................................................................................................... 35

CHAPTER -6 ..........................................................................................................36

6.1 CONCLUSION ..................................................................................................................................... 36

4. FUTURE SCOPE.............................................................................................36

6. BIBLIOGRAPHY ............................................................................................37

7

LISTOF FIGURES

Figure 1: classification 11

Figure 4: dendogram 23

8

ABSTRACT

Pattern recognition and Classification of time series are important tasks of data mining.

For example, document classification is the process of assigning different documents to

one or more classes, Electro cardiogram (ECG) is used to identify the abnormal behavior

of the heart, speech recognition (SR) system will be trained by an individual person by

reading the data into SR system which can be used for recognition of his voice. If we are

performing classification on large data sets then we have to take care about the amount of

computational time along with the accuracy of classification process. Classification of

time series can be carried out in different ways. In this paper the distance based

classification is discussed which requires a distance or similarity measure in order to

perform classification on time series data.

In this paper we have developed a new classification algorithm named “An efficient

algorithm for classification of deluge time series data using local extremes” which yields

a good performance in terms of time complexity and accuracy. We know that one nearest

neighbor (1NN) euclidean classifier has often been found to perform better than any other

method for time series classification. The proposed algorithm is developed based on

KNN classifier and tested on different datasets.

9

CHAPTER-1

1.1. INTRODUCTION: PATTERN RECOGNIZATION

Pattern recognition as a field of study developed significantly in the 1960s. It was very much an

interdisciplinary subject, covering developments in the areas of statistics, engineering, artificial

intelligence, computer science, psychology and physiology, among others. Some people entered

the field with a real problem to solve. The large numbers of applications, ranging from the

classical ones such as automatic character recognition and medical diagnosis to the more recent

ones, have attracted considerable research effort, with many methods developed and advances

made. Other researchers were motivated by the development of machines with “brain-like

performance that in some way could emulate human performance. There were many over

optimistic and unrealistic claims made, and to some extent there exist strong parallels with the

growth of research on knowledge-based systems in the 1970s and neural networks in the 1980s.

Nevertheless, within these areas significant progress has been made, particularly where the

domain overlaps with probability and statistics, and within recent years there have been many

exciting new developments, both in methodology and applications. These build on the solid

foundations of earlier research and take advantage of increased computational resources readily

available nowadays.

important problems in a variety of engineering and scientific disciplines such as biology,

psychology, medicine, marketing, computer vision, artificial intelligence, and remote sensing. A

pattern could be a fingerprint image, a handwritten cursive word, a human face, or a speech

signal. Given a pattern, its recognition/classification may consist of one of the following two

tasks:

identified as a member of a predefined class.

2) Unsupervised classification (e.g., clustering): In which the pattern is assigned to a

hitherto unknown class.

The recognition problem here is being posed as a classification or categorization task, where the

classes are either defined by the system designer (in supervised classification) or are learned

based on the similarity of patterns (in unsupervised classification).These applications include

data mining (identifying a “pattern”,

10

e.g., correlation, or an outlier in millions of multidimensional patterns), document classification

(efficiently searching text documents), financial forecasting, organization and retrieval of

multimedia databases, and biometrics. The rapidly growing and available computing power,

while enabling faster processing of huge data sets, has also facilitated the use of elaborate and

diverse methods for data analysis and classification. At the same time, demands on automatic

pattern recognition systems are rising enormously due to the availability of large databases and

stringent performance requirements (speed, accuracy, and cost).

The design of a pattern recognition system essentially involves the following three aspects:

1) Data acquisition and preprocessing

2) Data representation

3) Decision making

The problem domain dictates the choice of sensor(s), preprocessing technique, representation

scheme, and the decision making model. It is generally agreed that a well-defined and

sufficiently constrained recognition problem (small intra-class variations and large interclass

variations) will lead to a compact pattern representation and a simple decision making strategy.

Learning from a set of examples (training set) is an important and desired attribute of most

pattern recognition systems.

The four best known approaches for pattern recognition are:

1) Template matching,

2) Statistical classification,

3) Syntactic or structural matching

4) Neural networks.

One of the simplest and the first approaches to pattern recognition is based on a Template

matching (comparison on the basis of the test samples and templates). Matching is the generic

algorithm in pattern recognition, which is used to determine the similarity between two entities

(points, curves, other services) of specified type. The template is the most important element of

recognition in this method. The test sample, which is an effort to recognize indications of

diseases, is compared with template. The comparison is making with respect to the metrics and is

calculated the similarity. It is necessary to make the normalizing changes of sample, in order to

achieve the best similarity.

11

1.2 STATISTICAL CLASSIFICATION

The primary goal of pattern recognition is supervised or unsupervised classification. Among the

various frameworks in which pattern recognition has been traditionally formulated, the statistical

approach has been most intensively studied and used in practice. In machine learning and

statistics, classification is the problem of identifying to which of a set of categories (sub-

populations) a new observation belongs, on the basis of a training set of data containing

observations (or instances) whose category membership is known. The individual observations

are analyzed into a set of quantifiable properties, known as various explanatory variables,

features, etc. These properties may variously be categorical (e.g. "A", "B", "AB" or "O", for

blood type), ordinal (e.g. "large", "medium" or "small"), integer-valued (e.g. the number of

occurrences of a part word in an email) or real-valued (e.g. a measurement of blood pressure).

Some algorithms work only in terms of discrete data and require that real-valued or integer-

valued data be discretized into groups (e.g. less than 5, between 5 and 10, or greater than 10). An

example would be assigning a given email into "spam" or "non-spam" classes or assigning a

diagnosis to a given patient as described by observed characteristics of the patient (gender, blood

pressure, presence or absence of certain symptoms, etc.).

In the statistical approach, each pattern is represented in terms of d features or measurements and

is viewed as a point in a d-dimensional space. The goal is to choose those features that allow

pattern vectors belonging to different categories to occupy compact and disjoint regions in a d-

dimensional feature space. The effectiveness of the representation space (feature set) is

determined by how well patterns from different classes can be separated. Given a set of training

patterns from each class, the objective is to establish decision boundaries in the feature space

which separate patterns belonging to different classes. In the statistical decision theoretic

approach, the decision boundaries are determined by the probability distributions of the patterns

belonging to each class, which must either be specified or learned. One can also take a

discriminant analysis-based approach to classification: First a parametric form of the decision

boundary (e.g., linear or quadratic) is specified; then the “best” decision boundary of the

specified form is found based on the classification of training patterns. Such boundaries can be

constructed using, for example, a mean squared error criterion. The direct boundary construction

approaches are supported. “If you possess a restricted amount of information for solving some

problem, try to solve the problem directly and never solve a more general problem as an

intermediate step. It is possible that the available information is sufficient for a direct solution

but is insufficient for solving a more general intermediate problem.” Statistical pattern

recognition draws from established concepts in statistical decision theory to discriminate among

data from different groups based upon quantitative features of the data. There are a wide variety

of statistical techniques that can be used within the description task for feature extraction,

ranging from simple descriptive statistics to complex transformations. Examples of statistical

feature extraction techniques include mean and standard deviation computations, frequency

count summarizations, Fourier transformations, wavelet transformations, and Hough

transformations. The quantitative features extracted from each object for statistical pattern

recognition are organized into a fixed length feature vector where the meaning associated with

each feature is determined by its position within the vector (i.e., the first feature describes a

particular characteristic of the data, the second feature describes another characteristic, and so

on). The collections of feature vectors generated by the description task are passed to the

12

classification task. Statistical techniques used as classifiers within the classification task include

those based on similarity (e.g., template matching, k-nearest neighbor), probability (e.g., Bays

rule), boundaries e.g., decision trees, neural networks), and clustering (e.g., k-means,

hierarchical).

The quantitative nature of statistical pattern recognition makes it difficult to discriminate among

groups based on the morphological (i.e., shape based or structural) sub patterns and their

interrelationships embedded within the data. This limitation provided the impetus for the

development of a structural approach to pattern recognition that is supported by psychological

evidence pertaining to the functioning of human perception and cognition. Object recognition in

humans has been demonstrated to involve mental representations of explicit, structure oriented

characteristics of objects, and human classification decisions have been shown to be made on the

basis of the degree of similarity between the extracted features and those of a prototype

developed for each group. For instance, Biederman proposed the recognition by components

theory to explain the process of pattern recognition in humans:

(1) The object is segmented into separate regions according to edges defined by differences in

surface characteristics (e.g., luminance, texture, and color),

(2) Each segmented region is approximated by a simple geometric shape, and

(3) The object is identified based upon the similarity in composition between the geometric

representation of the object and the central tendency of each group. This theorized functioning of

human perception and cognition serves as the foundation for the structural approach to pattern

recognition. Structural pattern recognition, sometimes referred to as syntactic pattern recognition

due to its origins in formal language theory, relies on syntactic grammars to discriminate among

data from different groups based upon the morphological interrelationships (or interconnections)

present within the data.

Structural features, often referred to as primitives, represent the sub-patterns (or building blocks)

and the relationships among them which constitute the data. The semantics associated with each

feature are determined by the coding scheme (i.e., the selection of morphologies) used to identify

primitives in the data. Feature vectors generated by structural pattern recognition systems contain

a variable number of features (one for each primitive extracted from the data) in order to

accommodate the presence of superfluous structures which have no impact on classification.

Since the interrelationships among the extracted primitives must also be encoded, the feature

vector must either include additional features describing the relationships among primitives or

take an alternate form, such as a relational graph, that can be parsed by a syntactic grammar. The

emphasis on relationships within data makes a structural approach to pattern recognition most

sensible for data which contain an inherent, identifiable organization such as image data (which

is organized by location within a visual rendering) and time-series data (which is organized by

time); data composed of independent samples of quantitative measurements, such as the Fisher

iris data, lack ordering and require a statistical approach. Methodologies used to extract

structural features from image data such as morphological image processing techniques result in

primitives such as edges, curves, and regions; feature extraction techniques for time series data

include chain codes, piecewise linear regression, and curve fitting which are used to generate

13

primitives that encode sequential, time ordered relationships. The classification task arrives at an

identification using parsing: the extracted structural features are identified as being

representative of a particular group if they can be successfully parsed by a syntactic grammar.

When discriminating among more than two groups, a syntactic grammar is necessary for each

group and the classifier must be extended with an adjudication scheme so as to resolve multiple

successful parsing. In many recognition problems involving complex patterns, it is more

appropriate to adopt a hierarchical perspective where a pattern is viewed as being composed of

simple sub-patterns which are themselves built from yet simpler sub-pattern. The

simplest/elementary Sub-patterns to be recognized are called primitives and the given complex

pattern is represented in terms of the interrelationships between these primitives. In syntactic

pattern recognition, a formal analogy is drawn between the structure of patterns and the syntax of

a language. The patterns are viewed as sentences belonging to a language, primitives are viewed

as the alphabet of the language, and the sentences are generated according to a grammar. Thus, a

large collection of complex patterns can be described by a small number of primitives and

grammatical rules.

The grammar for each pattern class must be inferred from the available training samples.

Structural pattern recognition is intuitively appealing because, in addition to classification, this

approach also provides a description of how the given pattern is constructed from the primitives.

This paradigm has been used in situations where the patterns have a definite structure which can

be captured in terms of a set of rules, such as ECG waveforms, textured images, and shape

analysis of contours.

The implementation of a syntactic approach, however, leads to many difficulties which primarily

have to do with the segmentation of noisy patterns (to detect the primitives) and the inference of

the grammar from training data. The syntactic approach may yield a combinatorial explosion of

possibilities to be investigated, demanding large training sets and very large computational

efforts.

More recently, the addition of artificial neural network techniques theory have been receiving

significant attention. In spite of almost 50 years of research and development in this field, the

general problem of recognizing complex patterns with arbitrary orientation, location, and scale

remains unsolved. New and emerging applications, such as data mining, web searching, retrieval

of multimedia data, face recognition, and cursive handwriting recognition, require robust and

efficient pattern recognition techniques. The main characteristics of neural networks are that they

have the ability to learn complex nonlinear input-output relationships, use sequential training

procedures, and adapt themselves to the data. The most commonly used family of neural

networks for pattern classification tasks is the feed-forward network, which includes multilayer

perception and Radial-Basis Function (RBF) networks. Another popular network is the Self-

Organizing Map (SOM), or Kohonen-Network, which is mainly used for data clustering and

feature mapping. The learning process involves updating network architecture and connection

weights so that a network can efficiently perform a specific classification/clustering task. The

increasing popularity of neural network models to solve pattern recognition problems has been

primarily due to their seemingly low dependence on domain-specific knowledge and due to the

14

availability of efficient learning algorithms for practitioners to use. ANN (ANNs) provides a new

suite of nonlinear algorithms for feature extraction (using hidden layers) and classification (e.g.,

multilayer perceptrons).

In addition, existing feature extraction and classification algorithms can also be mapped on

neural network architectures for efficient (hardware) implementation. An ANN is an information

processing paradigm that is inspired by the way biological nervous systems, such as the brain,

process information. The key element of this paradigm is the novel structure of the information

processing system. It is composed of a large number of highly interconnected processing

elements (neurons) working in unison to solve specific problems. An ANN is configured for a

specific application, such as pattern recognition or data classification, through a learning process.

Learning in biological systems involves adjustments to the synaptic connections that exist

between the neurons. In 2006, the artificial neural network method was used for ECG pattern

recognition. Four types of ECG patterns were chosen from the MIT-BIH database to recognized,

including normal sinus rhythm, premature ventricular contraction, atrial premature beat and left

bundle branch block beat. ECG morphology and R-R interval features were performed as the

characteristic representation of the original ECG signals to be fed into the neural network

models. Three types of artificial neural network models, SOM, BP and LVQ networks were

separately trained and tested for ECG pattern recognition and the experimental results of the

different models have been compared.

Artificial Neural Network has provide an exciting alternative method for solving a variety of

problems in different fields of science and engineering. An artificial neural network was first

evolved by looking at how the human brain works. The human brain has millions of neurons

which communicate with other neurons using electrochemical signals. Signals are received by

neurons through junctions called synapses. The inputs to a neuron are combined in some way

and if it is above a threshold, the neuron fires and an output is sent out to other neurons through

the axon. This principle is also used in artificial neural networks.

Algorithm:

the corresponding weights

w1, w2, · · · , wn, the activation will be

a = w1x1 + w2x2 + · · · + wnxn

The output, o, of the neuron is a function of this activation.

The output of the Neural network depends on the input and the weights in the network. the

training of the neural network consists of making the network give the correct output for every

input. it starts by taking random weights for every link in this network.

when an input is given to the network the output is observed if the output is correct, then nothing

is done to the weights in the network.

If the output is wrong, the error is calculated and it is used to update all the weights in the

network.

This procedure is carried out for a large number of inputs, till the output is the correct output for

every output.

15

Learning in neural networks nothing but making appropriate change in weights.

For updating the weights

Wj = Wj-1 + lamda( y-y') * Xi

Where

Wj = updated weight

Wj-1 = previous weight

lamda = learning rate

y = given output

y' = predicted output

Xi = inputs

(y-y') = error of prediction

1.4.2 Perceptron

Perceptron is a simplest neural network model. A Neural network is a system that is inspired by

biological neural system such as brain. The human brain consist mainly nerve cells called

‘neurons’ , which linked together with other neurons via axons. In general axons are used to

transmit nerve impulses from one neuron to another whenever the neurons are stimulated. Each

neuron receives thousands of connections with other neurons constantly receiving incoming

signals to reach the cell body. If the resulting sum of the signals exceeding a certain threshold a

response is send through the axon. This same signal transforming phenomenon is followed by

the Artificial neural network model. So perceptron is considered as an artificial neuron.

Linearly seperable:

Linear discriminate function can be used to discriminate patterns belonging to two or more

classes. In the above figure the plane or line which discriminate two classes is represented by the

equation. The bias ‘b’ allows to shift the plane towards up or down, which discriminate the two

classes. The change in weight w^t changes the slope of the plane which discriminate the patterns

belongs to two classes.

Perceptron consist of nodes, the summation processor and activation function. Nodes are two

types

1. Input nodes – which are used represent the input attributes

2. Output node – which is used to represent the model output

Each input node is connected via weighted link to the output node. The perceptron consist of

another input known as the bias.

16

A perceptron computes its output value ‘o’ , by performing a weighted sum on its input,

subtracting a bias factor ‘b’ from the sum by using any activation function. A perceptron takes

weighted sum of inputs and produce output as below:

Fig 1: Perceptron

17

CHAPTER-2

2.1. EXISTIG TECHNIQUES FOR CLASSIFICATION

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-

parametric method used for classification and regression. In both cases, the input consists of the

k closest training examples in the feature space. The output depends on whether k-NN is used for

classification or regression:

In k-NN classification, the output is a class membership. An object is classified by a majority

vote of its neighbors, with the object being assigned to the class most common among its k

nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply

assigned to the class of that single nearest neighbor.

In k-NN regression, the output is the property value for the object. This value is the average

of the values of its k nearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is only

approximated locally and all computation is deferred until classification. The k-NN algorithm is

among the simplest of all machine learning algorithms.

Both for classification and regression, it can be useful to assign weight to the contributions of the

neighbors, so that the nearer neighbors contribute more to the average than the more distant ones.

For example, a common weighting scheme consists in giving each neighbor a weight of 1/d,

where d is the distance to the neighbour.

The neighbors are taken from a set of objects for which the class (for k-NN classification) or the

object property value (for k-NN regression) is known. This can be thought of as the training set

for the algorithm, though no explicit training step is required.

A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the

data.[citation needed] The algorithm is not to be confused with k-means, another popular

machine learning technique.

18

2.1.2. Algorithm

The training examples are vectors in a multidimensional feature space, each with a class label.

The training phase of the algorithm consists only of storing the feature vectors and class labels of

the training samples.

In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test

point) is classified by assigning the label which is most frequent among the k training samples

nearest to that query point.

A commonly used distance metric for continuous variables is Euclidean distance. For discrete

variables, such as for text classification, another metric can be used, such as the overlap metric

(or Hamming distance). In the context of gene expression microarray data, for example, k-NN

has also been employed with correlation coefficients such as Pearson and Spearman. Often, the

classification accuracy of k-NN can be improved significantly if the distance metric is learned

with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood

components analysis.

A drawback of the basic "majority voting" classification occurs when the class distribution is

skewed. That is, examples of a more frequent class tend to dominate the prediction of the new

example, because they tend to be common among the k nearest neighbors due to their large

number. One way to overcome this problem is to weight the classification, taking into account

the distance from the test point to each of its k nearest neighbors. The class (or value, in

regression problems) of each of the k nearest points is multiplied by a weight proportional to the

inverse of the distance from that point to the test point. Another way to overcome skew is by

abstraction in data representation. For example, in a self-organizing map (SOM), each node is a

representative (a center) of a cluster of similar points, regardless of their density in the original

training data. K-NN can then be applied to the SOM.

The best choice of k depends upon the data; generally, larger values of k reduce the effect of

noise on the classification, but make boundaries between classes less distinct. A good k can be

selected by various heuristic techniques (see hyper parameter optimization). The special case

where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is

called the nearest neighbor algorithm.

The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or

irrelevant features, or if the feature scales are not consistent with their importance. Much

19

research effort has been put into selecting or scaling features to improve classification. A

particularly popular[citation needed] approach is the use of evolutionary algorithms to optimize

feature scaling. Another popular approach is to scale features by the mutual information of the

training data with the training classes.[citation needed]

In binary (two class) classification problems, it is helpful to choose k to be an odd number as this

avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via

bootstrap method

2.1.4. 1-nn

The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier that

assigns a point x to the class of its closest neighbour in the feature space.

As the size of training data set approaches infinity, the one nearest neighbour classifier

guarantees an error rate of no worse than twice the Bayes error rate (the minimum achievable

error rate given the distribution of the data)

2.1.5. K-NN

1) Training phase :

For each and every class find the mean, the mean pattern has average of all dimensions of

same interval of that class.

2) Classification phase :

Calculate the distance between test pattern and mean of each class.

Sort the distances of every class with the test pattern.

Take the first k classes in the sorted order.

Find the distance between each and every pattern of the class for first k classes

Assign the label of the class having less dissimilarity

Repeat the same procedure for remaining test patterns to get class label

20

2.1.6. Properties

The naive version of the algorithm is easy to implement by computing the distances from the test

example to all stored examples, but it is computationally intensive for large training sets. Using

an appropriate nearest neighbour search algorithm makes k-NN computationally tractable even

for large data sets. Many nearest neighbour search algorithms have been proposed over the years;

these generally seek to reduce the number of distance evaluations actually performed.

k-NN has some strong consistency results. As the amount of data approaches infinity, the two-

class k-NN algorithm is guaranteed to yield an error rate no worse than twice the Bayes error rate

(the minimum achievable error rate given the distribution of the data). Various improvements to

the k-NN speed are possible by using proximity graphs.

Fuzzy classification is the process of grouping ekements into a fuzzy set whose membership

function is defined by the truth value of a fuzzy propositional function.

• For each class k we create three arrays each of size 1 × n which are named as

◦ minki← which holds the min value of training data of class k of dimension i

◦ maxki ←which holds the max value of training data of class k of dimension i

◦ meanki ← which holds the mean value of training data of class k of dimension i

• For every dimension of time series we calculate the min, max and mean of each class from

training data which is used to perform the classification on test data.

In the classification phase of the classifier, we calculate the score for every class of the test data.

We try to find the class to which the test pattern has the highest membership value. The test

pattern is classified as belonging to that class.

Procedure:

1) This subroutine will take test data as well as the above calculated min, max and mean of the

each class as input.

2) For every test pattern, for every dimension i we find the fuzzy membership function of the test

pattern to every class. For a class j,

21

If TSi is less than meanji then

4) Find the membership function of the test pattern for every class j

memj = score j / Σi=1 to N score i

5) We classify the test pattern as belonging to the class j with maximum memj value.

6) We repeat the step 2, 3 and 4 for every remaining test examples.

In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based

on applying Bayes' theorem with strong (naive) independence assumptions between the features.

Naive Bayes has been studied extensively since the 1950s. It was introduced under a different

name into the text retrieval community in the early 1960s, and remains a popular (baseline)

method for text categorization, the problem of judging documents as belonging to one category

or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the

features. With appropriate pre-processing, it is competitive in this domain with more advanced

methods including support vector machines. It also finds application in automatic medical

diagnosis.

Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the

number of variables (features/predictors) in a learning problem. Maximum-likelihood training

can be done by evaluating a closed-form expression, which takes linear time, rather than by

expensive iterative approximation as used for many other types of classifiers.

In the statistics and computer science literature, Naive Bayes models are known under a variety

of names, including simple Bayes and independence Bayes. All these names reference the use

of Bayes' theorem in the classifier's decision rule, but naive Bayes is not (necessarily) a Bayesian

method

22

2.3.1. Introduction

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to

problem instances, represented as vectors of feature values, where the class labels are drawn

from some finite set. It is not a single algorithm for training such classifiers, but a family of

algorithms based on a common principle: all naive Bayes classifiers assume that the value of a

particular feature is independent of the value of any other feature, given the class variable. For

example, a fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter.

A naive Bayes classifier considers each of these features to contribute independently to the

probability that this fruit is an apple, regardless of any possible correlations between the color,

roundness, and diameter features.

For some types of probability models, naive Bayes classifiers can be trained very efficiently in a

supervised learning setting. In many practical applications, parameter estimation for naive Bayes

models uses the method of maximum likelihood; in other words, one can work with the naive

Bayes model without accepting Bayesian probability or using any Bayesian methods.

The Bayesian classification is on the Base theorem. The posterior probability of the class that a

record belongs to a class by using prior probability drawn from the training set. It estimates the

likelihood of the record belonging to each class. The class with highest probability becomes the

class label for the record.

Bayes theorem:

Navie Bayes method makes the assumption that attributes are independent of each other given

the class label. If a set of events are independent, the probability that all other them happen at the

same time equals the product of probabilities for the individual events. Therefore the class

conditional probability p(X/Y=y) is estimated as product of all conditional probabilities

P(X1|Y=y),P(X2|Y=y)…… P(Xd|Y=y).

23

Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers

have worked quite well in many complex real-world situations. In 2004, an analysis of the

Bayesian classification problem showed that there are sound theoretical reasons for the

apparently implausible efficacy of naive Bayes classifiers. Still, a comprehensive comparison

with other classification algorithms in 2006 showed that Bayes classification is outperformed by

other approaches, such as boosted trees or random forests.

An advantage of naive Bayes is that it only requires a small number of training data to estimate

the parameters necessary for classification

2.4.1. Introduction

conditions. It's called a decision tree because it starts with a single box (or root), which then

branches off into a number of solutions, just like a tree. Decision trees are helpful, not only

because they are graphics that help you 'see' what you are thinking, but also because making a

decision tree requires a systematic, documented thought process.

A flow-chart-like tree structure, an Internal node denotes a test on an attribute, a branch

represents an outcome of the test, e.g., Color=red. A leaf node represents a class label or class

label distribution, at each node, one attribute is chosen to split training examples into distinct

classes as much as possible, a new case is classified by following a matching path to a leaf node.

Decision tree generation consists of two phases

Tree construction

• At start, all the training examples are at the root

• Partition examples recursively based on selected attributes

Tree pruning

• Identify and remove branches that reflect noise or outliers

24

• We first make the decision tree to a large depth. Then we start at the bottom and start

removing leaves which are giving us negative returns when compared from the top Decision

trees use multiple algorithms to decide to split a node in two or more sub-nodes.

2.4.2. Advantages

Relatively fast compared to other classification models Obtain similar and sometimes

better accuracy compared to other models Simple and easy to understand Can be converted into

simple and easy to understand classification rules Able to handle both numerical and categorical

data. Other techniques are usually specialized in analyzing datasets that have only one type of

variable.

2.4.3. Disadvantages

Over fitting: Over fitting is one of the most practical difficulty for decision tree models.

This problem gets solved by setting constraints on model parameters and pruning. Not fit for

continuous variables: While working with continuous numerical variables, decision tree loose

information when it categorizes variables in different categories.

2.5.1. Introduction

k-means clustering is a method of vector quantization, originally from signal processing, that is

popular for cluster analysis in data mining. k-means clustering aims to partition n observations

into k clusters in which each observation belongs to the cluster with the nearest mean, serving as

a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficient heuristic

algorithms that are commonly employed and converge quickly to a local optimum. These are

usually similar to the expectation-maximization algorithm for mixtures of Gaussian distributions

via an iterative refinement approach employed by both algorithms. Additionally, they both use

cluster centers to model the data; however, k-means clustering tends to find clusters of

comparable spatial extent, while the expectation-maximization mechanism allows clusters to

have different shapes.

The algorithm has a loose relationship to the k-nearest neighbor classifier, a popular machine

learning technique for classification that is often confused with k-means because of the k in the

name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained by k-means

25

to classify new data into the existing clusters. This is known as nearest centroid classifier or

Rocchio algorithm.

2.5.2. Procedure

2) Calculate the distance between each data point and cluster centers.

3) Assign the data point to the cluster center whose distance from the cluster center is minimum

of all the cluster centers..

5) Recalculate the distance between each data point and new obtained cluster centers.

6) If no data point was reassigned then stop, otherwise repeat from step 3).

2.5.3. Advantages

object, and t is # iterations. Normally, k, t, d << n.

3) Gives best result when data set are distinct or well separated from each other.

26

Fig I: Showing the result of k-means for 'N' = 60 and 'c' = 3

Note: For more detailed figure for k-means algorithm please refer to k-means figure sub page.

2.5.4. Disadvantages

1) The learning algorithm requires apriori specification of the number of cluster centers.

2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means

will not be able to resolve that there are two clusters.

3) The learning algorithm is not invariant to non-linear transformations i.e. with different

representation of data we get different results (data represented in form of cartesian co-ordinates

and polar co-ordinates will give different results).

5) The learning algorithm provides the local optima of the squared error function.

6) Randomly choosing of the cluster center cannot lead us to the fruitful result.

7) Applicable only when mean is defined i.e. fails for categorical data.

27

Fig II: Showing the non-linear data set where k-means algorithm fails

Both this algorithm are exactly reverse of each other. So we will be covering Agglomerative

Hierarchical clustering algorithm in detail.

Agglomerative Hierarchical clustering -This algorithm works by grouping the data one by one

on the basis of the nearest distance measure of all the pairwise distance between the data point.

Again distance between the data point is recalculated but which distance to consider when the

groups has been formed? For this there are many available methods. Some of them are:

4) centroid distance.

28

This way we go on grouping the data until one cluster is formed. Now on the basis of dendogram

graph we can calculate how many number of clusters should be actually present.

Let X = {x1, x2, x3, ..., xn} be the set of data points.

1) Begin with the disjoint clustering having level L(0) = 0 and sequence number m = 0.

2) Find the least distance pair of clusters in the current clustering, say pair (r), (s), according to

d[(r),(s)] = min d[(i),(j)] where the minimum is over all pairs of clusters in the current

clustering.

3) Increment the sequence number: m = m +1.Merge clusters (r) and (s) into a single cluster to

form the next clustering m. Set the level of this clustering to L(m) = d[(r),(s)].

4) Update the distance matrix, D, by deleting the rows and columns corresponding to clusters (r)

and (s) and adding a row and column corresponding to the newly formed cluster. The distance

between the new cluster, denoted (r,s) and old cluster(k) is defined in this way: d[(k), (r,s)] = min

(d[(k),(r)], d[(k),(s)]).

5) If all the data points are in one cluster then stop, else repeat from step 2).

2.6.2. Advantages

2.6.3. Disadvantages

29

2) Time complexity of at least O(n2 log n) is required, where ‘n’ is the number of data points.

3) Based on the type of distance matrix chosen for merging different algorithms can suffer with

one or more of the following:

Fig I: Showing dendogram formed from the data set of size 'N' = 60

30

CHAPTER-3

SERIES DATA using local extremes

So far we discussed various classification techniques now we try to derive an algorithm which

could be better than them in terms of accuracy and time complexity.

Combination of Nearest neighbor and fuzzy classification yields best results. Our proposed

algorithm is combination of nearest neighbor and fuzzy classification together on local

extremes of particular intervals of training patterns.

1) Training phase :

For each and every class find the mean, the mean pattern has average of all dimensions of

same interval of that class.

For each and every class find the maximum, the maximum pattern has maximum of all

dimensions.

For each and every class find the minimum, the minimum pattern has minimum of all

dimensions.

Divide the mean,max and minimum patterns of each class into specified number of

intervals

In every interval find local extreme values

2) Classification phase :

Divide the test pattern of into specified number of intervals

In every interval find local extreme values

If local extreme value is greater than mean extreme

Then

31

Calculate the membership of that extreme

Sum the membership of all intervals of each class separately to get membership of each

class

Assign the label of the class having maximum membership.

Repeat the same procedure for remaining test patterns to get class label

In this paper we have developed a new classification algorithm which yields a good performance

in terms of time complexity and accuracy. We know that one nearest neighbor (1NN) euclidean

classifier has often been found to perform better than any other method for time series

classification. The algorithm is tested and examined on different data sets.

32

CHAPTER-4

1.5 ALGORITHM

An efficient algorithm for classification of deluge time series data using local extremes is

developed in matlab, the algorithm is written below.

Classify_data.mm

A remove label of M

Delete column 1 of A

Test=T[pattern]

a=find mean(class)

Mx=max(interval)

maximum=max(A)

Mx_mx=max(interval)

minimum=min(A)

Mx_mn=max(interval)

33

Mx_tst=max(interval)

mem(class)=membership(Mx,Mx_mx,Mx_mn,Mx_tst)

weight=[weight;mem]

for w 1:length(weight)

[M,I]=max(w)

Label=[label I]

Count +=1

Accuracy=count/length(test)

Membership.m

function mem=membership(Mx,Mx_mx,Mx_mn,Mx_tst)

mem=0;

for I 1:length(Mx_tst)

m1 (tst(i)-Mx(i))/ (Mx_mx(i)-Mx(i));

m=1-m1;

mem=mem+m;

end

d=(Mx(i)-tst(i))/(Mx(i)-Mx_mn(i));

m=1-d;

mem=mem+m;

34

end

end

1.5.1 RESULTS

Dataset Size Our approach KNN KNN

ED (%) manhattan

35

CHAPTER -6

6.1 CONCLUSION

This report introduced to develop an algorithm which yields good performance in terms time

complexity and accuracy to classify deluge amount of data compared to existing techniques. In

this paper various pattern recognition approaches has been discussed. Among the various

traditional approaches of pattern recognition the statistical approach has been most intensively

studied and used in practice. The design of a recognition system requires careful attention to the

following issues: definition of pattern classes, sensing environment, pattern representation,

feature extraction and selection, cluster analysis, classifier design and learning, selection of

training and test samples, and performance evaluation.

The results shows that the algorithm is efficient in terms of time complexity it is almost equivalent to

KNN which is more consistent in terms of performance. The above algorithm is good to run on hadoop

cluster which is made of commodity hardware to achieve good performance and accuracy with low cost,

make sure the data is as large as possible.

5.FUTURE SCOPE

For organizations to not waste precious time and money and manpower over these issues, there is a

need to develop expertise and process of creating small scale prototypes quickly and test them to

demonstrate its correctness, matching with business goals. As we see data mining techniques are going to

be outdated there is a need of new technologies much likely big data frameworks. The above algorithm is

good to run on hadoop cluster which is made of commodity hardware to achieve good performance and

accuracy with low cost, make sure the data is as large as possible.

Future work in this model may include the feature extraction of ECG signal using nonlinear

techniques and feature classification using ANN methods.

36

6. BIBLIOGRAPHY

[1] Ravikumar, P; Devi, V.S,. Fuzzy classification of time series data. Fuzzy Systems (FUZZ), 2013

IEEE international Conference on, vol., no., pp.1,6, 7-10 July 2013.

[2] E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana. The UCR Time Series Classification/Clustering

Homepage: http://www.cs.ucr.edu/_eamonn/time series data/, 2006.

[3] X. Xi, E. Keogh, C. Shelton, L. Wei, C. A. Ratanamahatana. Fast time series classification using

numerosity reduction. ICML ’06.

[4] HAN HU1, YONGGANG WEN, (Senior Member, IEEE), TAT-SENG CHUA, AND XUELONG LI,

(Fellow, IEEE) Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

[5] M. Yong, N. Garegrat, and S. Mohan, ``Towards a resource aware scheduler in Hadoop,'' in Proc. Int.

Conf. Web Services (ICWS), 2009, pp. 102_109

[6] A. Abouzied, K. Bajda-Pawlikowski, J. Huang, D. J. Abadi, and A. Silberschatz, ``Hadoopdb in

action: Building real world applications,'' in Proc. Assoc. Comput. Mach. (ACM) SIGMOD Int. Conf.

Manag. Data, 2010, pp. 1111_1114.

[7] Charu C.Agarwal Datamining the text book IBM T. J. Watson Research Center Yorktown Heights,

New York

37

- data science a.pptxUploaded bysai venkat srinivas
- IJDKPUploaded byLewis Torres
- Techniques of Data Stream Mining for Health Care ApplicationUploaded byijcsn
- Top Data Science Institutes in HyderabadUploaded byprathyusha
- Complex Engineering ProblemUploaded byFahad Ibrar
- Exercises ClassificatiwqeonUploaded byPascDoina
- Logistic regressionUploaded byDevavret Makkar
- 3.IJCSEITRAPR20173Uploaded byTJPRC Publications
- IJCSE12-04-03-050Uploaded byAsif_Akram_4516
- HW1Uploaded byCalvin Low
- Data Science AUploaded byHrh Renu
- JCIT4-184028 Camera ReadyUploaded bydiankusuma123
- Dissertation New2Uploaded byrj300
- A Study on the Performances of Representation Strategies Handled for Text CategorizationUploaded byEditor IJRITCC
- TextClassifier_usingWeka-2Uploaded byAnonymous HTDoUfUgz
- Performance Analysis of Chain Code Descriptor for Hand Shape ClassificationUploaded byijcga
- A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Document CategorizationUploaded byAnonymous 7VPPkWS8O
- A.hybrid.user.Model.for.News.story.classificationUploaded byMichael D
- POS ClassifierUploaded byJack Wu
- Comparative Study of Classification Algorithm for Text Based CategorizationUploaded byesatjournals
- 26Uploaded byShravan Gawande
- KLASIFIKASI DOKUMENUploaded bymadeartama5391
- joshua.pdfUploaded byKavi Sekar
- Wikipedia k Nearest Neighbor AlgorithmUploaded byRadu Cimpeanu
- Peer Review ReportUploaded byPramana Yoga Saputra
- twitterSentiment1228.pdfUploaded bykhitish19
- L14VectorClassifyCS707_022112Uploaded bycecsdistancelab
- Test Learners AssignmentUploaded byAugustine Barlow
- A Comparative StudyUploaded byRahul Bhardwaj
- EFFICIENCY IMPROVEMENT IN CLASSIFICATION TASKS USING NAIVE BAYES.pdfUploaded byiaetsdiaetsd

- Correlation RegressionUploaded byHimanshParmar
- My Coca Cola ProjectUploaded byHannan Khan
- Ch Logistic Regression GlmUploaded byJaynal Abedin Joy
- The Mutual Information and s-ConvexityUploaded byTI Journals Publishing
- Assessing the Impact of Interest Rates on Sme CapitalizationUploaded byFrankie HandsomePaddy
- BayesNet I 6Uploaded byHieu Le Trung
- What is Scientifically Based ResearchUploaded bysacaclavo
- 4 Audit SamplingUploaded byKamil Ubungen Delos Reyes
- Discharge Algorithms for Canal Radial GatesUploaded bybestbryant
- Monte carlo Simulation.xlsxUploaded byAnuj Popli
- 2-Year MBA (Business Management) 2011-12MDUUploaded bydeepak_jha
- Lesson 1 Econometrics HilmerUploaded byyuyumaruti
- A Bass Diffusion Model Analysis- Understanding Alternative Fuel V.pdfUploaded byshreya
- The Basic Concepts of Quantitative Research DesignUploaded byCaca
- 5990-9185ENUploaded bypp
- OM _DF_06Uploaded bySandeep Lolugu
- Writing and Interpreting Research ReportUploaded byMakanjuola Osuolale John
- McFadyen-hedging,timing.pdfUploaded byBá Tùng Phan
- Review AP Exam PsychologyUploaded byLuis
- Hasil Uji ValidUploaded byHildayanti ILyas
- Analysis of Variance PPT @ BEC DOMSUploaded byBabasab Patil (Karrisatte)
- Symmetic Nearest Neighbour Anisotropic 2D image filterUploaded bypi194043
- 广义潜变量模型.pdfUploaded byChuanjin Meng
- 7467-24131-1-SMUploaded byVidya Hegde Kavitasphurti
- Statistics TutorialUploaded byFe Oracion
- Factorial Experiment[1]Uploaded bypavan_1988
- 05606412Uploaded byJaroslav Hnipirdo
- Occupational StressUploaded byDavid Raju Gollapudi
- Statistics 578 Assignemnt 3Uploaded byMia Dee
- Learning With Concept and Knowledge Maps: A Meta-AnalysisUploaded byPhilipOchiengMackio