You are on page 1of 36

Time Series Analysis Using Wavelets and Entropy Analysis

Vasco Chikwasha
June 5, 2005
Contents
List of Figures ii
Abstract iii
1 Introduction 1
1.1 Stationarity in time series data analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Review of Data Analysis Methods 5
2.1 Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Continuous Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Entropy Measures in Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Approximate Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Sample Entropy Procedure for times series data . . . . . . . . . . . . . . . . . 14
2.3.3 Multiscale Entropy Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Analysis of Time Series 18
3.1 Multiscale Entropy Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Conclusion 24
5 Appendices 24
A Normalizing the rst derivative of the Gaussian (normal) probability density . . . . . . 25
B Derivation of the Mexican Hat Wavelet . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
C Time Series and Entropy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
D Continuous Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
i
E Multiscale Entropy Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Acknowledgements 31
ii
List of Figures
1.1 Simulated stationary time series showing the rst 200 of 1 690 values . . . . . . . . . 3
2.1 Graph of x(t) and its Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Three types of wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Time Series v[1], . . . , v[n] to show Sample Entropy calculation . . . . . . . . . . . . . 15
2.4 Course graining procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 RER time series for an obese subject . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 RER time series for a lean subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 RER time series for an obese subject aged 11 . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Entropy measures for lean and obese subjects aged 11 . . . . . . . . . . . . . . . . . 20
3.5 Time series and entropy measures for obese subject aged 9 (BMI=33) . . . . . . . . 21
3.6 Time series and entropy measures for obese subject aged 9 (BMI=30.7) . . . . . . . 21
3.7 Continuous wavelet transform of an obese subject aged 9, BMI=33. . . . . . . . . . . 23
C.1 Time series for obese subject before and after training . . . . . . . . . . . . . . . . . 27
C.2 Time series and entropy values for lean subject aged 10 . . . . . . . . . . . . . . . . 27
C.3 Time series and entropy values for an obese subject aged 12. . . . . . . . . . . . . . 28
C.4 Time series and entropy values for an obese subject aged 14 . . . . . . . . . . . . . . 28
C.5 Time series and entropy values for an obese subject aged 9. . . . . . . . . . . . . . . 28
C.6 Time series and entropy values for an obese subject aged 10. . . . . . . . . . . . . . 28
D.1 Continuous wavelet transform of an obese subject aged 14, BMI=45.1. . . . . . . . . 29
D.2 Continuous wavelet transform of a lean subject aged 11, BMI=22.9. . . . . . . . . . 29
iii
Abstract
Time series analysis has attracted a lot of attention for a long time because of its application in
a wide range of elds ranging from science, economy, management, medicine, agriculture, among
others. Time series analysis can be used to extract information hidden in data. The classical
techniques for time series data analysis are the linear time series models, including the moving
average models (MA), the autoregressive models (AR) and their mixture, the autoregressive moving
average models (ARMA). In this essay we introduce the multi-scale entropy analysis of bivariate
time series data using the MSE algorithm. In this essay we describe the entropy analysis algorithm
and its application to bivariate time series data. The wavelet transform is also a good technique
for analysing time series signals because of its capacity to highlight details of a signal in time and
frequency resolution. MSE reects changing characteristics with increase in scale, however, it fails
to show conclusively the dierence between time series for data collected prior to giving subjects a
physical training and after the training programme.
Key words: Multiscale entropy (MSE), Sample entropy (SampEn), Respiratory expiratory ratio
(RER).
iv
Chapter 1
Introduction
A time series is a collection of observations made sequentially in time, that is, a collection of
random variables {X
t
, t D Z} with t often interpreted as time. Usually D = N, D = Z or
D = {1, 2, 3, . . . , T}. If the observations are made continuously in time, the time series is said to be
continuous and when the observations are taken only at specic times, usually equally spaced, the
time series is said to be discrete. In time series analysis, there are several objectives which one may
wish to achieve. We can classify these objectives into the categories of description, explanation,
forecasting, control and monitoring.
A modest objective of any time series analysis is to provide a concise description of the past values
of a series or a description of the underlying generating process of the time series. A plot of the data
will show the important features of the series such as trend, seasonality, outliers and discontinuities.
When time series observations are taken on two or more variables, it may be possible to use the
variation in one time series to explain variation in the other time series. This may lead to a
deeper understanding of the mechanism generating the given observations. A more ambitious task
is forecasting future values of a time series using historical data. This is common practise in
business with obvious relevance to sales forecasting and other economically oriented applications.
In industry, manufacturers are concerned with maintaining the quality of products. When a time
series that records the quality of a product is generated, the aim may be to control the process to
keep it within the required tolerance. A problem which arises naturally in medicine and elsewhere
is the monitoring of time series to detect changes in behaviour as they occur. Monitoring of time
series can be used to detect pathological states in individuals with respiratory problems.
Creating a time series plot is useful, both to describe the data and to help formulate a sensible
model. For example, plotting the data may suggest that it is sensible to transform the data. If a
trend is detected in the series and the variance appears to increase with the mean, then we need
to stabilize the variance by taking logarithmic transformations. If there is a trend in the series
and the size of the seasonal eect appears to increase with the mean then it may be necessary to
transform the data to make the seasonal eect constant from year to year. The seasonal eect is
then additive. In particular if the size of the seasonal eect is directly proportional to the mean,
then the seasonal eect is multiplicative and a logarithmic transformation is appropriate to make
the eect additive.
Data transformations are necessary to make non-stationary time series stationary. Mathematical
transformations are also applied to signals to obtain further information from that signal that is
1
not readily available in the raw signal.
We will show how entropy measures in time series can be used as diagnostic tools for pathologi-
cal cases in physiological time series data. In this context, entropy refers to order, regularity or
complexity and has roots in the works of Pincus, Kolmogorov, Shannon, and Grassberger and co-
workers. The idea is that time series with repeating elements arise from more ordered systems and
would be characterised by low entropy values. We will apply the sample entropy algorithm to cal-
culate the entropy measures for a given time series at dierent scales. For the calculation of entropy
at dierent scales, we will use the multiscale entropy algorithm. A comparison of the entropies to
bivariate time series data will help in detecting anomalies in given subjects for corrective measures
where necessary. In this essay we will also show the utility of wavelets in decomposing a time series
both in frequency and time at dierent scales.
In chapter 2 we will discuss the mathematical background for calculating entropy and its physical
meaning. We will start by giving a short description of Fourier transform followed by a short
review of the continuous wavelet transform for time series data analysis. We will present a short
description of the approximate entropy (ApEn) and sample entropy (SampEn) algorithms which
have been used in the analysis of short and noisy physiologic time series [? ]. We conclude Chapter
2 by giving an account of the multiscale entropy (MSE) algorithm which incorporates the SampEn
statistics.
In Chapter 3 we will apply the MSE method to respiratory expiratory ratio (RER) data consisting
of recordings of obese and lean subjects. RER is a ratio between the expiratory volume of carbon
dioxide (CO
2
) and the expiratory volume of oxygen (O
2
). It ranges between 0.7 and 1 (normally but
not always). It indicates the proportion of the utilization of glucose or lipid for energy production
in the body. The data was collected by making the subjects ride on a device and breathe in a
mask. The data are the breath to breath values of this ratio. We will investigate whether physical
training has an eect on the respiratory expiratory ratio of obese subjects.
1.1 Stationarity in time series data analysis
The general model by Box and Jenkins (1976) requires an input signal that is stationary. The
model includes the autoregressive(p) as well as the moving average(q) parameters, and explicitly
includes dierencing(d) in the formulation of the model. In the notation introduced by Box and
Jenkins, these models are summarized as ARIMA(p, d, q). A time series is said to be strictly
stationary if the joint distribution of X(t
1
), X(t
2
), . . . , X(t
n
) is the same as the distribution of
X(t
1
+ ), X(t
2
+ ), . . . , X(t
n
+ ) for all t
1
, t
2
, . . . , t
n
, . Thus, shifting the time origin by an
amount has no eect on the joint distribution, which must therefore depend on the interval
t
1
, t
2
, . . . , t
n
.
Strict stationarity implies that the distribution of X(t) is the same for all t, so that provided the
rst two moments are nite,
(t) = , and
2
(t) =
2
are both constants and do not depend on the value of t. Furthermore, if n = 2 the joint distribution
of X(t
1
) and X(t
2
) depends only on (t
2
t
1
), which is called the lag. Thus, the auto-covariance
2
-0.02
-0.015
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0 50 100 150 200
(a)
-0.2
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200
-0.2
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200
-0.2
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200
(b)
Figure 1.1: (a). Simulated stationary time series showing the rst 200 of the 1 690 values generated
and (b), the corresponding correlogram. The horizontal lines in (b) are at 2/

N.
function (t
1
, t
2
), also depends only on (t
2
t
1
), which we can write as (). Thus,
() = E[X(t) ][X(t +) ]
= Cov[X(t), X(t +)] (1.1)
is called the auto-covariance coecient at lag .
The size of the auto-covariance coecient depends on the units in which X(t) is measured. Thus, it
is useful to standardize the auto-covariance function to produce a function called the autocorrelation
function which is given by
() =
()
(0)
, (1.2)
which measures the correlation between X(t) and X(t +). The autocorrelation function is an an
important guide to the properties of a time series. It measures the correlation between observations
at dierent distances apart and is used to check for randomness in a data set. The autocorrelation
function has the following properties;
(k) = (k)
1 (k) 1
A plot of (k) versus k, called a correlogram, is used to check if there is evidence for any serial
dependence in an observed time series. For a stationary time series, the autocorrelation values
should be near zero for higher lags. If the time series is not stationary a signicant number of
values will be dierent from zero. Figure 1.1.a shows a stationary time series plot with a mean
around zero. The corresponding correlogram is shown in Figure 1.1.b with the horizontal lines at
2/

N. Values outside these lines are signicantly dierent from zero and if a large number of
values are outside these lines, the data will be serially correlated and lack stationarity.
3
In practice, it is often useful to dene stationarity in a less restricting way. A process is weakly
stationary or second order stationary if its mean is constant and its auto-covariance function depends
only on the lag, so that
E[X(t)] = , Cov[X(t), X(t +)] = () (1.3)
Many time series techniques assume weak stationarity so data are often subjected to ltering to
render them approximately stationary prior to analysis. SampEn analysis does not require that any
assumptions be made regarding the stationarity of data. However, in the absence of approximate
stationarity care must be taken with interpretations of results of data analysis.
4
Chapter 2
Review of Data Analysis Methods
In this chapter we describe the wavelet technique and the entropy analysis technique for analysing
time series data. Wavelet analysis does better than Fourier analysis in analyzing time series data
if our interest lies in time and frequency decomposition.
2.1 Fourier Analysis
The basis of Fourier series is that any function x(t) L
2
[0, T] can be decomposed into an innite
sum of cosine and sine functions.
x(t) =

k=0

a
k
cos
2kt
T
+b
k
sin
2kt
T

, t (2.1)
where
a
k
=
1
T

T
0
x(t)cos
2kt
T
dt (2.2)
b
k
=
1
T

T
0
x(t)sin
2kt
T
dt (2.3)
This is due to the fact that {1, cos
2kt
T
, sin
2kt
T
, k = 1, 2, 3, . . .} form an orthonormal basis for the
space L
2
[0, T]. The summation in (2.1) is up to innity, but x(t) can be well approximated in the
L
2
sense by a nite sum with K cosine and sine functions:
X(t) =
K

k=0

a
k
cos
2kt
T
+b
k
sin
2kt
T

(2.4)
5
This decomposition shows that x(t) can be approximated by a sum of sinusoidal shapes at frequen-
cies
k
=
2k
T
, k = 0, 1, 2, . . . , T. In addition, the variability in x(t) as measured by

T
0
|x(t)|
2
dt
can be approximately partitioned into the sum of the variability of the sinusoidal shapes.

T
0
|x(t)|
2
dt =

T
0

k=0
a
k
cos
2kt
T
+b
k
sin
2kt
T

2
dt

k=0
|
k
|
2
(2.5)
A standard technique of time series analysis is to treat the partition (2.1) as an analysis of variance
(ANOVA) for identifying sinusoidal periodicities in a time series data set {x(t), 0 t T}. When
x(t) has sharp discontinuities or a non-sinusoidal wave form, such as rectangular form, then we
would require a very large number, K, of terms in its Fourier series in order to get an adequate
approximation.
Transformations are applied to signals to obtain further information from that signal that is not
readily available in the raw signal. Most time series signals are time domain. When we plot a time
domain signal, we obtain a time amplitude representation of that signal. In many cases, the most
distinguished information is hidden in the frequency content of the signal. The frequency spectrum
of a signal is basically the frequency components (spectral components) of that signal.
Fourier transformation gives the frequency information that we do not readily see in the time-
domain signal, that is, it tells us how much of each frequency exists in a signal. However, we do
not have information on when in time these frequencies occur. Fourier transform is suitable for
non-stationary signals if we are only interested in what spectral components exist in the signal, but
not when in time these occur. We know that a stationary signal is a signal whose frequency does
not change in time, that is, the frequency components exist at all times throughout the duration
of the signal.
Suppose we have a stationary signal given by,
x(t) = cos(210t) + cos(230t) + cos(275t) + cos(2100t). (2.6)
Figure 2.1 shows the graph of x(t). The frequencies of x(t) exist at all times within the entire
duration of the signal. A Fourier transform of x(t) will give us the frequency components of the
signal as shown in Figure 2.1.b. If we are interested in when in time each of the frequencies occur,
wavelet analysis will do a better job.
2.2 Wavelets
A wavelet is a small wave that grows and decays essentially in a limited time period. The con-
trasting notion is a big wave, for example, the sine function which keeps oscillating up and down
on a plot of sin(x) versus x (, ).
There are many kinds of wavelets. One can choose between smooth wavelets, compactly supported
wavelets, wavelets with simple mathematical expressions, wavelets with simple associated lters,
etc.
6
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
x
(
t
)
time
(a) Graph of x(t).
0
1000
2000
3000
4000
5000
6000
0 50 100 150 200 250 300 350 400 450 500
a
m
p
l
i
t
u
d
e
Frequency
(b) Frequency resolution of x(t).
Figure 2.1: Graph of x(t) and its Fourier transform
Just as Fourier analysis is based upon the notion of representing a time series as a linear combi-
nation of sinusoids, the idea underlying wavelet analysis is to represent the time series as a linear
combination of wavelets. Like sines and cosines in Fourier analysis, wavelets are used as basis
functions in representing other functions. In Fourier analysis, each sinusoid is associated with a
particular frequency f, so we can deduce what frequencies are important in a particular time series
by studying the magnitude of the coecients of the various sinusoids in the linear combination.
In contrast each wavelet is associated with two independent variables, time t and scale s, because
each wavelet is essentially non-zero only inside a particular interval of time, namely [t s, t + s].
Within that interval, the wavelet spends roughly an equal amount of time above and below zero,
so it appears to be a small wave centered at t and having a width of 2s. We can thus learn how
a time series varies on particular scales across time if we re-express it using wavelets.
To quantify the notion of wavelet, we consider a real-valued function (.) dened over the real axis
(, ) and satisfying two basic properties:
1. The integral of (.) is zero.

(u)du = 0 (2.7)
2. The square of (.) integrates to unity.

2
(u)du = 1 (2.8)
If (2.7) and (2.8) hold, then for any satisfying 0 < < 1, there must be an interval [-T,T] of nite
length such that,

T
T

2
(u)du < 1 (2.9)
If is very close to zero, then (.) can only deviate insignicantly from zero outside of [-T,T]: its
nonzero activity is essentially limited to the nite interval [-T,T]. Since the length of the interval
7
-1.5
-1
-0.5
0
0.5
1
1.5
-3 -2 -1 0 1 2 3
-1
-0.5
0
0.5
1
-3 -2 -1 0 1 2 3
-1
-0.5
0
0.5
1
-3 -2 -1 0 1 2 3
Figure 2.2: The Haar wavelet on the left, the wavelet related to the rst derivative of the Gaussian
probability density function and the Mexican hat wavelet.
[-T,T] is very small compared to the innite length of the entire real axis (, ), the non-zero
activity of (.) can be considered as limited to a relatively small interval of time. While equation
(2.8) says (.) has to make some excursions away from zero, equation (2.7) tells us that any
excursion above zero has to be cancelled by excursions below zero. Hence (2.7) and (2.8) lead us
to a wavelet (small wave).
There are two main classes of wavelets. We have the continuous wavelet transform (CWT) and the
discrete wavelet transform (DWT).
2.2.1 Continuous Wavelet Transform
As we have already mentioned, a real valued function (t) is called a wavelet if it satises (2.7)
and (2.8). Three such wavelets are given in Figure (2.2).
The rst plot is the Haar wavelet, which is dened by

(H)
(x)

2
, 1 < u 0;
1

2
, 0 < u 1;
0, otherwise
(2.10)
The second wavelet is proportional to the rst derivative of the Gaussian (normal) probability
density function for a random variable with mean zero and variance
2
. If we normalize the
negative of the rst derivative of the Gaussian probability density function,
(u)
e

u
2
2
2

2
, < u < , (2.11)
with mean zero and variance
2
, we obtain the second wavelet in Figure 2.2 given by,

(fdG)
(u)

2ue
u
2
/2
2

3/2

1/4
(2.12)
[See Appendix A].
8
The other wavelet
(Mh)
(u) is proportional to the second derivative of the Gaussian probability
density function. It is given by normalizing the negative of the second derivative of the Gaussian
probability density function (see Appendix B). This is called the Mexican hat wavelet:

(Mh)
(u)
2(1
u
2

2
)e
u
2
/2
2

1/4

3
(2.13)
In short, a wavelet is any function that integrates to zero and is square integrable [? ].
We can use the Haar wavelet to show how localized averages of a signal x(t) vary across time. To
quantify this description, let
A(s, t)
1
s

t+
s
2
t
s
2
x(u)du, t
s
2
u t +
s
2
(2.14)
A(s, t) is the average value of x(t) over the interval [t
s
2
, t +
s
2
], centered at t and has a scale (or
width) of s.
In the physical sciences, average values of signals are of great interest. Examples include daily and
monthly average temperatures amongst many. What is often of more interest than the averages
themselves, however, is how the averages evolve over time. We can show this by looking at the
dierence between adjacent averages. Let us dene
D(s, t) A(s, t +
s
2
) A(s, t
s
2
)
=
1
s

t+s
t
x(u)du
1
s

t
ts
x(u)du. (2.15)
If we let x(t) be the amount of rainfall in South Africa at time t and if we let s be 30days (one
month), then the plot of D(s, t) versus t would tell us how the monthly averages before and after
t dier as a function of time. By increasing the scale s up to a year for example, a plot of D(s, t)
would tell us how much the yearly average rainfall is changing from one year to the next. So,
changes in averages over various scales are of more interest than the averages themselves. The two
integrals in Equation (2.15) involve adjacent non-overlapping intervals which we can combine into
a single integral over the entire real axis. To connect this to the Haar wavelet, we can write
D(s, t) =

s,t
(u)x(u)du (2.16)
where

s,t
(u)

1
s
, t s < u t
1
s
, t < u t +s
0, otherwise
If we specialize to the case s = 1 and t = 0,
9

1,0
(u)

1, 1 < u 0
1, 0 < u 1
0, otherwise,
and then compare

1,0
to the Haar wavelet, we see that

1,0
=

2
(H)
(u). Thus to within a
constant of proportionality, the Haar wavelet tells us how unit scale averages dier before and after
time zero.
We can adjust the Haar wavelet so that we can use it to tell us about changes in x(t) at other scales
and times.

H
s,t
(u)
1

s
,
H

u t
s

2s
, t s < u t
1

2s
, t < u t +s
0, otherwise
(2.17)
Conceptually we form
H
s,t
(u) from
H
(u) by taking the latter and stretching it out so that its
non-zero portion covers [s, s] and then relocating it so that it is centered at time t = 0. It is easy
to check that
H
s,t
(u) obey the dening properties for a wavelet in Equations (2.7) and (2.8). We
can thus, dene the Haar continuous wavelet transform (CWT) of x(t) as,
W
(H)
(s, t)

H
s,t
(u)x(u)du, 0 < s < and < t < (2.18)
W
(H)
(s, t) D(s, t) such that the (t, s)th value of the CWT can be interpreted as the dierence
between adjacent averages of scale s located before and after time t. The CWT is fully equivalent
to the signal x(t) since we can recover x(t) from its CWT:
x(t) =
1
C

W
(H)
(s, u)
(H)
s,t
(u)du

ds
s
2
(2.19)
(as quoted by Percival and Walden (2000), see[? ]), where C

is a constant depending just on

(H)
(u).
If W
(H)
(s, t)/s
2
is large in magnitude, then
(H)
s,t
(u) is an important contributor to re-expressing
x(t) in terms of various wavelets. On the other hand if W
(H)
(s, t)/s
2
is small, then
(H)
s,t
(u) is an
insignicant contributor.
Moreover,

x
2
(t)dt =
1
C

[W
(H)
(s, t)]
2
dt

ds
s
2
. (2.20)
10
The left hand side of (2.20) is called the energy in the signal x(t) (it is, however, not energy
in the physical sense unless x(t) has the proper units). [W
(H)
(s, t)]
2
/s
2
is proportional to an
energy density function that decomposes the energy x(t) across dierent scales and times. Again,
if [W
(H)
(s, t)]
2
/s
2
is large, we can say there is an important contribution to the energy in x(t) and
scales s and time t.
We can replace the Haar wavelet with any other wavelet and the results will still hold. Wavelet
transformation has the advantage over Fourier transformation in its ability to transform a signal
into its components in time and frequency with dierent scale parameters.
2.3 Entropy Measures in Time Series
Quantifying the complexity of physiological time series in health and disease has attracted consid-
erable interest. The premise is that the disease/pathological states are associated with more regular
behaviour in the time series and shows reduced entropy in the data compared to a healthy state.
Multiscale entropy (MSE) analysis is a method of measuring the complexity of a nite length time
series. Traditional entropy-based algorithms quantify the regularity (orderliness) of a time series.
Entropy increases with the degree of disorder and is maximum for completely random systems [?
].
The entropy H(X) of a single discrete random variable X is a measure of its average uncertainty
and is given by:
H(X) =

x
i

p(x
i
) ln
1
p(x
i
)
=

x
i

p(x
i
) ln p(x
i
)
= E[ln p(x
i
)] (2.21)
where X represents a random variable with set of values and probability mass function p(x
i
) =
P{X = x
i
}, x
i
and E represents the expectation operator.
For a time series output from a stochastic process, that is, an indexed sequence of n random
variables {X
i
} = {X
1
, . . . , X
n
}, with set of values
1

2
, . . . ,
n
respectively and X
i

i
, the joint
entropy is dened as:
H
n
= H(X
1
, X
2
, . . . , X
n
)
=

x
1

xnn
p(x
1
, . . . , x
n
) ln
1
p(x
1
, . . . , x
n
)
=

x
1

xnn
p(x
1
, . . . , x
n
) ln p(x
1
, . . . , x
n
) (2.22)
where p(x
1
, . . . , x
n
) = P{X
1
= x
1
, . . . , X
n
= x
n
} is the joint probability for the n variables
X
1
, . . . , X
n
.
11
By applying the chain rule to (2.22), we can write the joint entropy as a summation of conditional
entropies, each of which is a non-negative quantity,
H
n
=
n

i=1
H(X
i
|X
i1
, . . . , X
1
). (2.23)
Proof.
H(X
1
, X
2
) = H(X
1
) +H(X
2
|X
1
)
H(X
1
, X
2
, X
3
) = H(X
1
) +H(X
2
, X
3
|X
1
)
= H(X
1
) +H(X
2
|X
1
) +H(X
3
|X
2
, X
1
)
.
.
.
H(X
1
, X
2
, . . . , X
N
) = H(X
1
) +H(X
2
|X
1
) +H(X
3
|X
2
, X
1
) +. . . +H(X
n
|H
n1
, X
n2
, . . . , X
1
)
=
n

i=1
H(X
i
|X
i1
, . . . , X
1
). (2.24)
We can see that the joint entropy is an increasing function of n. The rate at which the joint entropy
grows with n, that is, the entropy rate H, is dened as:
H = lim
n
1
n
H(X
1
, . . . , X
n
). (2.25)
The state of the system at a certain instant t
i
, is partially determined by its history, t
1
, t
2
, . . . , t
i1
.
However, each new state carries a certain amount of new information. The mean rate of new infor-
mation creation, known as the Kolmogorov-Sinai (KS) entropy, is a useful parameter to characterize
the system dynamics.
Suppose that the phase space of a D-dimensional dynamical system is partitioned into hypercubes
of content
D
and the state of the system is measured at intervals of time . Let p(k
1
, . . . , k
n
) denote
the joint probability that the state of the system is in hypercube k
1
at t = , in the hypercube k
2
at t = 2, and in the hypercube k
n
at t = n. The Kolmogorov-Sinai (KS) entropy is dened as
H
KS
= lim
0
lim
0
lim
n
1
n

k
1
,...,kn
p(k
1
, . . . , k
n
) ln p(k
1
, . . . , k
n
)
= lim
0
lim
0
lim
n
1
n
H
n
. (2.26)
For a stationary process [? ],
lim
n
H
n
n
= lim
n
H(X
n
|X
n1
, . . . , X
1
), (2.27)
and by the chain rule,
H
KS
= lim
0
lim
0
lim
n
(H
n+1
H
n
). (2.28)
H
n+1
H
n
is the information needed to predict in which hypercube the system will be in at (n+1)
given the system states up to n.
12
2.3.1 Approximate Entropy
Numerically only entropies of nite order n can be calculated. As soon as n becomes large with re-
spect to the length of a given time series, the entropy H
n
is underestimated and decays towards zero.
Therefore, Equation (2.28) is of limited use in estimating the entropy of nite length time series.
Several formulas have been proposed in an attempt to estimate the KS-entropy with reasonable
precision. Grassberger and Procaccia[? ], suggested characterizing chaotic signals by calculating
the K
2
entropy, which is a lower bound of the KS-entropy.
Let {X
i
} = {x
1
, . . . , x
i
, . . . , x
N
} be a time series of length N. If we consider the m-length vectors:
u
m
(i) = {x
i
, x
i+1
, . . . , x
i+m1
}, 1 i N m+ 1,
let n
m
i
(r) be the number of vectors u
m
(j) that are similar to the vector u
m
(i), that is, the number
of vectors that satisfy: d[u
m
(i), u
m
(j)] r, where d is the Euclidean distance. The probability that
any vector u
m
(j) is similar to the vector u
m
(i) is given by,
C
m
i
(r) =
n
m
i
(r)
N m+ 1
. (2.29)
The average of the C
m
i
, C
m
(r) is given by,
C
m
(r) =
1
N m+ 1
Nm+1

i=1
C
m
i
(r). (2.30)
This represents the probability that any two vectors are within r of each other. Grassberger and
Procassia [? ] dene the entropy K
2
as,
K
2
= lim
N
lim
m
lim
r0
{ln[C
m+1
(r) C
m
(r)]}. (2.31)
Considering the distance between two vectors as the maximum absolute dierence between their
components: d[u
m
(i), u
m
(j)] = max{|x(i + k) x(j + k)| : 0 k m 1}, Eckmann and Ruelle
(ER) [? ], dened the function,

m
(r) =
1
N m+ 1
Nm+1

i=1
ln C
m
i
(r) (2.32)
and suggested approximating the entropy of the underlying process as,
H
ER
= lim
N
lim
m
lim
r0
[
m
(r)
m+1
(r)]. (2.33)
When N is large,

m+1
(r)
m
(r)
1
N m
Nm

i=1

ln C
m
i
(r)|C
m+1
i
(r)

(2.34)
represents the average of the natural logarithm of the conditional probability that sequences that
are similar to each other for m consecutive data points will remain similar to each other when
13
one more point is known. In other words, if the data are ordered, then templates that are similar
for m points are often similar for m + 1 points, the conditional probability approaches 1, and the
logarithm and entropy approach zero.
Pincus et al [? ], discovered equation (2.33) does not apply to experimental data since the result
is innity for a process with superimposed noise of any magnitude. Because of these limits, this
formula is not suited to the analysis of nite and noisy time series derived from experiments. For
the analysis of short and noisy time series, Pincus [? ], introduced the approximate entropy (ApEn)
family of measures dened by
ApEn(m, r) = lim
N
[
m
(r)
m+1
(r)], (2.35)
which for nite data sets is approximated by the statistic:
ApEn(m, r, N) =
m
(r)
m+1
(r)
=
1
N m+ 1
Nm+1

i=1
ln C
m
i
(r)
1
N m
Nm

i=1
ln C
m+1
i
(r). (2.36)
ApEn is a regularity statistic and has been widely used in physiology and medicine [? ]. Lower
ApEn values are assigned to more regular time series while higher values are assigned to more
irregular, less predictable time series.
Richmann and Moorman [? ] noted that ApEn(m, r, N) is a biased statistic and suggests more sim-
ilarity in a time series than is present. The ApEn algorithm requires that each template contribute
a dened non-zero probability. This constraint is overcome by allowing each template to match
itself. As a consequence of this practice, ApEn(m, r, N) is a biased statistic. ApEn statistics are
biased because the expected value of ApEn is less than the parameter ApEn(m, r). For a discussion
of the bias caused by including self-matches, the reader can refer to Pincus and Goldberger [? ].
Richmann and Moorman [? ], proposed a modication of the ApEn algorithm, the Sample Entropy
(SampEn). SampEn has the advantage of being less dependent on the length of the time series,
and showing relative consistency over a broader range of possible r, m and N values.
2.3.2 Sample Entropy Procedure for times series data
Sample entropy, which we denote as SampEn(m, r, N), depends on three parameters. The rst m
determines the length of the vectors to be considered in the analysis. That is, given N data points,
{v[j] : 1 j N}, form the N m+1 vectors u
m
(i) for {i|1 i N m+1} where u(i) are the
m-length vectors given by,
u
m
(i) = {v
i
, v
i+1
, . . . , v
i+m1
}, 1 i N m+ 1 (2.37)
Figure 2.3 shows a time series with matching points. We will consider two data points to be matching
each other if the absolute dierence between them is r, that is, the tolerance for values considered
to be arbitrarily similar. Costa et al [? ], considered the variation of r to be within 10% and 20%
of the time series standard deviation. Consider a case for m = 2, that is, we consider a template
sequence of two data points v[i], v[i + 1] joined together. The dotted horizontal lines enclosing
the points v[1], v[2] and v[3] represent points within v[1] r, v[2] r, and v[3] r respectively.
Data points that match the point v[1] are enclosed within the same pair of horizontal lines. If
14
60
65
70
75
80
85
90
95
0 10 20 30 40 50
a
c
t
i
v
i
t
y

m
e
a
s
u
r
e
m
e
n
t
time
v[1]
v[2]
v[3]
v[13]
v[14]
v[17]
v[43]
v[44]
v[45]
v[1]
v[2]
v[3]
v[13]
v[14]
v[17]
v[43]
v[44]
v[45]
Figure 2.3: Time Series v[1], . . . , v[n] to show Sample Entropy calculation
we consider a 2-component template sequence (v[1], v[2]) and a 3-component template sequence
(v[1], v[2], v[3]) in Figure 2.3, there are two sequences, (v[13], v[14]) and (v[43], v[44]) that match the
sequence (v[1], v[2]), and only one sequence, (v[43], v[44], v[45]) that matches the template sequence
(v[1], v[2], v[3]). Thus, the number of sequences matching the 2-component template sequence is
two and for the 3-component sequence is one.
WE repeat the process for the next 2-component and 3-component template sequence (v[2], v[3]) and
(v[2], v[3], v[4]) respectively. The number of sequences that match each of the 2 and 3 component
sequences is summed up and added to the previous values. We repeat this process again for all pos-
sible 2-component template and 3-component template sequence matches, (v[3], v[4]), . . . , (v[N
1], v[N]) and (v[3], v[4], v[5]), . . . , (v[N 2], v[N 1], v[N]), to determine the ratio of the total
number of 2-component template matches to the total number of 3-component template matches.
Sample Entropy (SampEn) is the natural logarithm of this ratio. Thus, if we take the sum of the
total of the 2-component template matches to be

nm
i=1
n
m
i
= A, and the sum of the total of the
3-component matches to be equal to

Nm
i=1
n
m+1
i
= B, the sample entropy is given by,
SampEN(m, r, N) = ln

A
B

(2.38)
The computationally intensive aspect of the algorithm is simply counting the numbers A and B,
through counting the number of vectors that match for m and m+ 1 points. We have to consider
two additional computational aspects. First, we do not compare any vector with itself since this
provides no new information. Second, although the vector u
m
(N m+ 1) exists, we do not use it
for comparisons since the vector u
m+1
(N m + 1) is not dened. SampEn diers from ApEn in
that for SampEn self matches are not counted.
While SampEn is used with just one xed value of m, we have an implementation of the algorithm
that eciently calculate SampEn(k, r, N) for all k from 1 up to m. The SampEn algorithm also
calculates the entropy at one scale, that is, for the entire range of the raw time series. Information
about the characteristics of the time series at dierent scales can give important information about
phenomena that is being investigated. Month to month averages of a time series may be more
important than weekly or daily gures. So, analysing a time series at dierent scales can bring out
some of the important features that are not obvious at lower scales. The multiscale entropy (MSE)
15
algorithm enables us to capture some of the dynamics of a signal at higher scales.
2.3.3 Multiscale Entropy Procedure
Given a one dimensional discrete time series, {x
1
, . . . , x
N
}, we construct consecutive coarse grained
time series {y
()
}, corresponding to the scale factor . First the time series is divided into non-
overlapping windows of length . The data points are then averaged inside each window (see Figure
2.4).
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
PSfrag replacements
x
1
x
1
x
2
x
2
x
3
x
3
x
4
x
4
x
5
x
5
x
6
x
6
x
i
x
i
y
1
y
1
y
2
y
2
y
3
y
j
=
x
i
+x
i+1
2
y
j
=
x
i
+x
i+1
+x
i+2
3
. . .
. . .
x
i+1
x
i+1
x
i+2
Scale 2
Scale 3
Figure 2.4: Course graining procedure
In general each element of a coarse-grained time series is calculated according to the equation,
y
()
j
=
1

(j1)+1
x
i
, 1 j
N

(2.39)
For scale one, the time series y
(1)
is the original time series. The length of each coarse grained time
series is equal to the length of the original time series divided by the scale factor, .
Finally, the entropy measure (SampEn) for each coarse grained time series is calculated and plotted
as function of scale factor scale factor. This procedure is called multiscale entropy (MSE) analysis.
We use the MSE curves to compare the relative complexity of normalized time series (same variance
for scale one) based on the following:
1. if for the majority of the scales the entropy values are higher for one time series than for
another, we consider the former to be more complex than the latter.
2. a monotonic decrease of the entropy values indicates the original signal contains information
only in the smallest scales.
The entropy measures reect both the variance of a time series and its correlation properties. To
illustrate this, let us look at two special cases where these two eects can be felt.
16
Case 1
Let us consider two uncorrelated random variables X and Y , with set of values {x
1
, x
2
, . . . , x
N
}
and {y
1
, y
2
, . . . , y
N
} respectively. Assuming that all values are equally probable, p(x
i
) =
1
N
, the
entropy of the random variables X, is given by,
H(x) =
N

i=1
1
N
ln
1
N
= ln N (2.40)
Similarly, H(y) = ln M. If N > M, then H)x) > H(y). Therefore, the larger the set of values of a
random variable, the larger its variance.
Case 2
Consider a periodic signal with variance |a| and a random signal with variance |b|, such that
|a| >> |b|. The entropy of a periodic signal is zero, since each data point occurs with probability
1. Therefore, the entropy of a periodic signal is never larger than the entropy of a random signal
regardless of the variance of the signals.
In the MSE method, r is set at a certain percentage (usually 20%) of the original time series
standard deviation, and remains constant for all scales [? ]. We do not recalculate r for each
coarse grained time series. After the initial normalization, subsequent changes of the variance due
to coarse-graining procedure are related to the temporal structure of the original time series and
should be accounted for by the entropy measure. The initial normalization, however, insures that
the MSE values assigned to the two dierent time series are not a trivial consequence of the possible
dierences between their variances but result from dierent organizational structures.
17
Chapter 3
Analysis of Time Series
Human physiologic systems behaviour is highly variable. Mathematical techniques have been de-
veloped that may be useful for quantifying and describing the complex and chaotic signals that are
characteristic of physiologic systems. Much research in this area has been conducted in relation to
the cardiac system but the same methods used in that research also show promise in relation to
the respiratory system.
In this chapter, we will apply the multiscale entropy algorithm to calculate the entropies of res-
piratory expiratory ratios (RER) data collected on lean and obese subjects. The subjects ride on
a device and breathe in a mask where the RER data is measured. The subjects are taken for a
physical training programme for a period of time. After the end of the training period the subjects
are taken for the RER measurements again. We will investigate the similarity of the time series
obtained from these measures by looking at the entropy measures before and after the training pro-
gramme for the same subject. We will also compare the time series for lean and obese subjects for
any similarity or dierence. The values of the parameters used in the MSE analysis are m = 2, and
r = 0.2. The value of the parameter r is a percentage of the time series standard deviation. This
implementation corresponds to normalizing the time series. The code used for MSE was adopted
from physionet.org and modied for our analysis.
3.1 Multiscale Entropy Analysis
We start intuitively by looking at the time series of an obese
1
subject aged 14 with Body Mass
Index (BMI) of 45.1. Figure 3.1 shows the time series for the respiratory expiratory ratio (RER)
before and after training programme. We calculate the entropy measures of the coarse grained
time series for scales 1 up to 20. Scale 1 corresponds to the original time series. In Figure 3.1.c
we present the results of the analysis. We notice from the results that the entropy values for the
time series before the training exercise is higher than the entropy values after the training exercise
at all scales. Both time series exhibit an initial increase in entropy followed by a gradual decrease
as scale increase. We observe a generally similar behaviour in entropy as we move with scale. For
scale one, which is the only scale considered by traditional single-scale based methods, the entropy
values are almost equal. The time series for the data after the subjects have been given training is
1
Obesity is measured by the Body mass index ratio (BMI) and is given by, BMI =
weight(kg)
(height(m))
2
. It ranges between
19 and 25 for a normal person. Values greater than 25 are considered obese.
18
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
0 100 200 300 400 500 600 700 800
R
E
R
RER number
SG1.txt
(a)
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
0 100 200 300 400 500 600 700 800
R
E
R
RER number
SG2.txt
(b)
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 2 4 6 8 10 12 14 16 18 20
E
n
tro
p
y
Scale factor
before
after
(c)
Figure 3.1: RER time series for an obese subject aged 14 and BMI=45.1: (a) before training
exercise, (b) after training, and (c) entropy measures.
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
0 100 200 300 400 500 600 700 800
R
E
R
RER number
PF1.txt
(a)
0.96
0.98
1
1.02
1.04
1.06
1.08
0 100 200 300 400 500 600 700 800
R
E
R
RER number
PF2.txt
(b)
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c)
Figure 3.2: RER time series for a lean subject aged 11, BMI=22.9: (a) before training, (b) after
training, and (c) entropy measures.
more regular, that is, has less complexity than the time series for the data after the training.
In Figure 3.2 we look at the time series for a lean individual aged 11 with a body mass index (BMI)
of 22.9. The results for the entropy measures are also characteristically the same. However, entropy
measures before training are lower than the entropy measures after the training programme. At
scale 1 the entropy measures are the same. The entropy measures for the two time series increase
to a maximum at scale 4 and gradually decrease up to scale 14. From scale 14 to 17, the entropy
for the time series after the training period attains a constant value and starts decreasing again.
For a lean subject, there seems to be more regularity in the RER before the training programme
than after.
We will look at the time series of an obese individual aged 11 and see if there is a similarity with the
time series of a lean individual of the same age both before and after the training programme. The
time series graphs before and after the training programme are shown in Figure 3.3.a and Figure
3.3.b respectively. The results for the entropy analysis are shown in Figure 3.3.c. The entropy for
the time series after training is greater than the entropy for the time series generated by RER values
before the training for scales 1 and 2. After scale 2, the entropy values for the time series before
training programme become greater than the entropy values for the time series obtained after the
training programme. This agrees with the results for the obese subject shown in Figure 3.1, that
is, entropy values are higher before the subject is given training and low after the training.
We note also that the general characteristics of the entropy values are the same as the ones we
discussed above. The entropy values start by increasing to a maximum and start decreasing with
19
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0 50 100 150 200 250 300 350 400 450 500
R
E
R
RER number
JS1.txt
(a)
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
0 50 100 150 200 250 300 350 400 450 500
R
E
R
RER number
JS2.txt
(b)
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c)
Figure 3.3: RER time series for an obese subject aged 11: (a) before training programme (b) after
training programme and (c) entropy values.
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 2 4 6 8 10 12 14 16 18 20
E
n
t
r
o
p
y
Scale factor
lean
obese
(a)
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 2 4 6 8 10 12 14 16 18 20
E
n
t
r
o
p
y
Scale factor
lean
obese
(b)
Figure 3.4: Entropy measures for lean and obese subjects aged 11: (a) before training exercise was
conducted and (b) after training exercise.
scale. The entropy values for both time series (Figure 3.3.c), attain a constant value at dierent
points in scale. The entropy values for the time series before training assume a constant value
between scales 12 and 14. For the time series generated after the training programme the constant
entropy values are between the scales 16 and 19.
We will compare the entropy measures of obese and lean subjects of the same age (11 years) on
the same axis. In Figure 3.4.a we have the entropy measures for the lean and obese subjects before
training exercise is conducted. The entropy measure for the lean subject is less than the entropy
measure for the obese subjects at all scales. Again, the entropy values initially increase rapidly
to a maximum at scale 3 for both subjects. The entropy values decrease gradually after scale 3.
The results for the entropy measures after the training period are shown in Figure 3.4.b. From
the results we observe that the entropy values for the obese subjects are initially higher than the
entropy values for the lean subject for a short duration, that is, up to scale 3. The general behaviour
of the entropy values is still the same for all time series.
Interesting results are shown for two obese subjects aged 9. Figure 3.5 and Figure 3.6 show the
time series before and after the training programme. The corresponding results for entropy are
shown for both cases. In the two cases, we obtain a large dierence between the entropy values
20
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
0 50 100 150 200 250 300 350 400
R
E
R
RER number
CS1.txt
(a)
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
0 50 100 150 200 250 300 350 400
R
E
R
RER number
CS2.txt
(b)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c)
Figure 3.5: RER time series for obese subject: (a) before training, (b) after training, and (c) entropy
measures
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
1.06
0 100 200 300 400 500 600 700
R
E
R
RER number
KE1.txt
(a)
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
0 100 200 300 400 500 600 700
R
E
R
RER number
KE2.txt
(b)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c)
Figure 3.6: RER time series for obese subject: (a) before training, (b) after training, and (c) entropy
measures
of the coarse-grained time series before and after the training programme. The entropy measures
are low (below 0.6) after the training period for both cases at all scales. However, both cases show
gradual decrease in entropy measures before and after the training period. The monotonic decrease
in entropy with large scale reects the degradation of the control mechanisms regulating the RER
on large time scales. We have seen that entropy values are higher before training is given and lower
after training for obese subjects (Figures 3.1.c, 3.3.c, and 3.5.c).
More time series comparisons for subjects who are obese and lean are shown in Appendix C.
In Figure C.1, results for an obese subject aged thirteen are shown. The entropy measures are
uctuating with scale. The values of entropy after the training programme are higher than the
entropy values before the training programme up to scale 8. After scale 8, the entropy for the time
series before training becomes higher. This uctuation with scale seem to suggest that analysis at
dierent scales can bring out some characteristics that are not obvious in the original time series.
We also observe that the entropy for the time series before the training programme attains an
equilibrium entropy value (about 1.4) from scale 4 to scale 10. After scale 10 the entropy decrease
until scale scale 17. The entropy increase rapidly from scale 17 until scale 20.
Interesting results are also obtained for an obese subject aged 12 in Figure C.3. The entropy values
change with scale for the two time series. We obtain equal entropy measures for the two time series
at 3 dierent points in scale. At scales 7, 12 and 17 the entropy values are the same. The entropy
measures start by initially increasing up to scale 4 and then gradually decrease until scale 19 after
which there is an increase till scale 20. We obtain a generally similar pattern for an obese subject
aged 9 as shown in Figure C.5. In Figure C.2, we have a time series for a lean subject aged 10.
21
We observe that the entropy measures before the training exercise increase rapidly up to scale 3
and start to decrease. Although the entropy values are initially higher for the time series before
the training, they gradually become smaller than the entropy values for the time series after the
the training programme after scale 6. The general behaviour of the entropy values is still the same
as compared to the other time series. In Figure C.4, we have the results for an obese subject aged
14. Entropy values increase rapidly until scale 4 for both time series. After scale 4 the entropies
decrease gradually. The entropy values for the time series after the training programme is higher
than the entropy for the time series before the programme at all scales. In Figure C.6 the entropy
values initially increase slowly until scale 4 with the entropy values for the time series after the
training greater than the the entropy values for the time series before the training. After scale 4
the entropy values for the time series before training become greater than the values for the time
series after the training programme.
We note that the entropy measures for each time series change with scale showing an initial increase
followed by a gradual decrease for all scales. All time series are showing generally the same pattern.
The low entropy measures at higher scales seem to be due to loss of complexity in time series with
coarse-graining. We see from the majority of graphs that for obese subjects, the entropy values
are generally higher before the training exercise than after the training. However in Figure C.4,
the entropy values for an obese subject aged 14 with BMI=30.5, are lower before training and
higher after training. In Figure C.3 and Figure C.5 for obese subjects aged 12 and 9 respectively,
the entropy values before and after training are alternating between high and low. There are no
conclusive results for the time series of these two subjects.
We note that entropy measures change with scale for each time series analysed. The entropy
increases for lower scales (up to scale 4 in most cases) initially, followed by a gradual decrease.
The general characteristics of the observed multiscale entropy measures are generally similar for
the observed time series. There is need to consider other methods to analyse the data. Wavelet
analysis is a good method to consider.
3.2 Wavelet Transform
Wavelet transform can decompose a signal in both time and frequency. A continuous wavelet
transform of the time series from an obese subject and a lean subject are shown in Figure D.1
and Figure D.2 respectively in Appendix D. Figure D.1 is the wavelet transform of the time series
in Figure 3.1 and Figure D.2 is the transform of the time series in Figure 3.2. The frequency is
inversely proportional to scale. Low scales correspond to high frequencies and high scales to low
frequencies. The wavelet transform for the obese subject in Figure D.1 before training contains
many spikes which at high frequency than the transform for the time series after training. The
sharp spikes at high frequencies correspond to noise in the original time series.
In Figure 3.7 we have the continuous wavelet transform of the time series of an obese subject aged
9 with BMI= 33, before and after training. The transform before training shows many spikes at
high frequency. The transform for the time series after training has one dominant spike at high
frequency. This corresponds to the time series in Figure 3.5 which showed high entropy values
before training and low entropy values after training. The wavelet transform after training shows
more regularity compared to the wavelet transform before training. This agrees with the results
obtained using MSE which showed low entropy values after training.
22
For the continuous wavelet transforms we have included a few scales to have a good picture of whats
happening. We can include all the possible scales but it will be dicult to interpret the resultant
picture because of the characteristics of wavelet transforms. Wavelet transform has a good time
0
50
100
150
200
250
300
350
400
period
0
1
2
3
4
5
frequency
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
amplitude
(a) Before training.
0
50
100
150
200
250
300
350
400
period
0
1
2
3
4
5
frequency
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
amplitude
(a) After training.
Figure 3.7: Continuous wavelet transform of an obese subject aged 9, BMI=33.
and poor frequency resolution at high frequencies (low scales), and good frequency and poor time
resolution at low frequencies (high scales). There is also need to carefully chose the type of wavelet
to use for better time series decomposition.
23
Chapter 4
Conclusion
The analysis of complex physiologic time series is quite involving. Multiscale entropy analysis does
not give conclusive results on the dierence between RER data before and after a subject does to
some physical training. It is not clear from the results whether the training improves the utilization
of glucose in energy production using multiscale entropy analysis. However, it is worthwhile to
note that analysing complex time series signals using multiscale entropy analysis can help us detect
some hidden characteristics of the original time series. We have observed that the entropy measures
of the signals decrease with scale. This implies a reduction in time series complexity with scale,
that is, rate of new information creation decreases as scale increase. Although for the majority of
obese cases the entropy before training was generally higher than after training, the results are not
conclusive because in some cases we had entropy values lower before training than after training.
A new method of time series analysis that is worth pursuing is wavelet analysis. The benet of a
wavelet transform of a time series signal lies in its capacity to highlight details of the signal with
time-frequency resolution at dierent scales. However, the use of wavelet analysis needs further
investigation into the type of wavelet to use for a given time series.
24
Appendix A
Normalizing the rst derivative of the Gaussian (normal) probabil-
ity density
Given the Gaussian distribution with mean zero and variance
2
,

0
=
1

2
e

x
2
2
2
(A.1)
Taking the rst derivative,

0
=
xe

x
2
2
2

2
(A.2)
Squaring the rst derivative and integrating,

0
2
=
x
2
e

x
2
2
2

6
2
(A.3)

0
2
dx =
1

6
2

x
2
e

x
2
2
2
dx
=
1

6
2

1
2

1
2

3
2
=
1
4
3

1
2
(A.4)
The normalized negative of the rst derivative of the Gaussian distribution is given by,
=

0
2
dx

1
2
(A.5)
=
xe

x
2
2
2

2
2
3
2

1
4
=
2

xe

x
2
2
2

3
2

1
4
=

2xe

x
2
2
2

3
2

1
4
(A.6)
25
Appendix B
Derivation of the Mexican Hat Wavelet
Give the rst derivative of the Gaussian probability density function (see Appendix A):

0
=
xe

x
2
2
2

2
(B.1)
The second derivative is given by,

0
=
x
2
e

x
2
2
2

x
2
2
2

2
=
e

x
2
2
2

x
2

2
1

(B.2)
Squaring the second derivative and integrating,

0
2
=
x
4
e

x
2

2
2
10

2x
2
e

x
2

2
2
8
+
e

x
2

2
2
6
(B.3)

0
2
dx =
1
2

3
2
(
2
)
5/2
2
10

2
1
2

1/2
(
2
)
3/2
2
8
+

1/2
(
2
)
1/2
2
6
=
3
8
1/2

5
(B.4)
Now,

0
2
dx

1
2
=

8
1/2

5
3
1
2
(B.5)
Thus, normalizing the negative of the second derivative of the Gaussian probability density,

(Mh)
=

0
2
dx

1
2
=
e

x
2
2
2

1
x
2

8
1/2

5
3
1
2
(B.6)
Simplifying gives us,

(Mh)
=
2

1
x
2

x
2
2
2

1/4

3
, (B.7)
which is the Mexican hat wavelet.
26
Appendix C
Time Series and Entropy Measures
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0 100 200 300 400 500 600 700 800 900 1000
R
E
R
RER number
BN1.txt
(a) Before training.
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
0 100 200 300 400 500 600 700 800 900 1000
R
E
R
RER number
BN2.txt
(b) After training.
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c) Entropy measures.
Figure C.1: Time series for obese subject aged 10 before and after training and the corresponding
entropy measure
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0 100 200 300 400 500 600
R
E
R
RER number
CA1.txt
(a) Before training.
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
1.08
0 100 200 300 400 500 600
R
E
R
RER number
CA2.txt
(b) After training.
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c) Entropy measures.
Figure C.2: Time series and entropy values for lean subject aged 10
27
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
0 100 200 300 400 500 600 700 800
R
E
R
RER number
DB1.txt
(a) Before training
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0 100 200 300 400 500 600 700 800
R
E
R
RER number
DB2.txt
(b) After training.
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c) Entropy measures.
Figure C.3: Time series and entropy values for an obese subject aged 12.
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
0 200 400 600 800 1000 1200
R
E
R
RER number
KM1.txt
(a) Before training.
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
0 200 400 600 800 1000 1200
R
E
R
RER number
KM2.txt
(b) After training.
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c) Entropy measures.
Figure C.4: Time series and entropy values for an obese subject aged 14
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
0 50 100 150 200 250 300 350 400 450 500
R
E
R
RER number
NS1.txt
(a) Before training.
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
1.08
0 50 100 150 200 250 300 350 400 450 500
R
E
R
RER number
NS2.txt
(b) After training.
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 2 4 6 8 10 12 14 16 18 20
S
c
a
le
fa
c
to
r
Entropy
before
after
(c) Entropy measures.
Figure C.5: Time series and entropy values for an obese subject aged 9.
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
1.05
0 50 100 150 200 250 300 350 400
VH1.txt
(a) Before training.
0.95
0.96
0.97
0.98
0.99
1
1.01
1.02
1.03
1.04
0 100 200 300 400 500 600
VH2.txt
(b) After training.
0
0.5
1
1.5
2
2.5
3
3.5
0 2 4 6 8 10 12 14 16 18 20
VH.mse using 1:2
VH.mse using 1:3
(c) Entropy measures.
Figure C.6: Time series and entropy values for an obese subject aged 10.
28
Appendix D
Continuous Wavelet Transform
0
100
200
300
400
500
600
700
800
period
0
1
2
3
4
5
frequency
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
amplitude
(a) Before training.
0
100
200
300
400
500
600
700
800
period
0
1
2
3
4
5
frequency
0
0.05
0.1
0.15
0.2
0.25
0.3
amplitude
(b) After training.
Figure D.1: Continuous wavelet transform of an obese subject aged 14, BMI=45.1.
0
100
200
300
400
500
600
700
800
period
0
1
2
3
4
5
frequency
0
0.05
0.1
0.15
0.2
0.25
0.3
amplitude
(a) Before training.
0
100
200
300
400
500
600
700
800
period
0
1
2
3
4
5
frequency
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
amplitude
(b) After training.
Figure D.2: Continuous wavelet transform of a lean subject aged 11, BMI=22.9.
29
Appendix E
Multiscale Entropy Code
/* Perform the coarse-graining procedure. */
for (j = 1; j <= scale_max; j += scale_step){
CoarseGraining(j);
/* Calculate SampEn for each scale and each r value. */
c = 0;
for (r = r_min; r <= (r_max*1.0000000001); r += r_step){
SampleEntropy(l, r, sd, j);
c++;
}
}
/* Print results. */
PrintResults(nfile);
}
/* Process multiple data files. */
if (flag == 1) {
/* Read the list of data files. */
for (l = 0; fscanf(fl, "%s", file) == 1; l++) {
nfile++; /*count the number of data files*/
if ((pin = fopen(file, "r")) == NULL) { /* open each data file */
fprintf(stderr, "%s : Cannot open file %s\n", prog, file);
exit(1);
}
/* Read the data. */
ReadData();
30
Acknowledgements
I am greatful to those who contributed to the shape and substance of this essay, through discusions
directly or indirectly. Thanks to my supervisor, Dr. Gareth Witten who introdused me to this
research domain, Lisa, Tutor at AIMS, for her patience and assistance by reading through the
drafts of this essay and her helpful suggestions. I thank my fellow students at AIMS and all the
tutors. I beg for forgiveness if I fail to acknowledge anybody adequately.
31

You might also like