You are on page 1of 4

Machine Learning Project 2014-2015 (Oct)

Who has my phone?


1

Introduction

Every person walks in a unique manner. This uniqueness is what we want to quantify in this
project. Nowadays, all smart phones have a range of sensors build-in such as accelerometers,
gyroscopes, or GPS. We are going to build on the data generated by these sensors to solve our
task at hand: Can you detect which person is walking around with a phone based on how the
phone behaves in their trousers pocket? For this to be possible, it is necessary to apply machine
learning techniques to detect a combination of characteristics that is unique to each person.

Problem and Approach

Your task is to classify 10 unlabeled walks based on a training set of 58 walks. In total 30 persons
have participated in the data collection, but in this project you only have to consider three labels:
Wannes, Leander and Other.
Data is collected using an Android app sampling the accelerometer, gyroscope and magnetometer
at a speed of 50Hz (the .json-files). Acceleration is measured along three axes according to the
phone coordinate system with the gravitational component removed. Since this would mean
that the data are different depending on how you put the phone in your pocket (e.g., screen inor outside), this is not an ideal representation. Therefore, the data is preprocessed to express
acceleration according to the geographical coordinate system (the .csv-files):
X
Y
Z

Acceleration along the West-East axis


Acceleration along the (magnetic) South-North axis
Acceleration orthogonal to the earth surface
Accelerometer data

Accelerometer data

In order to use raw accelerometer data for classification techniques like Naive Bayes, SVMs
or Decision trees it is often necessary to first extract features from the data (see [1, 2, 3, 4] for
0

Collecting Data
With rotation correction
g

-2
-4

-2
-4

-6

Collecting Data

-6

-8

-8

-10

30073

30074

Based on phone reference frame

30075

30076

30077

30078

30079

30073

time (s)

30074

30075

g
31123

31124

31125

31126

31127

Earth Reference Frame

31128

time (s)
30073

30074

30075

30076

30077

Y
Z
time (s)

30078

m/sg 2

2
m/s
g g

Top-down, screen inside

-4
-4
-6
-6
-8
-8

-4
-4
-6
-- 68
-8

30113

30114

30115

30116

30117

2
m/s
gg g

30118

time (s)
31123
30073

31124
30074

31125
30075

31126
30076
Y (s)
Z
time
time (s)

31127
30077

31125
30075

31126
30076

31127
30077

Y
Z
time (s)

31128
30078

30079

Accelerometer
data
z
xy

30112

-- 192
-14

31128
30078

16
4
0
4
2
-1
2
0
-2
- 02
-3
-- 2
4
-4
-4
- -56
-- 6
- 68
-8
-- 170
-- 180

30079

30113

30114

31123

30074

31124

30075

31125

Y30076Z
time (s)
time (s)

30116

31126

30077

30117

31127

30078

30118

31128

30079

z
xy
FFT(z)

4
10

8
8

30115

time (s)
30073

z
xy
FFT(z)

10

8
2
6

6
6

Top-up, screen outside

0
0
-2
-2
1
-4
-4

m/s2

ROTATION_VECTOR, http://developer.android.com/reference/android/hardware/
SensorManager.html

4
24
2
2

amplitude
gg

m/s

30074

10

Top-up, screen inside


30112

-4
-8
-6

2
amplitude
gg

31124

time (s)
30073

1380
28
6

60
0
-2
4
-- 24
2
-6
-0
4
-8
20
- --16

0
2 4
2
-2
0
-- 42

-4
1- 6
-6
-- 88

-6
-6

-10
-10
-12

-8
0

31123

30079

2
82

2
m/s
gg g

-2
-2

Accelerometer data
z
xy

10
1 64
8
1 26
4
1 04

Top-up, screen outside

30079

00

00
-2
-2

-10

Top-up, screen outside

30078

2
2

22

Top-up,
screen inside
Top-up, screen
inside

30077

6
4
4

44

Top-down, screen
Top-down, inside
screen inside

Accelerometer data
z

Accelerometer
data
z

30076

time (s)

30112
0

31123 30113
3
4
5
6

31124 30114 31125 30115 31126 30116 31127 30117 31128 30118
7
8
9 1 0 1 1 1 2time
1 3 (s)
14 15 16 17 18 19 20 21 22 23 24 25 26
time
(s)

30112

30113
331123
4
5
6

30114
30115
30116
30117
30118
731124
8
9 1 0 131125
1 1 2time
1 3 (s)
1 4 131126
5 1 6 1 7 1 8 131127
9 2 0 2 1 2 2 31128
23 24 25 26

time
(s)
freq
(Hz)

freq (Hz)

xy
FFT(z)
WPD(z)

10

xy
FFT(z)
WPD(z)

14
12
15
21 0

8
10
26

8
10
6

45
20

amplitude
g

g
amplitude
amplitude

Figure 1: Transformation from phone coordinate system to geographical coordinate system.


0
-5
1- 2

--140

-6
-15
-8

6
- 1- 0
-8
-- 1
15
0

-20
- 01 0
0

30112
1
2
0

30113
5
6

8 30114
9 1 0 1 1 1 2 30115
1 3 1 4 1 5 1 6 30116
1 7 1 8 1 9 2 0 30117
2 1 2 2 2 3 2 430118
25 26

time(Hz)
(s)
freq
scaling coeff

FFT(z)
WPD(z)
Delta rotation of phone
10

3
5

4
5

2
0
0
1
-2
-5
-4

30112

0 0 1

30113

30114

30115

30116

scaling
coeff
freq (Hz)

15
2

10

30117

30118

1 0 1 1 1 time
2 1 3 (s)
1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 41 2 5 2 6

FFT(z)
Delta rotation
of phone
WPD(z)

z
g

-2

-4

-6
-8

-10

-2

1880

-4

1886

1888

1890

1892

1894

xy

1-08

14
12
10

1880

1882

1884

1886

1888

1890

1892

1894

time (s)

2
0

6
4

xy

-2
14
-4
12
-6
10
-8
8
- 1 06

1884

time (s)

8
-10
6

Features: Signal Statistics

-2
-4

Features:
FFT
Fourier Transformation

-6
-8

1880

1882

1884

1886

1888

1890

1892

1894

-10

time (s)
Statistics

1882

Accelerometer data

-6

-2

1880

1882

1884

1886

1890

1892

1894

FFT(z)

-4

1888

time (s)

-6
-48
1882

1884

1886

1888

1890

1892

1894

time (s)

FFT(z)

-2

10
-4

-10

2
0

Peak mean

-2
1880

1882

1884

1886

1888

1890

1892

1894

xy
haar
daubechies{2,3,4,5,6,7,8,9,10,20}
01 0
0
2
4
6
8
10
12
14
16
18
20
Wavelet
Decomposition
biorthogonal{11,13,15,22,24,26,28,31,33,35,37,39,44,55,68}
8

Features:

10

Wavelet
22

12

15
1882
10

1880
24

1884

1886

- 65
-8
- 1 00

amplitude

DWT(z)
WPD(z)

16

18

20

1888

1890

1892

1894

time (s)

26

105
-2
- 140

14

22

24

26

WPD(z)

Max amplitude per bin


Energy per bin

-8
-10

freq (Hz)

freq (Hz)

6
4

-6

Peak distance

12

-4

time (s)

14

Equally sized bins

Hidden
Markov
Models
Y
Y
Z
Z

-5

0.343

0.7

0.15

-10

-1.522

0.62

2
0.26

-15

-3.771

-20

1880

-5

1882

1884

1886

1888

1890

1892

g 2
m/s

amplitude

Accelerometer data
1

-6
2
-8

amplitude

amplitude

stddev

Peaks
1880

mean 0
m/s
g 2

stddev

-10
2

1894

time (s)

0.05

FFT(z)

0.02

0.05

0.99

0.01

0.01

-7.595

0.86

0.13

0.14

scaling coeff

-4

-3.379

0.49

0.51

Delta rotation of phone

-6

-10
-15

-2

0.12

-0.457

0.53

0.72

0.06

0.22

-8

1.153

0.62

0.33

-10

-20
0

1880

1882

Degrees

amplitude

2.170

scaling coeff

Average Energy

Delta rotation of phone

Cumulative Approximate Energy

1884

1886

1888

1890

1892

0.32

1894

0.71

0.12 0.03

0.17

time (s)

3.134

0.76

0.22

xy of signal given an HMM


Likelihood

1.587

0.09

14
12

10
8
0

10

12

14

16

18

20

22

24

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895

Time

Degrees

Figure 2: Example features you could consider.

26

freq (Hz)

Signal without gravity (derived from signal)

-2
1

-4

WPD(z)

-6

inspiration). Useful features are for example simple statistics like average, variance or kurtosis,
average peakSignal
distance
or height, peaks in the frequency domain, energy in a wavelet component,
without gravity (derived from signal)
FFT(z)
HMM likelihood, . . . .
A file is not guaranteed to be pure walking data. Putting the phone in a pocket, turning in the
hallway or taking the phone out of a pocket are also part of the data. Therefore, it will be necessary
Gravity (derived from signal)
to device a method to extract windows of clean walking data. To achieve this you can, for example,
Delta rotation of phone
train a separate classifier to recognize walking in general and use this classifier for pre-processing
the data.
Gravity (derived from signal)
WPD(z)
Additionally, you also get a test data set where the walks are not labeled.
Your task is to predict
the labels for these walks. Extra points will be rewarded to the teams with the highest scores
(divided by the number of teams that achieve this score).
-8

01 5

-10

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895

1880

Time

10

amplitude
1

1884

1886

1888

1890

1892

1894

time (s)

-5

-10

amplitude

-15
-20

scaling coeff

1882

10

12

14

16

18

20

22

24

26

Degrees

freq (Hz)

15
10

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895

Time

amplitude

Quaternions (as given)

-5
-10

Tasks

-15

-20

scaling coeff

Quaternions (as given)

3.1

Signal without gravity (derived from signal)

Delta rotation of phone

Form Groups

By October 24th

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895

time (s)

Degrees

You can work on this project in groups of two and divide the work. Form the groups as quickly
as possible and email the names of the members of your group to davide.nitti@cs.kuleuven.be.
(derived from signal)
You work on thisGravity
assignment
alone or in team, do not share your experimental results or implementations with your fellow students.
0

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895

time (s)

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895

Time

Signal without gravity (derived from signal)

3.2

Literature Study

Familiarize yourself with the concept of machine learning for accelerometer data and activity
recognition. Get a feeling for what has been done before in this context, and what are the open
Quaternions (as given)
questions. Divide the work between the two people in the team.
0

Gravity (derived from signal)

1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895

time (s)
W

3.3

First Report

By November 7th

In a first phase of the project you explain in a brief report (about 2 pages) how you will address
these tasks. Describe what literature you have read and what you have learned from it. Give
examples of questions that you want to find answers for, and give an idea of how you intend
to find the answers (What data representation, features and machine learning techniques will
you use? How will you evaluate your classifier?). Please send your report in PDF format to
the aforementioned email address. The quality of this report influences your final score for the
project, so do your best to come up with a good plan. After the reports are handed in, you will get
feedback on your report, and, where necessary, we may give more concrete guidelines on how to
proceed.

3.4

Machine Learning

Use at least two different machine learning techniques to classify the data. Make sure that you
use a correct experimental setup to assess the accuracy of the classifier (e.g., cross-validation,
test/train split).

3.5

Analyze results

What is the accuracy you can achieve when applying cross-validation on the training data? Do
you understand why your parameters work? What is the computational cost of pre-processing,
learning and classifying? What is your prediction for the 10 unlabeled walks?

3.6

Final Report and Prototype

By December 12th

Report: Write a report with your findings.


Mail your final report, describing details of your approach and the results obtained:
Clearly state the questions you address with your research
Describe how you try to answer them (what experiments were performed, how do you
measure success, etc.),
Write out the conclusions you draw from your experiments together with a scientifically
supported motivation for these conclusions. Such a motivation should include descriptions
of used techniques and performed experiments and a report and discussion of the results
of these experiments. Be careful to report concrete numbers and custom formulas (avoid
conclusions using weasel words).
Report your prediction for the unlabeled walks.
Report the total time you spent on the project, and how it was divided over the different
tasks mentioned.
The total length of your report should be at most 10 pages.
Prototype: Include the code you used to perform the machine learning and experiments with
the final report. We will test unseen examples using your prototype to assess the accuracy. You
are free to use any mainstream programming language and any publicly available toolbox. Make
sure that:
The code is intuitive to run and includes a short README file with the necessary steps.
3

The default setting is to use the classifier you have found to be the best.
Your code is self-contained and includes all dependencies.
Your code is print-friendly (e.g., max 80 columns).
It is easy to give as input a path to a new file and let your software print the classification
(this will be used to test your software on an unseen walk).

3.7

Peer assessment

By December 12th, individually

Send by email a peer-assessment of your partners efforts. This should be done on a scale from
0-4 where 0 means I did all the work, 2 means I and my partner did about the same effort,
and 4 means My partner did all the work. Add a short motivation to clarify your score. This
information is used only by the professor and his assistants and is not communicated further.

3.8

Discussion

By December 19-20

There will be an oral discussion of your project report in week 13 where you will also get the
chance to demo your prototype.

Time
The time allocated for completing this project is 60 hours per person. This does not include
studying the Machine Learning course material, but it does include the literature study specific
for this projects topic.

Questions
Please direct any questions that you may have about the project to the Toledo forum.

Good luck!

References
[1]

Mark V. Albert, Konrad Kording, Megan Herrmann, and Arun Jayaraman. Fall Classification by Machine Learning Using Mobile Phones. In: PLoS ONE 7.5 (2012).

[2]

Jonathan Lester, Tanzeem Choudhury, Nicky Kern, Gaetano Borriello, and Blake Hannaford.
A Hybrid Discriminative/Generative Approach for Modeling Human Activities. In: Proceedings of IJCAI. 2005.

[3]

Stephen J Preece, John Yannis Goulermas, Laurence PJ Kenney, and David Howard. A
comparison of feature extraction methods for the classification of dynamic activities from
accelerometer data. In: Biomedical Engineering, IEEE Transactions on 56.3 (2009), pp. 871879.

[4]

Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Michael L Littman. Activity recognition from accelerometer data. In: Proceedings of AAAI. 2005.

You might also like