Human Activity Detection Using Machine Learning

Human Activity Detection based on Multiple Smartphone Sensors using Machine Learning
CHAPTER 1
INTRODUCTION
1.1 HUMAN ACTIVITY DETECTION
Human activity detection is regarded as one of the most important issues in smart home
Environments as it has a wide range of applications including healthcare, daily fitness
recording, and anomalous situations alerting. Some simple human activities (walking,
running, sitting, etc.) are essential for understanding complex human behaviors. Also,
the detection of activities such as walking and sitting can provide useful information
about a person's health condition. Nowadays, cell phone is not only a tool for
communication. Smart phone and smartphone-based applications are used everywhere.
Most smart phones have a tri-axial accelerometer and gyroscope as well as orientation
sensors. All these sensors can tell us the acceleration and rate of rotation on and around
three physical axes (X, Y, and Z). With these embedded sensors, we can identify different
kinds of activities instead of using other wearable sensors attached to human body. In this
project, we present a smart phone-based human activity detection system. The data are
collected by the embedded sensors. Some machine learning algorithms (J48, SVM, Naive
Bayes, and Multilayer Perceptron) are employed to make classifications. The system is
implemented in an Android smart phone platform.
Activity detection is the method of recognizing the different physical activities or the
action performed by the users from a set of observations recorded during the user
activities in the context of the decisive atmosphere. In the recent time we have observed
that the Human Activity Detection (HAR) serving the multiple challenging applications
developed on the increase in pervasive, wearable and persuasive computing. Human
Activity Detection (HAR) has turned out to be as the most important technology that is
changing the situation of daily routine of the people and hence contributing to a wide
range of applications. For example, rehabilitation assistive technology, health-care
domain especially in elder care support, diabetes, cognitive disorder and fitness tracking.
Moreover, the research in the activity detection has been so quick and advanced that it
started to serve as the applications that go beyond the activity detection.
Dept of CSE, NHCE 1

The classical workflow of a human activity detection system deals with data acquisition
from different sensors of the smartphones generated during each activity that are recorded
and sent to the remote system. This data is then processed to remove noise and to obtain a
cleaner and transformed data suitable for further processing. The data is then explored to
understand its nature and the type of processing that can be applied in the next stages. The
design of the later stages could vary vastly on the application and its domain. But
typically, the data is engineered to make it more appropriate by extracting more useful
features from it. Furthermore, the data is segmented on basis of the evaluation that needs
to be performed. Finally, a predictive model is created to identify the activities performed
by the user and is evaluated for its performance. This detected activity can be repurposed
for various applications as detecting change in the user activity, assistance in regard to the
detected activity etc.
1.2 OBJECTIVES OF THE PROPOSED PROJECT WORK

The following project on “Human Activity Detection based on Multiple Smart Phone
Sensors using Machine Learning” will help in understanding the complex human
behaviour by recognizing the different activity performed by the users. Understanding
people's actions and their interaction with the environment is a key element for the
development of the aforementioned intelligent systems. Human Activity Recognition is a
field that specifically deals with this issue through the integration of sensing and
reasoning, in order to deliver context-aware data that can be employed to provide
personalized support in many applications. The project will take data of the different
activity as the input and based on the data, results will be formed. The data that has been
provided to the code is analyzed using the machine learning algorithms provided. Using
those algorithms, the conclusions are generated the statistical results are formed with the
group or classes assigned to each of the parameters. Using the help of this analysis the
user will get to know what physical action is being performed by him/her and in case of
the elderly people fall detection will also be recognized and the system it will send an
emergency message to the guardian informing about this accident. To prepare this project
machine learning algorithms are used and the system is being implemented in Android
smartphones.
Dept of CSE, NHCE 2

1.3 PROJECT DEFINITION

Human Activity Detection is regarded as one of the most important issues in making the
people’s life smarter and easier and has a wide range of applications in health-care
domain, daily fitness recording and anomalous situation alerting. Some simple human
activities like walking, running, sitting, etc. are quite essential in understanding the
complex human behavior. Also, the detection of these activities can provide all the
necessary and useful information regarding the health condition of a person.
Human activity detection has been widely studied in the literature. Some methods and
prototypes have been developed according to the different kinds of information and
algorithms. The data used for detection include video data or sensor data (acceleration,
orientation, and angular speed, etc.). The sensors could be on-board smart phone sensors
or sensors installed in wearable devices. In addition, different methods and algorithms can
be used for activity detection. In probability-based algorithms to build activity models are
studied. The hidden Markov model (HMM) and the conditional random field (CRF) are
among the most popular modelling techniques.
1.4 PROJECT FEATURES

Our scheme is to develop and design a light weight and accurate system on a smartphone
that can recognize the different human activities. Moreover, to reduce the computational
time, storage and burden overhead supervised learning models are being developed.
Through testing and comparing different supervised learning algorithm, we find the one
that is best fit for our system in terms of accuracy and efficiency on a smartphone and
also in implementing in real life scenario.
We monitor the changes of accelerometer, orientation, gyroscope, and linear acceleration

values. And every 20 times the sensors' values are changed, the 20th sensors' values are
recorded. The sampling frequency is about 20Hz. In addition, four-time domain features
(maximum, minimum, mean, and standard deviation of every 20 values) are computed
and stored. So, a fixed window length without overlapping is used in the feature
extraction stage. For each sensor, we regard its three axes as three individual features. As
we monitor four sensors and there are five-time domain features for each axis as well as
Dept of CSE, NHCE 3

three axes for each sensor, a total of 60 features are extracted for each instance. The
activity types are labelled manually in each file.
The main features of the proposed system are:

 More efficient.
 Better activity detection systems.
 Reduced the time complexity of the system.
 Has a simpler architecture to understand for new users.
 Processing of large chunks of data becomes easier.
1.5 MODULES DESCRIPTION

1.5.1 DATA ACQUISITION
Data acquisition involves accelerometer reading which is generated during each activity
are recorded and sent to the remote system. Here, the signals are detected from the
embedded smartphone sensors like tri-axial linear accelerometer, gyroscope and
orientation sensors to infer the different human activities. To collect the data using these
sensors the user puts a smartphone in his/her pocket or hand performs some activities.
Preliminary data collection and analyses show no difference when the smart phone is
placed in different orientations in one pocket or in different pockets. On the other hand,
this would make the smart phone-based detection system simpler and more practical.
Therefore, the subject does not need to care about how to place the smart phone in his/her
pockets during the data collection. An android application is running during the
experiments. Sensor data are saved as CSV files. In order to increase the classification
accuracy, data recorded in the beginning and end of each activities are trimmed from the
data files. Due to the Android APIs, we record the sensor data when the sensor values are
changed every 20 times.
1.5.2 DATA PREPROCESSING
Data preprocessing involves the removal of noise and normalization of data to obtain a
cleaner and transformed data suitable for further processing. It is basically a technique
which is used to convert the raw data into and understandable format. Whenever the data
is collected from different sources it is being collected in raw format which is not feasible
Dept of CSE, NHCE 4

enough for the analysis as it often incomplete, inconsistent and is likely to contain many
errors. Data preprocessing helps in resolving such issues.
1.5.3 SEGMENTATION
After the data preprocessing comes the segmentation where the data is being segmented
based on the window size. Here, window size refers to the time interval between
each activity.
1.5.4 FEATURE EXTRACTION
In feature extraction, due to a large number of features, the array is reduced before
classifier construction. This step involves selection of the most significant features
for classification. Here, we monitor the changes in linear accelerometer, gyroscope
and orientation sensors values. The values are changes after a fixed time interval and
are recorded and we are finding out the relation within the data values maximum,
minimum, standard deviation and the mean. Hence the data values are computed
and stored. So, a fixed window size without overlapping is used in the feature
extraction.
1.5.5 ACTIVITY RECOGNISITION
Here, we are using different classifier algorithms (Naïïve Bayes and Multilayer
Perceptron, SVM) are used to predict the different activities performed by the user.
Dept of CSE, NHCE 5

CHAPTER 2
LITERATURE SURVEY
2.1 HUMAN ACTIVITY DETECTION
Literature survey is the most important step in software development process.
Before developing the tool, it is necessary to determine the time factor, economy and
company strength. Once these things are satisfied, then next steps are to determine
which operating system and language can be used for developing the tool. Once the
programmers start building the tool the programmers need lot of external support.
This support can be obtained from senior programmers, from book or from
websites. Before building the system, the above consideration is taken into account
for developing the proposed system. We have to analyze the Human Activity
Detection Survey:
Human Activity Detection

Human activity recognition has gained much importance in the past decade due to
its numerous applications in human centric applications as in the field of medical,
security and also military (D. Lara & Labrador, 2013). In recent times, due to the
increase of wearable tech devices, the task of human activity recognition has gained
much more gravitas1. An important goal of the HAR in the current scenario is to
identify the actions of the user in order to assist them with their tasks with the help
of computing systems Abowdet al. (1998). The initial research on HAR involved
detecting gestures and activities from still images and videos in restricted
environments and under constrained settings Turaga et al. (2008); A significant
number of domains have been discovered to benefit due to HAR as in the case of
Activities of Daily Living (ADL’s) by Katz et al. (1970), which was one of the initial
researchers performed as an application of activity recognition, which further
boosted the research as by Bao & Intille (2004); Ravi et al. (2005); The traditional
medical procedures were challenged by introducing the HAR to support patients’
daily activity monitoring especially for patients with chronic impairments or other
medical diagnosis or even for rehabilitation (Starner et al., 1997; J. Chen et al., 2006;
Dept of CSE, NHCE 6

Oliver & Flores-Mangas, 2007; Bachlin et al., 2009; Tessendorf et al., 2011). HAR also
provided great results for other areas of lesser severity as the entertainment and
sports category (Kunze et al., 2006; Minnen et al., 2006; Ladha et al., 2013), the
industrial and operations sector (Maurtua et al., 2007; Stiefmeier et al., 2008). HAR
was further explored to cater naive human activities as transportation routines
(Krumm & Horvitz, 2006), brushing teeth (Lester et al., 2006) and medication intake
(Wan, 1999; De Oliveira et al., 2010). One of the most recent and popular usages of
human activity recognition was for gaming consoles as Microsoft Kinect where body
gestures and movements are recognized to provide an upgraded gaming experience
(Shotton et al., 2013).
2.2 EXISTING SYSTEM

First and Foremost is the use of Body Worn sensors. The older HAR systems
comprises of different body worn sensors and these are tightly coupled with
different parts of the human body to acquire the activity of a person. Practically, it is
not possible to carry an external device with you and also people tend to forget to
wear these devices.
Use of multiple Sensors to achieve the same goal which makes the application bulky
leading to slower processing of the data and also affects its cost.
In most of the application Positioning of the device is the concerned with the success
of the application i.e. Most of the application build are position specific device. If the
device kept in hands then the values will be different from the values generated
when the device is kept in pocket.
2.3 PROPOSED SYSTEM

The system will be able to identify the different activity based on the data generated
from tri-axial accelerometer on all the axes (x, y, z) during each activity like walking,
sitting, standing etc. and each activity is labeled. To deploy the system various
machine learning algorithms are used to fulfil our purpose. The working of
deployment will be as follows:
Dept of CSE, NHCE 7

 Firstly, the data generated from the different sensors are taken as an input to
the system for the analysis
 After the data is provided by the user the code comes into action and from
that machine learning algorithms and tools will come into action.
 The first step will be the preparation of the data for the analysis purpose
which also involves cleaning of the data and discarding the data with null or
no values.
 After the data is prepared for the analysis purpose the data is segmented
further according the window size. Here the window size is the time interval
between each activity.
 After the data is being segmented the data is further sent for Feature
Extraction where we find out the relationship between the data values. (MAX,
MIN, MEAN, Standard Deviation).
 And finally, with the help of different classifier algorithm the system will be
able to detect the human activity.
2.4 SOFTWARE DESCRIPTION
2.4.1 Python
Python is a popular platform used for research and development of production

systems. It is a vast language with number of modules, packages and libraries that
provides multiple ways of achieving a task.
Python and its libraries like NumPy, SciPy, Scikit-Learn, Matplotlib are used in data
science and data analysis. They are also extensively used for creating scalable
machine learning algorithms. Python implements popular machine learning
techniques such as Classification, Regression, Recommendation, and Clustering.
Python offers ready-made framework for performing data mining tasks on large
volumes of data effectively in lesser time. It includes several implementations
achieved through algorithms such as linear regression, logistic regression, Naïïve
Bayes, k-means, K nearest neighbor, and Random Forest.
Dept of CSE, NHCE 8

2.4.2 Python in Machine Learning
Python has libraries that enables developers to use optimized algorithms. It

implements popular machine learning techniques such as recommendation,
classification, and clustering. Therefore, it is necessary to have a brief introduction
to machine learning before we move further.
2.4.3 What is Machine Learning?
Data science, machine learning and artificial intelligence are some of the top
trending topics in the tech world today. Data mining and Bayesian analysis are
trending and this is adding the demand for machine learning. This tutorial is your
entry into the world of machine learning.
Machine learning is a discipline that deals with programming the systems so as to

make them automatically learn and improve with experience. Here, learning implies
recognizing and understanding the input data and taking informed decisions based
on the supplied data. It is very difficult to consider all the decisions based on all
possible inputs. To solve this problem, algorithms are developed that build
knowledge from a specific data and past experience by applying the principles of
statistical science, probability, logic, mathematical optimization, reinforcement
learning, and control theory.
2.4.4 Application of Machine Learning Algorithms
The developed machine learning algorithms are used in various applications such as
–
 Vision processing
 Language processing
 Forecasting things like stock market trends, weather
 Pattern recognition
 Games
 Data mining
 Expert systems
Dept of CSE, NHCE 9

 Robotics
2.4.5 Steps Involved in Machine Learning
A machine learning project involves the following steps −
 Defining a Problem
 Preparing Data
 Evaluating Algorithms
 Improving Results
 Presenting Results
The best way to get started using Python for machine learning is to work through a
project end-to-end and cover the key steps like loading data, summarizing data,
evaluating algorithms and making some predictions. This gives you a replicable
method that can be used dataset after dataset. You can also add further data and
improve the results.
2.5 ENVIRONMENTAL SETUP
2.5.1 Libraries and Packages
To understand machine learning, you need to have basic knowledge of Python

programming. In addition, there are a number of libraries and packages generally
used in performing various machine learning tasks as listed below –
numpy − is used for its N-dimensional array objects
NumPy is the fundamental package for scientific computing with Python. It contains
among other things:
 a powerful N-dimensional array object

 sophisticated (broadcasting) functions
 tools for integrating C/C++ and Fortran code
Dept of CSE, NHCE 10

 useful linear algebra, Fourier transform, and random number capabilities
pandas − is a data analysis library that includes data frames.
Pandas is an open-source Python Library providing high-performance data

manipulation and analysis tool using its powerful data structures. The name Pandas
is derived from the word Panel Data – an Econometrics from Multidimensional data.
matplotlib − is 2D plotting library for creating graphs and plots.
Matplotlib is a Python 2D plotting library which produces publication quality figures

in a variety of hardcopy formats and interactive environments across platforms.
Matplotlib can be used in Python scripts, the Python and IPython shells,
the Jupyter notebook, web application servers, and four graphical user interface
toolkits.
scikit-learn − the algorithms used for data analysis and data mining tasks
Scikit-learn provides a range of supervised and unsupervised learning algorithms

via a consistent interface in Python. It is licensed under a permissive simplified BSD
license and is distributed under many Linux distributions, encouraging academic
and commercial use. Some popular groups of models provided by scikit-learn
include:
 Clustering: for grouping unlabeled data such as K-Means.

 Cross Validation: for estimating the performance of supervised models on
unseen data.
 Datasets: for test datasets and for generating datasets with specific
properties for investigating model behavior.
 Dimensionality Reduction: for reducing the number of attributes in data for
summarization, visualization and feature selection such as Principal
component analysis.
 Ensemble methods: for combining the predictions of multiple supervised
models.
 Feature extraction: for defining attributes in image and text data.

 Feature selection: for identifying meaningful attributes from which to create

supervised models.
 Parameter Tuning: for getting the most out of supervised models.
 Manifold Learning: For summarizing and depicting complex multi-
dimensional data.
 Supervised Models: a vast array not limited to generalized linear models,
discriminate analysis, naive Bayes, lazy methods, neural networks, support
vector machines and decision trees.
seaborn − a data visualization library based on matplotlib.
One of the best but also more challenging ways to get your insights across is to
visualize them: that way, you can more easily identify patterns, grasp difficult
concepts or draw the attention to key elements. When you’re using Python for data
science, you’ll most probably will have already used Matplotlib, a 2D plotting library
that allows you to create publication-quality figures. Another complimentary
package that is based on this data visualization library is Seaborn, which provides a
high-level interface to draw statistical graphics.
Install Anaconda
To install Python and other scientific computing and machine learning packages
simultaneously, we should install Anaconda distribution. It is a Python
implementation for Linux, Windows and OSX, and comprises various machine
learning packages like numpy, scikit-learn, and matplotlib. It also includes Jupyter
Notebook, an interactive Python environment. We can install Python 2.7 or any 3.x
version as per our requirement.
To download the free Anaconda Python distribution from Continuum Analytics, you
can do the following −
Visit the official site of Continuum Analytics and its download page. Note that the
installation process may take 15-20 minutes as the installer contains Python,

associated packages, a code editor, and some other files. Depending on your
operating system, choose the installation process as explained here –
For Windows − Select the Anaconda for Windows section and look in the column
with Python 2.7 or 3.x. You can find that there are two versions of the installer, one
for 32-bit Windows, and one for 64-bit Windows. Choose the relevant one.
For Mac OS − Scroll to the Anaconda for OS X section. Look in the column with
Python 2.7 or 3.x. Note that here there is only one version of the installer: the 64-bit
version.
For Linux OS − We select the "Anaconda for Linux" section. Look in the column with
Python 2.7 or 3.x.
Note that you have to ensure that Anaconda’s Python distribution installs into a
single directory, and does not affect other Python installations, if any, on your
system.
To work with graphs and plots, we will need these Python library packages
-matplotlib and seaborn.
If you are using Anaconda Python, your system already has numpy, matplotlib,
pandas, seaborn, etc. installed. We start the Anaconda Navigator to access either
Jupyter Note book or Spyder IDE of python.
2.6 MACHINE LEARNING ALGORITHM
2.6.1 Classification vs. Regression
Fig 1. Machine Learning Algorithms

REGRESSION IN MACHINE LEARNING
In machine learning, regression algorithms attempt to estimate the mapping

function (f) from the input variables (x) to numerical or continuous output variables
(y). In this case, y is a real value, which can be an integer or a floating-point value.
Therefore, regression prediction problems are usually quantities or sizes.
For example, when provided with a dataset about houses, and you are asked to
predict their prices, that is a regression task because price will be a continuous
output.
CLASSIFICATION IN MACHINE LEARNING
On the other hand, classification algorithms attempt to estimate the mapping

function (f) from the input variables (x) to discrete or categorical output variables
(y). In this case, y is a category that the mapping function predicts. If provided with a
single or several input variables, a classification model will attempt to predict the
value of a single or several conclusions.
For example, when provided with a dataset about houses, a classification algorithm
can try to predict whether the prices for the houses “sell more or less than the
recommended retail price.” Here, the houses will be classified whether their prices
fall into two discrete categories: above or below the said price.
Examples of the common classification algorithms include logistic regression, Naïïve

Bayes, decision trees, and K Nearest Neighbors.
Naïve Bayes Algorithm
It is a classification technique based on Bayes Theorem with an assumption of

independence among predictors. In simple terms, a Naive Bayes classifier assumes
that the presence of a particular feature in a class is unrelated to the presence of any
other feature. For example, a fruit may be considered to be an apple if it is red,
round, and about 3 inches in diameter. Even if these features depend on each other
or upon the existence of the other features, all of these properties independently

contribute to the probability that this fruit is an apple and that is why it is known as
‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large data sets.
Along with simplicity, Naive Bayes is known to outperform even highly sophisticated
classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c),
P(x) and P(x|c). Look at the equation below:
Fig 2. Naïve Bayes Algorithm
 P(c|x) is the posterior probability of class (c, target) given predictor (x,
attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.
How Naive Bayes algorithm works?
Let’s understand it using an example. Below I have a training data set of weather and
corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we
need to classify whether players will play or not based on weather condition. Let’s
follow the below steps to perform it.
Step 1: Convert the data set into a frequency table.

Step 2: Create Likelihood table by finding the probabilities like Overcast probability
= 0.29 and probability of playing is 0.64.
Table 1: Sample Dataset
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome of
prediction.
Problem: Players will play if weather is sunny. Is this statement being correct?
We can solve it using above discussed method of posterior probability.
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P(Yes)= 9/14 =
0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different class based
on various attributes. This algorithm is mostly used in text classification and with
problems having multiple classes.
What are the Pros and Cons of Naive Bayes?
Pros:

 It is easy and fast to predict class of test data set. It also performs well in
multi class prediction
 When assumption of independence holds, a Naive Bayes classifier performs
better compare to other models like logistic regression and you need less
training data.
 It performs well in case of categorical input variables compared to numerical

variable(s). For numerical variable, normal distribution is assumed (bell
curve, which is a strong assumption).
Cons:
 If categorical variable has a category (in test data set), which was not
observed in training data set, then model will assign a 0 (zero) probability
and will be unable to make a prediction. This is often known as “Zero
Frequency”. To solve this, we can use the smoothing technique. One of the
simplest smoothing techniques is called Laplace estimation.
 On the other side naive Bayes is also known as a bad estimator, so the
probability outputs from predict_proba are not to be taken too seriously.
 Another limitation of Naive Bayes is the assumption of independent

predictors. In real life, it is almost impossible that we get a set of predictors
which are completely independent.
Naive Bayes classifier is based on the Bayesian theorem, which says p(cjld) =
p(dicj)p(Cj)/p(d). p(cjld) is the probability of instance d being in class Cj. p(d icj) is
the probability of instance d given in class Cj. p(Cj) is the probability of occurrence of
class Cj. p(d) is the probability of instance d occurring. The advantages of Naive
Bayes classifier are that it is fast to train and classify. However, it assumes
independence of features. 4) Multilayer Perceptron A multilayer perceptron (MLP) is
a feedforward artificial neural network, which usually contains three or more layers.
For node i, which is connected to node j in its following layer, has a weight wij' A MLP
is trained through backpropagation. And the weights are updated according to the
amount of error between the output and the desired output.

SVM
As a supervised machine learning algorithm, the support vector machine (SVM) can
construct a hyperplane and separate two classes, which tries to maximize the margin
between the nearest data points in each class. The hyperplane can be stated as wT x
+ Wo = O. The two classes are labeled as -1 and 1. Let {xil be the training set, and {yil
represents the output of SVM for a new point Xi' SOYi E {-ll,}. If wT Xo + Wo > 0, then
Yi = l. If wT Xo + Wo < 0, then Yi = -1. To determine the hyperplane, we have to solve
the equation that maximizes the distance between the closest points of each class:
arg max min {lIx - Xi II: x E Rd, wT X + Wo = OJ, where the variables are wand wo, i is
from 1 to N. Our problem has more than two classes. So before applying SVM, we
have to extend the standard SVM to multi class SVM. There are two strategies, one-
against-one and one-against-all strategy, to deal with multiclass situation.
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can
be used for both classification or regression challenges. However, it is mostly
used in classification problems. In this algorithm, we plot each data item as a point in n-
dimensional space (where n is number of features you have) with the value of each
feature being the value of a particular coordinate. Then, we perform classification by
finding the hyper-plane that differentiate the two classes very well (look at the below
snapshot).

Fig 3. Support Vectors
How does it work?
Above, we got accustomed to the process of segregating the two classes with a
hyper-plane. Now the burning question is “How can we identify the right hyper-
plane?”.
 Identify the right hyper-plane: Here, we have three hyper-planes (A, B and
C). Now, identify the right hyper-plane to classify star and circle. You need to
remember a thumb rule to identify the right hyper-plane: “Select the hyper-
plane which segregates the two classes better”. In this scenario, hyper-plane
“B” has excellently performed this job.
Fig 4. Identifying Hyper Plane
 Identify the right hyper-plane: Here, we have three hyper-planes (A, B and
C) and all are segregating the classes well. Now, how can we identify the right
hyper-plane?

Fig 5. Identifying Hyper Plane and Segregating Classes
Here, maximizing the distances between nearest data point (either class) and hyper-
plane will help us to decide the right hyper-plane. This distance is called as Margin.
Fig 6. Identifying Hyper Plane and Selecting Margins
Above, you can see that the margin for hyper-plane C is high as compared to both A
and B. Hence, we name the right hyper-plane as C. Another lightning reason for
selecting the hyper-plane with higher margin is robustness. If we select a hyper-
plane having low margin then there is high chance of miss-classification.
2.7 MATERIALS AND METHODS

Information based learning – Under the information-based family of machine
learning, decision trees will be modeled. When modeling the decision trees, each and

every parameter can be modified to create an apt classification model. The first
parameter to be provided will be the target and the selected independent variables.
Another important parameter is the splitting criteria which can either take the gini
impurity or the information gain value to split the data into partitions. As the current
study deals with a classification problem, the method must be specified as
appropriate. There are several controlling parameters which may be used to enhance
or restrain the tree growth, the best approach would be to perform a trial and error
procedure to choose the suitable parameters. The resultant decision tree can be
inspected in several ways as graph of the decision tree plot, summary, cross
validation results etc.
Similarity based learning – For K-nearest neighbor algorithm, the independent

variables and target feature must be specified to help the algorithm differentiate.
The important aspect of the KNN algorithm is the ’K’ value, which is the number of
neighbors to consider. The experiment must be carried out with a ’tune length’ set to
various values of ’K’ to find the one that fits in best. The default distance used in KNN
is the Euclidean distance. The final value of the target is decided by the majority vote
of the K nearest neighbors in terms of the Euclidean distance .
Probability based learning – The Naive Bayes algorithm implementation takes in

the independent and the dependent features as the first parameters. The algorithm
can work with the data with default parameters alone; however, there are multiple
other parameters that may be provided to customize it better. Laplacian correction
can be used to smooth the categorical data which can be provided with a numerical
value, the distribution type can be altered by editing the Boolean value of ’use kernel’
and the total bandwidth can also be adjusted.
Error based learning – Multinomial logistic regression can be implemented by

using a penalized logistic regression technique. The important parameter is the
MaxNwts which must be set high enough for the algorithm to function. The summary
of the logistic regression gives out the coefficient and the standard error for each of
the independent features. Additionally, it also highlights significant values out of all
the independent features.

Artificial neural networks are complex structures that can be easily map to number
of varied datasets. The size parameter plays a significant role of specifying number
of units in a hidden layer and it will take a single or a series of numbers to run the
model with. The best way to choose the number of hidden units as suited by the data
is to run multiple experiments and identifying the right number.
CHAPTER 3

REQUIREMENT ANALYSIS
3.1 PURPOSE
1) Framework for Activity detection of Human, by applying Machine Learning
Algorithm.
2) Ascertain the type of actions being performed.
3.1.1 Scope
 Widening the horizon for health-domain in the form of human based

applications, which are the most important issues in smart home
environments.
 Classifying the type of activity or the action being performed.
 Finding the details of the activity.
3.2 FUNCTIONAL REQUIREMENTS

The functional requirements of a system are mainly the behavior of the system when
a certain group of inputs are given evaluated through the system software. The
functional requirements of the system are:
 The system should be capable of taking data input from the user.
 The system should be capable of data preprocessing before analysis of data.
 The system should be able to analyze the data.
 The system should be able to group data based on patterns.
 The system should be able to assign parameters based on data groups.
 The system should be able to train the model using the training sets and
assign labels to the parameter.
 The system must validate trained model using test set as well as labeled data.
 The system should be able to classify the data based on the labels.

 The system should be able to accurately classify the activity being performed
by the user.
3.2 NON-FUNCTIONAL REQUIREMENTS

The nonfunctional requirements describe how a system must behave and establish
constraints of its functionality. This type of requirements is also known as the
system’s quality attributes. Attributes such as performance, security, usability,
compatibility are not the feature of the system, they are a required characteristic.
They are "developing" properties that emerge from the whole arrangement and
hence we can't compose a particular line of code to execute them. Any attributes
required by the customer are described by the specification. We must include only
those requirements that are appropriate for our project.
Some Non-Functional Requirements are as follows:
 Reliability
 Maintainability
 Performance
 Portability
 Scalability
 Flexibility
3.3 HARDWARE REQUIREMENTS
Processor : Any Processor above 500 MHz

RAM : More than 2 GB
Hard Disk : 10 GB
Input device : Standard Keyboard and Mouse
Output device : High Resolution Monitor

3.4 SOFTWARE REQUIREMENTS
• Operating system : Windows

• IDE : Python IDLE
• Data : Human Activity Recognition
• Modules : Numpy, Pandas, matplotlib, sklearn

CHAPTER 4
DESIGN
4.1 GENERAL STRUCTURE
A system architecture or systems architecture is the conceptual model that defines
the structure, behavior, and more views of a system.
4.2 SYSTEM ARCHITECTURE
Fig 7. System Architecture
The architecture diagram shown above shows the details of system which is been
carried in our proposed work, activity is being sensed first. Then these activities are
then filtered for the removal of noise and normalization of data. After that in feature
extraction, we monitor the changes in linear accelerometer, gyroscope and
orientation sensors values. The values are changes after a fixed time interval and are
recorded and we are finding out the relation within the data values maximum,
minimum, standard deviation and the mean. Hence the data values are computed and
stored. So, a fixed window size without overlapping is used in the feature extraction.
And after that the based on the classification algorithms the data is being trained
first in the training phase and then from the learned model the activity is prediction
is made and the predicted activity is being recognized in the testing phase.

4.3 DATA FLOW DIAGRAM
Data Collection from

Tri-axial
Accelerometer sensor
Raw Data
Data Filtering for

Noise Removal
Filtered Data
Data Segmentation
based on Window
Size Segmented Data
Feature Extraction
(MAX, MIN, MEAN and
standard Deviation)
Features
Classifier Algorithms
Sitting, Walking, Standing etc.

Activity Detection
Fig 8. Data Flow Diagram
The above diagram shows us the flow of the program. The data is first collected from
linear accelerometer and then the raw data is filtered and then it is segmented and
then data is sent for feature extraction where the relation between the data value is
found. After that the classifier algorithms like Naïïve Bayes, Multilayer Perceptron
and SVM is being applied and the corresponding activity of the human being is
detected.

CHAPTER 5
CONCLUSION
The phase 1 was a learning phase in which we as students got to know a lot of things.
This was a very nice exposure to learn a lot of new concepts. Phase 1 was more like a
learning experience for the domain of statistics and machine learning. The main
aspect of the major project was seen in this phase, a lot of learning and understanding
has gone into doing this. The phase 2 will be a closer approach to the project where
the actual implementation takes place. Human Activity Detection is a very crucial topic
to deal with and making a system suitable for any System will be a challenging role to
do. The system should be efficient enough to help predict the outcome of daily human
activities performed by the users in order to understand the most complex human
behaviour.
At the end of the entire term we will be done with understandings, learning and
implementing a fully working system.

REFERENCES
1 Z. Zhou, X. Chen, Y. Chung, Z. He, T.X. Han, J.M. Keller, "Activity Analysis,
Summarization, and Visualization for Indoor Human Activity Monitoring, " IEEE
Transactions on Circuits and Systems for Video Technology, 18(11), 1489-1498,
2008.
2 M. Zhang, A.A. Sawchuk, "USC-HAD: a daily activity dataset for ubiquitous

activity recognition using wearable sensors, " ACM Conference on Ubiquitous
Computing (UbiComp'12), pp. 1036,1043, 2012.
3 J. Cho, J.T. Kim, T. Kim, "Smart Phone-based Human Activity Classification and
Energy Expenditure Generation in Building Environments, " 7th International
Symposium on Sustainable Healthy Buildings (SHB'2012), 2012.
4 M. Boyle, A. Klaustner, D. Starboinski, A. Trachtenberg, and H. Wu, "Poster: gait-

based smartphone user identification, " the 9th International Conference on
Mobile Systems, Applications, and Services (MOBISYS'II), pp. 395,396, 2011.
5 S. Chen, Y. Huang, "Recognizing human activities from multi-modal sensors, "

IEEE International Conference on Intelligence and Security Informatics (ISI'09),
pp. 220,222, 2009.
6 G. Fortino, R. Giannantonio, R. Gravina, P. Kuryloski, R. Jafari, "Enabling Effective

Programming and Flexible Management of Efficient Body Sensor Network
Applications, " IEEE Transactions on Human-Machine Systems 43(1), 115-\33,
20\3.

7 S. Fang, Y. Liang, K. Chiu, "Developing a mobile phone-based fall detection

system on Android platform, " Computing, Communications and Applications
Conference (ComComAp), pp. 143,146, 2012.

Human Activity Detection Using Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Human Activity Detection Using Machine Learning

Uploaded by

Copyright:

Available Formats

Human Activity Detection based on Multiple Smartphone Sensors using Machine Learning

1.1 HUMAN ACTIVITY DETECTION

Dept of CSE, NHCE 1

1.2 OBJECTIVES OF THE PROPOSED PROJECT WORK

Dept of CSE, NHCE 2

1.3 PROJECT DEFINITION

1.4 PROJECT FEATURES

We monitor the changes of accelerometer, orientation, gyroscope, and linear acceleration

Dept of CSE, NHCE 3

The main features of the proposed system are:

1.5 MODULES DESCRIPTION

1.5.2 DATA PREPROCESSING

Dept of CSE, NHCE 4

1.5.4 FEATURE EXTRACTION

1.5.5 ACTIVITY RECOGNISITION

Dept of CSE, NHCE 5

Human Activity Detection

Dept of CSE, NHCE 6

2.2 EXISTING SYSTEM

2.3 PROPOSED SYSTEM

Dept of CSE, NHCE 7

2.4 SOFTWARE DESCRIPTION

Python is a popular platform used for research and development of production

Dept of CSE, NHCE 8

2.4.2 Python in Machine Learning

Python has libraries that enables developers to use optimized algorithms. It

2.4.3 What is Machine Learning?

Machine learning is a discipline that deals with programming the systems so as to

2.4.4 Application of Machine Learning Algorithms

 Forecasting things like stock market trends, weather

Dept of CSE, NHCE 9

2.4.5 Steps Involved in Machine Learning

A machine learning project involves the following steps −

2.5 ENVIRONMENTAL SETUP

2.5.1 Libraries and Packages

To understand machine learning, you need to have basic knowledge of Python

numpy − is used for its N-dimensional array objects

 a powerful N-dimensional array object

 tools for integrating C/C++ and Fortran code

Dept of CSE, NHCE 10

 useful linear algebra, Fourier transform, and random number capabilities

pandas − is a data analysis library that includes data frames.

Pandas is an open-source Python Library providing high-performance data

matplotlib − is 2D plotting library for creating graphs and plots.

Matplotlib is a Python 2D plotting library which produces publication quality figures

Scikit-learn provides a range of supervised and unsupervised learning algorithms

 Clustering: for grouping unlabeled data such as K-Means.

Dept of CSE, NHCE 11

 Feature selection: for identifying meaningful attributes from which to create

seaborn − a data visualization library based on matplotlib.

Dept of CSE, NHCE 12

2.6 MACHINE LEARNING ALGORITHM

2.6.1 Classification vs. Regression

Fig 1. Machine Learning Algorithms

Dept of CSE, NHCE 13

REGRESSION IN MACHINE LEARNING

In machine learning, regression algorithms attempt to estimate the mapping

CLASSIFICATION IN MACHINE LEARNING

On the other hand, classification algorithms attempt to estimate the mapping

Examples of the common classification algorithms include logistic regression, Naïïve

Naïve Bayes Algorithm

It is a classification technique based on Bayes Theorem with an assumption of

Dept of CSE, NHCE 14