You are on page 1of 7

10EE35001 1

Ninth Semester Project Work Autumn


Report

Action Recognition using pose estimation for YouTube


videos
Ashwani Gupta
(10EE35001)

Supervisor: Prof.Rajiv Ranjan Sahay


Project Field: Image Processing
Project Type: Development & Computer Software

10EE35001 2

Project Work for 7-8th Semester


Bag of Visual Words for Person Detection
This project aims to develop a system called the Bag of visual words for
visual classification, to identify persons and will develop a system that, can
identify any given person in a crowd from a database of a given a number
of images, based on facial features. The system uses Viola Jones Face
Detector to detect face of the subject in a crowd on which Affine Scale
Invariant Feature Transform (ASIFT) descriptors are applied to extract
features of the subject face. The firstly extracted interest points by
difference of Gaussian (DoG) are then processed with k-means clustering
based on which a visual vocabulary is generated. The generated visual
vocabulary is then saved as a bag of visual words which is used to find the
nearest match test image from these visual vocabulary. Test images are fed
to the system and based on the least sum of squared error distance for a
given threshold a match is found for a given person.
Classification Concept
Bag of Visual Words
Descriptors Used for Image Learning:
SIFT
ASIFT
Algorithms Used:
Viola Jones Face Detector
AdaBoost
K-means Clustering
K-NN Neighbour Search.
SVM

10EE35001 3

Project Work for Semester 9th


Action Recognition using Pose Estimation for YouTube videos

This project aims to develop a methodology to identify and recognize different action preformed in
various videos on YouTube which display an expression in each action performed or a sports like
gymnastics which a performances of exercises with strict postures & moves.
Bharata Natyam dance consists of a series of postures called Karanas.Karanas are the 108 key
transitions in the classical Indian dance described in Natya Shastra. Karana is a Sanskrit verbal noun,
meaning "doing".
Gymnastics is a very complex competition involving the performance of exercises requiring physical
strength, flexibility, power, agility, coordination, grace, balance and control. It typically involves the
women's events of uneven bars, balance beam, floor exercise, and vault.

Fig. 1. A variant of Vrscikakuttitam karana

Fig.2 Gymnast doing a stag leap on floor exercise

Objective
Our prime objective is to estimate the pose of the key subject from 2D image frames of a video and
to recognize the action performed using machine learning techniques.

10EE35001 4

2D articulated human pose estimation for Bharat Natyam


Our aim is to detect and estimate 2D human pose in video, i.e. recover a distribution over the spatial
conguration of body parts in every frame of a shot. Our aim is to detect and estimate 2D human
pose in video, i.e. recover a distribution over the spatial conguration of body parts in every frame
of a shot.

(a)

(b)

(c)

Fig 3.(a)Input Image(b)Soft labelling of Pixels to body parts or background. Red indicates torso, green
upper arms, blue lower arms and head. Brighter pixels are more likely to belong to a part.(c)
Stickman representation of pose, obtained by fitting straight line segments to the segmentations in
(b). For enhanced visibility, the lower arms are in yellow and the head is in purple.

The exact image regions covered by the parts has to be found. For estimating 2D pose in individual
video frames, we used the image parsing technique of Ramanan.
Image parsing : A person is represented as a pictorial structure composed of body parts tied
together in a tree-structured conditional random eld Parts, li, are oriented patches of xed size,
and their position is parameterized by location and orientation. The posterior of a conguration of
parts L = {li} given an image I can be written as a log-linear model

Fig 4.Single-frame models. Each node represents a body part (h: head, t: torso, left/right
upper/lower arms lua, rua, lla, rla). (a) The kinematic tree includes edges between every two body
parts which are physically connected in the human body. (b) The repulsive model extends the
kinematic tree with edges between opposite-sided arm parts.

10EE35001 5

Results of 2D Image Pose Esimation on Bharat Natyam Karanas

Fig .5.Estimated pose using 2D Articulated Full body detector for Bharat Natyam dancers in different
poses.

10EE35001 6

Pose Estimation through skin detection for Gymnastics


Since 2D pose estimation can be done for poses in an upright position i.e. Head should be above
torso while the gymnast performs a series of acts where there are certain number of gestures which
cannot be detected using 2D pose estimation.
In gymnastics since there is a huge portion of body which is exposed we can easily detect the
orientations of hands and legs using skin detection.
Non-parametric histogram-based models were trained using manually annotated skin and non-skin
pixels.

Fig 6 (a)Original Image

(b) Skin Likelihood Image

(c)Detected Skin

Algorithm:
Step 1: Load an RGB image and convert it to doubles.
Step 2: Compute the Skin likelihood for each pixel.
Step 3: Threshold the likelihood to detect skin.

These binary images can used to detect the action in any given image with HOG(Histogram of
Oriented Gradients). The technique counts occurrences of gradient orientation in localized portions
of an image. This method is similar to that of edge orientation histograms, scale-invariant feature
transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of
uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.
The final step in action recognition using Histogram of Oriented Gradient descriptors is to feed the
descriptors into some recognition system based on supervised learning. TheSupport Vector
Machine classifier is a binary classifier which looks for an optimal hyperplane as a decision function.
Once trained on images containing some particular action, the SVM classifier can make decisions
regarding activity done such as a straddle, bridge in additional test images.

10EE35001 7

The following Results were obtained from Skin Detection on gymnastics dataset which
correctly identify the shape of the Gymnast.

Fig. 7. Results of Skin detection on gymnasts in various events

You might also like