You are on page 1of 12

M.

Sc Project Proposal
Application of Data Mining Techniques
on Health Facilities and Classification
of Maternal Health Delivery in
Nigeria

OLANREWAJU AHMED BABJIDE


MATRIC NO: 106286

SUPERVISOR
DR OJO, Adebola K.
INTRODUCTION
• A rigorous, geo-referenced baseline facility
inventory across Nigeria is created spanning
from 2009 to 2014 to build Nigeria’s first
nation-wide inventory of health facility.
• The need to draw useful insights and hidden
pattern in the health care facilities data
available in Nigeria.
LITERATURE REVIEW
Authors Source Contribution Drawbacks / Future Work

Shelly Gupta Performance Analysis The results obtained after applying different More experiments can also
Dharminder Of Various Data Mining classification techniques on given datasets be done on healthcare
Kumar Classification SVM showed the most promising results for datasets using different
Anand Sharma Techniques On PIMA Indian Diabetes dataset and StatLog parameters and techniques.
(2011) Healthcare Data. Heart Disease dataset with 96.74% and
International Journal of 99.25% accuracy rate respectively and C4.5
Computer Science and decision tree for BUPA Liver-disorders
Information dataset with an accuracy rate of 79.71%
Technology, 3(4), 155– whereas for Wisconsin Breast Cancer dataset
169. Bayes Net, SVM, kNN and RBF-NN all
https://doi.org/10.5121 shown the almost similar results with high
/ijcsit.2011.3413 accuracy rate and the highest accuracy rate
achieved is 97.28%
Lavrač, Nada Data mining and The results are applicable to health-care There is a need to focus on
Bohanec, Marko visualization for planning and support in decision making by the development of decision
Pur, Aleksander decision support and local and regional health-care authorities. In support tools for modeling of
Cestnik, Bojan modeling of public addition to the practical results, which are health-care providers using
Debeljak, Marko health-care resources. directly useful for decision making in data mining.
Kobler, Andrej Journal of Biomedical planning of the regional health-care system,
(2007). Informatics, 40(4), 438– the main methodological contri- bution of
447. the paper are the developed visualization
https://doi.org/10.1016 methods that can be used to facilitate
/j.jbi.2006.10.003 knowledge management and decision
making processes
LITERATURE REVIEW (cont’d)
Authors Source Contribution Drawbacks
Wang, Y., Kung, L. A., Wang, An integrated big data It was implemented on 3 There is a need to optimize
W. Y. C., & Cegielski, C. G. analytics-enabled different classification the networks architecture,
(2018). transformation model: algorithms: NaiveBayes, gathering more training
Application to health care. Multilayer Perception (MLP) samples, and using temporal
Information and and SVM to compare results information in the sequential
Management, 55(1), 64–79. with SPPN on our own data.
https://doi.org/10.1016/j.im. dataset.
2017.04.001
Farahani, R. Z., Hekmatfar, Hierarchical facility location This work shows that less One area of emerging
M., Fahimnia, B., & problem: Models, than 39% have utilized exact literature is the
Kazemzadeh, N. (2014) classifications, techniques, solution methods and the development of hierarchi-
and applications. Computers remaining have used cal facility location models
and Industrial Engineering, approximations, heuristics with maximum covering
68(1), 104–117. and meta-heuristics. In 41- objectives for emergency
https://doi.org/10.1016/j.ci year HFLP modeling efforts, services that are resilient to
e.2013.12.005 about 30% of all exact unexpected severe
methods have been disruptions such as terrorist
proposed in recent 12 years, attacks, tsunami and floods.
while the use of Such assumptions as
approximations heuristics possible future facility
and meta-heuristics has relocations
been twice as much during
LITERATURE REVIEW (cont’d)
Authors Source Contribution Drawbacks
P.C., A., C.D., N., & J.V., T. A comparison of a Bayesian The purpose of this study The 4 methods presented in
(2001). vs. a frequentist method for was to compare 4 different this study employ different
profiling hospital Bayesian methods for classi- criteria for classifying a
performance. Journal of fying hospitals as outcomes hospital as a performance
Evaluation in Clinical outliers, using 30-day outlier. Our findings suggest
Practice, 7(1), 35–45. hospital- level mortality rates a need for research into
for a cohort of acute which methods are best able
myocardial infarction to correctly discriminate
patients as a test case. between hospitals and are
most meaningful to clini-
cians, managers, and the
general public.
PROBLEM STATEMENT
• There is dire need to appraise the
effectiveness of the various health facilities
scattered all over the nation and also to
provide reliable and accurate information
about the maternal health delivery services
being provided.
AIMS AND OBJECTIVES
The aim of this project is to investigate the effectiveness of the
various health facilities and also their maternal health delivery
service status.

The Objectives of this project are:


1. To apply a machine learning approach to correctly cluster the
health facilities and maternal health delivery service provision.
2. To compare the various classification algorithms (K-Means
Clustering, Mean-Shift Clustering, Density-Based Spatial
Clustering of Applications with Noise (DBSCAN), Expectation–
Maximization (EM) Clustering using Gaussian Mixture Models
(GMM), Agglomerative Hierarchical Clustering)
3. To evaluate the best model to apply
PROPOSED METHODOLOGY
• DATA GATHERING
• DATA CLEANING (PRE-PROCESSING)
• TRAIN – TEST SPLIT
• APPLYING THE CLUSTERING ALGORITHMS
• EVALUATE THE MODELS
• COMPARE THE PERFORMANCE OF THE
CLASSIFICATION ALGORITHMS
PROPOSED METHODOLOGY
MODEL APPLICATION
DATA COLLECTION • K-Means Clustering
Merge Dataset (Various Sources) • Mean-Shift Clustering
• Density-Based Spatial Clustering of
Applications with Noise (DBSCAN),
• Expectation–Maximization (EM) Clustering
using Gaussian Mixture Models (GMM)
• Agglomerative Hierarchical Clustering

PRE-PROCESSING
Missing Data Evaluation TRAIN / TEST SPLIT
Standardize Data EVALUATE MODELS
Feature Engineering Contingency Tables
Feature Extraction Sum of Squared-Error Criterion
Feature Transformation Silhouette Value
Class Based Precision and Recall
Pair Wise Precision and Recall

EXPLORATORY DATA
ANALYSIS AND
VISUALIZATION
APPLY MODEL ON
REAL WORLD DATA MODEL COMPARISON
EXPECTED RESULT
• At the end of this research, a clustering model
would have been deployed that would clearly
cluster the health facilities and maternal health
delivery services
• It would help reduce the cost, human and other
resources that has been wasted in clustering the
facilities when budgeting for upgrade
• It would also help in ensuring that the already
limited facilities are judiciously use.
REFERENCES
• Wang, Y., & Hajli, N. (2017). Exploring the path to big
data analytics success in healthcare. Journal of
Business Research, 70, 287–299.
• Wang, Y., Kung, L. A., Wang, W. Y. C., & Cegielski, C. G.
(2018). An integrated big data analytics-enabled
transformation model: Application to health care.
Information and Management, 55(1), 64–79.
• Shelly Gupta, Dharminder Kumar, & Anand Sharma.
(2011). Performance Analysis Of Various Data Mining
Classification Techniques On Healthcare Data.
International Journal of Computer Science and
Information Technology, 3(4), 155–169.
REFERENCES
• Farahani, R. Z., Hekmatfar, M., Fahimnia, B., & Kazemzadeh, N.
(2014). Hierarchical facility location problem: Models,
classifications, techniques, and applications. Computers and
Industrial Engineering, 68(1), 104–117.
• Lavrač, N., Bohanec, M., Pur, A., Cestnik, B., Debeljak, M., &
Kobler, A. (2007). Data mining and visualization for decision
support and modeling of public health-care resources. Journal of
Biomedical Informatics, 40(4), 438–447.
• Lyman, J. A., Scully, K., & Harrison, J. H. (2008). The Development
of Health Care Data Warehouses to Support Data Mining. Clinics in
Laboratory Medicine, 28(1), 55–71.
• Pramanik, M. I., Lau, R. Y. K., Demirkan, H., & Azad, M. A. K.
(2017). Smart health: Big data enabled health paradigm within
smart cities. Expert Systems with Applications, 87, 370–383.

You might also like