Professional Documents
Culture Documents
ISSN No:-2456-2165
Abstract: - Breast cancer has become the second main An automation system would be very beneficial here,
cause of cancer deaths in the women, nowadays, after in this scenario. The analysis using the automated system
lung cancer. It has become very common and is can help the radiologists or experts to enhance the diagnosis
spreading very quickly and undesirably. Though there accuracy. An when an automated system is provided with a
are a number of medical facilities available to treat and proper dataset, it can perform so well that a patient need not
diagnose breast cancer but still it is very long process to go through the painful biopsies and the other tests. This
and takes time over its diagnoses which just increase paper gives an overview of some major works done
the probability of cancer spreading more quickly. For machine learning over diagnosis of Breast cancer. This
this reason scientists decided to develop a computer paper will provide the different works done in ANN and
based system that could be used for diagnosis of breast various classification algorithms, which helped in
cancer, so that the time taken would be reduced and developing an automation system to predict the breast
eventually the rate of spreading can be reduced. This cancer. This paper will help the future researchers to find
paper provides the overview of survey of breast cancer out the important information regarding to the any research
using various machine learning algorithms. This will in this field.
help us to know a lot about the work done on this
system and eventually will help the researchers to II. DATA MINING TECHNIQUES USED FOR
improve the system. BREAST CANCER ANALYSIS
Keywords: - Machine Learning Approach, Neural Nets, Data mining & Knowledge Discovery of Data of
Breast Cancer (KDD) are extracting novel, understandable, and useful
information, knowledge or patterns from huge amounts of
I. INTRODUCTION available data[4]. As the data collected from the different
sources is in very large volumes with the huge noise along
Breast cancer has become a serious problem in the with it , it becomes very difficult to analyze this data and
world today, affecting the health of women all around the get results with satisfying accuracy rates. Her the data
globe very badly. From last decades, it has been seen this mining techniques can be used which have the capabilities
cancer is spreading more quickly than the attempts made to to analyze the large sets of data, finding hidden
control it. Death rates due to breast cancer has also relationships between various features of these data and
increased and main cause for this is late or time-taking extracting satisfying results. For example in the health
diagnosis process. According to a survey in US, on centers doctors observe the various symptoms of any
increasing rates of breast cancer, it has been found that each disorder in a patient and then analyze that obtained
year about 252,710 new cases of breast cancer are being information and try to find the relation , after that,
reported and about 40,610 women die[1]. According to according to which they predict the disease. The problem
some other report provided by WHO, each year around 1.5 with this manual method is that it is slow, and expensive
women are affected by invasive breast cancer [2]. and also the patient has to go through a series of tests and
biopsies.
When the survey was carried out in Asia it was
observed that in country Pakistan most of the cases of Data mining involves the following steps;
breast cancer are reported. About 90,000 cases are reported
each year there [3]. Selection of data
Preprocessing: It involves the removing of columns or
Because of the fact that the breast cancer are not rows with the missing data
easily detectable in early stages, its diagnostic process Transformation: It includes normalization of data and
becomes very long and takes a lot of time. For this reason selecting only the important features among all the
also it has become so invasive. Medical diagnosis include features chosen, on basis of which prediction could be
mammography and Biopsies. If abnormalities are found in made.
mammography of a patient has to go through Biopsy, Predictive Tasks: involves using of various
which is very painful, time-taking and costly. classification and association rules of datmining to
predict results on the basis of features.
Interpretation/Evaluation: includes statistical
A neural network is a complex algorithm that is “Chandra Prasetyo Utomo, Aan Kardiana, Rika
inspired from the structure of a biological human brain. It is Yuliwulandari” [15] have presented a paper in which
developed in a way that it could mimic the decision-making they have implemented ANN with the extreme
ability of human brain. Though human brain is more learning for diagnosis of Breast Cancer. In their work
complex as compared to this artificial neural network but they have used Wisconsin dataset. They later have
still it is able to extract the patterns and meaningful compared the results of Extreme Learning techniques
information from huge complex data, which is very difficult with the Back Propagation algorithms and find out that
to be recognized and detected in other ways. A neural Extreme Learning Machine Neural Network(ELM
network develops an ability within itself with help of NN) is better classifier than BP ANN. Although the
which it can detect patterns in input data and predict an specificity rates of BP ANN were found better than the
output data free of any noise. The structure of neural later, but still from the results it could be observed that
network algorithm consist of 3- layers which are given as; accuracy rates and sensitivity of ELM NN were much
improved.
The Input Layer: This layer takes the data items from “Mihir Borkar, Prof. Khushali Deulkar, and Abhinav
outside environment and feeds it to the next following Garg”[16] , in their study developed an
layer, that’s called hidden layer. artificial neural network that was able to predict
The Hidden Layer: This layer consists of certain weather a patient has the breast cancer or not that is
complex functions according to which input data is weather the patient has malignant tumor or benign
processed and predictions/results/decisions are made. tumor. The attributes used in dataset were taken from
These function are not visible to users but are just the cell of patients and total 699 patients were the
embedded in an algorithm used. sources of data. 599 instances were created to train the
The Output layer: The decisions made in the hidden network and the final network provided the accuracy
layers are collected here and final decisions are made of 96.49%.
which eventually provide the final predictions. “Larrisa Westerdijk along with prof. Dr. Sandjai
Bhulai” [17] in their research have used five machine
Figure 1, shows more clearly how the artificial neural learning models to predict the benign and malignant
network work, is given as follows: breast cancers. The 5 models included logistic
regression, random forest, support vector machine,
neural networks and ensemble models. The
performances and accuracies of these all models were
found. The accuracy results were 97.35%, 97.35%,
98.23%, 97.35% and 98.23%.
“Xin Yao and Yong Liu” [18] have proposed their
work where they used two neural network based
approaches to diagnose the breast cancer. The two
approaches used were Evolutionary based and
Ensemble based approach. It was found out that
Evolutionary approach can be used to design compact
neural networks automatically by evolving network
architectures and weights and the Ensemble approach
could tackle with the large problems.
“Harsh Vaizirani, Rahul Kala” [19] et al used the
Modular Neural Network for diagnosis of Breast
Cancer. They used the two different neural network
models, over four modules to solve problem. They
Fig 1:- Neural Net Representation used BPNN and RBFN as two models to train and test
the 4 different modules. Module 1 and 3 were trained
Neural networks provide the results with very high and tested using BPNN and 2 and 4 using RBFN.
accuracy even if the data is very noisy. However it has a Training and Testing accuracies for BPNN for module
disadvantage that accuracy of predictions made is valid 1 and 3 were found as 89.50% &96.44% (for module1)
only for a specific period of time in which data was and 91.50% & 94.67%(for module3)respectively.
collected. A lot of the work over the analysis of breast Training and testing accuracies for RBFN for module
cancer related data and prediction of different types of 2 and 4 were found as 94.75% &96.44% (for
tumors, using ANN algorithms has been done, with a module2) and 97.50% & 97.63% (for
satisfying accuracy. Some of the major works will be given module4)respectively.
in another Section of this paper.