You are on page 1of 5

2019 XXII Symposium on Image, Signal Processing and Artificial Vision (STSIVA)

Automatic Cell Image Segmentation


Using Genetic Algorithms

Margarita Gamarra Andrés Mitre-Ortiz Hugo Escalante


Electronic Engineering department Human-Centered Computing Lab Instituto Nacional de Astrofísica, Óptica y
Politécnico de la Costa Atlántica Center for Resarch in Mathematics Electrónica
Barranquilla, Colombia Zacatecas, México Puebla, México
mgamarraa@pca.edu.co andres.mitre@cimat.mx hugojair@inaoep.mx

Abstract—Cell image segmentation is a fundamental stage for of exact indicators and in order to obtain a minimum error in
cell identification process, but it is not an easy task. Several the classification, then better results can be achieved.
methods for cell segmentation have been proposed. However, the
selection of parameters for the available algorithms depends on Faced with these challenges in cell image segmentation,
the cell type and finally they are designated by an expert. this research implements an Autonomous Machine Learning
Whereas this approach can result in good performance, it is not (AutoML) strategy for the parameters selection process, in
necessarily the optimal and may inherit expert’s biases. We order to obtain an improved performance in the segmentation
propose in this paper an autonomous machine learning technique of cells from fluorescence microscope images.
based on genetic algorithms for the selection of parameters in the
cell image segmentation process. The use of optimized II. RELATED WORK
parameters improved the performance of the cell segmentation The review of the state of the art in the analysis of cell
algorithm. images segmentation is quite broad, and it is a topic of growing
research. This is because the biological sciences have found
Keywords—cell segmentation, genetic algorithms, auto machine
great support in the tools of computing and digital processing
learning, auto tuning.
of information for the development of their research [2].
I. INTRODUCTION Although the work is vast, there is no a single-best method
Recent advances in microscopy and improvements in for cell segmentation and the choice of a suitable algorithm
image processing algorithms have allowed the development of depends on required computational efficiency, performance,
computer-assisted analytical approaches in the identification of and type of images being segmented [3].
cells [1]. Several applications can be mentioned in this field:
In [4] the authors explain how Machine Learning (ML)
identification of cellular phenotypes, detection and treatment of
methods work and the considerations for their successful
diseases, identification of virus entry into cells and virus
application in cell biology. It summarizes how microscopy
classification. These applications could help complement the
images can be converted into a suitable data representation for
opinion of medical experts.
machine learning, and then present several next-generation ML
Despite the great advances in the areas of cell imaging, algorithms, highlighting recent applications in image-based
image processing and pattern recognition, each type of cell and detection.
image is a new challenge and requires of its own
Similarly, a review is carried out in [5] which is focused on
configurations, where the performance of techniques can vary.
ML applications for the analysis of images in microscopy
In addition to this, the analyst performs a manual tuning of the
experiments with typical segmentation and cell tracking tasks.
parameters in each of the algorithms and even the selection of
This review also offers a brief historical perspective of ML and
the same algorithms is a search task made by the human expert.
presents several example applications in various stages of
Finally, a model with its adjusted parameters and with an
image processing, including the use of supervised learning
acceptable performance is obtained by trial and error and the
methods to improve cell segmentation and the application of
expert's criterion.
active learning for tracking.
Nevertheless, if the task of choosing the model and its
Although different approaches for parameter optimization
parameters is carried out automatically, under the measurement
have been developed and successfully applied in several
application fields [6], there is still a lot to explore regarding its foreground objects, computing background markers and
implementation in cell segmentation. The main contribution of computing Watershed transform [7].
our research is the adaptation of a method for parameter
optimization based on Genetic algorithms (GA), applied in the
context of cell image segmentation.
III. PROPOSED APPROACH
In this study we are interested in the auto-tuning of the
parameters, more specifically, in a setting where the parameter
space of a cell segmentation algorithm is automatically
explored by a genetic algorithm. The method for cell
segmentation is the reported algorithm marker-controller
watershed [7]. The auto-tuning process compares the results
with ground truth, obtaining four basic cardinalities of the
confusion matrix: true positives (TP), false positives (FP), true
negatives (TN), and false negatives (FN). Based on these
values, the following performance indicators can be obtained
[8]:

a) Sensitivity or recall: TPR=TP/(TP+FN)


b) Specificity: TNR=TN/(TN+FP)
c) Precision: PPV = TP / (TP + FP).
d) Negative predictive value: NPV = TN / (TN + FN).
e) Accuracy: (TP+TN)/(TP+FN+TN+FP)
f) F-Index= 2(PPV*TPR)/(PPV+TPR) Fig. 1. Flowchart of the MC-Watershed algortithm.
g) Dice: 2TP/(2TP+FP+FN)
h) Jaccard: TP/(TP+FP+FN) The preprocessing stage adapts the input image Iinput and
adjusts the intensity by trying to contrast the cell and
background; the algorithm cleans the borders in the image as
These metrics offer information about the segmentation the cells in these places are not completely visible and it would
process. Precision and Recall are meaningful when considered result in wrong information for following processes, such as
jointly. In general, under-segmentation generates high values characterization. The resulting image is Iadj.
of precision and low recall, while over-segmentation is the
opposite [9]. An appropriate behavior between precision and Morphological operations like "opening-by-reconstruction"
recall is revealed in higher f-index values. and "closing-by-reconstruction" are performed to “clean up”
the image. The output of the preprocessing block is Io, which is
The auto-tuning process is repeated to find a set of the input to obtain foreground and background markers. In this
parameters that produces the most accurate results as measured stage is necessary to define a morphological structuring
by a comparison metric that in our case is the F-index. element. The “disk” structure is suitable since it fits to the cell
We selected a Genetic Algorithms (GA) as our shape. The radio r of the disk is a parameter that affects the
optimization method, since it has obtained a suitable performance of the segmentation. Then it is an input for the
performance in similar applications [6] and it allows use optimization algorithm.
different configurations with the aim to obtain the optimal The next block calculates the regional maxima of Io to
output. Besides, this is a derivative free optimization technique obtain foreground markers. The regional maxima are
that can be implemented in parallel. connected components of pixels with a constant intensity
For our experiments, first we selected the parameters of the value, whose all external boundary pixels have a lower value.
segmentation algorithm based on the expert criterion and some It is necessary to clean the edges and remove the elements that
visible results. Second, we separate the samples in training and have fewer than P pixels. This is the second input for the
test and execute the GA with the fitness function with the aim optimization algorithm. The resulting image of this block is IF.
to obtain the optimal set of parameters. A threshold based on Otsu's method is used to obtain the
A. Cell segmentation algorithm background markers. It is not suitable that the background
markers are too close to the edges of the cells. Then, the
The used method to distinguish between background and background is thinned by using the Watershed transform of the
cell is the marker-controlled watershed (MC-Watershed) Distance Transform of the internal marker and by looking for
[10][11]. The goal of this block is to recognize as many cells as the watershed ridge-lines. The DT has four options:
possible. A flowchart of this process is shown in Fig. 1. Four “cityblock”, “chessboard”, “Euclidean” and “Quasi-Euclidean”
stages integrate the MC-Watershed: preprocessing, marking the which influence the performance. Then DT is another
parameter. The resulting image is IB.
The last stage computes the Watershed transform. This The new population is evaluated based on the F-index
block needs the inputs IF, IB and Iadj. The gradient magnitude indicator and the results are input back to the GA to build
image is adjusted so that its only regional minima occur at another generation. The process continues until a
foreground and background marker pixels. Then the predetermined number of generations (10 iterations) are
Watershed transform is computed on this modified gradient. reached. In our experiments, the probabilities C and M were
empirically selected as 0.5 and 0.3, respectively, to maximize
The output of this first step is a segmented image with cells performance. The GA was performed with the Optimization
identified in the background. The resulting image is a label Toolbox from @Matlab 2018a, where the following constraints
matrix Ilabel. were chosen according to several tests:
• Limits for parameters (r,p,DT): lower bound: (4, 30, 1),
B. Fitness function upper bound: (7, 50, 3).
The fitness function is based on the cell segmentation
• Population size: 10.
algorithm. The input vector contains the three parameters r, p
and DT, which correspond to radio of the disk, number limit of • Creation function: uniform.
the pixels to remove elements and option for the Distance
Transform, respectively. The operations described in the • Selection function: roulette.
flowchart in Fig.1 are executed. The output image Ilabel is In Fig. 2 the auto-tuning process of the GA is showed. The
compared with the ground truth and the F-index is obtained. process takes initial parameters (r, p and DT) as a constraint.
This process is completed for each of the training images (10 The parameters are then introduced into the MC-Watershed
images). Then, the average F-index is calculated: it is the algorithm for the cell segmentation process. Afterwards, the
output of the fitness function and the value to optimize by the Ilabel (segmented image) and Ibinarized (ground truth) are used for
GA. the binary classification (foreground or background class).
C. Genetic algorithm In the block Binary classification, we calculate the
The GA maps each parameter of our cell segmentation specificity, accuracy, sensitivity, precision, Jaccard index, Dice
algorithm to a gene of an individual [12]. This is the input as well of the F-index. The metric to optimize is F-index.
vector to the fitness function, which perform the segmentation Based on the result of the objective function value and the
algorithm and obtain the F-index for each individual. The configuration, the GA looks for other individuals in the
initial population is created randomly and evolved using population, using mutation, crossover, and other operations.
crossover and mutation. The crossover uses a one-point This process is repeated until the GA finds the parameters that
crossover between pairs of individuals with a probability of C. produces the most accurate results for the F-index.
The mutation in each gene of occurs with an independent
probability of M.

Fig. 2. Auto-tuning process of the Genetic Algorithm.


IV. RESULTS of the True Positive (TP), True Negative (TN), False Positive
This section evaluates the auto-tuning algorithm using GA (FP) and False Negative (FN) values.
with the goal of maximizing the F-index metric. These The average values of the metrics for the complete set of
experiments were executed with the SNP HEp-2 cell dataset. images are presented in Table II. The results show that the
The dataset (SNPHEp-2) [13] was obtained between January auto-tuning algorithms improved the quality of the results
and February 2012 at Sullivan Nicolaides Pathology compared with the results generated by the default input
laboratory, Australia. This dataset contains images of five cell parameters selected by the human expert, regard to F-Index
classes: centromere, coarse speckled, fine speckled, and Jaccard indicator.
homogeneous, and nucleolar; and consists of 1,884 cell images
extracted from 40 specimen images. DAPI image channel was TABLE II. PERFORMANCE INDICATORS.
used to obtain the cell image masks automatically. With the Method
aim of validating this proposal, 40 cell images from the Indicator
homogeneous class were randomly chosen. Default 1 Default 2 GA
Precision 0,86804 0,8975 0,87921
The execution time of the optimization algorithm with GA
was 3 hours in an Intel Core i7-6500U, CPU @2.5 GHz, 8 GB Sensitivity 0,85691 0,83771 0,85372
of RAM, 64 bits OS. The algorithms were run in MATLAB Specificity 0,97384 0,98078 0,97624
@R2018a.
NPV 0,97027 0,96672 0,96970
The values of the parameters obtained with the GA are
showed in Table I and compared with the selected by human Accuracy 0,95338 0,95594 0,95492
expert: F-Index 0,85863 0,86303 0,86309

TABLE I. PARAMETERS OBTAINED BY GA. Jaccard 0,75598 0,76276 0,76289


Method Dice 0,85863 0,86541 0,86550
Parameter
Default 1 Default 2 GA
R 4 5 5 An appropriate behavior between precision and sensitivity
P 35 30 33 generates high f-index values. This is the case of our proposal
with GA, where some over-segmentation causes high
DT cityblock cityblock cityblock sensitivity and low precision, compared with default values.
Due to these metrics are based on the comparison of the
A panel with two images (000002_p2.tif and segmented image with the ground truth, the results are very
00005_p1.tif) is presented in Fig. 3 to show the improvement similar (comparing values of default 2 with GA). Nevertheless,
in segmentation output generated by the tuned versus default the visual inspection demonstrates that the automatic GA
input parameters. This image shows that the segmented image tuning achieves an improved performance in the separation of
with GA reach superior results regard to identification and clustered cells, as the Fig.3 shows.
correct separation of clustered cells. However, some objects
were over-segmented. V. CONCLUSIONS
Segmentation with Segmentation with Cell image segmentation algorithms are sensitive to input
Ground Truth
default 2 parameters GA parameters parameters and the selection done by human experts not always
is the optimal option. The proposed auto-tuning algorithm
using GA improved the indicators for cell segmentation and it
produced accurate visual results. Although it may take
computational time, this search method makes it easier for the
programmer to select the multiple parameters of the algorithm.
As a future work, we intend to apply AutoML for the full
selection model problem, which not only include the tuning
parameters, but the selection of the method too.

REFERENCES

Fig. 3. Segmented images using tuned parameters versus default parameters.


[1] F. Xing and L. Yang, “Robust Nucleus/Cell Detection and
The performance of the GA was measured with an Segmentation in Digital Pathology and Microscopy Images: A
evaluation of binary class according to the final parameters. Comprehensive Review,” IEEE Rev. Biomed. Eng., vol. 9, pp. 234–
The images from the dataset were used for the evaluation. The 263, 2016.
following metrics were measured: specificity, accuracy,
sensitivity, precision, dice and Jaccard index; these, in function [2] E. Zurek and M. Gamarra, “Cell Identification Using Image
Analysis: A Literature Survey,” 2017.

[9] S. Tonti, S. Di Cataldo, A. Bottino, and E. Ficarra, “An automated


[3] B. T. Grys, D. S. Lo, N. Sahin, O. Z. Kraus, Q. Morris, C. Boone, approach to the segmentation of HEp-2 cells for the indirect
and B. J. Andrews, “Machine learning and computer vision immunofluorescence ANA test.,” Comput. Med. Imaging Graph.,
approaches for phenotypic profiling.,” J. Cell Biol., vol. 216, no. 1, vol. 40, pp. 62–9, Mar. 2015.
pp. 65–71, Jan. 2017.

[10] X. Han, Y. Fu, and H. Zhang, “A fast two-step marker-controlled


[4] C. Sommer and D. W. Gerlich, “Machine learning in cell biology - watershed image segmentation method,” in 2012 IEEE
teaching computers to recognize phenotypes.,” J. Cell Sci., vol. 126, International Conference on Mechatronics and Automation, 2012,
no. Pt 24, pp. 5529–39, Dec. 2013. pp. 1375–1380.

[5] A. Kan, “Machine learning applications in cell image analysis,” [11] C. F. Koyuncu, S. Arslan, I. Durmaz, R. Cetin-Atalay, and C.
Immunol. Cell Biol., vol. 95, no. 6, pp. 525–530, Jul. 2017. Gunduz-Demir, “Smart Markers for Watershed-Based Cell
Segmentation,” PLoS One, vol. 7, no. 11, p. e48664, Nov. 2012.

[6] G. Teodoro, T. M. Kurç, L. F. R. Taveira, A. C. M. A. Melo, Y.


Gao, J. Kong, and J. H. Saltz, “Algorithm sensitivity analysis and [12] A. R. Conn, N. I. M. Gould, and P. L. Toint, “A Globally
parameter tuning for tissue image segmentation pipelines,” Convergent Augmented Langrangian Algorithm for Optimization
Bioinformatics, vol. 33, no. 7, p. btw749, Jan. 2017. with General Constraints and Simple Bounds,” SIAM Journal on
Numerical Analysis, vol. 28. Society for Industrial and Applied
Mathematics, pp. 545–572.
[7] R. C. Gonzalez, Digital Image Processing. Pearson Education,
2009.
[13] A. Wiliem, Y. Wong, C. Sanderson, P. Hobson, S. Chen, and B. C.
Lovell, “Classification of Human Epithelial Type 2 Cell Indirect
[8] A. A. Taha and A. Hanbury, “Metrics for evaluating 3D medical Immunofluoresence Images via Codebook Based Descriptors.”
image segmentation: analysis, selection, and tool.,” BMC Med.
Imaging, vol. 15, p. 29, Aug. 2015.

You might also like