You are on page 1of 11

review articles

DOI:10.1145/ 2960404
abundance of molecular details
Cell-graph construction methods known about wound healing, for ex-
ample, it is virtually impossible to ac-
are best served when physics-driven and curately predict the final functional
data-driven paradigms are joined. state of a healing wound.36 This illus-
trates a need to build models that rep-
BY BÜLENT YENER resent the structural organization at
the organ, tissue, cellular, and molec-

Cell-Graphs:
ular levels. Furthermore, such models
must capture relationships between
these scales and relate them to the un-
derlying functional state.

Image-Driven
Data-driven network/graph analysis
is primed to decipher cellular interac-
tions in the intricate relationship be-
tween protein-protein interactions,

Modeling
genetic changes, metabolic pathways,
and chemical secretions, which com-
prise cellular events. When extended to

of Structure-
the organ level, the key challenge would
be to link the local and global struc-
tural properties of tissues to the overall
morphology and function of a tissue.

Function
Only a systems-level understanding of
the various cellular processes encom-
passing multiple biological levels will
take into account the multidimension-

Relationship
al complexity of these processes. If the
principles governing biological orga-
nization on a morphological, spectral,
local, and global scale can be deduced,
the correlation between structural and
molecular signaling within the tissue
can be understood and applied to in-
form and accelerate studies of organ
development and tissue regeneration.
THE STRUCTURE-FUNCTION RELATIONSHIP is fundamental
to our understanding of biological systems at all key insights
levels, and drives most, if not all, techniques for ˽˽ Structural and spatial patterns of
cell organizations in a tissue are
detecting, diagnosing, and treating a disease. The not random but associated with the
underlying functional state. Cell-
predominant means of collecting structure/function graphs combine techniques from image
data in biomedicine is reductionist and has thus led analysis, graph theory, data mining,
and machine learning to identify
to a proliferation of complex data (for example, gene such patterns to predict underlying
functional state. Thus, understanding
expression arrays, digital images) that captures only structure-function relationships can be
used to predict malfunctioning when
a fraction of the structure/function relationship. the patterns start changing.
Gene sequence and expression data illustrates the ˽˽ Advances in tissue staining and
structure and activities of individual genes but does not image processing permit capturing
multichannel, multiscale information
explain how these genes collaborate to control cellular which in turn can be used by the state of
the art machine learning algorithms to
and tissue-scale functions. As a result, despite the model structure-function relationships.

74 COM MUNICATIO NS O F TH E AC M | JA NUA RY 201 7 | VO L . 60 | NO. 1


The cell-graph technique11,12,20 aims tion, and machine learning algorithms that capture the macroscale behavior of
to learn structure-function relationship to establish a quantitative relationship a system by smaller (micro) scale consti-
by modeling structural organization between structure and function. tutive relations. For example, the Car-
of a tissue/organ sample using graph As more sophisticated staining tech- Parrinello Molecular Dynamics (CPMD)
theory. Its main hypothesis is that cells niques that provide information about model employs a “microscopic” model
in a tissue/organ organize to perform different biological scales are deployed to formulate the “constitutive relations”
a specific function. For example, the (as will be discussed), image-driven mod- based on a force field between the nu-
spatial distribution and interaction of eling with cell-graphs provides a mul- clei, with a “macroscopic” model that
cells in a salivary gland tissue is differ- tiscale approach to modeling complex uses mechanics for the dynamics of the
ent than that of a brain tissue since they biological systems, as a complementary nuclei.7 However, complex biological
perform very different functions. Thus, one to physics-based continuous mod- systems have different scales, including
IMAGE BY ANNA JURKOVSKA

if one can understand tissue organiza- els and methods (for example, finite ele- molecular, cellular, tissue, and organ
tion then one can successfully predict ment method, fine difference method). levels, than the computational ones.
the corresponding function. The cell- While quite successful in various engi- Furthermore, physics-based techniques
graph technique deploys image pro- neering applications, these methods are parametric and do not leverage the
cessing, feature extraction and selec- operate under computational scales massive amounts of data available due

JA N UA RY 2 0 1 7 | VO L. 6 0 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 75
review articles

to advances in data acquisition, such Figure 1. Different tissue types and by the location of vertices. The Delau-
as high throughput medical imaging states as well as their representations nay triangulation permits the exis-
techniques, which has been recognized as cell-graphs. tence of edges solely between adjacent
as an important research direction (see vertices. Thus, only relationships
https://datascience.nih.gov/bd2k). between closely located nuclei are
In this article, I will illustrate vari- Healthy Cancer represented. This restriction makes
ous cell-graph construction methods it impossible to generate and test dif-
for different applications, explain the ferent biological hypotheses for cell-
graph features used in tissue classifi- Bone to-cell interactions. Second, Voronoi
cation, and suggest how to combine graphs are restricted to planar graphs
physics-driven and data-driven para- that are very limited in their structure
digms toward a multiscale modeling and do not allow crossing of edges.
for better prediction. The discussion There is no evidence to justify such a
starts from a simple graph model and limitation in tissue structural organi-
progresses toward more sophisticated zation. This constraint also presents
ones as a function of staining, and im- difficulties with 3D images. Third, a
aging techniques. This review includes Brain Voronoi graph always has a single con-
both static data and time-evolving dy- nected component (that is, the tissue
namic data as well. is represented by a connected graph),
Image-driven native tissue modeling. which may not be a valid assumption
Several different approaches are used for sparse tissues (those with fewer
to extract features at the cellular and numbers of cells). Finally, the graph
tissue levels to distinguish and classify features are limited and mainly com-
distinct (mal)functional states such puted on minimum spanning trees
as tumor types in cancer. The features over Voronoi graphs.
are used to quantify the information (Various modifications have been
Breast
carried in the sample, and to distin- proposed to adapt Delaunay triangula-
guish the diseased structures from the tions to specific biological systems by
healthy and damaged ones. changing the triangulation technique
The first approach makes use of and resulting in different neighbor-
morphology to quantify the size and hood graphs.2,25,37 However, the feature
shape of a cell or its nucleus.40 At the Cell-graphs are constructed from H&E sets constructed from these neighbor-
cellular level, such features are used stained human tissue samples and provide hood graphs are limited and mainly
precise metrics to capture the spatial organi-
to classify a nucleus as belonging to a zation of tissues. Using these graph metrics,
based on spanning tree properties.)
healthy or diseased cell. At the tissue machine learning algorithms can predict the The cell-graph approach generalizes
level, the statistics of these features functional state of underlying tissue samples. graph-based approaches by allowing
over the tissue are exploited in the clas- As shown in the figure, cell-graphs and corre- an arbitrary edge function between a
sponding functional states of different tissue
sification of a tissue as diseased or not. types quite different. pair of nodes based on a biological hy-
The second approach employs inten- Courtesy:doi:10.1371/journal.pone.0032227 pothesis on their pairwise relationship.
sity, or the distribution of the color val- In a cell-graph, cells or cell clusters of a
ues of pixels to define features.43 Such sample tissue are the vertices. An edge
features may include the mean, stan- and co-occurrence matrices.22 Fractals is defined between a pair of cells or cell
dard deviation, skewness, and kurtosis that describe the similarity levels of clusters based on an assumption that
of the red, green, and blue components, different structures found in a tissue has a biological foundation (or hypothe-
as well as the difference between the red image over a range of scales have been sis). For example, if we believe that cells
and blue components, and the propor- proposed in Einstein15 and Esgiar.16 We that are spatially close to each other are
tion of the blue component in RGB col- note that none of the approaches men- more likely to interact (for example, sig-
or space. Given this approach directly tioned here can model the structure- nal) with each other than more distant
derives features from the intensity val- function relationship in tissue. cells, then a link can be made between
ues, these features are more sensitive to Prior work using graph theory to them with a probability that decays ex-
the noise that arises from stain artifacts model a tissue is based on drawing ponentially with increasing Euclidean
and image acquiring conditions. a Voronoi graph of cells from a tissue distance between them. Thus, links of a
The third approach exploits the image.26,42 In these studies the graph- cell-graph aim to capture the biological
textural descriptors as its features and based features are defined on the De- interactions in the underlying tissue.
considers spatial dependency of the launay triangulation graph or its cor- The cell-graphs provide a precise math-
intensity values to quantify the smooth- responding minimum spanning tree. ematical representation of cellular or-
ness, regularity or coarseness of the There are several limitations of Vor- ganization and the extracellular matrix
image. The two most popular models onoi graphs that cell-graphs success- (ECM) that surrounds cells. If the im-
to compute these textural descriptors fully remedy. First, the edge function ages carry multichannel information,
are those that use run-length matrices18 in Voronoi graphs is fixed and dictated by applying more sophisticated stain-

76 COMM UNICATIO NS O F THE AC M | JA NUA RY 201 7 | VO L . 60 | NO. 1


review articles

ing techniques (for example, multispec- ticated staining techniques discussed One earlier approach used for node
tral fluorescence imaging) it is possible later in this article. identification is to have two control pa-
to build cell-graphs that have different Formally, let G=(V, E) denote a cell- rameters: the size of the grid, and the
types of nodes, corresponding to differ- graph with V and E being the set of threshold value.11 The grid size deter-
ent types of cells that coexist (for exam- nodes and edges of the graph, respec- mines the down sampling rate, that is,
ple, epithelial vs. fibroblast) and other tively. The overall methodology is shown the resolution of the resultant image.
ECM entities (for example, basal mem- in Figure 2. It starts with image analysis Consequently, a node can represent a
brane underlying epithelial cell layers and ends with checking the accuracy of single cell, a part of a cell, or a bunch of
and blood vessels). With 3D images and the machine learning algorithms. cells, depending on the grid size. The
3D cell-graphs, such representation Identification of nodes for a cell- finer the grid size, the closer a node is
becomes more accurate and powerful. graph. Nodes of a cell-graph are as- to a single cell. For each grid entry, the
Cell-graphs bring the well-established sociated with individual cells, thus average values of pixels located in this
principles of graph theory and provide a the first step is to distinguish the cells grid entry are computed and compared
rich set of features defined precisely by from their background based on the against a threshold value to determine
these principles to be used as quantita- color information of the pixels. Stan- the nodes of the cell-graph. Threshold-
tive descriptor features. These features dard imagining techniques can be ing eliminates the noise that arises
could be defined and computed locally used for this part of the process. We from the staining artifacts and mis-
from a single node’s point of view (for note that cell-graphs do not require assignment of black pixels in the color
example, number of its neighbors), or precise cell segmentation and mor- quantization step.
globally for the entire tissue sample phology since determining cell loca- There are more sophisticated ap-
(that is, the shortest or longest dis- tions are enough to identify node set. proaches to cell segmentation and
tance in the cell-graph between any two Cell segmentation is an active area of node identification depending on the
nodes). Cell-graphs can use cell level research and outside the scope of this type of the tissue and how much seg-
attributes such as convexity, size, physi- work; we refer readers to many survey mentation accuracy is required. For
cal contact, shape, and so on to define papers on this topic.21,29,44 example, cells can be identified by us-
similarity metrics for establishing links
between a pair of nodes. Figure 2. Methodology for image-driven tissue modeling.
As an introductory example for
application of the cell-graphs, con-
Node identification
sider automated diagnosis of cancer
from digital images (that is, digital Tissue images Edge establishing
pathology). The “gold standard” for
cancer diagnosis remains the expert Step 1:
Cell-graph generation
(qualitative) opinion of pathologists
specially trained to recognize indica-
Cell-graphs
tive morphological signatures of
different tumors in histopathology
slides. This process is not only time Local features
consuming but also subject to inter- Global features
observer variability. The cell-graph
method can successfully assist diag- Step 2:
Feature Extraction
nostics by automating this process.
For example, consider the problem
Features
of predicting cancer for three mor-
phologically distinct tissues (brain,
breast, and bone) from histopathol-
ogy images. Figure 1 shows the cell- Machine Learning
graphs of three different human
Step 3:
tissue samples in two different func- Classification
tional states: healthy and cancerous.

Methodology Predictions
The cell-graph methodology is image-
driven and utilizes different technolo- Image segmentation and identification of
Local features
gies ranging from florescence micros- objects of interest is followed by cell-graph Global features
copy to confocal microscopy. While construction and feature extraction.
Selected features are used to train machine
most of the work we report is based Step 2:
learning algorithms. The testing results Feature Extraction
on hematoxylin and eosin (H&E) are cross-validated, and hypotheses are
stained image analysis, the cell-graph verified by further experiments.
technique benefits from more sophis-

JA N UA RY 2 0 1 7 | VO L. 6 0 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 77
review articles

Figure 3. Geometric interpretation of changes in cell-graph features. probability function P(u,v) = ed(u,v)/(L)
where d(u, v) is the Euclidean distance,
and L is the largest Euclidean distance
between two nodes of the grid. The
model parameters α and β must be cho-
sen between 0 and 1. These parameters
affect the number of the links and the
connectivity of the graphs. Selecting
smaller values of these parameters re-
sults in a smaller number of links. Dif-
ferent probability functions such as
power law P(u,v) = d(u,v)–α can also be
used based on a different hypothesis.
Intuitively, the closer two cells are,
the more likely that they share a rela-
tionship. This probability quantifies
(a) (b) the possibility for one of these nodes
to be grown from the other, thus aim-
9 10 12 9 10 12
ing to model the prevalence of the
Compactness

Distribution
Clustering

disease state in a tissue.


Spatial
and

5
3
5
3
An edge (u,v) can be established de-
6 8 6 8
terministically if the distance d(u,v) is
2 1 4 7 2 1 4 7
less than a threshold (for example, two
(c) (d) cells are physically touching each other).
This edge function captures cell inter-
The clustering coefficient (CC) represents The diameter of a graph is the longest
how well the neighbors of a node shortest path distance. For example, the
action through adhesion. When the
are connected to each other. In the shortest path in (d) between nodes 5 and dataset is large, cross-validation tech-
corresponding graph (c) of tissue sample in 8 has the maximum length in this graph. niques can be used to identify the op-
(a), node 1 has 3 neighbors but only one pair Increase diameter implies that nodes are timal threshold that might signify cell-
of them are connected to each other. Thus separated from each other due to sparsity
the CC of node 1 is 1/3. This metric captures (fewer edges are established due to edge cell communication. In cases when the
cliquishness property of the underlying tissue threshold). The tissue sample in (b) has dataset is limited in size, heuristics
organization. The sample in (a) has three two regions mesenchmal and epithelial such as five times the average radius
regions with clumps of cells where the CC of (marked by “m” and “e”, respectively) where
each node is higher than the rest of the cells mesenchymal region has denser distribution
of a nucleus (for example, 20 microns)
(inside white dotted lines). of the cells. can be used.
Note that the presence of a link
Graph metrics confirm and quantify the structural organization tissue samples.
Courtesy: doi:10.1371/journal.pone.0032906.g006interest.
between nodes does not specify what
kind of relationship exists between
the nodes (cells); it simply indicates
ing eigenvalues of the Hessian matrix cific tissue type. For example, it may that a relationship of some sort is
of the image17,23 with two parameters of be more likely that physically adjacent hypothesized to exist, and that it is
interest: RB=λ1/λ2, and S=||∇2f|| where cells signal each other than the ones dependent on the distance between
S is the Frobenius matrix norm and is far away. Such distance-based interac- cells. Surprisingly, the distance mea-
used to differentiate objects of interest tion among the elements is well un- sure alone is sufficient to reveal im-
from the background, whereas RB is a derstood in physical systems based on portant, diagnostic structural differ-
measure to differentiate between blob- energy minimization. In the absence of ences in human tissues (see sidebars
like structures and ridge-like struc- any other similarity measure between a 1–3 in the online appendix).
tures. S will be low in background pix- pair of cells, one can adapt a simple Eu- Feature extraction. After construct-
els as the eigenvalues for pixels lacking clidean distance measure for defining ing the cell-graphs, the next step is to de-
contrast will be small. In high-contrast an edge between them. Therefore, we fine and extract graph features to train
regions however, at least one of the translate the pairwise spatial relation machine learning algorithms for clas-
eigenvalues will be high and S will be between every two nodes to the pos- sification of tissue functional states.
large.4 For further details, we refer the sible existence of links in a cell-graph. We consider two types of features to be
reader to several excellent survey pa- An edge (u,v) can be established used by classification algorithms: local
pers on this topic as cited above. probabilistically or deterministically features at the individual cell level, and
Establishing edges in a cell-graph. or a combination of these two meth- global features at the tissue level. Table
After determining the vertex set V, an ods. For example, in probabilistic 1 (in an online appendix accompanying
edge (u,v) between a pair of nodes u cell-graphs11 the probability of creat- this article in the ACM Digital Library
and v can be defined by making use of ing a link between any two nodes may dl.acm.org) summarizes the graph fea-
the biological insight and knowledge decay exponentially with the Euclidean tures to capture information from dif-
of the interaction of the cells in a spe- distance between them employing a ferent scales.6 By computing the distri-

78 COMM UNICATIO NS O F THE AC M | JA NUA RY 201 7 | VO L . 60 | NO. 1


review articles

bution of local features, one can obtain tive is the unbalanced class representa-
some of the global features. However, tion. For example, while there is abun-
some other global features, such as the dant data labeled as “cancer” class for
ratio of the size of the giant connected almost all the tissue types we studied,
component over the size of the entire
graph, can only be computed over the The cell-graph much less data was available labeled as
“healthy” class. Standard techniques
entire graph.
The spectrum of a graph, which is
technique such as under sampling and over sam-
pling of data are applied to cope with
the set of graph eigenvalues computed aims to learn this problem, and in addition each data-
from the adjacency matrix or from its
Laplacian, also provides global fea-
structure-function set is normalized and centered. In SVM
the test data is classified by determining
tures such as the spectral radius and relationship by the side of the hyperplane they lie on in
Eigen exponents. The eigenvalues of
the Laplacian relate to the graph in-
modeling structural the kernel-mapped space. The radial ba-
sis function (RBF) kernel, also referred
variants better than the eigenvalues of organization of to as the Gaussian kernel (that is, K (xi,
the adjacency matrix.8 For example, the
number of eigenvalues with a value of a tissue/organ xj) = exp (−||xi−xj ||2 / 2σ2)), is commonly
used as a kernel to map the data into
0 gives the number of connected com- sample using an infinite dimensional Hilbert space.
ponents in the graph. Moreover, as the
eigenvalues of the Laplacian lies in the graph theory. While there are some parameters in
SVM that can be fine-tuned to increase
range [0,2], it is easier to compare the learning accuracy, the default settings
spectra of graphs with different sizes. used in Matlab, LibSVM, and other
Feature selection and machine packages have shown to be sufficient.
learning. Feature selection helps to In order to extend SVM for the classifica-
overcome the problem of curse of di- tion of three classes, one can employ the
mensionality and may increase classi- one-against-one approach24 where three
fication accuracy. Note the importance two-class SVM classifiers are established
of graph features vary from one type for each pair of classes in the training da-
of tissue to another. For example, the taset. Each sample in the test data is as-
most important features for bone tis- signed to a class by these classifiers and
sue classification (using f-scores for the class with the majority vote is chosen
feature selection) are the number of as the final result. If there is equal voting
nodes (fs=1.685), giant connected ra- for the three classes, the class that has
tio (fs=1.094), number of central points the largest margin from the separating
(fs=1.607), and clustering coefficient hyperplane is chosen. The Bayesian clas-
(fs=1.069) (the next f-score value cor- sifier maximizes the posterior probabil-
responds to percentage of end points ity, which is a function of the likelihood
which is much smaller).4 Interestingly, and prior probability with the assump-
while the number of nodes (that is; tion that the data points are drawn from
cell density) is an important feature Gaussian distributions. The KNN (K-
for brain tissue analysis, it is not so for nearest neighborhood) classifies each
breast tissue.5 Some features such as data point to the class that is most com-
clustering coefficient and number of cen- mon among its K-nearest neighbors de-
tral points are important for bone tis- termined by a Euclidean distance-based
sue, as well as for breast and brain tis- difference. In this study, we test three
sue. Other influential features include values K=10, 11, and 12, and choose the
average effective eccentricity, number of values that achieve the highest grading
links, and average path length. While accuracy. Both Bayesian and KNN classi-
there is a tremendous amount of work fiers can readily handle multiclass clas-
on it (for an online repository see http:// sification problem.
featureselection.asu.edu/), feature se- We note that classification accu-
lection is based on heuristics and dif- racy of different machine learning
ferent methods yield different subsets. algorithms may vary on the same
The selected graph features are in- feature set. For example, for histo-
put to a classifier such as the Artificial pathological grading of follicular
Neural Networks, Bayesian Networks, lymphoma (FL) images into one of
or Support Vector Machines (SVM) for three grades, a comparison of three
learning and predicting the functional classifiers (SVM, Bayesian, and KNN)
state associated with the structure. The show different accuracy results34 (see
main challenge from learning perspec- Table 3 in the online appendix).

JA N UA RY 2 0 1 7 | VO L. 6 0 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 79
review articles

Finally, the cell-graph features can It is best to explain the meanings of ROCK inhibitor-treated tissues were
be used to design specialized kernels as graph features within an application further apart from each other and thus,
an alternative approach to RBF or poly- domain. Recently, the cell-graph tech- had fewer edges, or links, per unit area,
nomial kernels for graph classification nique was used to quantify changes in measurable as a decreased clustering
problems. Such kernel computation is the cellular dynamics of submandibu- coefficient. The average path length,
based on feature-vectors constructed lar gland (SMG) morphogenesis as a which measures the average shortest
from different global topological attri- function of ROCK1-mediated signaling path between two cells, increases with
butes, as well as global label features. in this process.6 The laboratory analy- ROCK inhibition and number of con-
The main idea24 is the graphs from the sis verified that the average diameter of nected components, which is the num-
same class should have similar topo- the SMG increased and that the thick- ber of cell-linked cell clusters, decreas-
logical and label attributes. A detailed ness decreased following inhibitor es. Less compact tissue should have a
comparison on real benchmark da- treatment, which is consistent with the smaller number of linked cells, an in-
tasets shows that our topological and overall decrease in cellular contractil- creased inter-cellular distance (that is,
label feature-based approach delivers ity (see Figure 3). Additionally, the total longer average path length) and, hence,
better or competitive classification ac- number of cells is also decreased with a lower number of connected compo-
curacy, and is also substantially faster inhibitor treatment. This implies the nents. Cell-graph features were thus
than other graph kernels.27 It is the overall compactness of the explant de- able to predict known ROCK inhibitor-
most effective method for large unla- creases both at the tissue and at the cel- induced global tissue changes.6
beled graphs. lular level with ROCK inhibitor-treat-
Feature interpretation. There are ment. The values for certain cell-graph Enhanced Cell-Graph Models
three types of cell-graph features: features captured this observation: the This section explores how to go beyond
cliqueness metrics (for example, clustering coefficient in the control simple cell-graphs without self-loops,
clustering coefficient), compactness tissues was greater than in the ROCK multiple edges, and attributes to more
metrics (for example, number of cen- inhibitor-treated tissues. The cluster- complex ones.
tral points), and distance metrics ing coefficient gives a measure of com- Hierarchical cell-graphs. So far
(for example, diameter). pactness of a tissue. That is, cells in the the cell-graphs discussed are used to
model diffusive structural organiza-
Figure 4. Hierarchical cell-graphs for breast tissue modeling. tions such as the ones found in brain
tissue samples. Other tissue types
such as breast or prostate exhibit more
complex structural organizations and
require enhancing the cell-graph ap-
proach further5 (see comparison re-
sults in Sidebar 2 in the online appen-
dix). For example, to model the lobular
structure of breast tissue, a 2-phase
cell-graph construction is proposed5
(see Figure 4). After cell segmentation,
first, connected subgraphs have been
built to capture the local structure of
lobular/glandular architecture so that
Each connected component of a hierarchical cell-graph corresponds
each connected component represents
to a lobular structure, which is modeled by using simple cell-graph approach.23 a lobular/glandular structure. A biolog-
Courtesy: Conf Proc IEEE Eng Med Biol Soc. 2007;2007:5311-4. ically meaningful hypothesis for this
step is that within a glandular structure
there is a high interaction among the
Figure 5. ECM aware cell-graphs. cells as a function of physical contact.
Second, the interactions among the
connected components are modeled
with the hypothesis that the likelihood
of interglandular interaction through
ECM may decrease as a function of the
spatial distance between them. These
two phases may admit different edge
functions. For example, intra-glandu-
lar edges can be assigned deterministi-
cally while interglandular edges would
Fractured bone tissue example where fracture cells exist in the middle of the original image and
stem cells repairing fracture are colored in blue. Courtesy: doi: 10.1007/s10618-009-0153-2
be defined probabilistically.5 The hier-
archical cell-graphs enable us to model
and test different biological hypoth-

80 COMM UNICATIO NS O F THE ACM | JA NUA RY 201 7 | VO L . 60 | NO. 1


review articles

esizes on the interaction of cells, and Figure 6. Stitched images of submandibular gland were segmented using the active
glands by changing the edge function. contour method to define epithelial (white) vs. mesenchymal tissue (black) in control (a)
ECM-aware cell-graphs. In a tissue and ROCK inhibitor-treated explants (d).
sample, there is more than one type of
cells of interest, including blood cells,
cancerous cells, normal cells, stem
A B C
cells repairing a damaged tissue (for ex-
ample, bone fracture), and so on. These
cells are not only distinguishable from
each other by their color and size, but
also carry valuable information about
the underlying functional state. ECM-
aware cell-graphs take advantage of the
heterogeneity of tissue samples to en- D E F
code more information. After image seg-
mentation, each cell is assigned a color-
code based on the ECM composition of
its surroundings. For each color code, a
dedicated cell-graph is constructed and
graph features are extracted. As a result,
multiple cell-graphs coexist for model-
ing the same tissue and their combined
feature set can be used for tissue classi- G H
fication. Figure 5 shows a tissue sample
from fractured bone and corresponding
cell-graphs constructed from segmenta-
tion, which illustrates that ECM-aware
cell-graphs result in high accuracy in
bone tissue classification problems4
(see Sidebar 3 online).
Similarly, Figure 6 captures the
spatial organization of epithelial, and
mesenchymal cells that coexist during
the branching morphogenesis of sub-
These masks were used to identify the epithelial nuclei (b, e) and mesenchymal nuclei (c, f).
mandibular gland.6
Using each nucleus as a vertex, cell-graphs were constructed for control and ROCK
Cell-graphs with multiple staining. inhibitor-treated tissues, respectively (g, h), where zoomed regions of cell graphs corresponding
With immunohistochemical staining to regions of the original images (shown as red boxes in a and d) are shown in detail.
becoming more widely used in digi- Epithelial tissue is represented by the blue graph and the mesenchymal tissue is represented
by the red graph. We discarded the sublingual tissues and only used the submandibular gland.
tal pathology practices, richer sets of Courtesy: doi: 10.1371/journal.pone.0032906.g002
biological information also become
available to construct cell-graphs that
are more realistic and relevant. These
staining techniques indicate the ex- Figure 7. Non-invasive breast tissue sample.
pression level of various proteins and
provide important information for
learning the underlying functional
state. For example, for the automated
grading of breast cancer in 3D tissue
sections, both the distribution and
expression level of a lateral cell-mem-
brane protein integrin α3 has been
used to hypothesize the interaction
between cell pairs.35 The cell-graph (a) (b) (c)
edges have been established based on
the integrin α3 densities between the Non-invasive breast tissue sample with two stains is shown in (a). Corresponding
nuclei pairs (see Figure 7). As the can- cell-graph based on 3D distance is shown in (b) and the cell-graph that considers
cer progresses, reduced expression of expression levels is shown in (c). Table 4 in appendix shows the improvement in
classification by capturing the underlying biology more effectively.
integrin α3 is observed corresponding
Courtesy: doi: 10.1109/ISBI.2013.6556431.
to the loss of interaction between the
cells which is an important feature for

JA N UA RY 2 0 1 7 | VO L. 6 0 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 81
review articles

grading. Table 4 in the online appendix are constructed. It is straightforward mors (epithelial, connective, and neu-
illustrates that grading accuracy in- to extend 2D Euclidean distance to 3D. ral) were studied in order to determine
creases by capturing this information. Consider two vertices: u = (xu, yu, zu) and which structural properties (captured by
Recently, a new technique called v = (xv, yv, zv) then the distance between graph metrics) dominate the differen-
hyperplexed immunofluorescence tech- them in 3D would be tiation.28 The cell-lines include MCF10A:
nology has allowed unlimited stains on Precancerous Human, Breast Epithelial;
a single specimen.19 Molecular stains √ (xu  xv)2 + (yu  yv)2 +(zu  zv)2 AU565: Human Breast Cancer HER2+/
are quantified in single cells and sub- ER-; MCF7: Human Breast Cancer HER2-/
cellular compartments, yielding un- Similarly, different distance metrics ER+; MDA- MB231: Human Breast Can-
paralleled insights into the biology of can be adapted such as the Lp distance: cer HER2+/ER2+; hDFB: Human Dermal
intact tissues. Each stain reflects a dif- Fibroblasts; NHA: Normal Human Astro-
3
ferent biological property—cell types, Lp = ( i=1
• |xi  yi|p)1/p cytes; U118MG: Human Glioblastoma;
signaling processes, and so on. As an NHOst: Normal Human Osteoblast; MG63:
extension of ECM-aware cell-graphs, The features defined on 2D cell- Human Osteosarcoma; RWPE-1: Non-
one can establish links based on spatial graphs can be used in the 3D case with tumorigenic Human Prostate; DU145:
proximities of pairs of cells using the additional computational demand. Human Prostate Carcinoma.28 Although
expression of each marker, thus build- 3D cell-graphs have been construct- there are limitations to in vitro studies,
ing one cell graph for each marker. We ed to model a 3D cellular environment the cell lines used in this study represent
expect the feature set of each cell-graph and quantify type I collagen remodel- a range of tissue types allowing one to di-
captures a different biological property. ing and fibrillogenesis with respect to rectly compare the structural profiles of
Figure 8 shows the cell-graphs for three mesenchymal stem cell organization various functional states through analy-
different stains. Statistical analysis over time.3 In that work, an initial re- sis of cell-graph metrics as follows.
shows that feature sets obtained from sult on how to integrate a physics- The structural features of underly-
these cell-graphs come from different based mechanical model30 with the ing tissue samples are calculated on
probability distributions (see Table 5 in cell-graph approach is also shown. each cell-graph using Gi = (Vi(t), Ei(t)),
the online appendix). The results are verified on multiple ex- where Vi(t) and Ei(t) represent the list of
3D tissue analysis with cell-graphs. periments and provide the first quan- vertices and nodes at time point t and i
3D confocal imaging techniques have titative support to the hypothesis that represents the index for the cell line. As
been a powerful tool for cell biologists continuity between extracellular and a result, time series of tissue evolution
and engineers providing 3D spatial intracellular environments is required can be represented by a 3rd order tensor
information regarding the location of for stem cell fate determination (see with the modes: features × time × cell-
specific structures within cells and tis- Figure 9 in the online appendix). line whose dimensions are I, J, and K,
sues. Some of the work discussed previ- respectively.30 An entry Tijk in this data
ously used 3D modeling.6,35 To capture Time Series of Tissue Evolution cube corresponds to the value of met-
the 3rd dimension, one needs to stich Up to this point, the discussion on cell- ric i at time point j for cell-line k where i
z-stack sequences with some overlap- graphs was confined to static histol- = 1,..., 20; j = 1,..., 6; and k = 1,..., 11.
ping. This process requires defining a ogy samples. Here, we are interested in Two common models in multi-way
depth parameter k and then ensuring modeling the evolution of time depen- data analysis are Tucker3 and Parallel
some overlap between stack i-1, stack dent cell and tissue growth. Factor Analysis (PARAFAC).1 A Tucker3
i, and stack i+1. For example, in Oztan Spatiotemporal cell differentiation. model with orthogonality constraints
et al.35 z-stack sequences of 8 slices Recently, in vitro (3D hydrogel models) on component matrices is a generaliza-
deep with 25% overlap with the preced- evolutions of 11 different cell-lines from tion of SVD from matrices to high-or-
ing and following z-stack sequences different tissues that develop solid tu- der datasets and is also called Higher-
Order Singular Value Decomposition
Figure 8. Cell-graphs overlaid on images of (a) E-Cadherin (b) Pan-Keratin (c) Keratin 15. (HOSVD)10 or multilinear SVD. Using
a Tucker3, a 3-way tensor T  RIxJxK is
modeled as follows:
R Q P
Tijk = ( r=1
•   q=1  • Gpqr AipBjqCkr = Eijk
• p=1

where P, Q and R indicate the number


of components extracted from first,
second and third mode (P≤I, Q≤J, and
R≤K), respectively. A  RI×P, B  RJ×Q,
(a) (b) (c) and C  RK×R are the component matri-
ces. G RP×Q×R is the core tensor and
The cell-graphs reveal distinct spatial patterns showing
E RI×J×K represents the error term.5
orthogonal information can be obtained. Three-mode tensor analysis in fea-
ture mode for outliers identified six
features such as; average degree; clus-

82 COMMUNICATIO NS O F TH E ACM | JA NUA RY 201 7 | VO L . 60 | NO. 1


review articles

tering coefficient; number of central iteration of the algorithm, the cells are
points; number of connected com- divided into two populations based on
ponents; standard deviation of edge their distance from the gland boundary,
lengths; and number of isolated points namely internal (I) and periphery (P).
that capture the compactness, cluster-
ing, and spatial uniformity of the 3D Data-driven Subsets I0 and P0 of I and P, respectively
are chosen to undergo a proliferation
architectural changes for each cell type
throughout the time course.28 Impor-
techniques based attempt. Cells in P0 that successfully un-
dergo mitosis create new cells (or verti-
tantly, four of these metrics are also the on cell-tracking are ces) V0 that are added to V.
discriminative features for our histopa-
thology data from the previous studies
computationally Cells with identical topology and
growth are permitted only at the gland
reviewed earlier. challenging at boundary, where a hypothesized “nu-
Dynamic cell-graphs. Spatiotem-
poral development of tissues/organs
the organ level. trient medium” provided by the mes-
enchyme is accessible. In the dynamic
requires modeling of cell-to-cell in- Cell-graphs cell-graph model, this similarity is en-
teractions over time and has proven
to be difficult. For example, while the provide a scalable forced via the local structural (graph)
properties of cell-graphs that maintain
branching processes in developing alternative by consistency in the topology of the SMG
organs (lungs, pancreas, kidneys, sali-
vary, and mammary glands) have been tracking the graph throughout the development stages.
When first created, potential daughter
studied in detail, we are still far from
comprehending the integrated proc-
properties instead vertices are placed outside the initial
gland boundary in a region within 20°
ess.9 Computational modeling of mor- of individual cells. of the surface normal from the parent
phogenesis starts with mathematical vertex at a minimum distance of one
models for understanding the funda- cell diameter, but less than the speci-
mental properties of cell clusters.14,41 fied maximum edge length. Some pa-
These theories were followed by con- rameter K of possible candidate daugh-
tinuum, physics-based models, which ter vertices satisfying these spatial and
considered a tissue to be composed of angular constraints are chosen, and the
cells and ECM and described the stress daughter vertex with the closest local
forces between these two structures.32,33 cell-graph features to the parent ver-
Such models found a wide area of ap- tex is selected as the optimal daughter
plications including modeling of epi- vertex. These local structural features
thelial morphogenesis in 3D breast cul- assess the spatial uniformity (cluster-
ture acini,38 as well as lung31 and kidney ing coefficient), connectedness (degree,
branching morphogenesis.39 However, closeness centrality, betweenness cen-
these models are data agnostic and fo- trality), and compactness (edge length
cus on optimization of the model pa- statistics) of the cell-graph. New edges,
rameters for the best outcome. E0, are also constructed based on the
Advanced imaging techniques pro- distances from the new cells to existing
vide a vast amount of image data that cells in G. Bud outgrowth is modeled by
motivates data-driven modeling ap- the annexation of new nodes into the
proaches. Data-driven techniques gland boundary.33 However, the main
based on cell-tracking (identifying the difficulty with this approach is to derive
same cell over different time points) are the smooth shape formation from the
computationally challenging at the or- cell-to-cell interactions as discussed in
gan level. The cell-graphs provide a scal- Dhulekar et al.13
able alternative by tracking the graph
properties instead of individual cells. Conclusion
For example, dynamic cell-graphs have This article explored various cell-graph
been constructed to model the growth constructions to model the informa-
and cleft formation in SMG branching tion encoded in the image of a complex
morphogenesis.45 The model takes the structure like the human brain or bone
initial gland morphology and nuclei tissue. The main assumption of the
locations from an initial image along cell-graph approach is that cells in a
with basic biological control param- tissue organize in a certain way to per-
eters such as epithelial growth factor form a specific function; thus, under-
(EGF) concentration as input. The EGF standing structure would predict the
concentration levels determine the mi- function or malfunction.
tosis and cleft deepening rates. At each Graph theory implementation pro-

JA N UA RY 2 0 1 7 | VO L. 6 0 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 83
review articles

vides a rich and rigorous set of fea- Knowledge and Data Engineering 21, 1 (2009), 6–20. 29. Meijering, E. Cell segmentation: 50 years down the
2. Albert, R., Schindewolf, T., Baumann, I. and road. IEEE Signal Processing Magazine 29, 5 (Sept.
tures that are almost an order of mag- Harms, H. Three-dimensional image processing 2012), 140–145.
nitude greater than prior methods, for morphometric analysis of epithelium sections. 30. Meineke, F., Potten, C. and Loeffler, M. Cell migration
Cytometry (1992); 13:759–765. and organization in the intestinal crypt using a lattice-
which were limited to a few features. 3. Bilgin, C.C. et al. Quantification of three-dimensional free model. Cell Proliferation 34 (2001), 253–266.
Furthermore, it facilitates new ker- cell-mediated collagen remodeling using graph theory. 31. Metzger, R., Klein, O., Martin, G. and Krasnow M. The
PloS One 5, 9 (2010), e12783. branching programme of mouse lung development.
nels and data mining algorithms such 4. Bilgin, C.C., Bullough, P., Plopper, G.E., and Yener, Nature 453 (2008), 745–750.
as subgraph mining. These features B. ECM-aware cell-graph mining for bone tissue 32. Murray, J.D. and Oster, G.F. Generation of biological
modeling and classification. Data Min. Knowl. Discov. pattern and form. Math Med Biol. 1 (1984), 51–75.
can then be used to train machine 20, 3 (May 2010), 416–438. 33. Oster, G.F., Murray, J.D., and Harris, A.K. Mechanical
learning algorithms for predicting 5. Bilgin, C., Demir, C., Nagi, C. and Yener, B. Cell-graph aspects of mesenchymal morphogenesis. J. Embryol
mining for breast tissue modeling and analysis. In Exp Morphol 78 (1983), 83–125.
the functional class label, given the Proc. of IEEE EMBC (2007). 34. Oztan, B., Kong, H., Gürcan, M.N. and Yener, B.
6. Bilgin, C.C., Ray, S., Baydil, B., Daley, W.P., Larsen, Follicular lymphoma grading using cell-graphs and
feature set for test data. We note that multi-scale feature analysis. SPIE Medical Imaging.
M. and Yener, B. Multiscale feature analysis
identifying graph metrics that help of salivary gland branching morphogenesis. International Society for Optics and Photonics,
PLoS One 7, 3 (2012). 831516–831516.
to predict long-term functionality by 7. Car, R. and Parrinello, M. Unified approach for 35. Oztan, B., Shubert, K.R., Bjornsson, C.S., Plopper,
linking engineered tissue structure to molecular dynamics and density-functional theory. G.E. and Yener, B. Biologically-driven cell-graphs for
Physical Review Letters 55, 22 (1985), 2471–2474. breast tissue grading. In Proceedings of IEEE 10th
function is an important step toward 8. Chung, F.R.K. Spectral graph theory. Conference International Symposium on Biomedical Imaging (Apr.
optimizing biomaterials for the pur- Board of the Mathematical Sciences, American 2013), 137–140.
Mathematical Society 92 (1997). Providence, RI. 36. Plopper, G., Larsen, M. and Yener, B. Image-enhanced
poses of regenerative medicine. 9. Davies J. Branching Morphogenesis. Systems Biology: A Multiscale, Multidimensional
Any interdisciplinary work re- Springer-Verlag, 2004. Approach to Modeling and Controlling Stem Cell
10. de Lathauwer, de Moor, L.B. and Vandewalle, J. A Function in Computational Biology of Embryonic
quires strong collaboration between multilinear singular value decomposition. SIAM J. Stem Cells. Ming Zhan, ed. Bentham Science
biomedical experts and computa- Matrix Analysis and Apps 21, 4 (2000), 1253–1278. Publishers, 2012, 71–87.
11. Demir C., Gultekin, S.H. and Yener, B. Augmented cell- 37. Raymond, E., Raphael, M., Grimaud, M., Vincent, L.,
tional scientists, given that interpret- graphs for automated cancer diagnosis. Bioinformatics Binet, J.L., Meyer, F. Germinal center analysis with
ability of the results is crucial. It is (Suppl 2) 21, (2005), ii7–ii12. the tools of mathematical morphology on graphs.
12. Demir C., Gultekin, S.H. and Yener, B. Learning the Cytometry 14 (1993), 848–861.
particularly important to understand topological properties of brain tumors. IEEE/ACM 38. Rejniak, K.A. An immersed boundary framework for
and relate the computational feature Trans. Computational Biology and Bioinformatics 2, 3 modeling the growth of individual cells: An application
(2005), 262–270. to early tumour development. J. Theor Biol 247, 1
space back to the original problem 13. Dhulekar, N., Oztan, B. and Yener B. Model coupling (2007), 186–204.
39. Srivathsan, A., Menshykau, D., Michos, O. and Iber, D.
domain to advance the knowledge for predicting a developmental patterning process. In
Dynamic image-based modelling of kidney branching
Proc. of SPIE 2016.
there. Some of the cell-graph features 14. Eden, M. A two-dimensional growth process. In 4th morphogenesis. Computational Methods in Systems
Biology, Lecture Notes in Computer Science 8130
are not intuitive while still useful for Berkeley Symposium on Mathematical Statistics and
(2013), 106–119. Springer, Berlin Heidelberg.
Probability (1961), 223–239.
classification and prediction (while 15. Einstein, A.J., Wu, H.S., Sanchez, M. and Gil, J. 40. Street W.N., Wolberg, W.H. and Mangasarian, O.L.
Fractal characterization of chromatin appearance for Nuclear feature extraction for breast tumor diagnosis.
they remain effective, interpretability IS&T/SPIE 1993 International Symposium on
diagnosis in breast cytology. Journal of Pathology 185,
of features poses a specific challenge 4 (1998), 366–381. Electronic Imaging: Science and Technology. San Jose,
16. Esgiar, A.N., Naguib, R.N.G, Sharif, B.S, Bennett, M.K, CA, 1905:861–870.
to the convoluted features obtained by 41. Turing A.M. The chemical basis of morphogenesis.
Murray, A. Fractal analysis in the detection of colonic
deep learning algorithms). cancer images. IEEE Trans. Information Technology in Philos Trans R Soc Lond B Biol Sci 237, 641 (1952).
Biomedicine 6, 1 (2002), 54–58. 37–72.
Finally, we note that a coupling of 17. Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, 42. Weyn, B. et al. Computer-assisted differential
continuum physics-based models with M.A. Multiscale vessel enhancement filtering. Lecture diagnosis of malignant mesothelioma based on
Notes in Computer Science (1998), 130–137. syntactic structure analysis. Cytometry (1999),
discrete data-driven models such as the 18. Galloway M.M. Texture analysis using gray level run 35:23–29.
cell-graphs may provide more accurate lengths. Computer Graphics and Image Processing. 43. Wiltgen M., Gerger, A. and Smolle, J. Tissue counter
(1975) 4:172-179. analysis of healthy common nevi and malignant
prediction as the complexity of the un- 19. Gerdes et. Al. Highly multiplexed single-cell analysis melanoma. Int J Med Inform. 69(1), 17-28, 2003.
derlying problem increases (for exam- of formalin-fixed, paraffin-embedded cancer tissue. 44. Xing, F. and Yang, L. Robust nucleus/cell detection
PNAS 110, 29 (2013), 11982–11987. and segmentation in digital pathology and
ple, organ morphogenesis). In Metzger 20. Gunduz, C., Yener, B., and Gultekin, S.H. The cell microscopy images: A comprehensive review. IEEE
Rev Biomed Eng. (Jan. 6, 2016).
et al.31 an initial attempt for such model graphs of cancer. Bioinformatics 20 (2004), i145–i151.
45. Yener, B., Dhulekar, N., Ray, S., Yuan, D., Oztan, B.,
21. Gurcan, M.N., Boucheron, L., Can, A., Madabhushi, A.,
combining is reported by replacing the Rajpoot, N. and Yener, B. Histopathological Image Baskaran, A. and Larsen, M. Prediction of growth
factor dependent cleft formation during branching
“springs”32 with weighted cell-graph Analysis: A Review. 2009.
morphogenesis using a dynamic graph-based
22. Haralick R.M. Statistical and structural approaches to
edges where weights are calculated di- texture. In Proc. of IEEE. 67, 5 (1979), 786–804. growth model. IEEE/ACM Trans. Computational
Biology and Bioinformatics.
rectly from images of collagen fibers. 23. Hladuvka, J., Konig, A. and Groller, E. Exploiting
eigenvalues of the Hessian matrix for volume
However, much work needs to be done decimation. In Proceeding of the 9th International Bülent Yener (yener@cs.rpi.edu) is a professor in the
in this direction since as reported,45 the Conference in Central Europe on Computer Graphics, Department of Computer Science and in the Department
Visualization, and Computer Vision (2001), 124–129. of Electrical, Computer and Systems Engineering at
cell-graphs are agnostic to the physical 24. Hsu, C. and Lin, C. A comparison of methods for Rensselaer Polytechnic Institute, Troy, NY. He is the
laws that govern the underlying struc- multiclass support vector machines. IEEE Trans. on founding director of Data Science Research Center at
Neural Networks 13, 2 (2002), 415–425. RPI as well as co-director of Pervasive Computing and
tural organization, and are sufficient to 25. Jaromczyk J.W. and Toussaint G.T. (1992). Relative Networking Center.
predict complex shape formation and neighborhood graphs and their relatives. In Proc. IEEE
80 (1992), 1502–1517.
need to be coupled with techniques 26. Keenan S.J., Diamond, J., McCluggage, W.G., Bharucha, Copyright held by owner/author.
such as the level set method. H. Thompson, D., Bartels, B.H. and Hamilton, P.W. An
automated machine vision system for the histological
grading of cervical intraepithelial neoplasia. J. Pathol.
Additional background information, literature, and figures 192, 3 (2000), 351–362.
appear in an online appendix available with this article in 27. Li, G., Semerci, M., Yener, B. and Zaki, M.J. Effective
the ACM Digital Library (http://dl.acm.org/citation.cfm?doi graph classification based on topological and label
d=2960404&picked=formats). attributes. ASA Data Science J. Statistical Analysis
and Data Mining 5, 4 (2012), 265–283. Watch the author discuss
28. McKeen-Polizzotti, L. et al. Quantitative metric his work in this exclusive
References profiles capture three-dimensional temporospatial Communications video.
1. Acar, E. and Yener, B. Unsupervised multiway data architecture to discriminate cellular functional states. http://cacm.acm.org/videos/
analysis: A literature survey. IEEE Transactions on BMC Medical Imaging 11.1 (2011), 1. cell-graphs

84 COMMUNICATIO NS O F TH E AC M | JA NUA RY 201 7 | VO L . 60 | NO. 1

You might also like