Professional Documents
Culture Documents
computers
Supervised Learning
Supervised learning starts with the goal of predicting a known
output or target. In machine learning competitions, where individual participants are judged on their performance on common data sets, recurrent supervised learning problems include
handwriting recognition (such as recognizing handwritten digits), classifying images of objects (eg, is this a cat or a dog?),
and document classification (eg, is this a clinical trial about
heart failure or a financial report?). Notably, these are all tasks
prognosis
risk factors
statistics
Unsupervised Learning
In contrast, in unsupervised learning, there are no outputs
to predict. Instead, we are trying to find naturally occurring
From Cardiovascular Research Institute, Department of Medicine and Institute for Human Genetics, University of California, San Francisco, and
California Institute for Quantitative Biosciences, San Francisco.
Correspondence to Rahul C. Deo, MD, PhD, Smith Cardiovascular Research Building, 555 Mission Bay Blvd South, Rm 452S, San Francisco, CA
94158. E-mail: rahul.deo@ucsf.edu
(Circulation. 2015;132:1920-1930. DOI: 10.1161/CIRCULATIONAHA.115.001593.)
2015 American Heart Association, Inc.
Circulation is available at http://circ.ahajournals.org
DOI: 10.1161/CIRCULATIONAHA.115.001593
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
Patient
Feature
1
2
3
4
5
6
7
8
9
A B CD E
EF G
Outcome
MI
No MI
no
yes
no
Do they have
rheumatoid arthritis?
Is the LDL>160?
yes
no
Feature B
Hidden
Output
Input
Nodes
Node
Nodes
(Features) (Transformed
Features)
Feature A
Figure 1. Machine learning overview. A, Matrix representation of the supervised and unsupervised learning problem. We are interested in
developing a model for predicting myocardial infarction (MI). For training data, we have patients, each characterized by an outcome (positive or negative training examples), denoted by the circle in the right-hand column, and by values of predictive features, as well, denoted
by blue to red coloring of squares. We seek to build a model to predict outcome by using some combination of features. Multiple types
of functions can be used for mapping features to outcome (B through D). Machine learning algorithms are used to find optimal values of
free parameters in the model to minimize training error as judged by the difference between predicted values from our model and actual
values. In the unsupervised learning problem, we are ignoring the outcome column and grouping together patients based on similarities
in the values of their features. B, Decision trees map features to outcome. At each node or branch point, training examples are partitioned
based on the value of a particular feature. Additional branches are introduced with the goal of completely separating positive and negative training examples. C, Neural networks predict outcome based on transformed representations of features. A hidden layer of nodes
integrates the value of multiple input nodes (raw features) to derive transformed features. The output node then uses values of these
transformed features in a model to predict outcome. D, The k-nearest neighbor algorithm assigns class based on the values of the most
similar training examples. The distance between patients is computed based on comparing multidimensional vectors of feature values. In
this case, where there are only 2 features, if we consider the outcome class of the 3 nearest neighbors, the unknown data instance would
be assigned a no MI class. LDL indicates low-density lipoprotein; and MI, myocardial infarction.
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
all characterized by a rich set of (sufficiently) informative features, that has limited the contribution of machine learning
to complex tasks of classification and prediction in clinical
medicine.
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
above and beyond all established clinical and molecular factors (Figure2D).
The C-Path experience was instructive for several reasons.
Perhaps the most important lesson was that novel learned
features were essential to improved performance; one could
not simply dress up established features in a new algorithmic packaging and expect superior classification. Moreover,
many of the predictive features learned by C-Path were
entirely novel despite decades of examination of breast cancer
slides by pathologists. Thus, one of the main contributions of
machine learning is to take an unbiased approach to identify
unexpected informative variables. The second lesson to be
learned is that the final algorithm used for classification, a regularized form of logistic regression called lasso,22 was actually quite simple but still generated excellent results. Simple
algorithms can perform just as well as more complex ones in 2
circumstances: when the underlying relationship between features and output is simple (eg, additive) or when the number
of training examples is low, and thus more complex models
are likely to overfit and generalize poorly. If one truly needs
the benefits of more complex models, such as those capturing
high-dimensional interactions, one should focus on amassing
sufficient and diverse training data to have any hope of building an effective classifier. Finally, the C-Path authors found
that the success of their model crucially depended on being
able to first differentiate epithelium and stroma. Because it is
unlikely that a machine would arrive at the need for this step
on its own, this highlights the need for domain-specific human
expertise to guide the learning process.
Although analysis of pathology samples plays a limited
role in clinical cardiology, one can imagine extrapolating
this approach of data-driven feature extraction to other
information-rich types, such as cardiac MRI images or
electrograms.
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
Figure 2. Overview of the C-Path image-processing pipeline and prognostic model building procedure. A, Basic image processing and
feature construction. B, Building an epithelialstromal classifier. The classifier takes as input
a set of breast cancer microscopic images that
have undergone basic image processing and
feature construction and that have had a subset
of superpixels hand-labeled by a pathologist as
epithelium (red) or stroma (green). The superpixel
labels and feature measurements are used as
input to a supervised learning algorithm to build
an epithelial-stromal classifier. The classifier is
then applied to new images to classify superpixels as epithelium or stroma. C, Constructing
higher-level contextual/relational features. After
application of the epithelial stromal classifier, all
image objects are subclassified and colored on
the basis of their tissue region and basic cellular
morphological properties (left). After the classification of each image object, a rich feature
set is constructed. D, Learning an image-based
model to predict survival. Processed images from
patients alive at 5 years after surgery and from
patients deceased at 5 years after surgery were
used to construct an image-based prognostic
model. After construction of the model, it was
applied to a test set of breast cancer images (not
used in model building) to classify patients as
high or low risk of death by 5 years. Reprinted
from Beck et al2 with permission of the publisher.
Copyright 2011, American Association for the
Advancement of Science.
identified through applying unsupervised learning to completely unrelated cancers. The authors had previously developed an algorithm known as attractor metagenes28 that
identified clusters of genes that shared similarity across multiple tumor samples. Many of these clusters happened to correspond to biological processes essential for cancer progression
such as chromosomal instability and mesenchymal transition.
The authors incorporated the presence or absence of these features along with other clinical variables into various predictive
models for breast cancer outcomes. Because different learning
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
Discussion
Based on these examples, it is obvious that machine learning,
both supervised and unsupervised, can be applied to clinical
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
Figure 3. Schematic of model development for breast cancer risk prediction. Shown are block diagrams that describe the development
stages for the final ensemble prognostic model. Building a prognostic model involves derivation of relevant features, training submodels,
and making predictions, and combining predictions from each submodel. The model derived the attractor metagenes using gene expression data, combined them with the clinical information through Cox regression, gradient boosting machine, and k-nearest neighbor techniques, and eventually blended each submodels prediction. AIC indicates Akaike information criterion; GBM, gradient boosting machine;
and KNN, k-nearest neighbors. Reprinted from Cheng et al27 with permission of the publisher. Copyright 2013, American Association
for the Advancement of Science.
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
51000
51250
51500
51750
52000
2
1.00
Survival free of CV
hospitalization or death
4
6
Number of clusters
Log-rank P<0.0001
0.75
0.50
0.25
Pheno-group #1
Pheno-group #2
Pheno-group #3
0.00
0
Number at risk
Pheno-group #1 122
Pheno-group #2 133
Pheno-group #3 142
10
20
30
40
57
42
29
31
24
12
6
6
3
Figure 4. Application of unsupervised learning to HFpEF. A, Phenotype heat map of HFpEF. Columns represent individual study participants; rows represent individual features. B, Bayesian information criterion analysis for the identification of the optimal number of
phenotypic clusters (pheno groups). C, Survival free of cardiovascular (CV) hospitalization or death stratified by phenotypic cluster.
KaplanMeier curves for the combined outcome of heart failure hospitalization, cardiovascular hospitalization, or death stratified by phenotypic cluster. HFpEF indicates heart failure with preserved ejection fraction.
data sets for the purpose of developing robust risk models and
redefining patient classes. This is unsurprising, because problems across a broad range of fields, from finance to astronomy
to biology,13 can be readily reduced to the task of predicting
outcome from diverse features or finding recurring patterns
within multidimensional data sets. Medicine should not be
an exception. However, given the limited clinical footprint of
machine learning, some obstacles must be standing in the way
of translation.
Some of these obstacles may relate to pragmatic issues
relevant to the medical industry, including reimbursement and
liability. For example, our health system is reluctant to completely entrust a machine with a task that a human can do at
higher accuracy, even if there is substantial cost savings. For
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
To be in a position to extract novel features, we must somehow find the appetite to collect large amounts of unbiased data
on many thousands of individuals without knowing that such
an effort will actually be useful. And it will not be enough to
collect such data on the training cohort alone. As the RSF experience demonstrated, it is essential that the same informative
features in any promising model be collected on multiple independent cohorts for them to serve as test sets. Unfortunately,
such biologically informative features are likely to be costly
to acquire (unlike the tens of thousands of digital snapshots of
cats used as training data in image-processing applications39).
The final lesson is a technical one, related to the interplay
of unsupervised and supervised forms of learning. Deep learning, with stacked layers of increasingly higher order representations of objects, has taken the machine learning world
by storm.40 Deep learning uses unsupervised learning to first
find robust features, which can then be refined and ultimately
used as predictors in a final supervised model. Our work35 and
that involving attractor metagenes27 both suggest that such
techniques might be useful for patient data. In a deep learning
representation of human disease, lower layers could represent
clinical measurements (such as ECG data or protein biomarkers), intermediate layers could represent aberrant pathways
(which may simultaneously impact many biomarkers), and
top layers could represent disease subclasses (which arise
from the variable contributions of 1 aberrant pathways).
Ideally, such subclasses would do more than stratify by risk
and would actually reflect the dominant disease mechanism(s).
This raises a question about the underlying pathophysiologic
basis of complex disease in any given individual: is it sparsely
encoded in a limited set of aberrant pathways, which could
be recovered by an unsupervised learning process (albeit with
the right features collected and a large enough sample size),
or is it a diffuse, multifactorial process with hundreds of small
determinants combining in a highly variable way in different
individuals? In the latter case, the concept of precision medicine is unlikely to be of much utility. However, in the former
situation, unsupervised and perhaps deep learning might actually realize the elusive goal of reclassifying patients according
to more homogenous subgroups, with shared pathophysiology, and the potential of shared response to therapy.
Acknowledgments
I thank my clinical and scientific colleagues at UCSF and Dr Sanjiv
Shah for helpful discussions.
Sources of Funding
This work is funded by National Institutes of Health/National Heart,
Lung, and Blood Institute grants K08 HL093861, DP2 HL123228,
and U01 HL107440.
Disclosures
None.
References
1. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning.
New York, NY: Springer Science & Business Media; 2009.
2. Abu-Mostafa YS, Magdon-Ismail M, Lin HT. Learning From Data.
AMLbook.com; 2012.
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
23. Koren Y. The BellKor solution to the Netflix Grand Prize. Netflix Prize
Documentation. 2009.
24. Tscher A, Jahrer M, Bell RM. The bigchaos solution to the netflix grand
prize. Netflix Prize Documentation. 2009.
25. Pea-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y,
Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G,
Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu
J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D,
Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman
J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D,
Hughes TR, Roth FP. A critical assessment of Mus musculus gene function
prediction using integrated genomic evidence. Genome Biol. 2008;9(suppl
1):S2. doi: 10.1186/gb-2008-9-s1-s2.
26. Margolin AA, Bilal E, Huang E, Norman TC, Ottestad L, Mecham BH,
Sauerwine B, Kellen MR, Mangravite LM, Furia MD, Vollan HK, Rueda
OM, Guinney J, Deflaux NA, Hoff B, Schildwachter X, Russnes HG,
Park D, Vang VO, Pirtle T, Youseff L, Citro C, Curtis C, Kristensen VN,
Hellerstein J, Friend SH, Stolovitzky G, Aparicio S, Caldas C, BrresenDale AL. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci Transl Med. 2013;5:181re1.
doi: 10.1126/scitranslmed.3006112.
27. Cheng WY, Ou Yang TH, Anastassiou D. Development of a prognostic
model for breast cancer survival in an open challenge environment. Sci
Transl Med. 2013;5:181ra50. doi: 10.1126/scitranslmed.3005974.
28. Cheng WY, Ou Yang TH, Anastassiou D. Biomolecular events in cancer
revealed by attractor metagenes. PLoS Comput Biol. 2013;9:e1002920.
doi: 10.1371/journal.pcbi.1002920.
29. Udelson JE. Heart failure with preserved ejection fraction. Circulation.
2011;124:e540e543. doi: 10.1161/CIRCULATIONAHA.111.071696.
30. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix
factorization. Nature. 1999;401:788791. doi: 10.1038/44565.
31. Kaufman L, Rousseeuw P. Clustering by means of medoids. Reports of the
Faculty of Mathematics and Informatics. 1987, Issue 87, Part 3. Delft, The
Netherlands: Delft University of Technology, Faculty of Mathematics and
Informatics.
32. Olshausen BA, Field DJ. Sparse coding with an overcomplete basis set: a
strategy employed by V1? Vision Res. 1997;37:33113325.
33. Lee H, Battle A, Raina R, Ng A. Efcient sparse coding algorithms.
Advances in Neural Information Processing Systems 19. 2006.
34. Yogatama D, Faruqui M, Dyer C, Smith NA. Learning word representations with hierarchical sparse coding. CORD Conference Proceedings.
2014.
35. Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M,
Bonow RO, Huang CC, Deo RC. Phenomapping for novel classification of
heart failure with preserved ejection fraction. Circulation. 2015;131:269
279. doi: 10.1161/CIRCULATIONAHA.114.010637.
36. Pitt B, Pfeffer MA, Assmann SF, Boineau R, Anand IS, Claggett B,
Clausell N, Desai AS, Diaz R, Fleg JL, Gordeev I, Harty B, Heitner JF,
Kenwood CT, Lewis EF, OMeara E, Probstfield JL, Shaburishvili T,
Shah SJ, Solomon SD, Sweitzer NK, Yang S, McKinlay SM; TOPCAT
Investigators. Spironolactone for heart failure with preserved ejection fraction. N Engl J Med. 2014;370:13831392. doi: 10.1056/
NEJMoa1313731.
37. Ausiello D, Shaw S. Quantitative human phenotyping: the next frontier in
medicine. Trans Am Clin Climatol Assoc. 2014;125:219226; discussion 226.
38. James ML, Gambhir SS. A molecular imaging primer: modalities, imaging agents, and applications. Physiol Rev. 2012;92:897965. doi: 10.1152/
physrev.00049.2010.
39. Le QV. Building high-level features using large scale unsupervised learning. ICASSP 2013 - 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). 2013;85958598.
40. Bengio Y. Learning deep architectures for AI. FONT Machine Learning.
2009;2:1127.
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015
The online version of this article, along with updated information and services, is located on the
World Wide Web at:
http://circ.ahajournals.org/content/132/20/1920
Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published
in Circulation can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial
Office. Once the online version of the published article for which permission is being requested is located,
click Request Permissions in the middle column of the Web page under Services. Further information about
this process is available in the Permissions and Rights Question and Answer document.
Reprints: Information about reprints can be found online at:
http://www.lww.com/reprints
Subscriptions: Information about subscribing to Circulation is online at:
http://circ.ahajournals.org//subscriptions/
Downloaded from http://circ.ahajournals.org/ at CONS CALIFORNIA DIG LIB on November 20, 2015