Professional Documents
Culture Documents
Gil
Alterovitz1,2,
Andrew
H.
Xia,2,3,
Jeremy
Warner4
1MIT
PRIMES,
Cambridge,
MA;
2Harvard
Medical
School,
Boston,
MA;
3The
Rivers
School,
Weston,
MA;
4Vanderbilt
University,
Nashville,
TN
Abstract
The
current
system
for
classifying
cancer
pa4ents'
stages
was
introduced
more
than
one
hundred
years
ago
and
many
parts
of
the
system
are
outdated.
Because
the
current
system
emphasizes
invasive
surgical
procedures
that
could
have
undesirable
outcomes,
there
has
been
a
movement
to
develop
a
new
taxonomy
using
molecular
signatures
to
avoid
surgical
tes4ng.
This
project
explores
the
issues
of
the
current
classica4on
system
and
poten4al
ways
to
classify
cancer
pa4ents
stages
more
eec4vely.
Computeriza4on
has
made
a
vast
amount
of
cancer
data
available
online.
However,
a
signicant
por4on
of
the
data
is
incomplete;
some
crucial
informa4on
is
missing
and
therefore
we
explored
the
possibility
of
recovering
missing
cancer
data.
Using
various
methods,
we
have
shown
that
cancer
stages
cannot
be
simply
extrapolated
with
incomplete
data.
Furthermore,
a
new
approach
of
using
RNA
sequencing
data
is
studied.
RNA
sequencing
can
poten4ally
become
a
cost-ecient
way
to
determine
a
cancer
pa4ents
stage.
We
have
obtained
promising
results
of
using
RNA
sequencing
data
in
breast
cancer
staging.
Results
With
clinical
data,
there
was
evidence
that
the
TCGA
given
clinical
T,
N,
and
M-staging
may
not
yield
the
correct
overall
TNM
cancer
stage.
Also,
the
staging
data
is
not
random,
as
methods
3-6
show
no
Kappa
rela4onship.
With
missing
data,
part
2
of
the
project
becomes
more
necessary.
There
has
shown
to
be
correla4on
between
cancer
staging
and
RNA
sequencing
data
of
pa4ents.
For
example,
the
most
signicant
gene
shown,
re4noblastoma
binding
protein
8
(RBBP8)
has
been
proven
to
aect
breast
cancer
development.
Other
genes
may
poten4ally
have
a
cause-eect
rela4onship
pending
further
research.
References
Methods
There
were
two
parts
to
this
project.
The
rst
part
involved
looking
at
clinical
cancer
data
from
The
Cancer
Genome
Atlas
(TCGA)
and
analyzing
it
with
data
tree
func4ons.
The
second
part
of
the
project
involved
looking
at
RNA
sequencing
data
and
comparing
it
to
the
clinical
data
of
TCGA
and
looking
for
correla4on.
In
the
rst
part
of
the
project,
the
pa4ents
T,
N,
and
M
cancer
stages,
as
recorded
in
TCGA,
were
entered
into
a
data
tree
with
output
TNM
stage
per
AJCC
standard
staging.
These
calculated
stages
were
compared
against
the
overall
TNM
stage
as
recorded
in
TCGA.
Then,
ve
dierent
methods
of
imputed
stage
genera4on
were
evaluated
against
the
calculated
stages:
1)
equal
assignment
of
stages
(25%
I/II/III/IV);
2)
assignment
to
the
most
common
na4onal
stage;
3)
assignment
to
the
most
common
TCGA
stage;
4)
assignment
by
na4onal
distribu4on
of
stages;
5)
assignment
by
TCGA
distribu4on
of
stages.
In
the
second
part
of
the
project,
pa4ents
with
RNA
sequencing
data
were
linked
up
to
their
clinical
data
from
the
rst
part
of
the
project.
With
clinical
stages
already
determined
from
the
rst
part,
the
pa4ents
RNA
sequencing
data
was
analyzed,
in
order
to
nd
any
correla4on
between
certain
genes
and
cancer
staging.
A
T
test
was
conducted,
involving
the
many
types
of
RNA
sequencing
data
(raw
counts
normalized,
raw
counts
scaled
es4mate,
raw
counts
of
genes).
1. The Cancer Genome Atlas Data Portal. 2012. 20 July 2012. <hhps://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp>. 2. CLC Bio: User Manual. n.d. 20 July 2012. 3. Na4onal Research Council of the Na4onal Academies. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. Washington, DC: The Na4onal Academies Press, 2011. 20 4. Edge, S.B., et al. AJCC Cancer Staging Manual 7th EdiIon. New York: Springer, 2009. 5. Joseph G. Ibrahim, Haitao Chu, and Ming-Hui Chen. "Missing Data in Clinical Studies: Issues and Methods." Journal of Clinical Oncology (2012): 3297-3303. 6. Mardis, Elaine R. "Next-Genera4on DNA Sequencing Methods." Annual Reviews 9 (2008): 387-402. 7. How-to/RNASeq analysis. 18 August 2012. Website. 20 August 2012. <hhp://seqanswers.com/wiki/How-to/RNASeq_analysishhp://seqanswers.com/wiki/How-to/ RNASeq_analysis>. 8. Cai G, Li H, Lu Y, Huang X, Lee J, Mller P, Ji Y, Liang S. "Accuracy of RNA-seq and its dependence on Sequencing depth." BioinformaIcs (2012). <hhp://www.rna- seqblog.com/data-analysis/expression-tools/accuracy-of-rna-seq-and-its-dependence-on-sequencing-depth/>. 9. Kate D. Sutherland1, Jane E. Visvader1, David Y.H. Choong2, Eleanor Y.M. Sum1, Georey J. Lindeman1, Ian G. Campbell2,,*. "Muta4onal analysis of the LMO4 gene, encoding a BRCA1-interac4ng protein, in breast carcinomas." InternaIonal Journal on Cancer (2003, Volume 1, Issue 107): 155-158. 10. Mahhew Meyerson, Stacey Gabriel and Gad Getz. "Advances in understanding cancer genomes through second-genera4on sequencing." Nature Reviews 11 (October 2010): 685-696. 11. Candes, Emmanuel J and Benjamin Recht. "Exact Matrix Comple4on via Convex Op4miza4on." CommunicaIons of the ACM 55.6 (2012): 111-119.
Conclusion -By crea4ng a data tree func4on to analyze cancer pa4ents staging informa4on and comparing it to the TCGA staging informa4on many conicts were discovered, sugges4ng that the data may not be completely accurate. -The method of using RNA Sequencing data to analyze cancer pa4ents staging informa4on has proven to be eec4ve -Further research in this area may reveal stage-specic paherns of gene expression, which could allow for less invasive cancer staging.
Acknowledgements
Thank
you
very
much
to
Andrew
Xias
MIT
PRIMES
mentors,
Slava
Gerovitch,
and
Pavel
E4ngof.
]