You are on page 1of 1

Integrated Gene Expression Probabilistic Models for Cancer Staging

Gil Alterovitz1,2, Andrew H. Xia,2,3, Jeremy Warner4 1MIT PRIMES, Cambridge, MA; 2Harvard Medical School, Boston, MA; 3The Rivers School, Weston, MA; 4Vanderbilt University, Nashville, TN

Abstract
The current system for classifying cancer pa4ents' stages was introduced more than one hundred years ago and many parts of the system are outdated. Because the current system emphasizes invasive surgical procedures that could have undesirable outcomes, there has been a movement to develop a new taxonomy using molecular signatures to avoid surgical tes4ng. This project explores the issues of the current classica4on system and poten4al ways to classify cancer pa4ents stages more eec4vely. Computeriza4on has made a vast amount of cancer data available online. However, a signicant por4on of the data is incomplete; some crucial informa4on is missing and therefore we explored the possibility of recovering missing cancer data. Using various methods, we have shown that cancer stages cannot be simply extrapolated with incomplete data. Furthermore, a new approach of using RNA sequencing data is studied. RNA sequencing can poten4ally become a cost-ecient way to determine a cancer pa4ents stage. We have obtained promising results of using RNA sequencing data in breast cancer staging.

Results With clinical data, there was evidence that the TCGA given clinical T, N, and M-staging may not yield the correct overall TNM cancer stage. Also, the staging data is not random, as methods 3-6 show no Kappa rela4onship. With missing data, part 2 of the project becomes more necessary. There has shown to be correla4on between cancer staging and RNA sequencing data of pa4ents. For example, the most signicant gene shown, re4noblastoma binding protein 8 (RBBP8) has been proven to aect breast cancer development. Other genes may poten4ally have a cause-eect rela4onship pending further research.

References

Methods
There were two parts to this project. The rst part involved looking at clinical cancer data from The Cancer Genome Atlas (TCGA) and analyzing it with data tree func4ons. The second part of the project involved looking at RNA sequencing data and comparing it to the clinical data of TCGA and looking for correla4on. In the rst part of the project, the pa4ents T, N, and M cancer stages, as recorded in TCGA, were entered into a data tree with output TNM stage per AJCC standard staging. These calculated stages were compared against the overall TNM stage as recorded in TCGA. Then, ve dierent methods of imputed stage genera4on were evaluated against the calculated stages: 1) equal assignment of stages (25% I/II/III/IV); 2) assignment to the most common na4onal stage; 3) assignment to the most common TCGA stage; 4) assignment by na4onal distribu4on of stages; 5) assignment by TCGA distribu4on of stages. In the second part of the project, pa4ents with RNA sequencing data were linked up to their clinical data from the rst part of the project. With clinical stages already determined from the rst part, the pa4ents RNA sequencing data was analyzed, in order to nd any correla4on between certain genes and cancer staging. A T test was conducted, involving the many types of RNA sequencing data (raw counts normalized, raw counts scaled es4mate, raw counts of genes).

1. The Cancer Genome Atlas Data Portal. 2012. 20 July 2012. <hhps://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp>. 2. CLC Bio: User Manual. n.d. 20 July 2012. 3. Na4onal Research Council of the Na4onal Academies. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. Washington, DC: The Na4onal Academies Press, 2011. 20 4. Edge, S.B., et al. AJCC Cancer Staging Manual 7th EdiIon. New York: Springer, 2009. 5. Joseph G. Ibrahim, Haitao Chu, and Ming-Hui Chen. "Missing Data in Clinical Studies: Issues and Methods." Journal of Clinical Oncology (2012): 3297-3303. 6. Mardis, Elaine R. "Next-Genera4on DNA Sequencing Methods." Annual Reviews 9 (2008): 387-402. 7. How-to/RNASeq analysis. 18 August 2012. Website. 20 August 2012. <hhp://seqanswers.com/wiki/How-to/RNASeq_analysishhp://seqanswers.com/wiki/How-to/ RNASeq_analysis>. 8. Cai G, Li H, Lu Y, Huang X, Lee J, Mller P, Ji Y, Liang S. "Accuracy of RNA-seq and its dependence on Sequencing depth." BioinformaIcs (2012). <hhp://www.rna- seqblog.com/data-analysis/expression-tools/accuracy-of-rna-seq-and-its-dependence-on-sequencing-depth/>. 9. Kate D. Sutherland1, Jane E. Visvader1, David Y.H. Choong2, Eleanor Y.M. Sum1, Georey J. Lindeman1, Ian G. Campbell2,,*. "Muta4onal analysis of the LMO4 gene, encoding a BRCA1-interac4ng protein, in breast carcinomas." InternaIonal Journal on Cancer (2003, Volume 1, Issue 107): 155-158. 10. Mahhew Meyerson, Stacey Gabriel and Gad Getz. "Advances in understanding cancer genomes through second-genera4on sequencing." Nature Reviews 11 (October 2010): 685-696. 11. Candes, Emmanuel J and Benjamin Recht. "Exact Matrix Comple4on via Convex Op4miza4on." CommunicaIons of the ACM 55.6 (2012): 111-119.

Conclusion -By crea4ng a data tree func4on to analyze cancer pa4ents staging informa4on and comparing it to the TCGA staging informa4on many conicts were discovered, sugges4ng that the data may not be completely accurate. -The method of using RNA Sequencing data to analyze cancer pa4ents staging informa4on has proven to be eec4ve -Further research in this area may reveal stage-specic paherns of gene expression, which could allow for less invasive cancer staging.

Acknowledgements Thank you very much to Andrew Xias MIT PRIMES mentors, Slava Gerovitch, and Pavel E4ngof.
]

You might also like