The aim of these analysis notes is to give a basic introduction to PAM50 intrinsic subtypes and risk of reccurence score1 using a published breast cancer dataset. These notes are generated using an R code bundle available at github, and the aim of this bundle is also to show how one can use the ProjectTemplate framework for a data analysis project.
In these notes, we use the Mainz cohort2 of primary breast cancer patients. It is a population based cohort. Lymph node negative disease, no adjuvant therapy.
Gene-expression and clinical-pathological data was retrieved from the Gene Expression Omnibus (GEO) at NCBI (accession GSE11121) using the R/Bioconductor package GEOquery. Additionally, the oestrogen receptor status was retrieve from the R/Bioconductor data package breastCancerMAINZ.
Tumour size (cm) | Nodal status | ER status | Tumour grade | Intrinsic subtype | |
---|---|---|---|---|---|
Min. :0.1 | LN-:200 | ER-: 38 | G1: 29 | LA :54 | |
1st Qu.:1.5 | LN+: 0 | ER+:162 | G2:136 | LB :42 | |
Median :2.0 | G3: 35 | H2 :32 | |||
Mean :2.1 | BL :35 | ||||
3rd Qu.:2.4 | NBL:37 | ||||
Max. :6.0 |
t-distributed stochastic neighbour embedding (t-SNE) plots.
Since the Mainz dataset is a cohort of patients not receiving systemic therapy after surgery, the associations with outcome we observe are pure prognostic.3 A biomarker can of course also be both prognostic and therapy predictive. An example is HER2 status.
The association between the PAM50 proliferation index and outcome is illustrated using exploratory plots of excess distant metastases. A smoother with exploratory confidence band is superimposed in the scatterplot and the contributions from individual patients are shown with circles. The shape of the smoother indicates the form of an association between the index and risk of distant metastasis. Mathematically, the excess distant metastases are martingale residuals in a null Cox model. Corresponding plots for PAM50 intrinsic subtypes are added for comparison.
See, for example, section “Prognostic Signatures Within Intrinsic Subtypes” of the review by Ades et al.4
One should judge a candidate biomarker by its ability to improve prognostic/predictive accuracy beyond known prognosicators/predictors.5
Initial exploratory plots:
Formal statistical inference, compare with, for example, Dowsett et al.6:
Population | Comparison | Chisq | Df | P(>|Chi|) |
---|---|---|---|---|
ER+ | NPI+RORS vs NPI | 7.659 | 1 | 0.006 |
ER+ | NPI+RORS vs RORS | 1.748 | 1 | 0.186 |
ER+/not H2 | NPI+RORS vs NPI | 6.090 | 1 | 0.014 |
ER+/not H2 | NPI+RORS vs RORS | 4.781 | 1 | 0.029 |
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Rtsne_0.13 survplot_0.0.7 survival_2.42-6
## [4] ggplot2_3.0.0 Heatplus_2.26.0 genefilter_1.62.0
## [7] hgu133a.db_3.2.3 org.Hs.eg.db_3.6.0 AnnotationDbi_1.42.1
## [10] IRanges_2.14.10 S4Vectors_0.18.3 GEOquery_2.48.0
## [13] Biobase_2.40.0 BiocGenerics_0.26.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2
## [4] rprojroot_1.3-2 tools_3.5.1 backports_1.1.2
## [7] R6_2.2.2 rpart_4.1-13 KernSmooth_2.23-15
## [10] Hmisc_4.1-1 DBI_1.0.0 lazyeval_0.2.1
## [13] colorspace_1.3-2 nnet_7.3-12 withr_2.1.2
## [16] tidyselect_0.2.4 gridExtra_2.3 bit_1.1-14
## [19] compiler_3.5.1 htmlTable_1.12 xml2_1.2.0
## [22] ProjectTemplate_0.8.2 labeling_0.3 caTools_1.17.1.1
## [25] scales_0.5.0 checkmate_1.8.5 readr_1.1.1
## [28] stringr_1.3.1 digest_0.6.15 foreign_0.8-71
## [31] rmarkdown_1.10 base64enc_0.1-3 pkgconfig_2.0.1
## [34] htmltools_0.3.6 limma_3.36.2 highr_0.7
## [37] htmlwidgets_1.2 rlang_0.2.1 rstudioapi_0.7
## [40] RSQLite_2.1.1 bindr_0.1.1 gtools_3.8.1
## [43] acepack_1.4.1 dplyr_0.7.6 RCurl_1.95-4.11
## [46] magrittr_1.5 Formula_1.2-3 Matrix_1.2-14
## [49] Rcpp_0.12.17 munsell_0.5.0 stringi_1.2.4
## [52] yaml_2.1.19 gplots_3.0.1 plyr_1.8.4
## [55] grid_3.5.1 blob_1.1.1 gdata_2.18.0
## [58] crayon_1.3.4 lattice_0.20-35 cowplot_0.9.3
## [61] splines_3.5.1 annotate_1.58.0 hms_0.4.2
## [64] knitr_1.20 pillar_1.3.0 XML_3.98-1.12
## [67] glue_1.3.0 evaluate_0.11 latticeExtra_0.6-28
## [70] data.table_1.11.4 gtable_0.2.0 purrr_0.2.5
## [73] tidyr_0.8.1 assertthat_0.2.0 xtable_1.8-2
## [76] tibble_1.4.2 memoise_1.1.0 bindrcpp_0.2.2
## [79] cluster_2.0.7-1
Copyright © 2015-2018 by John Lövrot. This work is licensed under a Creative Commons Attribution 4.0 International License.
The source code is available at github.com/lovrot/misc-examples-pam50.
Version 0.11.0
1. Parker JS, Mullins M, Cheang MC, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160-1167. doi:10.1200/JCO.2008.18.1370
2. Schmidt M, Bohm D, Torne C von, et al. The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 2008;68(13):5405-5413. doi:10.1158/0008-5472.CAN-07-5206
3. Ballman KV. Biomarker: Predictive or Prognostic? J Clin Oncol. 2015;33(33):3968-3971. doi:10.1200/JCO.2015.63.3651
4. Ades F, Zardavas D, Bozovic-Spasojevic I, et al. Luminal B breast cancer: molecular characterization, clinical management, and future perspectives. J Clin Oncol. 2014;32(25):2794-2803. doi:10.1200/JCO.2013.54.1870
5. Kattan MW. Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst. 2003;95(9):634-635. doi:10.1093/jnci/95.9.634
6. Dowsett M, Sestak I, Lopez-Knowles E, et al. Comparison of PAM50 risk of recurrence score with oncotype DX and IHC4 for predicting risk of distant recurrence after endocrine therapy. J Clin Oncol. 2013;31(22):2783-2790. doi:10.1200/JCO.2012.46.1558