Introduction

The aim of these analysis notes is to give a basic introduction to PAM50 intrinsic subtypes and risk of reccurence score1 using a published breast cancer dataset. These notes are generated using an R code bundle available at github, and the aim of this bundle is also to show how one can use the ProjectTemplate framework for a data analysis project.

The Mainz dataset

In these notes, we use the Mainz cohort2 of primary breast cancer patients. It is a population based cohort. Lymph node negative disease, no adjuvant therapy.

Gene-expression and clinical-pathological data was retrieved from the Gene Expression Omnibus (GEO) at NCBI (accession GSE11121) using the R/Bioconductor package GEOquery. Additionally, the oestrogen receptor status was retrieve from the R/Bioconductor data package breastCancerMAINZ.

Initial data explorations

Clinical-pathological characteristics

Table. Clinial-pathological characteristics of the Mainz cohort. ER: Oestrogen receptor; LA: Luminal A; LB: Luminal B; H2: HER2-enriched; LB: Basal-like; NBL: Normal breast-like.
Tumour size (cm) Nodal status ER status Tumour grade Intrinsic subtype
Min. :0.1 LN-:200 ER-: 38 G1: 29 LA :54
1st Qu.:1.5 LN+: 0 ER+:162 G2:136 LB :42
Median :2.0 G3: 35 H2 :32
Mean :2.1 BL :35
3rd Qu.:2.4 NBL:37
Max. :6.0

t-SNE plots

t-distributed stochastic neighbour embedding (t-SNE) plots.

Figure. t-SNE plots of median-centered data and using one minus Spearman's correlation dissimilarity metric. (A) After non-specific filtering keeping top 20\% most varying genes, and (B) based on the fifty PAM50 genes. t-SNE: t-distributed stochastic neighbour embedding.

Figure. t-SNE plots of median-centered data and using one minus Spearman’s correlation dissimilarity metric. (A) After non-specific filtering keeping top 20% most varying genes, and (B) based on the fifty PAM50 genes. t-SNE: t-distributed stochastic neighbour embedding.

Illustrative cluster heatmap

Figure. Semi-unsupervised clustering of the Mainz patients based on the PAM50 genes. Average-linkage hierarchical clustering using a one-minus-spearman rank correlation dissimilarity metric, after gene-centering using the median. Yellow: higher than median gene expression; black: median; blue: lower than median. PAM50PROLIF: PAM50 proliferation index.

Figure. Semi-unsupervised clustering of the Mainz patients based on the PAM50 genes. Average-linkage hierarchical clustering using a one-minus-spearman rank correlation dissimilarity metric, after gene-centering using the median. Yellow: higher than median gene expression; black: median; blue: lower than median. PAM50PROLIF: PAM50 proliferation index.

Assocations with outcome

Since the Mainz dataset is a cohort of patients not receiving systemic therapy after surgery, the associations with outcome we observe are pure prognostic.3 A biomarker can of course also be both prognostic and therapy predictive. An example is HER2 status.

Illustration of excess distant metastases plots

The association between the PAM50 proliferation index and outcome is illustrated using exploratory plots of excess distant metastases. A smoother with exploratory confidence band is superimposed in the scatterplot and the contributions from individual patients are shown with circles. The shape of the smoother indicates the form of an association between the index and risk of distant metastasis. Mathematically, the excess distant metastases are martingale residuals in a null Cox model. Corresponding plots for PAM50 intrinsic subtypes are added for comparison.

Figure. Excess distant metastases in relation to PAM50 (A) intrinsic subtype and (B) proliferation index.

Figure. Excess distant metastases in relation to PAM50 (A) intrinsic subtype and (B) proliferation index.

Figure. Corresponding Kaplan-Meier curves. The PAM50 proliferation index is categorised into quarters.

Figure. Corresponding Kaplan-Meier curves. The PAM50 proliferation index is categorised into quarters.

Proliferation, intrinsic subtypes and outcome

See, for example, section “Prognostic Signatures Within Intrinsic Subtypes” of the review by Ades et al.4

Figure. Proliferation, intrinsic subtypes and outcome.

Figure. Proliferation, intrinsic subtypes and outcome.

Added value of ROR score in ER+ patients

One should judge a candidate biomarker by its ability to improve prognostic/predictive accuracy beyond known prognosicators/predictors.5

Initial exploratory plots:

Figure. Excess distant metastases in relation to (upper row) Nottingham prognostic index and (lower row) PAM50 risk of reccurence score in (left column) ER+ patients and (right column) ER+/not H2 patients. NPI: Nottingham prognostic index (based on tumour size, lymph node status and histological grade); RORS: Risk of reccurence score (subtype alone).

Figure. Excess distant metastases in relation to (upper row) Nottingham prognostic index and (lower row) PAM50 risk of reccurence score in (left column) ER+ patients and (right column) ER+/not H2 patients. NPI: Nottingham prognostic index (based on tumour size, lymph node status and histological grade); RORS: Risk of reccurence score (subtype alone).

Formal statistical inference, compare with, for example, Dowsett et al.6:

Figure. Concordance indices to assess the added value of ROR-S beyond standard clinical-pathological prognosticators as represented by the Nottingham prognostic index (NPI) in (A) ER+ patients and (B) ER+/not H2 patients.

Figure. Concordance indices to assess the added value of ROR-S beyond standard clinical-pathological prognosticators as represented by the Nottingham prognostic index (NPI) in (A) ER+ patients and (B) ER+/not H2 patients.

Table. Likelihood ratio tests to assess the added value of ROR-S beyond NPI in the sub-populations ER+ patients and ER+/not H2 patients. Corresponding tests to assess the added value of NPI beyond ROR-S is also included for comparison.
Population Comparison Chisq Df P(>|Chi|)
ER+ NPI+RORS vs NPI 7.659 1 0.006
ER+ NPI+RORS vs RORS 1.748 1 0.186
ER+/not H2 NPI+RORS vs NPI 6.090 1 0.014
ER+/not H2 NPI+RORS vs RORS 4.781 1 0.029

R session information

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.5
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] Rtsne_0.13           survplot_0.0.7       survival_2.42-6     
##  [4] ggplot2_3.0.0        Heatplus_2.26.0      genefilter_1.62.0   
##  [7] hgu133a.db_3.2.3     org.Hs.eg.db_3.6.0   AnnotationDbi_1.42.1
## [10] IRanges_2.14.10      S4Vectors_0.18.3     GEOquery_2.48.0     
## [13] Biobase_2.40.0       BiocGenerics_0.26.0 
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6          bit64_0.9-7           RColorBrewer_1.1-2   
##  [4] rprojroot_1.3-2       tools_3.5.1           backports_1.1.2      
##  [7] R6_2.2.2              rpart_4.1-13          KernSmooth_2.23-15   
## [10] Hmisc_4.1-1           DBI_1.0.0             lazyeval_0.2.1       
## [13] colorspace_1.3-2      nnet_7.3-12           withr_2.1.2          
## [16] tidyselect_0.2.4      gridExtra_2.3         bit_1.1-14           
## [19] compiler_3.5.1        htmlTable_1.12        xml2_1.2.0           
## [22] ProjectTemplate_0.8.2 labeling_0.3          caTools_1.17.1.1     
## [25] scales_0.5.0          checkmate_1.8.5       readr_1.1.1          
## [28] stringr_1.3.1         digest_0.6.15         foreign_0.8-71       
## [31] rmarkdown_1.10        base64enc_0.1-3       pkgconfig_2.0.1      
## [34] htmltools_0.3.6       limma_3.36.2          highr_0.7            
## [37] htmlwidgets_1.2       rlang_0.2.1           rstudioapi_0.7       
## [40] RSQLite_2.1.1         bindr_0.1.1           gtools_3.8.1         
## [43] acepack_1.4.1         dplyr_0.7.6           RCurl_1.95-4.11      
## [46] magrittr_1.5          Formula_1.2-3         Matrix_1.2-14        
## [49] Rcpp_0.12.17          munsell_0.5.0         stringi_1.2.4        
## [52] yaml_2.1.19           gplots_3.0.1          plyr_1.8.4           
## [55] grid_3.5.1            blob_1.1.1            gdata_2.18.0         
## [58] crayon_1.3.4          lattice_0.20-35       cowplot_0.9.3        
## [61] splines_3.5.1         annotate_1.58.0       hms_0.4.2            
## [64] knitr_1.20            pillar_1.3.0          XML_3.98-1.12        
## [67] glue_1.3.0            evaluate_0.11         latticeExtra_0.6-28  
## [70] data.table_1.11.4     gtable_0.2.0          purrr_0.2.5          
## [73] tidyr_0.8.1           assertthat_0.2.0      xtable_1.8-2         
## [76] tibble_1.4.2          memoise_1.1.0         bindrcpp_0.2.2       
## [79] cluster_2.0.7-1

Copyright © 2015-2018 by John Lövrot. This work is licensed under a Creative Commons Attribution 4.0 International License.
The source code is available at github.com/lovrot/misc-examples-pam50.
Version 0.11.0


References

1. Parker JS, Mullins M, Cheang MC, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160-1167. doi:10.1200/JCO.2008.18.1370

2. Schmidt M, Bohm D, Torne C von, et al. The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 2008;68(13):5405-5413. doi:10.1158/0008-5472.CAN-07-5206

3. Ballman KV. Biomarker: Predictive or Prognostic? J Clin Oncol. 2015;33(33):3968-3971. doi:10.1200/JCO.2015.63.3651

4. Ades F, Zardavas D, Bozovic-Spasojevic I, et al. Luminal B breast cancer: molecular characterization, clinical management, and future perspectives. J Clin Oncol. 2014;32(25):2794-2803. doi:10.1200/JCO.2013.54.1870

5. Kattan MW. Judging new markers by their ability to improve predictive accuracy. J Natl Cancer Inst. 2003;95(9):634-635. doi:10.1093/jnci/95.9.634

6. Dowsett M, Sestak I, Lopez-Knowles E, et al. Comparison of PAM50 risk of recurrence score with oncotype DX and IHC4 for predicting risk of distant recurrence after endocrine therapy. J Clin Oncol. 2013;31(22):2783-2790. doi:10.1200/JCO.2012.46.1558