Aim

The aim of this report is to reproduce the statistical analyses of the nested case-control study GSE48091 as part of the original research article by Cunha, Bocci, Lövrot, et al.1,2

Since not all clinical-pathological data is public, in particular disease stage (tumour size and lymph node status), the analyses are not identical to the ones in the article.

Background

To be added.

Data analysis

Co-expression analysis

Endothelial genes co-expressed with ACVRL1.

Table (cf. Suppl. Tab. 1 in the original article). Endothelial genes co-expressed with ACVRL1 in the nested case-control study.
Gene name Correlation coefficient 95% CI lower limit 95% CI uppper limit
TIE1 0.76 0.73 0.80
PECAM1 0.69 0.64 0.73
CD34 0.67 0.63 0.71
ESAM 0.75 0.71 0.78
CDH5 0.77 0.74 0.80
VWF 0.80 0.77 0.83
FLI1 0.63 0.58 0.67

Statistical inference

Univariate and multivariable conditional logistic regression models are used to compare patients developing metastatic disease with patient free from disseminating disase in the nested case-control study, where the controls are randomly matched to cases by age, adjuvant systemic therapy and calendar period at diagnosis.

Since the clinical-pathological data on tumour size, lymph node status and HER2-status is not public, the analyses are not identical to the ones in the article. HER2-enriched intrinsic subtype as determined by PAM503 (versus rest) is used as a surrogate for clinical HER2-status.

Moreover, within-therapy-group associations for proliferation as represented by the PAM50-PROLIF index4 is added as proliferation is a strong prognostic factor in ER-positive breast cancer (enriched in the endocrine therapy groups) and a (potential) predictive factor for response to chemotherapy, with opposite direction of association with outcome. See also illustrative figure in the Supportive information below.

Table (cf. Tab. 1 in the original article). Univariate and multivariable conditional logistic regression models of the nested case-control study. Per standard-deviation hazard ratios (HR) for continuous variables. ET, endocrine therapy; CT, chemotherapy; H2, HER2-enriched intrinsic subtype; PAM50PROLIF: PAM50 proliferation index. Statistical significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1.
coef HR univariate models P univariate models HR multivariable model A P multivariable model A HR multivariable model B P multivariable model B
scale(ACVRL1) 1.92 *** 4.09 ***
scale(endothelial_metagene) 1.14 0.44 ***
scale(ACVRL1_endothelial_index) 2.17 *** 2.16 ***
h2statH2 2.48 *** 1.84 * 1.87 **
trtgrpET:scale(PAM50PROLIF) 2.00 *** 2.19 *** 1.77 **
trtgrpCT+ET:scale(PAM50PROLIF) 1.53 * 1.74 ** 1.40 .
trtgrpCT:scale(PAM50PROLIF) 0.53 *** 0.83 0.57 **

Endothelial metagene expression is the average expression of the prototypical endothelial cell markers PECAM1, CDH5, and CD34.

Supportive information

Relating ACVRL1 to prototypical endothelial gene-expression makes the association with risk of disseminating disease coherent across treatment groups.

Figure (cf. Suppl. Fig. 3A and 3B in the original article). (A) Exploratory plots of excess distant metastases versus the ACVRL1:endothelial metagene index, stratified by adjuvant systemic therapy. ET, endocrine therapy; CT, chemotherapy. (B) Box plots show similar manifestation of the ACVRL1:endothelial metagene across all molecular subtypes of breast cancer. LA, luminal A; LB, luminal B; H2, HER2-enriched; BL, basal-like; NBL, normal breast-like.

Figure (cf. Suppl. Fig. 3A and 3B in the original article). (A) Exploratory plots of excess distant metastases versus the ACVRL1:endothelial metagene index, stratified by adjuvant systemic therapy. ET, endocrine therapy; CT, chemotherapy. (B) Box plots show similar manifestation of the ACVRL1:endothelial metagene across all molecular subtypes of breast cancer. LA, luminal A; LB, luminal B; H2, HER2-enriched; BL, basal-like; NBL, normal breast-like.

The shape of the smoother in an excess distant metastases plot indicates the form of an association between the gene expression variable and risk of metastatic disease. Mathematically, the excess distant metastases are martingale residuals in a null conditional logistic regression model.

Figure. Illustration of the association with outcome for proliferation status of the tumour as examplified by the PAM50-PROLIF index.

Figure. Illustration of the association with outcome for proliferation status of the tumour as examplified by the PAM50-PROLIF index.

Addendum

Addendum/Correction2, GSE81954

The correlation between gene expression data for original and reextracted RNA is excellent for key breast cancer genes, for example, ESR1 (r = 0.95) and ERBB2 (r = 0.96).

Gene name Correlation coefficient
ESR1 0.952
ERBB2 0.956

Bridging the primary comparison, case–control set differences (n = 40) for ACVRL1 and the ACVRL1:endothelial metagene index that we reported are consistent between the two extractions (Fig. 1). A case–control set difference is the value for the case minus the (average) value of the matched control(s).

Figure. Case-control set differences.

Figure. Case-control set differences.

Additional information

Addtional illustrative predictive models.

Training of illustrative models

The full study is split into a training set (2/3) and a test set (1/3), with the QC substudy as part of the test set.

Table. Number of case-control sets in each partition of the full study.
Training set Test set Sum
QC substudy 0 40 40
(rest) 126 24 150
Sum 126 64 190

A panel of models are trained and tuned based on the training set using repeated cross-validation, a resampling technique. For details, see reproduce-cunha15canres/addendum-train-illustrative-models.html

Performance in training and test sets

Selected model: Logistic Regression with Elastic Net Regularisation

Figure. Performance in the training and test tests as assessed by area under the ROC curve.

Figure. Performance in the training and test tests as assessed by area under the ROC curve.

Figure. ROC curves for the test set.

Figure. ROC curves for the test set.

Table. Illustrative statistical inference of the test set. HR: hazard ratio; ET, endocrine therapy; CT, chemotherapy; PAM50PROLIF: PAM50 proliferation index. Statistical significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1.
coef HR univariate models P univariate models HR multivariable model P multivariable model
model_pred_class_case 14.08 *** 16.76 ***
h2statH2 6.35 *** 4.25 *
trtgrpET:scale(PAM50PROLIF) 4.68 ** 5.73 **
trtgrpCT+ET:scale(PAM50PROLIF) 1.31 1.25
trtgrpCT:scale(PAM50PROLIF) 0.69 1.15

Performance in QC substudy

Figure. ROC curves for the QC substudy.

Figure. ROC curves for the QC substudy.

Figure. Case-control set differences.

Figure. Case-control set differences.

R session information

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.4
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] pROC_1.9.1              randomForest_4.6-12    
##  [3] glmnet_2.0-5            foreach_1.4.3          
##  [5] Matrix_1.2-9            caret_6.0-76           
##  [7] lattice_0.20-35         survival_2.41-3        
##  [9] HuRSTA2a520709.db_1.0.0 org.Hs.eg.db_3.4.1     
## [11] AnnotationDbi_1.38.0    IRanges_2.10.0         
## [13] S4Vectors_0.14.0        GEOquery_2.42.0        
## [15] Biobase_2.36.0          BiocGenerics_0.22.0    
## [17] dplyr_0.5.0             purrr_0.2.2            
## [19] readr_1.1.0             tidyr_0.6.1            
## [21] tibble_1.3.0            ggplot2_2.2.1          
## [23] tidyverse_1.1.1        
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.2.1          jsonlite_1.4        splines_3.4.0      
##  [4] gtools_3.5.0        modelr_0.1.0        assertthat_0.2.0   
##  [7] highr_0.6           cellranger_1.1.0    yaml_2.1.14        
## [10] RSQLite_1.1-2       backports_1.0.5     quantreg_5.33      
## [13] digest_0.6.12       minqa_1.2.4         rvest_0.3.2        
## [16] colorspace_1.3-2    cowplot_0.7.0       htmltools_0.3.5    
## [19] plyr_1.8.4          psych_1.7.3.21      XML_3.98-1.6       
## [22] broom_0.4.2         SparseM_1.77        haven_1.0.0        
## [25] genefilter_1.58.0   xtable_1.8-2        scales_0.4.1       
## [28] MatrixModels_0.4-1  lme4_1.1-13         annotate_1.54.0    
## [31] mgcv_1.8-17         car_2.1-4           nnet_7.3-12        
## [34] lazyeval_0.2.0      pbkrtest_0.4-7      mnormt_1.5-5       
## [37] magrittr_1.5        readxl_1.0.0        memoise_1.1.0      
## [40] evaluate_0.10       nlme_3.1-131        MASS_7.3-47        
## [43] forcats_0.2.0       xml2_1.1.1          foreign_0.8-68     
## [46] tools_3.4.0         hms_0.3             ProjectTemplate_0.7
## [49] stringr_1.2.0       munsell_0.4.3       compiler_3.4.0     
## [52] nloptr_1.0.4        grid_3.4.0          RCurl_1.95-4.8     
## [55] iterators_1.0.8     bitops_1.0-6        labeling_0.3       
## [58] rmarkdown_1.5       gtable_0.2.0        ModelMetrics_1.1.0 
## [61] codetools_0.2-15    DBI_0.6-1           reshape2_1.4.2     
## [64] R6_2.2.0            lubridate_1.6.0     knitr_1.15.1       
## [67] rprojroot_1.2       stringi_1.1.5       Rcpp_0.12.10

© 2017 John Lövrot.
This work is licensed under a Creative Commons Attribution 4.0 International License.
The source code is available at github.com/lovrot/reproduce-cunha15canres.
Version 0.0.0.9005

References

1. Cunha SI, Bocci M, Lövrot J, et al. Endothelial ALK1 is a therapeutic target to block metastatic dissemination of breast cancer. Cancer Res. 2015;75(12):2445-2456. doi:10.1158/0008-5472.CAN-14-3706.

2. Correction: Endothelial ALK1 is a therapeutic target to block metastatic dissemination of breast cancer. Cancer Res. 2016;76(20):6131-6132. doi:10.1158/0008-5472.CAN-16-2220.

3. Parker JS, Mullins M, Cheang MC, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27:1160-1167. doi:10.1200/JCO.2008.18.1370.

4. Nielsen TO, Parker JS, Leung S, et al. A Comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Clin Cancer Res. 2010;16(21):5222-5232. doi:10.1158/1078-0432.CCR-10-1282.