The aim of this report is to reproduce the statistical analyses of the nested case-control study GSE48091 as part of the original research article by Cunha, Bocci, Lövrot, et al.1,2
Since not all clinical-pathological data is public, in particular disease stage (tumour size and lymph node status), the analyses are not identical to the ones in the article.
To be added.
Endothelial genes co-expressed with ACVRL1.
| Gene name | Correlation coefficient | 95% CI lower limit | 95% CI uppper limit |
|---|---|---|---|
| TIE1 | 0.76 | 0.73 | 0.80 |
| PECAM1 | 0.69 | 0.64 | 0.73 |
| CD34 | 0.67 | 0.63 | 0.71 |
| ESAM | 0.75 | 0.71 | 0.78 |
| CDH5 | 0.77 | 0.74 | 0.80 |
| VWF | 0.80 | 0.77 | 0.83 |
| FLI1 | 0.63 | 0.58 | 0.67 |
Univariate and multivariable conditional logistic regression models are used to compare patients developing metastatic disease with patient free from disseminating disase in the nested case-control study, where the controls are randomly matched to cases by age, adjuvant systemic therapy and calendar period at diagnosis.
Since the clinical-pathological data on tumour size, lymph node status and HER2-status is not public, the analyses are not identical to the ones in the article. HER2-enriched intrinsic subtype as determined by PAM503 (versus rest) is used as a surrogate for clinical HER2-status.
Moreover, within-therapy-group associations for proliferation as represented by the PAM50-PROLIF index4 is added as proliferation is a strong prognostic factor in ER-positive breast cancer (enriched in the endocrine therapy groups) and a (potential) predictive factor for response to chemotherapy, with opposite direction of association with outcome. See also illustrative figure in the Supportive information below.
| coef | HR univariate models | P univariate models | HR multivariable model A | P multivariable model A | HR multivariable model B | P multivariable model B |
|---|---|---|---|---|---|---|
| scale(ACVRL1) | 1.92 | *** | 4.09 | *** | ||
| scale(endothelial_metagene) | 1.14 | 0.44 | *** | |||
| scale(ACVRL1_endothelial_index) | 2.17 | *** | 2.16 | *** | ||
| h2statH2 | 2.48 | *** | 1.84 | * | 1.87 | ** |
| trtgrpET:scale(PAM50PROLIF) | 2.00 | *** | 2.19 | *** | 1.77 | ** |
| trtgrpCT+ET:scale(PAM50PROLIF) | 1.53 | * | 1.74 | ** | 1.40 | . |
| trtgrpCT:scale(PAM50PROLIF) | 0.53 | *** | 0.83 | 0.57 | ** |
Endothelial metagene expression is the average expression of the prototypical endothelial cell markers PECAM1, CDH5, and CD34.
Relating ACVRL1 to prototypical endothelial gene-expression makes the association with risk of disseminating disease coherent across treatment groups.
Figure (cf. Suppl. Fig. 3A and 3B in the original article). (A) Exploratory plots of excess distant metastases versus the ACVRL1:endothelial metagene index, stratified by adjuvant systemic therapy. ET, endocrine therapy; CT, chemotherapy. (B) Box plots show similar manifestation of the ACVRL1:endothelial metagene across all molecular subtypes of breast cancer. LA, luminal A; LB, luminal B; H2, HER2-enriched; BL, basal-like; NBL, normal breast-like.
The shape of the smoother in an excess distant metastases plot indicates the form of an association between the gene expression variable and risk of metastatic disease. Mathematically, the excess distant metastases are martingale residuals in a null conditional logistic regression model.
Figure. Illustration of the association with outcome for proliferation status of the tumour as examplified by the PAM50-PROLIF index.
Addendum/Correction2, GSE81954
The correlation between gene expression data for original and reextracted RNA is excellent for key breast cancer genes, for example, ESR1 (r = 0.95) and ERBB2 (r = 0.96).
| Gene name | Correlation coefficient |
|---|---|
| ESR1 | 0.952 |
| ERBB2 | 0.956 |
Bridging the primary comparison, case–control set differences (n = 40) for ACVRL1 and the ACVRL1:endothelial metagene index that we reported are consistent between the two extractions (Fig. 1). A case–control set difference is the value for the case minus the (average) value of the matched control(s).
Figure. Case-control set differences.
Addtional illustrative predictive models.
The full study is split into a training set (2/3) and a test set (1/3), with the QC substudy as part of the test set.
| Training set | Test set | Sum | |
|---|---|---|---|
| QC substudy | 0 | 40 | 40 |
| (rest) | 126 | 24 | 150 |
| Sum | 126 | 64 | 190 |
A panel of models are trained and tuned based on the training set using repeated cross-validation, a resampling technique. For details, see reproduce-cunha15canres/addendum-train-illustrative-models.html
Selected model: Logistic Regression with Elastic Net Regularisation
Figure. Performance in the training and test tests as assessed by area under the ROC curve.
Figure. ROC curves for the test set.
| coef | HR univariate models | P univariate models | HR multivariable model | P multivariable model |
|---|---|---|---|---|
| model_pred_class_case | 14.08 | *** | 16.76 | *** |
| h2statH2 | 6.35 | *** | 4.25 | * |
| trtgrpET:scale(PAM50PROLIF) | 4.68 | ** | 5.73 | ** |
| trtgrpCT+ET:scale(PAM50PROLIF) | 1.31 | 1.25 | ||
| trtgrpCT:scale(PAM50PROLIF) | 0.69 | 1.15 |
Figure. ROC curves for the QC substudy.
Figure. Case-control set differences.
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] pROC_1.9.1 randomForest_4.6-12
## [3] glmnet_2.0-5 foreach_1.4.3
## [5] Matrix_1.2-9 caret_6.0-76
## [7] lattice_0.20-35 survival_2.41-3
## [9] HuRSTA2a520709.db_1.0.0 org.Hs.eg.db_3.4.1
## [11] AnnotationDbi_1.38.0 IRanges_2.10.0
## [13] S4Vectors_0.14.0 GEOquery_2.42.0
## [15] Biobase_2.36.0 BiocGenerics_0.22.0
## [17] dplyr_0.5.0 purrr_0.2.2
## [19] readr_1.1.0 tidyr_0.6.1
## [21] tibble_1.3.0 ggplot2_2.2.1
## [23] tidyverse_1.1.1
##
## loaded via a namespace (and not attached):
## [1] httr_1.2.1 jsonlite_1.4 splines_3.4.0
## [4] gtools_3.5.0 modelr_0.1.0 assertthat_0.2.0
## [7] highr_0.6 cellranger_1.1.0 yaml_2.1.14
## [10] RSQLite_1.1-2 backports_1.0.5 quantreg_5.33
## [13] digest_0.6.12 minqa_1.2.4 rvest_0.3.2
## [16] colorspace_1.3-2 cowplot_0.7.0 htmltools_0.3.5
## [19] plyr_1.8.4 psych_1.7.3.21 XML_3.98-1.6
## [22] broom_0.4.2 SparseM_1.77 haven_1.0.0
## [25] genefilter_1.58.0 xtable_1.8-2 scales_0.4.1
## [28] MatrixModels_0.4-1 lme4_1.1-13 annotate_1.54.0
## [31] mgcv_1.8-17 car_2.1-4 nnet_7.3-12
## [34] lazyeval_0.2.0 pbkrtest_0.4-7 mnormt_1.5-5
## [37] magrittr_1.5 readxl_1.0.0 memoise_1.1.0
## [40] evaluate_0.10 nlme_3.1-131 MASS_7.3-47
## [43] forcats_0.2.0 xml2_1.1.1 foreign_0.8-68
## [46] tools_3.4.0 hms_0.3 ProjectTemplate_0.7
## [49] stringr_1.2.0 munsell_0.4.3 compiler_3.4.0
## [52] nloptr_1.0.4 grid_3.4.0 RCurl_1.95-4.8
## [55] iterators_1.0.8 bitops_1.0-6 labeling_0.3
## [58] rmarkdown_1.5 gtable_0.2.0 ModelMetrics_1.1.0
## [61] codetools_0.2-15 DBI_0.6-1 reshape2_1.4.2
## [64] R6_2.2.0 lubridate_1.6.0 knitr_1.15.1
## [67] rprojroot_1.2 stringi_1.1.5 Rcpp_0.12.10
© 2017 John Lövrot.
This work is licensed under a Creative Commons Attribution 4.0 International License.
The source code is available at github.com/lovrot/reproduce-cunha15canres.
Version 0.0.0.9005
1. Cunha SI, Bocci M, Lövrot J, et al. Endothelial ALK1 is a therapeutic target to block metastatic dissemination of breast cancer. Cancer Res. 2015;75(12):2445-2456. doi:10.1158/0008-5472.CAN-14-3706.
2. Correction: Endothelial ALK1 is a therapeutic target to block metastatic dissemination of breast cancer. Cancer Res. 2016;76(20):6131-6132. doi:10.1158/0008-5472.CAN-16-2220.
3. Parker JS, Mullins M, Cheang MC, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27:1160-1167. doi:10.1200/JCO.2008.18.1370.
4. Nielsen TO, Parker JS, Leung S, et al. A Comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Clin Cancer Res. 2010;16(21):5222-5232. doi:10.1158/1078-0432.CCR-10-1282.