CalMoE predicts disease-specific survival for seven cancer types using patient age, pathologic tumor stage, histopathology imaging, and gene expression data. This page documents how it works, what it can do, and - equally - what it can\'t.
What is CalMoE?
CalMoE is a calibrated survival prediction model. Unlike most AI survival tools, which produce rankings without validating whether their probability estimates are reliable, CalMoE passes formal calibration testing: when it says 70%, we have evidence that approximately 70% of similar patients actually survive.
How it works
CalMoE is a Mixture-of-Experts fusion model. Each modality - age & stage, histopathology, RNA - is encoded by a specialist expert; a gating network learns which expert to weight for each patient. The output is a full survival curve S(t), not a single risk score.
Plain-language definition of calibration: if our model tells 100 patients they each have a 70% chance of 5-year survival, approximately 70 of them will actually survive 5 years. Most AI survival models are never tested for this property.
Performance
Held-out concordance index (C-index) across 5-fold site-stratified cross-validation on TCGA cohorts. Higher is better.
Cohort
CalMoE
MMP (2024)
SurvPath (2024)
MCAT (2021)
BRCA
0.783
0.753
0.709
0.648
KIRC
0.864
0.748
0.738
0.670
COADREAD
0.756
0.636
0.539
0.578
LUAD
0.728
0.643
0.612
0.615
BLCA
0.702
0.628
0.619
0.619
STAD
0.658
0.580
0.556
0.528
HNSC
0.673
-
0.600
0.531
Mean (6 shared)
0.749
0.665
0.629
0.610
Calibration (BH-adjusted)
27 / 30
not reported
not reported
not reported
Training data
The Cancer Genome Atlas (TCGA) cohorts, seven cancer types (BRCA, BLCA, LUAD, KIRC, STAD, COADREAD, HNSC)
Disease-Specific Survival (DSS) endpoint
5-fold site-stratified cross-validation (no patient or acquisition site appears in both train and test)
UNI2-h feature extractor for whole-slide histopathology
50 hallmark gene pathway signatures for genomics
Limitations
Trained on TCGA data (US academic centers, primarily White / European patients). May not generalize to other populations.
The web calculator matches your inputs to pre-computed predictions from 2,874 TCGA patients. Each patient\'s curve was generated by the full multimodal model (WSI + RNA + clinical). Your estimate is a weighted average of similar patients - not a direct model inference on your data.
Predictions are population-level estimates, not individual guarantees.
The model does not account for treatment effects - no treatment-specific predictions.
Not validated prospectively.
Calibration methodology
1-calibration (Hosmer-Lemeshow) evaluated at the median event time per cohort
Benjamini-Hochberg FDR correction across 30 fold-by-cohort tests
Platt scaling fit on training folds and applied to held-out validation
CiPOT conformal post-hoc adjustment for prediction-interval coverage