Methods

CalMoE predicts disease-specific survival for seven cancer types using patient age, pathologic tumor stage, histopathology imaging, and gene expression data. This page documents how it works, what it can do, and - equally - what it can\'t.

What is CalMoE?

CalMoE is a calibrated survival prediction model. Unlike most AI survival tools, which produce rankings without validating whether their probability estimates are reliable, CalMoE passes formal calibration testing: when it says 70%, we have evidence that approximately 70% of similar patients actually survive.

How it works

Age + Stageclinical featuresHistopathologyWSI · UNI2-h featuresGene expressionRNA · 50 hallmarksMixture-of-Expertsgated fusion+ Platt · CiPOTCalibrated S(t)0y10yInputs → MoE Fusion → Calibrated survival curve

CalMoE is a Mixture-of-Experts fusion model. Each modality - age & stage, histopathology, RNA - is encoded by a specialist expert; a gating network learns which expert to weight for each patient. The output is a full survival curve S(t), not a single risk score.

Plain-language definition of calibration: if our model tells 100 patients they each have a 70% chance of 5-year survival, approximately 70 of them will actually survive 5 years. Most AI survival models are never tested for this property.

Performance

Held-out concordance index (C-index) across 5-fold site-stratified cross-validation on TCGA cohorts. Higher is better.

CohortCalMoEMMP (2024)SurvPath (2024)MCAT (2021)
BRCA0.7830.7530.7090.648
KIRC0.8640.7480.7380.670
COADREAD0.7560.6360.5390.578
LUAD0.7280.6430.6120.615
BLCA0.7020.6280.6190.619
STAD0.6580.5800.5560.528
HNSC0.673-0.6000.531
Mean (6 shared)0.7490.6650.6290.610
Calibration (BH-adjusted)27 / 30not reportednot reportednot reported

Training data

  • The Cancer Genome Atlas (TCGA) cohorts, seven cancer types (BRCA, BLCA, LUAD, KIRC, STAD, COADREAD, HNSC)
  • Disease-Specific Survival (DSS) endpoint
  • 5-fold site-stratified cross-validation (no patient or acquisition site appears in both train and test)
  • UNI2-h feature extractor for whole-slide histopathology
  • 50 hallmark gene pathway signatures for genomics

Limitations

Trained on TCGA data (US academic centers, primarily White / European patients). May not generalize to other populations.
The web calculator matches your inputs to pre-computed predictions from 2,874 TCGA patients. Each patient\'s curve was generated by the full multimodal model (WSI + RNA + clinical). Your estimate is a weighted average of similar patients - not a direct model inference on your data.
Predictions are population-level estimates, not individual guarantees.
The model does not account for treatment effects - no treatment-specific predictions.
Not validated prospectively.

Calibration methodology

  • 1-calibration (Hosmer-Lemeshow) evaluated at the median event time per cohort
  • Benjamini-Hochberg FDR correction across 30 fold-by-cohort tests
  • Platt scaling fit on training folds and applied to held-out validation
  • CiPOT conformal post-hoc adjustment for prediction-interval coverage