Evaluation Summary

Evidence & Validation

Aranga has undergone retrospective evaluation on multiple international datasets. This page summarizes publicly shareable information; detailed performance metrics and methodology are available under NDA.

Evaluation Status

Retrospective Only

Evaluation Scope

Task

Datasets

Metric Types

Status

In-hospital mortality risk

MIMIC-IV, VitalDB

AUROC, AUPRC

Evaluated

ICU transfer/admission risk

MIMIC-IV

AUROC, AUPRC

Evaluated

Shock phenotype classification

MIMIC-IV

Accuracy, F1

Evaluated

Calibration

All

Brier, ECE

Evaluated

Note: Specific metric values available in Technical Appendix (NDA required).

Evaluation Datasets

MIMIC-IV

United States

Large critical care dataset

Primary evaluation

VitalDB

Korea

Perioperative waveforms

Signal validation

Multi-center

International

Cross-demographic cohorts

Details under NDA

Technical Appendix Available Under NDA

Detailed performance metrics including AUROC/AUPRC by prediction horizon, lead-time distribution analysis, confusion matrices, calibration curves, subgroup performance, and comparator analysis vs. EWS baselines.

Request Technical Package

Methods at a Glance

Key methodological principles applied in our evaluation approach.

Retrospective Evaluation

All reported metrics are from retrospective analysis of historical data. No prospective clinical trials have been conducted.

Patient-Level Grouping

Cross-validation folds are constructed at the patient level to prevent data leakage between training and evaluation sets.

Time-Censoring

Evaluation respects temporal order; predictions are made using only data available at the prediction time.

Prediction Horizons

Performance is evaluated at multiple prediction horizons (e.g., 1h, 4h, 12h, 24h) to characterize lead-time behavior.

Calibration Assessment

Probability outputs are assessed for calibration using Brier score and Expected Calibration Error (ECE).

Threshold Selection

Operating point thresholds are selected on validation folds, not test folds, to prevent overfitting.

Definitions

Key terms used in our evaluation reporting.

AUROCArea Under Receiver Operating Characteristic curve. Measures discrimination ability across all thresholds.

AUPRCArea Under Precision-Recall Curve. More informative than AUROC for imbalanced outcomes.

Brier ScoreMean squared error of probability predictions. Lower is better. Measures calibration and discrimination.

ECEExpected Calibration Error. Measures how well predicted probabilities match observed frequencies.

Lead TimeDuration between risk signal and clinical event. Measured from first alert to event onset.

Phenotype ClassificationCategorization of shock etiology (cardiogenic, distributive, hypovolemic, obstructive).

Limitations

This section explicitly states the limitations of current evidence. Partners should consider these factors when evaluating Aranga for their specific use case.

Retrospective Only

All reported performance metrics are from retrospective evaluation. Prospective clinical validation has not been completed.

Dataset Shift

Performance may differ in populations, clinical workflows, or device configurations not represented in evaluation datasets.

Site-Specific Calibration

Deployment at new sites may require calibration to local data characteristics, device types, and clinical workflows.

Subpopulation Variation

Performance on specific subpopulations (age, comorbidities, ethnicity) should be verified for target deployment context.

Data Quality Assumptions

Evaluation assumes data quality consistent with source datasets. Real-time data may have different noise and missingness patterns.

No Regulatory Clearance

Aranga has not received regulatory clearance or approval from FDA, TGA, or other regulatory bodies.

Prospective Validation Partnership

We are seeking clinical partners for prospective validation studies. If your institution is interested in evaluating Aranga in a prospective setting, we welcome the discussion.

Discuss Validation Partnership