0057bc0dd98dda4216de41c5093a818e6c7e2895
- Removed `run_evaluations_multi_gpu.sh` script as it was redundant. - Updated `run_experiments_multi_gpu.sh` to handle evaluation jobs instead of training. - Changed command-line options to support evaluation-specific parameters. - Implemented run directory discovery and validation for evaluation jobs. - Enhanced logging to capture evaluation details and outputs. - Added options for centralized output management and skipping existing results.
DeepHealth
Evaluation
This repo includes two event-driven evaluation entrypoints:
evaluate_next_event.py: next-event prediction using short-window CIFevaluate_horizon.py: horizon-capture evaluation using CIF at multiple horizons
IMPORTANT metric disclaimers
-
AUC reported by
evaluate_horizon.pyis “time-dependent” only because the label depends on the chosen horizon\tau. Without explicit follow-up end times / censoring, this is not a classical risk-set AUC with IPCW. Use it for model comparison and diagnostics, not strict statistical interpretation. -
Brier score reported by
evaluate_horizon.pyis an unadjusted diagnostic/proxy metric (no censoring adjustment). Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.
Example
# Next-event (no --horizons)
python evaluate_next_event.py \
--run_dir runs/your_run \
--tau_short 0.25 \
--age_bins 40 45 50 55 60 65 70 inf \
--device cuda \
--batch_size 256 \
--seed 0
# Horizon-capture
python evaluate_horizon.py \
--run_dir runs/your_run \
--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \
--age_bins 40 45 50 55 60 65 70 inf \
--device cuda \
--batch_size 256 \
--seed 0
Description
Languages
Python
95.8%
Shell
3.7%
R
0.5%