Jiarui Li 0057bc0dd9 Refactor evaluation scripts for multi-GPU execution
- Removed `run_evaluations_multi_gpu.sh` script as it was redundant.
- Updated `run_experiments_multi_gpu.sh` to handle evaluation jobs instead of training.
- Changed command-line options to support evaluation-specific parameters.
- Implemented run directory discovery and validation for evaluation jobs.
- Enhanced logging to capture evaluation details and outputs.
- Added options for centralized output management and skipping existing results.
2026-01-18 17:38:20 +08:00
2026-01-07 21:27:38 +08:00

DeepHealth

Evaluation

This repo includes two event-driven evaluation entrypoints:

  • evaluate_next_event.py: next-event prediction using short-window CIF
  • evaluate_horizon.py: horizon-capture evaluation using CIF at multiple horizons

IMPORTANT metric disclaimers

  • AUC reported by evaluate_horizon.py is “time-dependent” only because the label depends on the chosen horizon \tau. Without explicit follow-up end times / censoring, this is not a classical risk-set AUC with IPCW. Use it for model comparison and diagnostics, not strict statistical interpretation.

  • Brier score reported by evaluate_horizon.py is an unadjusted diagnostic/proxy metric (no censoring adjustment). Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.

Example

# Next-event (no --horizons)
python evaluate_next_event.py \
	--run_dir runs/your_run \
	--tau_short 0.25 \
	--age_bins 40 45 50 55 60 65 70 inf \
	--device cuda \
	--batch_size 256 \
	--seed 0

# Horizon-capture
python evaluate_horizon.py \
	--run_dir runs/your_run \
	--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \
	--age_bins 40 45 50 55 60 65 70 inf \
	--device cuda \
	--batch_size 256 \
	--seed 0
Description
No description provided
Readme BSD-3-Clause 1.2 MiB
Languages
Python 95.8%
Shell 3.7%
R 0.5%