Add evaluation scripts for next-event prediction and horizon-capture evaluation with detailed metric disclaimers

This commit is contained in:
2026-01-17 13:49:39 +08:00
parent 07916ee529
commit bfab601a77
4 changed files with 1069 additions and 0 deletions

View File

@@ -1,2 +1,40 @@
# DeepHealth
## Evaluation
This repo includes two event-driven evaluation entrypoints:
- `evaluate_next_event.py`: next-event prediction using short-window CIF
- `evaluate_horizon.py`: horizon-capture evaluation using CIF at multiple horizons
### IMPORTANT metric disclaimers
- **AUC** reported by `evaluate_horizon.py` is “time-dependent” only because the label depends on the chosen horizon $\tau$.
Without explicit follow-up end times / censoring, this is **not** a classical risk-set AUC with IPCW.
Use it for **model comparison and diagnostics**, not strict statistical interpretation.
- **Brier score** reported by `evaluate_horizon.py` is an unadjusted diagnostic/proxy metric (no censoring adjustment).
Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.
### Example
```bash
# Next-event (no --horizons)
python evaluate_next_event.py \
--run_dir runs/your_run \
--tau_short 0.25 \
--age_bins 40 45 50 55 60 65 70 inf \
--device cuda \
--batch_size 256 \
--seed 0
# Horizon-capture
python evaluate_horizon.py \
--run_dir runs/your_run \
--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \
--age_bins 40 45 50 55 60 65 70 inf \
--device cuda \
--batch_size 256 \
--seed 0
```