Add evaluation scripts for next-event prediction and horizon-capture evaluation with detailed metric disclaimers

2026-01-17 13:49:39 +08:00
parent 07916ee529
commit bfab601a77
4 changed files with 1069 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,40 @@
 # DeepHealth

+## Evaluation
+
+This repo includes two event-driven evaluation entrypoints:
+
+- `evaluate_next_event.py`: next-event prediction using short-window CIF
+- `evaluate_horizon.py`: horizon-capture evaluation using CIF at multiple horizons
+
+### IMPORTANT metric disclaimers
+
+- **AUC** reported by `evaluate_horizon.py` is “time-dependent” only because the label depends on the chosen horizon $\tau$.
+	Without explicit follow-up end times / censoring, this is **not** a classical risk-set AUC with IPCW.
+	Use it for **model comparison and diagnostics**, not strict statistical interpretation.
+
+- **Brier score** reported by `evaluate_horizon.py` is an unadjusted diagnostic/proxy metric (no censoring adjustment).
+	Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.
+
+### Example
+
+```bash
+# Next-event (no --horizons)
+python evaluate_next_event.py \
+	--run_dir runs/your_run \
+	--tau_short 0.25 \
+	--age_bins 40 45 50 55 60 65 70 inf \
+	--device cuda \
+	--batch_size 256 \
+	--seed 0
+
+# Horizon-capture
+python evaluate_horizon.py \
+	--run_dir runs/your_run \
+	--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \
+	--age_bins 40 45 50 55 60 65 70 inf \
+	--device cuda \
+	--batch_size 256 \
+	--seed 0
+```
+