2026-01-07 21:27:38 +08:00
|
|
|
# DeepHealth
|
|
|
|
|
|
2026-01-17 13:49:39 +08:00
|
|
|
## Evaluation
|
|
|
|
|
|
|
|
|
|
This repo includes two event-driven evaluation entrypoints:
|
|
|
|
|
|
|
|
|
|
- `evaluate_next_event.py`: next-event prediction using short-window CIF
|
|
|
|
|
- `evaluate_horizon.py`: horizon-capture evaluation using CIF at multiple horizons
|
|
|
|
|
|
|
|
|
|
### IMPORTANT metric disclaimers
|
|
|
|
|
|
|
|
|
|
- **AUC** reported by `evaluate_horizon.py` is “time-dependent” only because the label depends on the chosen horizon $\tau$.
|
|
|
|
|
Without explicit follow-up end times / censoring, this is **not** a classical risk-set AUC with IPCW.
|
|
|
|
|
Use it for **model comparison and diagnostics**, not strict statistical interpretation.
|
|
|
|
|
|
|
|
|
|
- **Brier score** reported by `evaluate_horizon.py` is an unadjusted diagnostic/proxy metric (no censoring adjustment).
|
|
|
|
|
Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.
|
|
|
|
|
|
|
|
|
|
### Example
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# Next-event (no --horizons)
|
|
|
|
|
python evaluate_next_event.py \
|
|
|
|
|
--run_dir runs/your_run \
|
|
|
|
|
--tau_short 0.25 \
|
|
|
|
|
--age_bins 40 45 50 55 60 65 70 inf \
|
|
|
|
|
--device cuda \
|
|
|
|
|
--batch_size 256 \
|
|
|
|
|
--seed 0
|
|
|
|
|
|
|
|
|
|
# Horizon-capture
|
|
|
|
|
python evaluate_horizon.py \
|
|
|
|
|
--run_dir runs/your_run \
|
|
|
|
|
--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \
|
|
|
|
|
--age_bins 40 45 50 55 60 65 70 inf \
|
|
|
|
|
--device cuda \
|
|
|
|
|
--batch_size 256 \
|
|
|
|
|
--seed 0
|
|
|
|
|
```
|
|
|
|
|
|