Add evaluation scripts for next-event prediction and horizon-capture evaluation with detailed metric disclaimers
This commit is contained in:
38
README.md
38
README.md
@@ -1,2 +1,40 @@
|
||||
# DeepHealth
|
||||
|
||||
## Evaluation
|
||||
|
||||
This repo includes two event-driven evaluation entrypoints:
|
||||
|
||||
- `evaluate_next_event.py`: next-event prediction using short-window CIF
|
||||
- `evaluate_horizon.py`: horizon-capture evaluation using CIF at multiple horizons
|
||||
|
||||
### IMPORTANT metric disclaimers
|
||||
|
||||
- **AUC** reported by `evaluate_horizon.py` is “time-dependent” only because the label depends on the chosen horizon $\tau$.
|
||||
Without explicit follow-up end times / censoring, this is **not** a classical risk-set AUC with IPCW.
|
||||
Use it for **model comparison and diagnostics**, not strict statistical interpretation.
|
||||
|
||||
- **Brier score** reported by `evaluate_horizon.py` is an unadjusted diagnostic/proxy metric (no censoring adjustment).
|
||||
Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.
|
||||
|
||||
### Example
|
||||
|
||||
```bash
|
||||
# Next-event (no --horizons)
|
||||
python evaluate_next_event.py \
|
||||
--run_dir runs/your_run \
|
||||
--tau_short 0.25 \
|
||||
--age_bins 40 45 50 55 60 65 70 inf \
|
||||
--device cuda \
|
||||
--batch_size 256 \
|
||||
--seed 0
|
||||
|
||||
# Horizon-capture
|
||||
python evaluate_horizon.py \
|
||||
--run_dir runs/your_run \
|
||||
--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \
|
||||
--age_bins 40 45 50 55 60 65 70 inf \
|
||||
--device cuda \
|
||||
--batch_size 256 \
|
||||
--seed 0
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user