DeepHealth/README.md

# DeepHealth

## Evaluation

This repo includes two event-driven evaluation entrypoints:

- `evaluate_next_event.py`: next-event prediction using short-window CIF
- `evaluate_horizon.py`: horizon-capture evaluation using CIF at multiple horizons

### IMPORTANT metric disclaimers

- **AUC** reported by `evaluate_horizon.py` is “time-dependent” only because the label depends on the chosen horizon $\tau$.
	Without explicit follow-up end times / censoring, this is **not** a classical risk-set AUC with IPCW.
	Use it for **model comparison and diagnostics**, not strict statistical interpretation.

- **Brier score** reported by `evaluate_horizon.py` is an unadjusted diagnostic/proxy metric (no censoring adjustment).
	Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.

### Example

```bash
# Next-event (no --horizons)
python evaluate_next_event.py \
	--run_dir runs/your_run \
	--tau_short 0.25 \
	--age_bins 40 45 50 55 60 65 70 inf \
	--device cuda \
	--batch_size 256 \
	--seed 0

# Horizon-capture
python evaluate_horizon.py \
	--run_dir runs/your_run \
	--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \
	--age_bins 40 45 50 55 60 65 70 inf \
	--device cuda \
	--batch_size 256 \
	--seed 0
```
Initial commit 2026-01-07 21:27:38 +08:00			`# DeepHealth`

Add evaluation scripts for next-event prediction and horizon-capture evaluation with detailed metric disclaimers 2026-01-17 13:49:39 +08:00			`## Evaluation`

			`This repo includes two event-driven evaluation entrypoints:`

			- `evaluate_next_event.py`: next-event prediction using short-window CIF
			- `evaluate_horizon.py`: horizon-capture evaluation using CIF at multiple horizons

			`### IMPORTANT metric disclaimers`

			- AUC reported by `evaluate_horizon.py` is “time-dependent” only because the label depends on the chosen horizon $\tau$.
			`Without explicit follow-up end times / censoring, this is not a classical risk-set AUC with IPCW.`
			`Use it for model comparison and diagnostics, not strict statistical interpretation.`

			- Brier score reported by `evaluate_horizon.py` is an unadjusted diagnostic/proxy metric (no censoring adjustment).
			`Use it to detect probability-mass compression / numerical stability issues; do not claim calibrated absolute risk.`

			`### Example`

			```bash
			`# Next-event (no --horizons)`
			`python evaluate_next_event.py \`
			`--run_dir runs/your_run \`
			`--tau_short 0.25 \`
			`--age_bins 40 45 50 55 60 65 70 inf \`
			`--device cuda \`
			`--batch_size 256 \`
			`--seed 0`

			`# Horizon-capture`
			`python evaluate_horizon.py \`
			`--run_dir runs/your_run \`
			`--horizons 0.25 0.5 1.0 2.0 5.0 10.0 \`
			`--age_bins 40 45 50 55 60 65 70 inf \`
			`--device cuda \`
			`--batch_size 256 \`
			`--seed 0`
			```