Jiarui Li cb7575a229 feat: Update model and training parameters
In `models.py`:
- Change temporal attention mask to be strictly causal (`<` instead of `<=`).
- Add self-attention for the first token in a sequence to prevent NaNs.

In `train.py`:
- Update hyperparameters:
  - `block_length`: 24 -> 48
  - `n_embd`: 256 -> 120
  - `n_layer`: 8 -> 12
  - `n_head`: 8 -> 12
2025-10-16 18:50:15 +08:00
2025-10-15 13:54:52 +08:00
2025-10-15 13:54:52 +08:00
2025-10-16 16:58:30 +08:00

DeepHealth

Description
No description provided
Readme BSD-3-Clause 145 MiB
Languages
Python 60.9%
Jupyter Notebook 39.1%