feat: Update model and training parameters

In `models.py`:
- Change temporal attention mask to be strictly causal (`<` instead of `<=`).
- Add self-attention for the first token in a sequence to prevent NaNs.

In `train.py`:
- Update hyperparameters:
  - `block_length`: 24 -> 48
  - `n_embd`: 256 -> 120
  - `n_layer`: 8 -> 12
  - `n_head`: 8 -> 12
This commit is contained in:
2025-10-16 18:50:15 +08:00
parent e2495f43b0
commit cb7575a229
2 changed files with 13 additions and 6 deletions

View File

@@ -15,12 +15,12 @@ class TrainConfig:
# Data parameters
train_data_path = 'ukb_real_train.bin'
val_data_path = 'ukb_real_val.bin'
block_length = 24 # Sequence length
block_length = 48 # Sequence length
# Model parameters
n_embd = 256
n_layer = 8
n_head = 8
n_embd = 120
n_layer = 12
n_head = 12
pdrop = 0.1
token_pdrop = 0.1