feat: Update model and training parameters

In `models.py`: - Change temporal attention mask to be strictly causal (`<` instead of `<=`). - Add self-attention for the first token in a sequence to prevent NaNs. In `train.py`: - Update hyperparameters: - `block_length`: 24 -> 48 - `n_embd`: 256 -> 120 - `n_layer`: 8 -> 12 - `n_head`: 8 -> 12
2025-10-16 18:50:15 +08:00
parent e2495f43b0
commit cb7575a229
2 changed files with 13 additions and 6 deletions
--- a/train.py
+++ b/train.py
@@ -15,12 +15,12 @@ class TrainConfig:
    # Data parameters
    train_data_path = 'ukb_real_train.bin'
    val_data_path = 'ukb_real_val.bin'
-    block_length = 24  # Sequence length
+    block_length = 48  # Sequence length

    # Model parameters
-    n_embd = 256
-    n_layer = 8
-    n_head = 8
+    n_embd = 120
+    n_layer = 12
+    n_head = 12
    pdrop = 0.1
    token_pdrop = 0.1