feat: Add load_model function and update training script

Added a `load_model` function to `utils.py` to allow loading of trained models from configuration and state dictionary files. The `train_iter.py` script was also modified, likely to incorporate or test this new functionality.
2025-10-18 11:07:59 +08:00
parent f7356b183c
commit a631ac6d59
2 changed files with 48 additions and 2 deletions
--- a/train_iter.py
+++ b/train_iter.py
@@ -23,8 +23,8 @@ class TrainConfig:
    n_embd = 120
    n_layer = 12
    n_head = 12
-    pdrop = 0.1
-    token_pdrop = 0.1
+    pdrop = 0.0
+    token_pdrop = 0.0

    # Training parameters
    max_iter = 200000