cb7575a229b8c7d6ba8501cb4dca0d372e19da52
In `models.py`: - Change temporal attention mask to be strictly causal (`<` instead of `<=`). - Add self-attention for the first token in a sequence to prevent NaNs. In `train.py`: - Update hyperparameters: - `block_length`: 24 -> 48 - `n_embd`: 256 -> 120 - `n_layer`: 8 -> 12 - `n_head`: 8 -> 12
DeepHealth
Description
Languages
Python
60.9%
Jupyter Notebook
39.1%