feat: Add multi-GPU training and improve config/ignore

Add train_multigpu.py for distributed data parallel training. Update train.py to save the training configuration to a JSON file. Generalize .gitignore to exclude all *.pt checkpoint files. Delete obsolete train_dpp.py file.
2025-10-17 14:09:34 +08:00
parent 053f86f4da
commit d760c45baf
4 changed files with 282 additions and 401 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,7 +5,7 @@
 __pycache__/

 # Model checkpoints
-best_model_checkpoint.pt
+*.pt

 # Large data files
 ukb_delphi.txt