forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 353
Pull requests: deepspeedai/Megatron-DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Simplify SP - Opportunity to improve SP scalability
#301
opened Nov 28, 2023 by
RezaYazdaniAminabadi
Loading…
updated Nov 28, 2023
optimize the generation of attention mask
#331
opened Jan 13, 2024 by
imh966
Loading…
updated Jan 13, 2024
support transfer llama hf weight to megatron weight
#246
opened Sep 12, 2023 by
uygnef
Loading…
updated Jan 23, 2024
collect grad_norm for non pipeline path
#370
opened Mar 21, 2024 by
inkcherry
Loading…
updated Mar 21, 2024
Fix ConstantGradScaler and loss-scale argument not match
#376
opened Apr 12, 2024 by
BeingGod
Loading…
updated Apr 12, 2024
convert mds checkpoint to Hf Llama model
#394
opened May 31, 2024 by
vksastry
Loading…
updated May 31, 2024
ds-sequence-parallel(ulysses) for rope.
#392
opened May 30, 2024 by
inkcherry
Loading…
updated Jun 3, 2024
fix NAN loss of rope long context training
#399
opened Jun 5, 2024 by
inkcherry
Loading…
updated Jun 13, 2024
fix --use-cpu-initialization error when expert is not tensor-parallel
#413
opened Jul 3, 2024 by
taozhiwei
Loading…
updated Jul 13, 2024
add HFTokenizer option for preprocess_data
#388
opened May 17, 2024 by
Jianhong-Zhang
Loading…
updated Jul 25, 2024
support split qkv linear and sp overlap comm
#415
opened Jul 5, 2024 by
inkcherry
Loading…
updated Dec 6, 2024
[Bug]Add sequence_parallel in layernorm init to enable 3D parallelism with DeepSpeed for non CUDA device.
#468
opened Feb 28, 2025 by
ys950902
Loading…
updated Feb 28, 2025
[BUG]Fix the error issue for q/k/v stride is not match for non FPDT scenarios.
#469
opened Mar 17, 2025 by
ys950902
Loading…
updated Mar 17, 2025
ProTip!
Follow long discussions with comments:>50.