Skip to content

Pull requests: deepspeedai/Megatron-DeepSpeed

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Add support for DS comms
#50 opened Jun 13, 2022 by Quentin-Anthony updated Jul 12, 2023
attempt at pipelining
#78 opened Aug 18, 2022 by siddharth9820 Loading… updated Jul 12, 2023
add vit training with TP/PP
#146 opened Jun 9, 2023 by etoilestar Loading… updated Jul 12, 2023
Simplify SP - Opportunity to improve SP scalability
#301 opened Nov 28, 2023 by RezaYazdaniAminabadi Loading… updated Nov 28, 2023
Enable torch.compile
#322 opened Dec 28, 2023 by tohtana Draft updated Dec 28, 2023
optimize the generation of attention mask
#331 opened Jan 13, 2024 by imh966 Loading… updated Jan 13, 2024
support transfer llama hf weight to megatron weight
#246 opened Sep 12, 2023 by uygnef Loading… updated Jan 23, 2024
collect grad_norm for non pipeline path
#370 opened Mar 21, 2024 by inkcherry Loading… updated Mar 21, 2024
Support Llama2Tokenizer
#375 opened Apr 11, 2024 by jinyouzhi Loading… updated Apr 11, 2024
Fix ConstantGradScaler and loss-scale argument not match
#376 opened Apr 12, 2024 by BeingGod Loading… updated Apr 12, 2024
convert mds checkpoint to Hf Llama model
#394 opened May 31, 2024 by vksastry Loading… updated May 31, 2024
ds-sequence-parallel(ulysses) for rope.
#392 opened May 30, 2024 by inkcherry Loading… updated Jun 3, 2024
Add layer norm weight plus 1
#378 opened Apr 18, 2024 by Yejing-Lai Loading… updated Jun 7, 2024
fix NAN loss of rope long context training
#399 opened Jun 5, 2024 by inkcherry Loading… updated Jun 13, 2024
fix --use-cpu-initialization error when expert is not tensor-parallel
#413 opened Jul 3, 2024 by taozhiwei Loading… updated Jul 13, 2024
add HFTokenizer option for preprocess_data
#388 opened May 17, 2024 by Jianhong-Zhang Loading… updated Jul 25, 2024
support split qkv linear and sp overlap comm
#415 opened Jul 5, 2024 by inkcherry Loading… updated Dec 6, 2024
FastPersist rebase
#467 opened Feb 25, 2025 by tjruwase Loading… updated Feb 25, 2025
[BUG]Fix the error issue for q/k/v stride is not match for non FPDT scenarios.
#469 opened Mar 17, 2025 by ys950902 Loading… updated Mar 17, 2025
ProTip! Follow long discussions with comments:>50.