deepspeedai / Megatron-DeepSpeed Public

forked from NVIDIA/Megatron-LM

Notifications
Fork 352
Star 2k

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: deepspeedai/Megatron-DeepSpeed

Labels 9 Milestones 0

New pull request New

20 Open 257 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[BUG]Fix the error issue for q/k/v stride is not match for non FPDT scenarios.

#469 opened Mar 17, 2025 by ys950902 • Review required

[Bug]Add sequence_parallel in layernorm init to enable 3D parallelism with DeepSpeed for non CUDA device.

#468 opened Feb 28, 2025 by ys950902 • Review required

FastPersist rebase

#467 opened Feb 25, 2025 by tjruwase

support split qkv linear and sp overlap comm

#415 opened Jul 5, 2024 by inkcherry • Review required

fix --use-cpu-initialization error when expert is not tensor-parallel

#413 opened Jul 3, 2024 by taozhiwei • Review required

fix NAN loss of rope long context training

#399 opened Jun 5, 2024 by inkcherry • Review required

convert mds checkpoint to Hf Llama model

#394 opened May 31, 2024 by vksastry • Review required

ds-sequence-parallel(ulysses) for rope.

#392 opened May 30, 2024 by inkcherry • Review required

add HFTokenizer option for preprocess_data

#388 opened May 17, 2024 by Jianhong-Zhang • Review required

Add layer norm weight plus 1

#378 opened Apr 18, 2024 by Yejing-Lai • Review required

Fix ConstantGradScaler and loss-scale argument not match

#376 opened Apr 12, 2024 by BeingGod • Review required

Support Llama2Tokenizer

#375 opened Apr 11, 2024 by jinyouzhi • Review required

collect grad_norm for non pipeline path

#370 opened Mar 21, 2024 by inkcherry • Review required

optimize the generation of attention mask

#331 opened Jan 13, 2024 by imh966 • Review required

Enable torch.compile

#322 opened Dec 28, 2023 by tohtana • Draft

Simplify SP - Opportunity to improve SP scalability

#301 opened Nov 28, 2023 by RezaYazdaniAminabadi • Review required

support transfer llama hf weight to megatron weight

#246 opened Sep 12, 2023 by uygnef • Review required

add vit training with TP/PP

#146 opened Jun 9, 2023 by etoilestar

attempt at pipelining

#78 opened Aug 18, 2022 by siddharth9820

Add support for DS comms

#50 opened Jun 13, 2022 by Quentin-Anthony

ProTip! What’s not been updated in a month: updated:<2025-02-24.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly