Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

NVIDIA / Megatron-LM Public

Notifications
Fork 2.7k
Star 12k

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/Megatron-LM

Labels 11 Milestones 0

Labels 11 Milestones 0

New pull request New

177 Open 280 Closed

177 Open 280 Closed

Author

Filter by author

Loading

Label

Filter by label

Loading

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Loading

Milestones

Filter by milestone

Loading

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Loading

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Updating the logic for reducing the load_balancing_loss during logging, such that the correct value is logged while using CUDA Graphs

#1507 opened Mar 27, 2025 by arjun-choudhry

1

Fix typo on distrib_optimizer.py

#1505 opened Mar 26, 2025 by wplf

fix for group_limited_topk: K_r is moe_router_topk instead of moe_router_num_groups

#1502 opened Mar 25, 2025 by ladyrick

fix: MultiLatentAttention cp_comm_type

#1499 opened Mar 24, 2025 by RandMist

[Bug Fix] fix p2p communication order error and stuck problems when pp 2 and vpp 2 with remove pad

#1495 opened Mar 22, 2025 by ETOgaosion

1

Fix llama_mistral loader by using args.true_vocab_size

#1491 opened Mar 20, 2025 by zhuzilin

vscode/cursor devcontainer

#1483 opened Mar 14, 2025 by yzhang123

Build dataset for all GPUs with tp_rank=0 and pp_rank=0 or -1 in multi-machine training.

#1480 opened Mar 14, 2025 by wan-nan

Set hashlib.md5 usedforsecurity=False, #1471

#1472 opened Mar 12, 2025 by jsta

Enabling variable_seq_lengths when encoder has Different TP Size

#1470 opened Mar 12, 2025 by xiaojunjie

fix(moe): the missing argument 'router_dtype' of _DeepepManager.__init__

#1463 opened Mar 11, 2025 by AsakusaRinne

Draft: Youngeun/a2a hiding

#1460 opened Mar 10, 2025 by lhb8125

[ENHANCEMENT] add z-loss (improved version)

#1442 opened Feb 28, 2025 by wdevazelhes

Replace deprecated numpy.product with numpy.prod to ensure compatibility with NumPy >=2.0

#1440 opened Feb 27, 2025 by mustious

fix seq_aux_loss for DeepSeek-V3

#1439 opened Feb 27, 2025 by yzlnew

fix a bug in load balancing loss aggregation when recompute is turned on

#1433 opened Feb 26, 2025 by lyuwen

a proof of concept for Distributed Muon

#1428 opened Feb 24, 2025 by toothacher17

fix: return float instead of tensor from get_rotary_seq_len

#1419 opened Feb 20, 2025 by jasonchiu-codeium

Fix document regarding GQA (--group-query-attention) argument

#1401 opened Feb 12, 2025 by eagle705

Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc

#1397 opened Feb 11, 2025 by yeahdongcn

Enabling Alternative Path ABC implementations

#1393 opened Feb 10, 2025 by ashvinnihalani

Fix typo in GPTModel forward function comments

#1391 opened Feb 9, 2025 by Zzhiter

support bf16 dtype for optimizer states using precision-aware optimizer in TransformerEngine

#1390 opened Feb 8, 2025 by XiaobingSuper • Draft

add qkv_bias

#1388 opened Feb 7, 2025 by Chandler-Bing

Update LICENSE

#1382 opened Feb 6, 2025 by maximevtush

Previous 1 2 3 4 5 6 7 8 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.