-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix for group_limited_topk: K_r is moe_router_topk instead of moe_router_num_groups
#1502
opened Mar 25, 2025 by
ladyrick
[Bug Fix] fix p2p communication order error and stuck problems when pp 2 and vpp 2 with remove pad
#1495
opened Mar 22, 2025 by
ETOgaosion
Build dataset for all GPUs with tp_rank=0 and pp_rank=0 or -1 in multi-machine training.
#1480
opened Mar 14, 2025 by
wan-nan
Enabling variable_seq_lengths when encoder has Different TP Size
#1470
opened Mar 12, 2025 by
xiaojunjie
fix(moe): the missing argument 'router_dtype' of _DeepepManager.__init__
#1463
opened Mar 11, 2025 by
AsakusaRinne
Replace deprecated numpy.product with numpy.prod to ensure compatibility with NumPy >=2.0
#1440
opened Feb 27, 2025 by
mustious
fix a bug in load balancing loss aggregation when recompute is turned on
#1433
opened Feb 26, 2025 by
lyuwen
fix: return float instead of tensor from
get_rotary_seq_len
#1419
opened Feb 20, 2025 by
jasonchiu-codeium
Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc
#1397
opened Feb 11, 2025 by
yeahdongcn
support bf16 dtype for optimizer states using precision-aware optimizer in TransformerEngine
#1390
opened Feb 8, 2025 by
XiaobingSuper
•
Draft
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.