-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Insights: NVIDIA/Megatron-LM
Overview
-
- 0 Merged pull requests
- 5 Open pull requests
- 4 Closed issues
- 7 New issues
5 Pull requests opened by 5 people
-
Fix llama_mistral loader by using args.true_vocab_size
#1491 opened
Mar 20, 2025 -
[Bug Fix] fix p2p communication order error and stuck problems when pp 2 and vpp 2 with remove pad
#1495 opened
Mar 22, 2025 -
fix: MultiLatentAttention cp_comm_type
#1499 opened
Mar 24, 2025 -
fix for group_limited_topk: K_r is moe_router_topk instead of moe_router_num_groups
#1502 opened
Mar 25, 2025 -
Fix typo on distrib_optimizer.py
#1505 opened
Mar 26, 2025
4 Issues closed by 4 people
-
[BUG] kv_channels get wrong for MLA
#1501 closed
Mar 25, 2025 -
[BUG] Incorrect `seq_aux_loss` impl for DeepSeek-V3
#1438 closed
Mar 24, 2025 -
[BUG] "ValueError: optimizer got an empty parameter list" under pipeline parallel
#1166 closed
Mar 20, 2025
7 Issues opened by 7 people
-
[QUESTION]How to set the routed scaling factor
#1504 opened
Mar 26, 2025 -
[BUG] Wrong attention gradient in Transformer Engine
#1503 opened
Mar 26, 2025 -
[BUG] T5 Model does not work when TP size is 1
#1500 opened
Mar 24, 2025 -
[ENHANCEMENT] Global Batch Load Balancing for MoE Models
#1498 opened
Mar 23, 2025 -
[BUG] Can not load _extra_state with TorchDistLoadShardedStrategy
#1497 opened
Mar 23, 2025 -
[QUESTION] fp8 can not set pp>1
#1496 opened
Mar 22, 2025 -
[BUG] Load Balancing loss discrepancy with/without CUDA Graphs
#1494 opened
Mar 21, 2025
15 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[ENHANCEMENT] Multi-token Prediction(MTP) support
#1404 commented on
Mar 21, 2025 • 0 new comments -
[QUESTION] plan to implement zero bubble pipeline or dual pipeline and MoE comm-comp overlapping
#1399 commented on
Mar 21, 2025 • 0 new comments -
[BUG] Checkpoint state dict remapping is not applied for MLA layers
#1417 commented on
Mar 21, 2025 • 0 new comments -
[BUG] Token routing probability all-gather precision in token_dispatcher causes differing results between EP ranks
#1421 commented on
Mar 21, 2025 • 0 new comments -
[QUESTION]convert LLaMA2-7B to the Megatron format failed: the converted model only repeats meaningless numbers
#1365 commented on
Mar 23, 2025 • 0 new comments -
[BUG] When trying to convert llama2-7b model from HF format to megatron format
#1348 commented on
Mar 23, 2025 • 0 new comments -
[QUESTION] checkpointing/loading memory overhead
#1380 commented on
Mar 24, 2025 • 0 new comments -
[BUG] can't load saved fp8 checkpoint when resume training
#1350 commented on
Mar 24, 2025 • 0 new comments -
[BUG] Using fp16 uses more memory than using fp32
#1349 commented on
Mar 24, 2025 • 0 new comments -
[QUESTION] Performance Impact of Using item() in `total_num_tokens += num_tokens.item()` in megatron/core/pipeline_parallel/schedules.py
#1403 commented on
Mar 25, 2025 • 0 new comments -
[QUESTION] Does MLA in Megatron-Core support PackedSeqParams?
#1398 commented on
Mar 25, 2025 • 0 new comments -
[ENHANCEMENT] Replace the hardcoded top-2 sum group selection strategy with configurable top-k
#1441 commented on
Mar 25, 2025 • 0 new comments -
[BUG]loss error when using MLA
#1445 commented on
Mar 26, 2025 • 0 new comments -
Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1262 commented on
Mar 25, 2025 • 0 new comments -
Add Mamba TRTLLM support
#1320 commented on
Mar 25, 2025 • 0 new comments