Pulse · NVIDIA/Megatron-LM

March 19, 2025 – March 26, 2025

Overview

5 Active pull requests

11 Active issues
- 0 Merged pull requests
- 5 Open pull requests
- 4 Closed issues
- 7 New issues

Excluding merges, 31 authors have pushed 43 commits to main and 43 commits to all branches. On main, 267 files have changed and there have been 4,919 additions and 1,097 deletions.

15 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[ENHANCEMENT] Multi-token Prediction(MTP) support
#1404 commented on Mar 21, 2025 • 0 new comments
[QUESTION] plan to implement zero bubble pipeline or dual pipeline and MoE comm-comp overlapping
#1399 commented on Mar 21, 2025 • 0 new comments
[BUG] Checkpoint state dict remapping is not applied for MLA layers
#1417 commented on Mar 21, 2025 • 0 new comments
[BUG] Token routing probability all-gather precision in token_dispatcher causes differing results between EP ranks
#1421 commented on Mar 21, 2025 • 0 new comments
[QUESTION]convert LLaMA2-7B to the Megatron format failed: the converted model only repeats meaningless numbers
#1365 commented on Mar 23, 2025 • 0 new comments
[BUG] When trying to convert llama2-7b model from HF format to megatron format
#1348 commented on Mar 23, 2025 • 0 new comments
[QUESTION] checkpointing/loading memory overhead
#1380 commented on Mar 24, 2025 • 0 new comments
[BUG] can't load saved fp8 checkpoint when resume training
#1350 commented on Mar 24, 2025 • 0 new comments
[BUG] Using fp16 uses more memory than using fp32
#1349 commented on Mar 24, 2025 • 0 new comments
[QUESTION] Performance Impact of Using item() in `total_num_tokens += num_tokens.item()` in megatron/core/pipeline_parallel/schedules.py
#1403 commented on Mar 25, 2025 • 0 new comments
[QUESTION] Does MLA in Megatron-Core support PackedSeqParams?
#1398 commented on Mar 25, 2025 • 0 new comments
[ENHANCEMENT] Replace the hardcoded top-2 sum group selection strategy with configurable top-k
#1441 commented on Mar 25, 2025 • 0 new comments
[BUG]loss error when using MLA
#1445 commented on Mar 26, 2025 • 0 new comments
Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1262 commented on Mar 25, 2025 • 0 new comments
Add Mamba TRTLLM support
#1320 commented on Mar 25, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

March 19, 2025 – March 26, 2025

Overview

5 Pull requests opened by 5 people

4 Issues closed by 4 people

7 Issues opened by 7 people

15 Unresolved conversations

Insights: NVIDIA/Megatron-LM

March 19, 2025 – March 26, 2025

Overview

5 Pull requests opened by 5 people

4 Issues closed by 4 people

7 Issues opened by 7 people

15 Unresolved conversations