Add NGen3 #36901

Thishyaketh · 2025-03-22T07:59:15Z

This PR Fixes Issues in uploading NGen 3 to Transformers

Thishyaketh · 2025-03-22T08:11:33Z

@ArthurZucker and @Rocketknight1
Hope it works this time!

ArthurZucker · 2025-03-27T13:18:33Z

src/transformers/models/ngen3/modular_ngen3.py

+class Block(nn.Module):
+    def __init__(self, config: NGEN3Config):
+        super().__init__()
+        self.ln1 = nn.LayerNorm(config.n_embd)
+        self.attn = CausalSelfAttention(config)
+        self.ln2 = nn.LayerNorm(config.n_embd)
+        self.mlp = MoEMLP(config) if config.use_moe else MLP(config)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # Residual connection for attention
+        residual = x
+        x = self.ln1(x)
+        x = self.attn(x)
+        x = residual + x
+        # Residual connection for feedforward
+        residual = x
+        x = self.ln2(x)
+        x = self.mlp(x)
+        return residual + x


can you try to align variable names / layer names with LlamaDecoderLayer here?

ArthurZucker · 2025-03-27T13:19:06Z

src/transformers/models/ngen3/modular_ngen3.py

+        x = self.mlp(x)
+        return residual + x
+
+class CausalSelfAttention(nn.Module):


this should be completely equivalent to a variation of LlamaAttention with fused qkv. Nothing very new and the mask should not be saved there

ArthurZucker · 2025-03-27T13:20:33Z

@Thishyaketh we need a proper description of the PR with:

original checkpiont release link
original checkpoint paper implementing this
and something like this to introduce the model.
Make sure you use transformers-cli add-new-model-like + try to explain what is different in your model compared to llama? 🤗

Thishyaketh · 2025-03-29T03:24:18Z

Thank you @ArthurZucker sir for the feedback I will make all the changes required and the paper + model check point link with difference between Llama and NGen 3

Add NGen3

23c17dd

Thishyaketh marked this pull request as ready for review March 22, 2025 08:09

github-actions bot requested review from ArthurZucker and Rocketknight1 March 22, 2025 08:09

Merge branch 'main' into Add_NGen3

c91d61f

ArthurZucker reviewed Mar 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NGen3 #36901

Add NGen3 #36901

Thishyaketh commented Mar 22, 2025

Thishyaketh commented Mar 22, 2025

ArthurZucker Mar 27, 2025

ArthurZucker Mar 27, 2025

ArthurZucker commented Mar 27, 2025

Thishyaketh commented Mar 29, 2025

Add NGen3 #36901

Are you sure you want to change the base?

Add NGen3 #36901

Conversation

Thishyaketh commented Mar 22, 2025

This PR Fixes Issues in uploading NGen 3 to Transformers

Thishyaketh commented Mar 22, 2025

ArthurZucker Mar 27, 2025

Choose a reason for hiding this comment

ArthurZucker Mar 27, 2025

Choose a reason for hiding this comment

ArthurZucker commented Mar 27, 2025

Thishyaketh commented Mar 29, 2025