Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NGen3 #36901

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add NGen3 #36901

wants to merge 2 commits into from

Conversation

Thishyaketh
Copy link

This PR Fixes Issues in uploading NGen 3 to Transformers

@Thishyaketh Thishyaketh marked this pull request as ready for review March 22, 2025 08:09
@Thishyaketh
Copy link
Author

@ArthurZucker and @Rocketknight1
Hope it works this time!

Comment on lines +78 to +96
class Block(nn.Module):
def __init__(self, config: NGEN3Config):
super().__init__()
self.ln1 = nn.LayerNorm(config.n_embd)
self.attn = CausalSelfAttention(config)
self.ln2 = nn.LayerNorm(config.n_embd)
self.mlp = MoEMLP(config) if config.use_moe else MLP(config)

def forward(self, x: torch.Tensor) -> torch.Tensor:
# Residual connection for attention
residual = x
x = self.ln1(x)
x = self.attn(x)
x = residual + x
# Residual connection for feedforward
residual = x
x = self.ln2(x)
x = self.mlp(x)
return residual + x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you try to align variable names / layer names with LlamaDecoderLayer here?

x = self.mlp(x)
return residual + x

class CausalSelfAttention(nn.Module):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be completely equivalent to a variation of LlamaAttention with fused qkv. Nothing very new and the mask should not be saved there

@ArthurZucker
Copy link
Collaborator

@Thishyaketh we need a proper description of the PR with:

  • original checkpiont release link
  • original checkpoint paper implementing this
    and something like this to introduce the model.
    Make sure you use transformers-cli add-new-model-like + try to explain what is different in your model compared to llama? 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants