Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen 2.5 fails to load: Key weight not found in Linear #214

Open
DePasqualeOrg opened this issue Feb 14, 2025 · 7 comments
Open

Qwen 2.5 fails to load: Key weight not found in Linear #214

DePasqualeOrg opened this issue Feb 14, 2025 · 7 comments

Comments

@DePasqualeOrg
Copy link
Contributor

DePasqualeOrg commented Feb 14, 2025

I've verified in mlx-swift-examples that the change from 0.21.2 to 0.21.3 causes Qwen 2.5 0.5B and 1.5B to fail to load with the error "Key weight not found in Linear".

@deet
Copy link

deet commented Feb 16, 2025

I have observed this with a a larger Qwen model as well.

@rudrankriyam
Copy link
Contributor

rudrankriyam commented Feb 16, 2025

Ah, I have been banging my head on this for the whole day. So the problem is not on my end, phew.

Edit: I think it is something related to the added error here in this PR ml-explore/mlx-swift#174?

@davidkoski
Copy link
Collaborator

The error means that the parameters that are being loaded are missing an expected value, weight for a Linear layer. This is triggered because of:

specifies that it wants .all validation.

In particular this means that you think you have loaded the model, but actually one of the Linear layers still has random values for the weight parameter -- to me that seems like a bug in the saved parameters, assuming the values are indeed missing.

@davidkoski
Copy link
Collaborator

Qwen2VL.swift

    fileprivate class LanguageModel: Module, KVCacheDimensionProvider {
...
        @ModuleInfo(key: "lm_head") var lmHead: Linear?

        public init(_ args: Qwen2VLConfiguration.TextConfiguration) {
            self.model = Qwen2Model(args)

            if !args.tieWordEmbeddings {
                _lmHead.wrappedValue = Linear(args.hiddenSize, args.vocabularySize, bias: false)
            }

and indeed tieWordEmbeddings is true and there is no lm_head in the weights, so if this were to be loaded:

        public func callAsFunction(
            _ inputs: MLXArray?, cache: [KVCache]? = nil, inputEmbedding: MLXArray? = nil
        ) -> LMOutput {
            var out = model(inputs, cache: cache, inputEmbedding: inputEmbedding)
            if let lmHead {
                out = lmHead(out)
            } else {
                out = model.embedTokens.asLinear(out)
            }
            return LMOutput(logits: out)
        }

Would be applying random weights.

The check now works, but I think this means the model weights are incorrect.

@davidkoski
Copy link
Collaborator

Ah, just a sec, I am mixing my models. This one fails:

and that is the LLM version, not the VLM. Same issue I described -- there is no lm_head in the safe tensors.

If we go back to the python version:

and the swift VLM code doesn't match -- it doesn't look at if not args.tie_word_embeddings. Instead it was handled like this:

        if configuration.tieWordEmbeddings {
            out = model.embedTokens.asLinear(out)
        } else {
            out = lmHead(out)
        }

That will work (evaluate correctly), but isn't done correctly in terms of parameter validation. It does half-way match the python code though :-)

@davidkoski davidkoski transferred this issue from ml-explore/mlx-swift Feb 27, 2025
@davidkoski
Copy link
Collaborator

Moving to mlx-swift-examples -- the bug is actually there, just detected in mlx-swift :-)

@davidkoski
Copy link
Collaborator

@adrgrondin provided some ids in #210

mlx-community/Qwen1.5-0.5B-Chat-4bit
mlx-community/Qwen2.5-7B-Instruct-4bit
mlx-community/Qwen2.5-1.5B-Instruct-4bit

davidkoski added a commit that referenced this issue Feb 27, 2025
- Qwen2 (LLM) had slightly incorrect logic in the initialization regarding lm_head
- it was initialized even if not used, but this causes parameter loading to fail with current 0.21.3 mlx-swift
davidkoski added a commit that referenced this issue Feb 27, 2025
#215)

- Qwen2 (LLM) had slightly incorrect logic in the initialization regarding lm_head
- it was initialized even if not used, but this causes parameter loading to fail with current 0.21.3 mlx-swift
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants