-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen 2.5 fails to load: Key weight not found in Linear #214
Comments
I have observed this with a a larger Qwen model as well. |
Ah, I have been banging my head on this for the whole day. So the problem is not on my end, phew. Edit: I think it is something related to the added error here in this PR ml-explore/mlx-swift#174? |
The error means that the parameters that are being loaded are missing an expected value, specifies that it wants In particular this means that you think you have loaded the model, but actually one of the Linear layers still has random values for the |
Qwen2VL.swift
fileprivate class LanguageModel: Module, KVCacheDimensionProvider {
...
@ModuleInfo(key: "lm_head") var lmHead: Linear?
public init(_ args: Qwen2VLConfiguration.TextConfiguration) {
self.model = Qwen2Model(args)
if !args.tieWordEmbeddings {
_lmHead.wrappedValue = Linear(args.hiddenSize, args.vocabularySize, bias: false)
} and indeed public func callAsFunction(
_ inputs: MLXArray?, cache: [KVCache]? = nil, inputEmbedding: MLXArray? = nil
) -> LMOutput {
var out = model(inputs, cache: cache, inputEmbedding: inputEmbedding)
if let lmHead {
out = lmHead(out)
} else {
out = model.embedTokens.asLinear(out)
}
return LMOutput(logits: out)
} Would be applying random weights. The check now works, but I think this means the model weights are incorrect. |
Ah, just a sec, I am mixing my models. This one fails: and that is the LLM version, not the VLM. Same issue I described -- there is no If we go back to the python version: and the swift VLM code doesn't match -- it doesn't look at if configuration.tieWordEmbeddings {
out = model.embedTokens.asLinear(out)
} else {
out = lmHead(out)
} That will work (evaluate correctly), but isn't done correctly in terms of parameter validation. It does half-way match the python code though :-) |
Moving to mlx-swift-examples -- the bug is actually there, just detected in mlx-swift :-) |
@adrgrondin provided some ids in #210 mlx-community/Qwen1.5-0.5B-Chat-4bit |
- Qwen2 (LLM) had slightly incorrect logic in the initialization regarding lm_head - it was initialized even if not used, but this causes parameter loading to fail with current 0.21.3 mlx-swift
#215) - Qwen2 (LLM) had slightly incorrect logic in the initialization regarding lm_head - it was initialized even if not used, but this causes parameter loading to fail with current 0.21.3 mlx-swift
I've verified in mlx-swift-examples that the change from 0.21.2 to 0.21.3 causes Qwen 2.5 0.5B and 1.5B to fail to load with the error "Key weight not found in Linear".
The text was updated successfully, but these errors were encountered: