Expose all models #2

atdrendel · 2025-02-23T15:27:48Z

This pull request exposes a simple actor for each supported model. The actual implementation of the models is LLM, which is just a convenience wrapper around MLX's stuff.

Each model conforms to a new protocol called ModelProtocol. This allows us to add extra functions for each model in just a single place: ModelProtocol.swift. The first example of this is request(_:maxTokenCount:).

Because of the reentrancy problem with Swift Actors, ModelProtocol.llm is wrapped inside of ActorLock, which is taken from Apple's swift-build/ASyncLock.swift.

According to MLX's documentation, the AI models themselves are not thread-safe, which means calls to them need to be serialized. However, because Swift Actors are reentrant, calling try await llm.request(_:maxTokenCount:) could immediately suspend and allow another reference to the same actor be called. This may not be a problem with library today, but I think it may be in the future, especially when we add support for KVCache. I think it's better to ensure that every call to ModelProtocol.someFunc is transactional, which is what we are doing by wrapping the implementation of every method on ModelProtocol inside of a AsyncLock.

atdrendel · 2025-02-23T19:31:21Z

I can't get Phi 3.5 MoE to work. When I try to run it, I see this error:

Caught error: keyNotFound(base: "SuScaledRotaryEmbedding", key: "_freqs")

Given the size of the model and the limited amount of RAM I have available, I'm not sure I'd even be able to run it if this error weren't thrown. So, I'm just continuing on to the next model.

atdrendel · 2025-02-23T21:25:10Z

I can't get Qwen 1.5 or Qwen 2.5 1.5B to run. They both fail with the same error:

Caught error: keyNotFound(base: "Linear", key: "weight")

As with the above, I'm not going to spend any time on trying to fix these errors. We have plenty of models to pick from, and we can fix it later or wait for an upstream update that fixes them.

atdrendel · 2025-02-23T21:35:47Z

These errors are known:

Qwen 2.5 fails to load: Key weight not found in Linear ml-explore/mlx-swift-examples#214
Error: "Failed: keyNotFound(base: "Linear", key: "weight")" for Qwen LLMs with mlx-swift 0.21.3 ml-explore/mlx-swift-examples#210

atdrendel · 2025-02-23T21:37:21Z

Yeah, I've got the same error for OpenELM:

Caught error: keyNotFound(base: "Linear", key: "weight")

Introduce LLM

cb08840

atdrendel requested a review from myobie February 23, 2025 15:27

atdrendel added 5 commits February 23, 2025 17:00

Add Mistral Nemo and SmolLM

b41f84e

Introduce ModelProtocol and Mistral7B

f62d9e9

Add CodeLlama

54ace6b

Add DeepSeek R1

e1e97bf

Add Phi2 and PhiMoE

f47674b

atdrendel added 2 commits February 23, 2025 21:19

Add Gemma and Gemma 2 models

663c266

Add Qwen models

a1da45d

Add OpenELM

d9140e4

Add Llama models

d923408

atdrendel marked this pull request as ready for review February 23, 2025 22:20

atdrendel merged commit 477fc27 into main Feb 23, 2025

atdrendel deleted the support-all-models branch February 23, 2025 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose all models #2

Expose all models #2

atdrendel commented Feb 23, 2025 •

edited

Loading

atdrendel commented Feb 23, 2025

atdrendel commented Feb 23, 2025

atdrendel commented Feb 23, 2025

atdrendel commented Feb 23, 2025

Expose all models #2

Expose all models #2

Conversation

atdrendel commented Feb 23, 2025 • edited Loading

atdrendel commented Feb 23, 2025

atdrendel commented Feb 23, 2025

atdrendel commented Feb 23, 2025

atdrendel commented Feb 23, 2025

atdrendel commented Feb 23, 2025 •

edited

Loading