Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose all models #2

Merged
merged 10 commits into from
Feb 23, 2025
Merged

Expose all models #2

merged 10 commits into from
Feb 23, 2025

Conversation

atdrendel
Copy link
Contributor

@atdrendel atdrendel commented Feb 23, 2025

This pull request exposes a simple actor for each supported model. The actual implementation of the models is LLM, which is just a convenience wrapper around MLX's stuff.

Each model conforms to a new protocol called ModelProtocol. This allows us to add extra functions for each model in just a single place: ModelProtocol.swift. The first example of this is request(_:maxTokenCount:).

Because of the reentrancy problem with Swift Actors, ModelProtocol.llm is wrapped inside of ActorLock, which is taken from Apple's swift-build/ASyncLock.swift.

According to MLX's documentation, the AI models themselves are not thread-safe, which means calls to them need to be serialized. However, because Swift Actors are reentrant, calling try await llm.request(_:maxTokenCount:) could immediately suspend and allow another reference to the same actor be called. This may not be a problem with library today, but I think it may be in the future, especially when we add support for KVCache. I think it's better to ensure that every call to ModelProtocol.someFunc is transactional, which is what we are doing by wrapping the implementation of every method on ModelProtocol inside of a AsyncLock.

@atdrendel atdrendel requested a review from myobie February 23, 2025 15:27
@atdrendel
Copy link
Contributor Author

I can't get Phi 3.5 MoE to work. When I try to run it, I see this error:

Caught error: keyNotFound(base: "SuScaledRotaryEmbedding", key: "_freqs")

Given the size of the model and the limited amount of RAM I have available, I'm not sure I'd even be able to run it if this error weren't thrown. So, I'm just continuing on to the next model.

@atdrendel
Copy link
Contributor Author

I can't get Qwen 1.5 or Qwen 2.5 1.5B to run. They both fail with the same error:

Caught error: keyNotFound(base: "Linear", key: "weight")

As with the above, I'm not going to spend any time on trying to fix these errors. We have plenty of models to pick from, and we can fix it later or wait for an upstream update that fixes them.

@atdrendel
Copy link
Contributor Author

Yeah, I've got the same error for OpenELM:

Caught error: keyNotFound(base: "Linear", key: "weight")

@atdrendel atdrendel marked this pull request as ready for review February 23, 2025 22:20
@atdrendel atdrendel merged commit 477fc27 into main Feb 23, 2025
@atdrendel atdrendel deleted the support-all-models branch February 23, 2025 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant