-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose all models #2
Conversation
I can't get Phi 3.5 MoE to work. When I try to run it, I see this error:
Given the size of the model and the limited amount of RAM I have available, I'm not sure I'd even be able to run it if this error weren't thrown. So, I'm just continuing on to the next model. |
I can't get Qwen 1.5 or Qwen 2.5 1.5B to run. They both fail with the same error:
As with the above, I'm not going to spend any time on trying to fix these errors. We have plenty of models to pick from, and we can fix it later or wait for an upstream update that fixes them. |
Yeah, I've got the same error for OpenELM:
|
This pull request exposes a simple actor for each supported model. The actual implementation of the models is
LLM
, which is just a convenience wrapper around MLX's stuff.Each model conforms to a new protocol called
ModelProtocol
. This allows us to add extra functions for each model in just a single place: ModelProtocol.swift. The first example of this isrequest(_:maxTokenCount:)
.Because of the reentrancy problem with Swift Actors,
ModelProtocol.llm
is wrapped inside ofActorLock
, which is taken from Apple's swift-build/ASyncLock.swift.According to MLX's documentation, the AI models themselves are not thread-safe, which means calls to them need to be serialized. However, because Swift Actors are reentrant, calling
try await llm.request(_:maxTokenCount:)
could immediately suspend and allow another reference to the same actor be called. This may not be a problem with library today, but I think it may be in the future, especially when we add support forKVCache
. I think it's better to ensure that every call toModelProtocol.someFunc
is transactional, which is what we are doing by wrapping the implementation of every method onModelProtocol
inside of aAsyncLock
.