You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Building the mask is currently blocking inference. However this does not need to be: the only operation that is necessarily blocking is applying the mask to the logits. Everything else can be done during the forward pass. I thus suggest we modify the implementation of logits processors so the next token mask is computed and moved to device asynchronously. Currently blocked by dottxt-ai/outlines-core#178.
The text was updated successfully, but these errors were encountered:
We could also attempt to perform compilation during the first forward pass that typically takes longer because of prompt processing. However, since computing index can be slow we still want to leave users the option to pass a Guide directly.
Building the mask is currently blocking inference. However this does not need to be: the only operation that is necessarily blocking is applying the mask to the logits. Everything else can be done during the forward pass. I thus suggest we modify the implementation of logits processors so the next token mask is computed and moved to device asynchronously. Currently blocked by dottxt-ai/outlines-core#178.
The text was updated successfully, but these errors were encountered: