Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LogitsProcessors compute the next mask asynchronously #1434

Open
rlouf opened this issue Feb 21, 2025 · 1 comment
Open

Make LogitsProcessors compute the next mask asynchronously #1434

rlouf opened this issue Feb 21, 2025 · 1 comment
Labels
enhancement structured generation Linked to structured generation

Comments

@rlouf
Copy link
Member

rlouf commented Feb 21, 2025

Building the mask is currently blocking inference. However this does not need to be: the only operation that is necessarily blocking is applying the mask to the logits. Everything else can be done during the forward pass. I thus suggest we modify the implementation of logits processors so the next token mask is computed and moved to device asynchronously. Currently blocked by dottxt-ai/outlines-core#178.

@rlouf
Copy link
Member Author

rlouf commented Feb 21, 2025

We could also attempt to perform compilation during the first forward pass that typically takes longer because of prompt processing. However, since computing index can be slow we still want to leave users the option to pass a Guide directly.

@rlouf rlouf added enhancement structured generation Linked to structured generation labels Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement structured generation Linked to structured generation
Projects
None yet
Development

No branches or pull requests

1 participant