Make `LogitsProcessor`s compute the next mask asynchronously #1434

rlouf · 2025-02-21T10:19:22Z

Building the mask is currently blocking inference. However this does not need to be: the only operation that is necessarily blocking is applying the mask to the logits. Everything else can be done during the forward pass. I thus suggest we modify the implementation of logits processors so the next token mask is computed and moved to device asynchronously. Currently blocked by dottxt-ai/outlines-core#178.

rlouf · 2025-02-21T10:20:45Z

We could also attempt to perform compilation during the first forward pass that typically takes longer because of prompt processing. However, since computing index can be slow we still want to leave users the option to pass a Guide directly.

rlouf added enhancement structured generation Linked to structured generation labels Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `LogitsProcessor`s compute the next mask asynchronously #1434

Make `LogitsProcessor`s compute the next mask asynchronously #1434

rlouf commented Feb 21, 2025

rlouf commented Feb 21, 2025

Make LogitsProcessors compute the next mask asynchronously #1434

Make LogitsProcessors compute the next mask asynchronously #1434

Comments

rlouf commented Feb 21, 2025

rlouf commented Feb 21, 2025

Make `LogitsProcessor`s compute the next mask asynchronously #1434

Make `LogitsProcessor`s compute the next mask asynchronously #1434