vLLM CUDA OOM for loglikelihood
, but not for generate_until
#2698
Labels
asking questions
For asking for clarification / support on library usage.
Hello,
I persistently get issue of CUDA OOM when trying to evaluate DeepSeek-R1-Distill-Llama-8B model (locally downloaded) on HAE-RAE benchmark.
My command is as follows:
I have to reduce
max_num_seqs
to 4 not to get the OOM error.However, when I evaluate the model on KMMLU benchmark, there is no issue of CUDA OOM.
My command was:
The only difference is that HAE-RAE benchmark uses
loglikelihood
while KMMLU usesgenerate_until
.Can anyone explain why this is happening?
Thank you in advance.
The text was updated successfully, but these errors were encountered: