Different models on same tasks gives same results when cache is active #2715

salvatore-cipolla · 2025-02-19T21:47:43Z

Executing the same task with same configs on two different models produces same identical resulting metrics when using cache. If I don't use cache I get the correct results (different metrics). Is cache working properly?

To reproduce the problem:

lm_eval --model hf \
    --model_args pretrained=Qwen/Qwen2.5-3B \
    --tasks hellaswag \
    --device cuda:0 \
    --batch_size 1 \
    --limit 10 \
    --output_path output/test \
    --use_cache cache/cache.db

lm_eval --model hf \
    --model_args pretrained=meta-llama/Llama-3.2-3B \
    --tasks hellaswag \
    --device cuda:0 \
    --batch_size 1 \
    --limit 10 \
    --output_path output/test \
    --use_cache cache/cache.db

I'm using the latest version of the library installed directly from this repo.

The text was updated successfully, but these errors were encountered:

baberabb · 2025-02-19T21:54:29Z

will have a look! probably a hashing error

salvatore-cipolla changed the title ~~Different models on same tasks gives same results~~ Different models on same tasks gives same results when cache is active Feb 19, 2025

baberabb added the bug Something isn't working. label Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different models on same tasks gives same results when cache is active #2715

Different models on same tasks gives same results when cache is active #2715

salvatore-cipolla commented Feb 19, 2025

baberabb commented Feb 19, 2025

Different models on same tasks gives same results when cache is active #2715

Different models on same tasks gives same results when cache is active #2715

Comments

salvatore-cipolla commented Feb 19, 2025

baberabb commented Feb 19, 2025