Eval support for DeepSeek-R1 like reasoning models #2682

Nithanaroy · 2025-02-09T00:45:33Z

DeepSeek-R1 like models natively using CoT to think of the strategy and then respond with the answer. The current GPQA, MuSR and BBH like reasoning benchmarks, assume the final answer only is returned. What's a good way to add support to evaluate R1 like models which have <think>...</think> ... solution ... response format?

The text was updated successfully, but these errors were encountered:

SzymonOzog mentioned this issue Feb 21, 2025

[Model] Deepseek GGUF support vllm-project/vllm#13167

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval support for DeepSeek-R1 like reasoning models #2682

Eval support for DeepSeek-R1 like reasoning models #2682

Nithanaroy commented Feb 9, 2025

Eval support for DeepSeek-R1 like reasoning models #2682

Eval support for DeepSeek-R1 like reasoning models #2682

Comments

Nithanaroy commented Feb 9, 2025