You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DeepSeek-R1 like models natively using CoT to think of the strategy and then respond with the answer. The current GPQA, MuSR and BBH like reasoning benchmarks, assume the final answer only is returned. What's a good way to add support to evaluate R1 like models which have <think>...</think> ... solution ... response format?
The text was updated successfully, but these errors were encountered:
DeepSeek-R1 like models natively using CoT to think of the strategy and then respond with the answer. The current GPQA, MuSR and BBH like reasoning benchmarks, assume the final answer only is returned. What's a good way to add support to evaluate R1 like models which have
<think>...</think> ... solution ...
response format?The text was updated successfully, but these errors were encountered: