About performance when serving CoT/reasoning models #63

Hygge02 · 2025-03-07T06:10:49Z

Hi! Thanks for your great work!

Have you tested the performance when CoT/reasoning models such as DeepSeek-Distill-Series are served by LServe? In the LServe paper, it's mentioned that LServe aims at accelerating both prefilling and decoding stage. But it seems that there are no results comming from reasoning benchmark such as AIME 2024, MATH-500 etc.

Thanks for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About performance when serving CoT/reasoning models #63

About performance when serving CoT/reasoning models #63

Hygge02 commented Mar 7, 2025

About performance when serving CoT/reasoning models #63

About performance when serving CoT/reasoning models #63

Comments

Hygge02 commented Mar 7, 2025