[Bug]: AutoGen can't work with vLLM v0.5.1 #3120

tonyaw · 2024-07-12T09:05:46Z

Describe the bug

From vllm v0.5.0, it starts to support a new feature "OpenAI tools support named functions":
https://github.com/vllm-project/vllm/releases/tag/v0.5.0
After that, every message returned by vllm includes an empty "tools_call" list if user prompt doesn't intend to call a tool:

#####oepnai client.completions.create RESPONSE START#####
ChatCompletion(id='cmpl-e858096512a1428890c6fb28f20386e9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='e2e4', role='assistant', function_call=None, **tool_calls=[]**), stop_reason=128009)], created=1720773990, model='XXX', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=5, prompt_tokens=186, total_tokens=191))
#####RESPONSE END#####

After autogen agent receives this message, it always adds a empty tool_calls list in its next message:

#####oepnai client.chat.completions.create PROMPT START#####
[
    {
        "content": "You are an AI-powered chess board agent.\nYou translate the user's natural language input into legal UCI moves.\nThe regex format of UCI is \"[a-h][1-8][a-h][1-8][qrnb]?\".\nONLY UCI move is allowed to use!\nFollowing are some examples:\n1. \"Ng8e7\" shall be translated to \"g8e7\".\n2. \"Ng8-f6\" shall be translated to \"g8f6\".\nYou should only reply with a UCI move string extracted from the user's input.",
        "role": "system"
    },
    {
        "content": "e2e4",
        "tool_calls": [],
        "role": "user"
    }
]
#####PROMPT END#####

This causes vllm to return 400, and breaks the conversation:

BadRequestError: Error code: 400 - {'object': 'error', 'message': '[{\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'system\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'system\'"}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'assistant\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'assistant\'"}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'tool\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'tool\'"}}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_call_id\'), \'msg\': \'Field required\', \'input\': {\'content\': \'e2e4\', \'tool_calls\': [], \'role\': \'user\'}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'name\'), \'msg\': \'Field required\', \'input\': {\'content\': \'e2e4\', \'tool_calls\': [], \'role\': \'user\'}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'function\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'function\'"}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}]', 'type': 'BadRequestError', 'param': None, 'code': 400}

Steps to reproduce

See description.

Model Used

Llama3 70B.
It shall be a communication issue between vllm and autogen, and not related to LLM.

Expected Behavior

autogen can work with vllm v0.5.0 and later version with no problem.

Screenshots and logs

No response

Additional Information

No response

The text was updated successfully, but these errors were encountered:

marklysze · 2024-07-12T17:29:31Z

Hey @tonyaw, this is a bit tricky in my opinion. I feel that it should return None if there are no tool calls to be made rather than an empty list, []. The finish_reason being stop indicates that it is not suggesting tool calls in this response.

What do you think?

tonyaw · 2024-07-15T02:22:12Z

@marklysze , I'm OK with both None and "[]" as long as it is aligned between agent framework(autogen) and LLM inference framework(vllm). :-)
As it follows OpenAI API schema, may I ask if there is some detail requirement from API schema perspective?

tonyaw · 2024-07-15T03:02:23Z

I also opened a same ticket to vllm. Let's align with vllm team for an agreement. :-)

sanjay920 · 2024-07-15T20:46:14Z

we integrated tools into vllm with function calling models. might be relevant: https://docs.rubra.ai/inference/vllm

tonyaw · 2024-07-16T02:13:06Z

@sanjay920,
Thanks for your info!

Is it possible to contribute your code change to vllm git repo? :-)
Have you tried your vllm with autogen?

hopefulPanda88 · 2024-07-27T15:24:35Z

Is there any possible way to bypass this?... This really gives me headache...

marklysze · 2024-07-27T20:27:02Z

I can suggest a couple of approaches:

A vLLM client class that can handle this empty tool_calls
A change to the existing AutoGen codebase that ignores empty tool_calls

If someone wants to work on a PR that would help.

priyankar4u2002 · 2024-12-10T19:17:21Z

what is latest on this ?

tonyaw · 2025-01-15T07:59:35Z

Any update of this issue? Is there any WR now?

ekzhu · 2025-01-15T08:22:20Z

@tonyaw can you check if the vllm integration issue is fixed for v0.4 autogen-agentchat?

tonyaw · 2025-01-15T14:01:17Z

@ekzhu, Thanks!
What is the difference between autogen and auto-agentchat?
I found the interface isn't identical.
Will auto-agentchat replace autogen in the future?

tonyaw · 2025-01-15T14:22:01Z

I found the difference:
https://microsoft.github.io/autogen/dev//user-guide/agentchat-user-guide/migration-guide.html
But still have questions:

how to set max_consecutive_auto_reply now?

ekzhu · 2025-01-17T00:26:46Z

@tonyaw , there is not max_consecutive_auto_reply for agent. Each agent will generate only one auto response each time it is called. If you put an agent in a team, then it is up to the team to decide when to call which agent.

ekzhu · 2025-01-17T00:27:55Z

Will auto-agentchat replace autogen in the future?

Yes.

tonyaw · 2025-01-17T08:33:41Z

@ekzhu ,

Could you please provide a non async example, or how to call an async function multiple times?
Some code like following:

    def communicate_with_assistant(self, *args, **kwargs):
        # Check if there's an existing event loop
        try:
            # If called from an async context
            loop = asyncio.get_event_loop()
            if loop.is_running():
                # Create a task and wait for it
                future = asyncio.ensure_future(self.async_communicate_with_assistant(*args, **kwargs))
                return asyncio.run_coroutine_threadsafe(future, loop).result()
        # except RuntimeError:
        except Exception as e:
            self.logger.exception("get_event_loop failure:")
            # If there's no event loop, create a new one
            pass

        # If called from a sync context
        return asyncio.run(self.async_communicate_with_assistant(*args, **kwargs))


    async def async_communicate_with_assistant(self, user_prompt, check_func=None, item_key=None, **kwargs):
        """

        Arguments:
        - `self`:
        - `user_prompt`:
        """
        self.logger.info(f"user_prompt={user_prompt}")

        # Run the team and stream messages to the console.
        stream = self.agent_team.run_stream(task=user_prompt)
        answer_message = ""
        async for chunk in stream:
            self.logger.info(f"chunk={chunk}")
            if type(chunk) is TaskResult:
                answer_message = chunk.messages[1].content
                self.logger.info(f"Got answer\n{answer_message}")

if communicate_with_assistant is called the second time, I will get following error:

  File "/root_fs/home/agent_autogen_agentchat.py", line 138, in async_communicate_with_assistant
    async for message in stream:
  File "/usr/local/lib/python3.12/site-packages/autogen_agentchat/teams/_group_chat/_base_group_chat.py", line 417, in run_stream
    message = await message_future
              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/queues.py", line 155, in get
    getter = self._get_loop().create_future()
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/mixins.py", line 20, in _get_loop
    raise RuntimeError(f'{self!r} is bound to a different event loop')
RuntimeError: <Queue at 0x7f7cb75f8890 maxsize=0 tasks=4> is bound to a different event loop

The fix is to recreate RoundRobinGroupChat each time:

    async def async_communicate_with_assistant(self, user_prompt, check_func=None, item_key=None, **kwargs):
        """

        Arguments:
        - `self`:
        - `user_prompt`:
        """
        self.logger.info(f"user_prompt={user_prompt}")

        # <<<<< Add following line:
        self.agent_team = RoundRobinGroupChat([self.assistant], max_turns=1)
        
        # Run the team and stream messages to the console.
        stream = self.agent_team.run_stream(task=user_prompt)

I wonder if it is a right usage. If not, could you please help to provide a right one?

with autogen_agentchat, how to enable cache, like cache_seed in autogen?

ekzhu · 2025-01-18T00:56:58Z

if communicate_with_assistant is called the second time, I will get following error:

I think that if you are calling from async, you cannot use a new event loop -- I am not 100% sure. But a function should be either sync or async, not both.

ekzhu · 2025-01-18T00:57:45Z

with autogen_agentchat, how to enable cache, like cache_seed in autogen?

Will be available in the next release next week (#4924 )

tonyaw added the bug label Jul 12, 2024

marklysze added the models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) label Jul 12, 2024

tonyaw mentioned this issue Jul 15, 2024

[Bug]: autogen can't work with vllm v0.5.1 vllm-project/vllm#6432

Closed

marklysze mentioned this issue Jul 27, 2024

[Bug]: Two-Agent Chat and Chat Result Error (vLLM) #3185

Closed

marklysze changed the title ~~[Bug]: autogen can't work with vllm v0.5.1~~ [Bug]: AutoGen can't work with vLLM v0.5.1 Jul 27, 2024

rysweet added 0.2 Issues which are related to the pre 0.4 codebase needs-triage labels Oct 2, 2024

fniedtner removed the bug label Oct 24, 2024

jackgerrits closed this as completed Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: AutoGen can't work with vLLM v0.5.1 #3120

[Bug]: AutoGen can't work with vLLM v0.5.1 #3120

tonyaw commented Jul 12, 2024

marklysze commented Jul 12, 2024

tonyaw commented Jul 15, 2024

tonyaw commented Jul 15, 2024

sanjay920 commented Jul 15, 2024

tonyaw commented Jul 16, 2024

hopefulPanda88 commented Jul 27, 2024

marklysze commented Jul 27, 2024

priyankar4u2002 commented Dec 10, 2024

tonyaw commented Jan 15, 2025

ekzhu commented Jan 15, 2025 •

edited

Loading

tonyaw commented Jan 15, 2025

tonyaw commented Jan 15, 2025

ekzhu commented Jan 17, 2025

ekzhu commented Jan 17, 2025

tonyaw commented Jan 17, 2025 •

edited

Loading

ekzhu commented Jan 18, 2025

ekzhu commented Jan 18, 2025

[Bug]: AutoGen can't work with vLLM v0.5.1 #3120

[Bug]: AutoGen can't work with vLLM v0.5.1 #3120

Comments

tonyaw commented Jul 12, 2024

Describe the bug

Steps to reproduce

Model Used

Expected Behavior

Screenshots and logs

Additional Information

marklysze commented Jul 12, 2024

tonyaw commented Jul 15, 2024

tonyaw commented Jul 15, 2024

sanjay920 commented Jul 15, 2024

tonyaw commented Jul 16, 2024

hopefulPanda88 commented Jul 27, 2024

marklysze commented Jul 27, 2024

priyankar4u2002 commented Dec 10, 2024

tonyaw commented Jan 15, 2025

ekzhu commented Jan 15, 2025 • edited Loading

tonyaw commented Jan 15, 2025

tonyaw commented Jan 15, 2025

ekzhu commented Jan 17, 2025

ekzhu commented Jan 17, 2025

tonyaw commented Jan 17, 2025 • edited Loading

ekzhu commented Jan 18, 2025

ekzhu commented Jan 18, 2025

ekzhu commented Jan 15, 2025 •

edited

Loading

tonyaw commented Jan 17, 2025 •

edited

Loading