Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: AutoGen can't work with vLLM v0.5.1 #3120

Closed
tonyaw opened this issue Jul 12, 2024 · 17 comments
Closed

[Bug]: AutoGen can't work with vLLM v0.5.1 #3120

tonyaw opened this issue Jul 12, 2024 · 17 comments
Labels
0.2 Issues which are related to the pre 0.4 codebase models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) needs-triage

Comments

@tonyaw
Copy link

tonyaw commented Jul 12, 2024

Describe the bug

  1. From vllm v0.5.0, it starts to support a new feature "OpenAI tools support named functions":
    https://github.com/vllm-project/vllm/releases/tag/v0.5.0

  2. After that, every message returned by vllm includes an empty "tools_call" list if user prompt doesn't intend to call a tool:

#####oepnai client.completions.create RESPONSE START#####
ChatCompletion(id='cmpl-e858096512a1428890c6fb28f20386e9', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='e2e4', role='assistant', function_call=None, **tool_calls=[]**), stop_reason=128009)], created=1720773990, model='XXX', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=5, prompt_tokens=186, total_tokens=191))
#####RESPONSE END#####
  1. After autogen agent receives this message, it always adds a empty tool_calls list in its next message:
#####oepnai client.chat.completions.create PROMPT START#####
[
    {
        "content": "You are an AI-powered chess board agent.\nYou translate the user's natural language input into legal UCI moves.\nThe regex format of UCI is \"[a-h][1-8][a-h][1-8][qrnb]?\".\nONLY UCI move is allowed to use!\nFollowing are some examples:\n1. \"Ng8e7\" shall be translated to \"g8e7\".\n2. \"Ng8-f6\" shall be translated to \"g8f6\".\nYou should only reply with a UCI move string extracted from the user's input.",
        "role": "system"
    },
    {
        "content": "e2e4",
        "tool_calls": [],
        "role": "user"
    }
]
#####PROMPT END#####
  1. This causes vllm to return 400, and breaks the conversation:
BadRequestError: Error code: 400 - {'object': 'error', 'message': '[{\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'system\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'system\'"}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'assistant\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'assistant\'"}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'tool\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'tool\'"}}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_call_id\'), \'msg\': \'Field required\', \'input\': {\'content\': \'e2e4\', \'tool_calls\': [], \'role\': \'user\'}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'missing\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'name\'), \'msg\': \'Field required\', \'input\': {\'content\': \'e2e4\', \'tool_calls\': [], \'role\': \'user\'}}, {\'type\': \'literal_error\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'role\'), \'msg\': "Input should be \'function\'", \'input\': \'user\', \'ctx\': {\'expected\': "\'function\'"}}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}, {\'type\': \'extra_forbidden\', \'loc\': (\'body\', \'messages\', 1, \'typed-dict\', \'tool_calls\'), \'msg\': \'Extra inputs are not permitted\', \'input\': []}]', 'type': 'BadRequestError', 'param': None, 'code': 400}

Steps to reproduce

See description.

Model Used

Llama3 70B.
It shall be a communication issue between vllm and autogen, and not related to LLM.

Expected Behavior

autogen can work with vllm v0.5.0 and later version with no problem.

Screenshots and logs

No response

Additional Information

No response

@tonyaw tonyaw added the bug label Jul 12, 2024
@marklysze marklysze added the models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) label Jul 12, 2024
@marklysze
Copy link
Contributor

Hey @tonyaw, this is a bit tricky in my opinion. I feel that it should return None if there are no tool calls to be made rather than an empty list, []. The finish_reason being stop indicates that it is not suggesting tool calls in this response.

What do you think?

@tonyaw
Copy link
Author

tonyaw commented Jul 15, 2024

@marklysze , I'm OK with both None and "[]" as long as it is aligned between agent framework(autogen) and LLM inference framework(vllm). :-)
As it follows OpenAI API schema, may I ask if there is some detail requirement from API schema perspective?

@tonyaw
Copy link
Author

tonyaw commented Jul 15, 2024

I also opened a same ticket to vllm. Let's align with vllm team for an agreement. :-)

@sanjay920
Copy link

we integrated tools into vllm with function calling models. might be relevant: https://docs.rubra.ai/inference/vllm

@tonyaw
Copy link
Author

tonyaw commented Jul 16, 2024

@sanjay920,
Thanks for your info!

  1. Is it possible to contribute your code change to vllm git repo? :-)
  2. Have you tried your vllm with autogen?

@hopefulPanda88
Copy link

Is there any possible way to bypass this?... This really gives me headache...

@marklysze
Copy link
Contributor

I can suggest a couple of approaches:

  • A vLLM client class that can handle this empty tool_calls
  • A change to the existing AutoGen codebase that ignores empty tool_calls

If someone wants to work on a PR that would help.

@marklysze marklysze changed the title [Bug]: autogen can't work with vllm v0.5.1 [Bug]: AutoGen can't work with vLLM v0.5.1 Jul 27, 2024
@rysweet rysweet added 0.2 Issues which are related to the pre 0.4 codebase needs-triage labels Oct 2, 2024
@fniedtner fniedtner removed the bug label Oct 24, 2024
@priyankar4u2002
Copy link

what is latest on this ?

@tonyaw
Copy link
Author

tonyaw commented Jan 15, 2025

Any update of this issue? Is there any WR now?

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 15, 2025

@tonyaw can you check if the vllm integration issue is fixed for v0.4 autogen-agentchat?

@tonyaw
Copy link
Author

tonyaw commented Jan 15, 2025

@ekzhu, Thanks!
What is the difference between autogen and auto-agentchat?
I found the interface isn't identical.
Will auto-agentchat replace autogen in the future?

@tonyaw
Copy link
Author

tonyaw commented Jan 15, 2025

I found the difference:
https://microsoft.github.io/autogen/dev//user-guide/agentchat-user-guide/migration-guide.html
But still have questions:

  1. how to set max_consecutive_auto_reply now?

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 17, 2025

@tonyaw , there is not max_consecutive_auto_reply for agent. Each agent will generate only one auto response each time it is called. If you put an agent in a team, then it is up to the team to decide when to call which agent.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 17, 2025

Will auto-agentchat replace autogen in the future?

Yes.

@tonyaw
Copy link
Author

tonyaw commented Jan 17, 2025

@ekzhu ,

  1. Could you please provide a non async example, or how to call an async function multiple times?
    Some code like following:
    def communicate_with_assistant(self, *args, **kwargs):
        # Check if there's an existing event loop
        try:
            # If called from an async context
            loop = asyncio.get_event_loop()
            if loop.is_running():
                # Create a task and wait for it
                future = asyncio.ensure_future(self.async_communicate_with_assistant(*args, **kwargs))
                return asyncio.run_coroutine_threadsafe(future, loop).result()
        # except RuntimeError:
        except Exception as e:
            self.logger.exception("get_event_loop failure:")
            # If there's no event loop, create a new one
            pass

        # If called from a sync context
        return asyncio.run(self.async_communicate_with_assistant(*args, **kwargs))


    async def async_communicate_with_assistant(self, user_prompt, check_func=None, item_key=None, **kwargs):
        """

        Arguments:
        - `self`:
        - `user_prompt`:
        """
        self.logger.info(f"user_prompt={user_prompt}")

        # Run the team and stream messages to the console.
        stream = self.agent_team.run_stream(task=user_prompt)
        answer_message = ""
        async for chunk in stream:
            self.logger.info(f"chunk={chunk}")
            if type(chunk) is TaskResult:
                answer_message = chunk.messages[1].content
                self.logger.info(f"Got answer\n{answer_message}")

if communicate_with_assistant is called the second time, I will get following error:

  File "/root_fs/home/agent_autogen_agentchat.py", line 138, in async_communicate_with_assistant
    async for message in stream:
  File "/usr/local/lib/python3.12/site-packages/autogen_agentchat/teams/_group_chat/_base_group_chat.py", line 417, in run_stream
    message = await message_future
              ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/queues.py", line 155, in get
    getter = self._get_loop().create_future()
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/asyncio/mixins.py", line 20, in _get_loop
    raise RuntimeError(f'{self!r} is bound to a different event loop')
RuntimeError: <Queue at 0x7f7cb75f8890 maxsize=0 tasks=4> is bound to a different event loop

The fix is to recreate RoundRobinGroupChat each time:

    async def async_communicate_with_assistant(self, user_prompt, check_func=None, item_key=None, **kwargs):
        """

        Arguments:
        - `self`:
        - `user_prompt`:
        """
        self.logger.info(f"user_prompt={user_prompt}")

        # <<<<< Add following line:
        self.agent_team = RoundRobinGroupChat([self.assistant], max_turns=1)
        
        # Run the team and stream messages to the console.
        stream = self.agent_team.run_stream(task=user_prompt)

I wonder if it is a right usage. If not, could you please help to provide a right one?

  1. with autogen_agentchat, how to enable cache, like cache_seed in autogen?

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 18, 2025

if communicate_with_assistant is called the second time, I will get following error:

I think that if you are calling from async, you cannot use a new event loop -- I am not 100% sure. But a function should be either sync or async, not both.

@ekzhu
Copy link
Collaborator

ekzhu commented Jan 18, 2025

with autogen_agentchat, how to enable cache, like cache_seed in autogen?

Will be available in the next release next week (#4924 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which are related to the pre 0.4 codebase models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) needs-triage
Projects
None yet
Development

No branches or pull requests

9 participants