Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MessageTokenLimiter ignored for output of Tools #2469

Open
daanraman opened this issue Apr 21, 2024 · 5 comments
Open

[Bug]: MessageTokenLimiter ignored for output of Tools #2469

daanraman opened this issue Apr 21, 2024 · 5 comments
Labels
0.2 Issues which are related to the pre 0.4 codebase needs-triage

Comments

@daanraman
Copy link

Describe the bug

MessageTokenLimiter works as expected for output not generated by Tools.
However, when Tools are used, this appears not to be the case.

Steps to reproduce

  • Create a custom tool that generates a large amount of data (in my case, the output of a swagger.json file)
  • Interact with the tool using a ConversableAgent that has a MessageTokenLimiter configured

Model Used

gpt-3.5-turbo

Expected Behavior

The truncated output should be sent to the GPT model, not the entire output, which triggers a rate limit

Screenshots and logs

In the screenshots, I show that even though the print statement mentions that the tokens were limited, it appears that the non-truncated output is sent to the GPT mode.

Output shows that the output of my tool is correctly being truncated
image

This seems to be ignored when calling the LLM though, showing a rate limit
image

Additional Information

No response

@sonichi
Copy link
Contributor

sonichi commented Apr 21, 2024

Thanks. If you make a PR, please add @gagb and @WaelKarkoub as reviewers.

@WaelKarkoub
Copy link
Contributor

Hi @daanraman, thanks for your feedback. I believe the accuracy of tool outputs is crucial, and truncating might omit valuable information. This issue might reflect a design decision rather than a bug but there are possible solutions. I'm working on a PR that applies LLM lingua for text compression, which might be a better fit for managing tool outputs but not certain of its effectiveness. We could also consider a new transform specifically for tool outputs that truncates differently, e.g. truncate from the middle, to preserve more context. Let me know what you think

@daanraman
Copy link
Author

Hi @sonichi / @WaelKarkoub - thanks for the quick feedback, appreciated.

I understand the reasoning behind not truncating tool output. If that's a design choice though, I think it's confusing that in the output, it suggests that the tokens were truncated, while that doesn't seem to be the case.

The output of the tool itself should be feasible to fit into the context window of the LLM I am using - however, the main reason why I was trying to truncate the output of the Tool is to avoid filling up the history sent to the LLM with the large Tool output, which is not required for later steps to understand the context of the conversation (they should just base themselves on the output of the Agent that used the tool to come up with an answer).

So my question then is: 1) is Tool output of previous steps included in the history window in later steps in the conversation, or are these excluded? and 2) if the answer to the first question is that they are in fact included in later steps (and thus fill up the context window with Tool output), is using Nested Chats a way to "hide" the Tool output of previous steps ?

@WaelKarkoub
Copy link
Contributor

@daanraman I see where the confusion lies now, the logs indicate truncation of message content without accounting for tool outputs. I'll open a PR to clarify the logging for this transform. Out of curiosity, are you applying the MessageTokenLimiter across all your agents?

  1. I looked through the code base, and tool calls and tool responses are appended to the context window.

    oai_message = {
    k: message[k]
    for k in ("content", "function_call", "tool_calls", "tool_responses", "tool_call_id", "name", "context")
    if k in message and message[k] is not None
    }

  2. You can still run into the same issue if the nested chats generate large responses to each other.

If possible, consider creating a custom transform to extract essential information from tool outputs. This approach could add value to AutoGen. Would you be interested in collaborating on this?

@daanraman
Copy link
Author

@WaelKarkoub thanks for the time & feedback. Correct, I was applying the MessageTokenLimiter to all agents.
My approach for now is indeed to manually change my custom tools (which interact with API endpoints) and make them return more consise information to avoid overruning context windows.

Still very new to autogen (moved away from crew.ai yesterday) but am liking it very much so far - great documentation, examples, and in general the agents behave better by default I feel (better system prompts & history management). The group chat features are great too.

I will certainly consider contributing once I am a bit more familiar with both the framework & the codebase!

@rysweet rysweet added 0.2 Issues which are related to the pre 0.4 codebase needs-triage labels Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which are related to the pre 0.4 codebase needs-triage
Projects
None yet
Development

No branches or pull requests

5 participants