You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Working from the understanding that LLaVA hosted via an OpenAI proxy like LiteLLM, as well as GPT4-V hosted in Azure or OpenAI are both valid options for the MultimodalConversable agent. My agent workflow works correctly when I point the vision agent to GPT4v, but I get errors if I switch the llm config to the locally hosted LLaVA config.
When I switch to LLaVA (hosted via LiteLLM with 'litellm --model ollama_chat/llava --run_gunicorn', I get
Traceback (most recent call last):
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/proxy/proxy_server.py", line 3671, in chat_completion
responses = await asyncio.gather(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 3465, in wrapper_async
raise e
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 3297, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/main.py", line 340, in acompletion
raise exception_type(
^^^^^^^^^^^^^^^
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 8665, in exception_type
raise e
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 8633, in exception_type
raise APIConnectionError(
litellm.exceptions.APIConnectionError: {"error":"json: cannot unmarshal array into Go struct field Message.messages.content of type string"}
If I start the ollama model without '_chat' like 'litellm --model ollama/llava --run_gunicorn' I get
Traceback (most recent call last):
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/proxy/proxy_server.py", line 3671, in chat_completion
responses = await asyncio.gather(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 3465, in wrapper_async
raise e
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 3297, in wrapper_async
result = await original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/main.py", line 340, in acompletion
raise exception_type(
^^^^^^^^^^^^^^^
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 8665, in exception_type
raise e
File "/Users/darinshapiro/Source/AutoGenDocPOC1/.venv/lib/python3.12/site-packages/litellm/utils.py", line 8633, in exception_type
raise APIConnectionError(
litellm.exceptions.APIConnectionError: {"error":"illegal base64 data at input byte 4"}
One thing to note is I'm including a list of frames via the prompt
prompt = """
context: camera location = "front yard", time = "10:00 AM", date = "March 15, 2022"
These are the frames of a video. Generate a compelling description that the SecurityAnalysisAgent can evaluate.
<img frames/frame0.jpg>
<img frames/frame1.jpg>
<img frames/frame2.jpg>
<img frames/frame3.jpg>
<img frames/frame4.jpg>
<img frames/frame5.jpg>
<img frames/frame6.jpg>
<img frames/frame7.jpg>
<img frames/frame8.jpg>
<img frames/frame9.jpg>
"""
It seems that LiteLLM isn't handling the list of images correctly. Is inclusion of multiple frames considered part of the OpenAI spec, or is MultimodalConversable agent not writing the [tools] section perfectly?
Steps to reproduce
Point an agent at gpt4v with a series of frames from of video and ask for a description of the video. The agent gets a valid description of the video.
Change the llm_config of that agent to point to a locally hosted LLaVA vision model using ollama and litellm as the proxy for ollama. Errors returned.
Model Used
gpt4v & LLaVA 1.6
Expected Behavior
I was expecting to be able to treat GPT4V and LLaVVA llm_configs as interchangeable, only differing in response quality, performance, and cost.
Screenshots and logs
No response
Additional Information
Latest AutoGen version, both MacOS and Windows, Python 3.1.1.9.
The text was updated successfully, but these errors were encountered:
Describe the bug
Working from the understanding that LLaVA hosted via an OpenAI proxy like LiteLLM, as well as GPT4-V hosted in Azure or OpenAI are both valid options for the MultimodalConversable agent. My agent workflow works correctly when I point the vision agent to GPT4v, but I get errors if I switch the llm config to the locally hosted LLaVA config.
When I switch to LLaVA (hosted via LiteLLM with 'litellm --model ollama_chat/llava --run_gunicorn', I get
If I start the ollama model without '_chat' like 'litellm --model ollama/llava --run_gunicorn' I get
One thing to note is I'm including a list of frames via the prompt
It seems that LiteLLM isn't handling the list of images correctly. Is inclusion of multiple frames considered part of the OpenAI spec, or is MultimodalConversable agent not writing the [tools] section perfectly?
Steps to reproduce
Model Used
gpt4v & LLaVA 1.6
Expected Behavior
I was expecting to be able to treat GPT4V and LLaVVA llm_configs as interchangeable, only differing in response quality, performance, and cost.
Screenshots and logs
No response
Additional Information
Latest AutoGen version, both MacOS and Windows, Python 3.1.1.9.
The text was updated successfully, but these errors were encountered: