Evaluation and Token Count for Gemini #6070

hristogg · 2025-01-16T11:13:17Z

hristogg
Jan 16, 2025

Hi I am using Arize Phoenix to trace a LlamaIndex workflow which uses Google's LLMs - VertexAI Gemini, Gemini through the API as well as their textembedding models.
I have two issues, the first one is that the token count stays empty and I cannot figure out how to add token count to the tracing, the second one is that once I tried to create a mock-up workflow to play with it without my complex logic I encountered another error around pydantic serialization which is curious as I dont have this error in my core workflow :)
If someone can take a look and help me figure out how to count tokens properly and why is this serialization failing it would be greatly appreciated.

Here is how I do the instrumentation:
from phoenix.otel import register
tracer_provider = register(
project_name="test", # Default is 'default'
endpoint="https://app.phoenix.arize.com/v1/traces",
)
LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider)

and my test workflow logic:
llm = Vertex(
model="gemini-pro",
temperature=0,
max_tokens=3000,
#safety_settings=safety_config,
credentials=credentials,
)
Settings.llm = llm

class TestWorkflow(Workflow):
@step
async def answer_q(self, ctx: Context, ev: StartEvent) -> StopEvent:
question = ev.question
qa_prompt_str = (
"Give answer to the quesiton below in the language it is asked.\n"
"---------------------\n"
"{question}\n"
)
chat_text_qa_msgs = [
ChatMessage(
role=MessageRole.SYSTEM,
content=(
"Always answer the question, even if the context isn't helpful."
),
),
ChatMessage(role=MessageRole.USER, content=qa_prompt_str),
]
formated_prompt = ChatPromptTemplate(chat_text_qa_msgs)
question_to_pass = formated_prompt.format_messages(question=question)
print(question_to_pass)
answer = await llm.achat(question_to_pass)
return StopEvent(result=answer)

Here is the error as well:
ERROR:openinference.instrumentation.llama_index._handler:Error serializing to JSON: PydanticSerializationError: Unable to serialize unknown type: <class 'google.cloud.aiplatform_v1beta1.types.prediction_service.GenerateContentResponse'>
Traceback (most recent call last):
File "C:\Users\hgospodinov\venv\container_work\Lib\site-packages\openinference\instrumentation\llama_index_handler.py", line 253, in process_output
self[OUTPUT_VALUE] = result.model_dump_json(exclude_unset=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\hgospodinov\venv\container_work\Lib\site-packages\pydantic\main.py", line 441, in model_dump_json
return self.pydantic_serializer.to_json(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.PydanticSerializationError: Error serializing to JSON: PydanticSerializationError: Unable to serialize unknown type: <class 'google.cloud.aiplatform_v1beta1.types.prediction_service.GenerateContentResponse'>

@dosu · 2025-01-16T11:27:07Z

dosubot[bot]
bot Jan 16, 2025

To address the issues you're facing with token counting and Pydantic serialization in your workflow using Arize Phoenix and VertexAI Gemini, here are some steps and insights:

Token Counting:
- Ensure that the Vertex class is properly configured to count tokens. You can use the count_tokens method of the GenerativeModel class. Initialize the vertexai library with the appropriate location and create an instance of the GenerativeModel with the desired model name. Then, use the count_tokens method to count tokens in a given input. Here's a basic setup:
```
import vertexai
from vertexai.language_models import GenerativeModel

vertexai.init(location="us-central1")
model = GenerativeModel("gemini-1.5-flash")
token_count = model.count_tokens("Your input text here.")
print(f"Token count: {token_count}")
```
- Make sure that your LlamaIndexInstrumentor is correctly instrumented with a TracerProvider to capture and export token count metrics during execution ^[1].
Pydantic Serialization Error:
- The error you're encountering indicates that the GenerateContentResponse object from the Google Cloud AI Platform is not being serialized correctly by Pydantic. To handle this, you may need to customize the serialization process for this specific type.
- You can modify the _encoder function in the LlamaIndexInstrumentor to handle the GenerateContentResponse type. This involves adding a custom serialization logic for this type within the _encoder function. Here's a simplified example of how you might approach this:
```
def _encoder(obj: Any) -> Any:
    if isinstance(obj, google.cloud.aiplatform_v1beta1.types.prediction_service.GenerateContentResponse):
        # Custom serialization logic for GenerateContentResponse
        return {
            "content": obj.content,  # Adjust based on actual attributes
            "other_attribute": obj.other_attribute,  # Example
        }
    # Existing serialization logic...
```
- Ensure that any attributes you need from the GenerateContentResponse are correctly accessed and serialized.

By following these steps, you should be able to resolve the token counting issue and handle the serialization error effectively. If you need further customization, consider exploring the attributes of the GenerateContentResponse class to tailor the serialization logic to your needs.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

2 replies

hristogg Jan 16, 2025
Author

Thanks @dosu I have changed my code a bit and have reached the following conclusions. Regarding the serialization error I have skipped this part for now, and am using llm.complete which does not have serialization issues. Regarding token counting:
I am constructing a callback_manager in the following way as well as added check whether it is instantiated:
from vertexai.preview.tokenization import get_tokenizer_for_model
tokenizer = get_tokenizer_for_model("gemini-1.5-pro-002")

token_counter = TokenCountingHandler(
tokenizer=tokenizer,
verbose=True
)
class DebugTokenCountingHandler(TokenCountingHandler):
def on_llm_start(self, **kwargs):
super().on_llm_start(**kwargs)
print("LLM Start Event Triggered")

def on_llm_end(self, **kwargs):
    super().on_llm_end(**kwargs)
    print("LLM End Event Triggered")

callback_manager = CallbackManager([DebugTokenCountingHandler])

and am using in the following way:
llm = Vertex(
model="gemini-pro",
temperature=0,
max_tokens=3000,
#safety_settings=safety_config,
credentials=credentials,
callback_manager=callback_manager,
)

However I do not see the instatiation and there is not count of tokens which leads me to believe that llm.complete is not utilizing the callback manager, any ideas?

dosubot[bot] bot Jan 16, 2025

It seems like the llm.complete method might not be triggering the callback_manager as expected. Here are a few things you can check or try:

Verify Callback Integration: Ensure that the Vertex class is designed to accept and utilize a callback_manager. If the class does not internally call the callback_manager methods during its operations, the callbacks won't be triggered.
Check Method Implementation: Look into the implementation of the llm.complete method to see if it includes calls to the callback_manager methods like on_llm_start and on_llm_end. If these calls are missing, the callbacks won't be executed.
Debugging: Add logging or print statements within the Vertex class methods to confirm whether the callback_manager is being passed and used correctly. This can help identify if the issue is with the method not utilizing the callback manager or if it's not being passed correctly.
Alternative Methods: If llm.complete does not support the callback manager, consider using another method or approach that does, or modify the existing method to include callback support if possible.

These steps should help you determine why the callback_manager is not being utilized and guide you towards a solution.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

axiomofjoy · 2025-01-16T23:33:44Z

axiomofjoy
Jan 16, 2025
Maintainer

Thank you for reporting @hristogg! I have filed an issue here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation and Token Count for Gemini #6070

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Evaluation and Token Count for Gemini #6070

hristogg Jan 16, 2025

Replies: 2 comments · 2 replies

dosubot[bot] bot Jan 16, 2025

hristogg Jan 16, 2025 Author

dosubot[bot] bot Jan 16, 2025

axiomofjoy Jan 16, 2025 Maintainer

hristogg
Jan 16, 2025

Replies: 2 comments 2 replies

dosubot[bot]
bot Jan 16, 2025

hristogg Jan 16, 2025
Author

axiomofjoy
Jan 16, 2025
Maintainer