-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(llmobs): llmobs-specific context manager #12236
Conversation
|
e870bb1
to
fa43b5e
Compare
Datadog ReportBranch report: ✅ 0 Failed, 130 Passed, 1109 Skipped, 3m 25.9s Total duration (24m 56.08s time saved) |
BenchmarksBenchmark execution time: 2025-02-06 22:09:54 Comparing candidate commit b61be52 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics. |
2a2bbb8
to
561d659
Compare
@erikayasuda @mabdinur can I get one last review from both of you? This PR is just a repro of #10767 to manually rebase branches to 3.x-staging. |
561d659
to
b61be52
Compare
/merge |
View all feedbacks in Devflow UI.
The median merge time in
|
[MLOB-1342] This is a manual copy of #10767 to 3.x-staging due to merge conflicts between 3.x-staging and main. ## Summary Public facing changes: - Any LLMObs method (`export_span(), annotate()`) that allow an optional span argument will now default to finding the current active LLMObs span rather than the current active APM span. - ~Adds multithreading (futures.multithreading) support for LLMObs. Previously multithreaded apps would result in broken traces.~ UPDATE: will be moving multithreading support to a separate PR. Private changes: - LLMObs has its own context provider which keeps track of the active LLM-type span (generated by both LLMObs._start_span() and LLM integrations) - HTTPPropagation now adds LLMObs parent ID as a field on the request headers directly, rather than through the context object. - Adds private helper method `LLMObs._instance.current_span()`, returns the current active LLMObs-generated (integration, SDK) span. - Adds private helper method `LLMObs._instance._current_trace_context()`, returns current LLMObs context (which can represent both a span or a distributed span) - Adds a new field to the LLMObs span event struct, `_dd` which is a str/str dictionary containing the span/trace IDs of the APM span to correlate with. Currently these are the same span/trace IDs as the LLMObs span/trace ID, but this unlocks future steps of using independent span/trace IDs. ## Previous behavior LLMObs spans are based on APM spans, except LLMObs spans' parenting involves only other LLMObs spans. So with a potential trace structure containing a mixture of APM-specific and LLMObs spans, like: ``` Span A (LLMObs span) --> Span B (Apm-specific span) --> Span C (LLMObs span) ``` LLMObs only cares about the LLMObs spans, where span C's parent is the root span, even though in APM it would be span B. Combined with distributed tracing and multithreading, this makes it not so easy to determine that "correct" (read LLMObs) parenting tree for traces submitted to LLM Observability. ### Problems with previous approach Previously we worked around this by traversing the span's local parent tree and finding the next LLM-type span on both span start and finish for non-distributed cases, and for distributed cases we would attach the parent ID on the span context's meta field to be propagated in distributed request headers. However attaching things to the span context meta was not suitable long-term due to a couple factors: 1. Context objects are not thread-safe: in a multithreading case with n>1 child threads creating their own spans, the parent ID stored in the context object could be overwritten during thread execution, therefore incorrectly propagating parent IDs. 2. Context objects store trace-specific information, and are not designed for our use case where we skip spans here and there in the trace. This also leads to edge cases that were handled with [ugly workaround code](https://github.com/DataDog/dd-trace-py/blob/3bffd02a071db91a6fdc56ae12b496dc84ff8abf/ddtrace/llmobs/_llmobs.py#L312-L317): <details> <summary><i>Example ugly workaround</i></summary> <b>Any meta fields set on the context object gets propagated as span tags on all subsequent spans in the trace on span start time, except for the spans in the first service of a trace which get propagated at span finish time. Fixing this resulted in overriding these span tags on span start and more checks on span finish.</b> </details> ## Current approach Instead of being dependent on a Context object that doesn't quite fit our use case and trying to make it fit our use case, we simply keep track of our own active LLMObs span/context: - `LLMObsContextProvider` handles keeping track of the current active LLMObs span via `active()` and `activate()` - Instead of traversing a span's local ancestor tree to solve for a span's llmobs parent ID, we just use `LLMObsContextProvider._activate_llmobs_span()` and set the llmobs parent ID as a tag at span start time. (called by `LLMObs._start_span()` and `BaseLLMIntegration.trace(submit_to_llmobs=True)` and the bedrock integration). - `LLMObs.inject_distributed_headers` now uses the LLMObsContextProvider to inject the active llmobs span's ID into span context and request headers - `LLMObs.activate_distributed_headers()` now uses the LLMObsContextProvider to activate the extracted llmobs context to continue the trace in a distributed case. - `trace_utils.activate_distributed_headers()` now includes automatic llmobs context activation if llmobs is enabled. I've used signal handling to avoid importing LLMObs entirely for non-LLMObs users (same for `HTTPPropagator.inject()`). By keeping track of our own active LLMObs spans, spans submitted to LLM Observability have an independent set of span and parent IDs, even if the span and trace IDs are shared with APM spans for now. This is the first step to decoupling from tracer internals. ## Next steps We can go further by generating LLMObs-specific span/trace IDs which are separate from APM. This will solve some edge cases with traces involving mixed APM/LLMObs spans. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) [MLOB-1342]: https://datadoghq.atlassian.net/browse/MLOB-1342?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[MLOB-1342] This is a manual copy of #10767 to 3.x-staging due to merge conflicts between 3.x-staging and main. ## Summary Public facing changes: - Any LLMObs method (`export_span(), annotate()`) that allow an optional span argument will now default to finding the current active LLMObs span rather than the current active APM span. - ~Adds multithreading (futures.multithreading) support for LLMObs. Previously multithreaded apps would result in broken traces.~ UPDATE: will be moving multithreading support to a separate PR. Private changes: - LLMObs has its own context provider which keeps track of the active LLM-type span (generated by both LLMObs._start_span() and LLM integrations) - HTTPPropagation now adds LLMObs parent ID as a field on the request headers directly, rather than through the context object. - Adds private helper method `LLMObs._instance.current_span()`, returns the current active LLMObs-generated (integration, SDK) span. - Adds private helper method `LLMObs._instance._current_trace_context()`, returns current LLMObs context (which can represent both a span or a distributed span) - Adds a new field to the LLMObs span event struct, `_dd` which is a str/str dictionary containing the span/trace IDs of the APM span to correlate with. Currently these are the same span/trace IDs as the LLMObs span/trace ID, but this unlocks future steps of using independent span/trace IDs. ## Previous behavior LLMObs spans are based on APM spans, except LLMObs spans' parenting involves only other LLMObs spans. So with a potential trace structure containing a mixture of APM-specific and LLMObs spans, like: ``` Span A (LLMObs span) --> Span B (Apm-specific span) --> Span C (LLMObs span) ``` LLMObs only cares about the LLMObs spans, where span C's parent is the root span, even though in APM it would be span B. Combined with distributed tracing and multithreading, this makes it not so easy to determine that "correct" (read LLMObs) parenting tree for traces submitted to LLM Observability. ### Problems with previous approach Previously we worked around this by traversing the span's local parent tree and finding the next LLM-type span on both span start and finish for non-distributed cases, and for distributed cases we would attach the parent ID on the span context's meta field to be propagated in distributed request headers. However attaching things to the span context meta was not suitable long-term due to a couple factors: 1. Context objects are not thread-safe: in a multithreading case with n>1 child threads creating their own spans, the parent ID stored in the context object could be overwritten during thread execution, therefore incorrectly propagating parent IDs. 2. Context objects store trace-specific information, and are not designed for our use case where we skip spans here and there in the trace. This also leads to edge cases that were handled with [ugly workaround code](https://github.com/DataDog/dd-trace-py/blob/3bffd02a071db91a6fdc56ae12b496dc84ff8abf/ddtrace/llmobs/_llmobs.py#L312-L317): <details> <summary><i>Example ugly workaround</i></summary> <b>Any meta fields set on the context object gets propagated as span tags on all subsequent spans in the trace on span start time, except for the spans in the first service of a trace which get propagated at span finish time. Fixing this resulted in overriding these span tags on span start and more checks on span finish.</b> </details> ## Current approach Instead of being dependent on a Context object that doesn't quite fit our use case and trying to make it fit our use case, we simply keep track of our own active LLMObs span/context: - `LLMObsContextProvider` handles keeping track of the current active LLMObs span via `active()` and `activate()` - Instead of traversing a span's local ancestor tree to solve for a span's llmobs parent ID, we just use `LLMObsContextProvider._activate_llmobs_span()` and set the llmobs parent ID as a tag at span start time. (called by `LLMObs._start_span()` and `BaseLLMIntegration.trace(submit_to_llmobs=True)` and the bedrock integration). - `LLMObs.inject_distributed_headers` now uses the LLMObsContextProvider to inject the active llmobs span's ID into span context and request headers - `LLMObs.activate_distributed_headers()` now uses the LLMObsContextProvider to activate the extracted llmobs context to continue the trace in a distributed case. - `trace_utils.activate_distributed_headers()` now includes automatic llmobs context activation if llmobs is enabled. I've used signal handling to avoid importing LLMObs entirely for non-LLMObs users (same for `HTTPPropagator.inject()`). By keeping track of our own active LLMObs spans, spans submitted to LLM Observability have an independent set of span and parent IDs, even if the span and trace IDs are shared with APM spans for now. This is the first step to decoupling from tracer internals. ## Next steps We can go further by generating LLMObs-specific span/trace IDs which are separate from APM. This will solve some edge cases with traces involving mixed APM/LLMObs spans. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) [MLOB-1342]: https://datadoghq.atlassian.net/browse/MLOB-1342?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
MLOB-1342 This is a manual copy of #10767 to 3.x-staging due to merge conflicts between 3.x-staging and main.
Summary
Public facing changes:
export_span(), annotate()
) that allow an optional span argument will now default to finding the current active LLMObs span rather than the current active APM span.Adds multithreading (futures.multithreading) support for LLMObs. Previously multithreaded apps would result in broken traces.UPDATE: will be moving multithreading support to a separate PR.Private changes:
LLMObs._instance.current_span()
, returns the current active LLMObs-generated (integration, SDK) span.LLMObs._instance._current_trace_context()
, returns current LLMObs context (which can represent both a span or a distributed span)_dd
which is a str/str dictionary containing the span/trace IDs of the APM span to correlate with. Currently these are the same span/trace IDs as the LLMObs span/trace ID, but this unlocks future steps of using independent span/trace IDs.Previous behavior
LLMObs spans are based on APM spans, except LLMObs spans' parenting involves only other LLMObs spans. So with a potential trace structure containing a mixture of APM-specific and LLMObs spans, like:
LLMObs only cares about the LLMObs spans, where span C's parent is the root span, even though in APM it would be span B. Combined with distributed tracing and multithreading, this makes it not so easy to determine that "correct" (read LLMObs) parenting tree for traces submitted to LLM Observability.
Problems with previous approach
Previously we worked around this by traversing the span's local parent tree and finding the next LLM-type span on both span start and finish for non-distributed cases, and for distributed cases we would attach the parent ID on the span context's meta field to be propagated in distributed request headers. However attaching things to the span context meta was not suitable long-term due to a couple factors:
Example ugly workaround
Any meta fields set on the context object gets propagated as span tags on all subsequent spans in the trace on span start time, except for the spans in the first service of a trace which get propagated at span finish time. Fixing this resulted in overriding these span tags on span start and more checks on span finish.Current approach
Instead of being dependent on a Context object that doesn't quite fit our use case and trying to make it fit our use case, we simply keep track of our own active LLMObs span/context:
LLMObsContextProvider
handles keeping track of the current active LLMObs span viaactive()
andactivate()
LLMObsContextProvider._activate_llmobs_span()
and set the llmobs parent ID as a tag at span start time.(called by
LLMObs._start_span()
andBaseLLMIntegration.trace(submit_to_llmobs=True)
and the bedrock integration).LLMObs.inject_distributed_headers
now uses the LLMObsContextProvider to inject the active llmobs span's ID into span context and request headersLLMObs.activate_distributed_headers()
now uses the LLMObsContextProvider to activate the extracted llmobs context to continue the trace in a distributed case.trace_utils.activate_distributed_headers()
now includes automatic llmobs context activation if llmobs is enabled. I've used signal handling to avoid importing LLMObs entirely for non-LLMObs users (same forHTTPPropagator.inject()
).By keeping track of our own active LLMObs spans, spans submitted to LLM Observability have an independent set of span and parent IDs, even if the span and trace IDs are shared with APM spans for now. This is the first step to decoupling from tracer internals.
Next steps
We can go further by generating LLMObs-specific span/trace IDs which are separate from APM. This will solve some edge cases with traces involving mixed APM/LLMObs spans.
Checklist
Reviewer Checklist