Skip to content

Commit 714c52f

Browse files
Yun-KimRamyElkest
authored andcommitted
feat(llmobs): llmobs-specific context manager (#12236)
[MLOB-1342] This is a manual copy of #10767 to 3.x-staging due to merge conflicts between 3.x-staging and main. ## Summary Public facing changes: - Any LLMObs method (`export_span(), annotate()`) that allow an optional span argument will now default to finding the current active LLMObs span rather than the current active APM span. - ~Adds multithreading (futures.multithreading) support for LLMObs. Previously multithreaded apps would result in broken traces.~ UPDATE: will be moving multithreading support to a separate PR. Private changes: - LLMObs has its own context provider which keeps track of the active LLM-type span (generated by both LLMObs._start_span() and LLM integrations) - HTTPPropagation now adds LLMObs parent ID as a field on the request headers directly, rather than through the context object. - Adds private helper method `LLMObs._instance.current_span()`, returns the current active LLMObs-generated (integration, SDK) span. - Adds private helper method `LLMObs._instance._current_trace_context()`, returns current LLMObs context (which can represent both a span or a distributed span) - Adds a new field to the LLMObs span event struct, `_dd` which is a str/str dictionary containing the span/trace IDs of the APM span to correlate with. Currently these are the same span/trace IDs as the LLMObs span/trace ID, but this unlocks future steps of using independent span/trace IDs. ## Previous behavior LLMObs spans are based on APM spans, except LLMObs spans' parenting involves only other LLMObs spans. So with a potential trace structure containing a mixture of APM-specific and LLMObs spans, like: ``` Span A (LLMObs span) --> Span B (Apm-specific span) --> Span C (LLMObs span) ``` LLMObs only cares about the LLMObs spans, where span C's parent is the root span, even though in APM it would be span B. Combined with distributed tracing and multithreading, this makes it not so easy to determine that "correct" (read LLMObs) parenting tree for traces submitted to LLM Observability. ### Problems with previous approach Previously we worked around this by traversing the span's local parent tree and finding the next LLM-type span on both span start and finish for non-distributed cases, and for distributed cases we would attach the parent ID on the span context's meta field to be propagated in distributed request headers. However attaching things to the span context meta was not suitable long-term due to a couple factors: 1. Context objects are not thread-safe: in a multithreading case with n>1 child threads creating their own spans, the parent ID stored in the context object could be overwritten during thread execution, therefore incorrectly propagating parent IDs. 2. Context objects store trace-specific information, and are not designed for our use case where we skip spans here and there in the trace. This also leads to edge cases that were handled with [ugly workaround code](https://github.com/DataDog/dd-trace-py/blob/3bffd02a071db91a6fdc56ae12b496dc84ff8abf/ddtrace/llmobs/_llmobs.py#L312-L317): <details> <summary><i>Example ugly workaround</i></summary> <b>Any meta fields set on the context object gets propagated as span tags on all subsequent spans in the trace on span start time, except for the spans in the first service of a trace which get propagated at span finish time. Fixing this resulted in overriding these span tags on span start and more checks on span finish.</b> </details> ## Current approach Instead of being dependent on a Context object that doesn't quite fit our use case and trying to make it fit our use case, we simply keep track of our own active LLMObs span/context: - `LLMObsContextProvider` handles keeping track of the current active LLMObs span via `active()` and `activate()` - Instead of traversing a span's local ancestor tree to solve for a span's llmobs parent ID, we just use `LLMObsContextProvider._activate_llmobs_span()` and set the llmobs parent ID as a tag at span start time. (called by `LLMObs._start_span()` and `BaseLLMIntegration.trace(submit_to_llmobs=True)` and the bedrock integration). - `LLMObs.inject_distributed_headers` now uses the LLMObsContextProvider to inject the active llmobs span's ID into span context and request headers - `LLMObs.activate_distributed_headers()` now uses the LLMObsContextProvider to activate the extracted llmobs context to continue the trace in a distributed case. - `trace_utils.activate_distributed_headers()` now includes automatic llmobs context activation if llmobs is enabled. I've used signal handling to avoid importing LLMObs entirely for non-LLMObs users (same for `HTTPPropagator.inject()`). By keeping track of our own active LLMObs spans, spans submitted to LLM Observability have an independent set of span and parent IDs, even if the span and trace IDs are shared with APM spans for now. This is the first step to decoupling from tracer internals. ## Next steps We can go further by generating LLMObs-specific span/trace IDs which are separate from APM. This will solve some edge cases with traces involving mixed APM/LLMObs spans. ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) [MLOB-1342]: https://datadoghq.atlassian.net/browse/MLOB-1342?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
1 parent 6dfe8b3 commit 714c52f

14 files changed

+303
-271
lines changed

ddtrace/contrib/internal/trace_utils.py

+1
Original file line numberDiff line numberDiff line change
@@ -596,6 +596,7 @@ def activate_distributed_headers(tracer, int_config=None, request_headers=None,
596596
# We have parsed a trace id from headers, and we do not already
597597
# have a context with the same trace id active
598598
tracer.context_provider.activate(context)
599+
core.dispatch("http.activate_distributed_headers", (request_headers, context))
599600

600601
dispatch("distributed_context.activated", (context,))
601602

ddtrace/llmobs/_constants.py

+2
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@
4545
DROPPED_IO_COLLECTION_ERROR = "dropped_io"
4646
DROPPED_VALUE_TEXT = "[This value has been dropped because this span's size exceeds the 1MB size limit.]"
4747

48+
ROOT_PARENT_ID = "undefined"
49+
4850
# Set for traces of evaluator integrations e.g. `runner.integration:ragas`.
4951
# Used to differentiate traces of Datadog-run operations vs user-application operations.
5052
RUNNER_IS_INTEGRATION_SPAN_TAG = "runner.integration"

ddtrace/llmobs/_context.py

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
import contextvars
2+
from typing import Optional
3+
from typing import Union
4+
5+
from ddtrace._trace.context import Context
6+
from ddtrace._trace.provider import DefaultContextProvider
7+
from ddtrace._trace.span import Span
8+
from ddtrace.ext import SpanTypes
9+
10+
11+
ContextTypeValue = Optional[Union[Context, Span]]
12+
13+
14+
_DD_LLMOBS_CONTEXTVAR: contextvars.ContextVar[ContextTypeValue] = contextvars.ContextVar(
15+
"datadog_llmobs_contextvar",
16+
default=None,
17+
)
18+
19+
20+
class LLMObsContextProvider(DefaultContextProvider):
21+
"""Context provider that retrieves contexts from a context variable.
22+
It is suitable for synchronous programming and for asynchronous executors
23+
that support contextvars.
24+
"""
25+
26+
def __init__(self) -> None:
27+
super(DefaultContextProvider, self).__init__()
28+
_DD_LLMOBS_CONTEXTVAR.set(None)
29+
30+
def _has_active_context(self) -> bool:
31+
"""Returns whether there is an active context in the current execution."""
32+
ctx = _DD_LLMOBS_CONTEXTVAR.get()
33+
return ctx is not None
34+
35+
def _update_active(self, span: Span) -> Optional[Span]:
36+
"""Updates the active LLMObs span.
37+
The active span is updated to be the span's closest unfinished LLMObs ancestor span.
38+
"""
39+
if not span.finished:
40+
return span
41+
new_active: Optional[Span] = span
42+
while new_active and new_active.finished:
43+
new_active = new_active._parent
44+
if new_active and not new_active.finished and new_active.span_type == SpanTypes.LLM:
45+
break
46+
self.activate(new_active)
47+
return new_active
48+
49+
def activate(self, ctx: ContextTypeValue) -> None:
50+
"""Makes the given context active in the current execution."""
51+
_DD_LLMOBS_CONTEXTVAR.set(ctx)
52+
super(DefaultContextProvider, self).activate(ctx)
53+
54+
def active(self) -> ContextTypeValue:
55+
"""Returns the active span or context for the current execution."""
56+
item = _DD_LLMOBS_CONTEXTVAR.get()
57+
if isinstance(item, Span):
58+
return self._update_active(item)
59+
return item

ddtrace/llmobs/_integrations/base.py

+10-18
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,8 @@
1818
from ddtrace.internal.telemetry import telemetry_writer
1919
from ddtrace.internal.telemetry.constants import TELEMETRY_NAMESPACE
2020
from ddtrace.internal.utils.formats import asbool
21-
from ddtrace.llmobs._constants import PARENT_ID_KEY
22-
from ddtrace.llmobs._constants import PROPAGATED_PARENT_ID_KEY
2321
from ddtrace.llmobs._llmobs import LLMObs
2422
from ddtrace.llmobs._log_writer import V2LogWriter
25-
from ddtrace.llmobs._utils import _get_llmobs_parent_id
2623
from ddtrace.settings import IntegrationConfig
2724
from ddtrace.trace import Pin
2825
from ddtrace.trace import Span
@@ -138,21 +135,16 @@ def trace(self, pin: Pin, operation_id: str, submit_to_llmobs: bool = False, **k
138135
span.set_tag(_SPAN_MEASURED_KEY)
139136
self._set_base_span_tags(span, **kwargs)
140137
if submit_to_llmobs and self.llmobs_enabled:
141-
if span.get_tag(PROPAGATED_PARENT_ID_KEY) is None:
142-
# For non-distributed traces or spans in the first service of a distributed trace,
143-
# The LLMObs parent ID tag is not set at span start time. We need to manually set the parent ID tag now
144-
# in these cases to avoid conflicting with the later propagated tags.
145-
parent_id = _get_llmobs_parent_id(span) or "undefined"
146-
span._set_ctx_item(PARENT_ID_KEY, str(parent_id))
147-
telemetry_writer.add_count_metric(
148-
namespace=TELEMETRY_NAMESPACE.MLOBS,
149-
name="span.start",
150-
value=1,
151-
tags=(
152-
("integration", self._integration_name),
153-
("autoinstrumented", "true"),
154-
),
155-
)
138+
LLMObs._instance._activate_llmobs_span(span)
139+
telemetry_writer.add_count_metric(
140+
namespace=TELEMETRY_NAMESPACE.MLOBS,
141+
name="span.start",
142+
value=1,
143+
tags=(
144+
("integration", self._integration_name),
145+
("autoinstrumented", "true"),
146+
),
147+
)
156148
return span
157149

158150
@classmethod

ddtrace/llmobs/_integrations/bedrock.py

+2-6
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,16 @@
44
from typing import Optional
55

66
from ddtrace.internal.logger import get_logger
7+
from ddtrace.llmobs import LLMObs
78
from ddtrace.llmobs._constants import INPUT_MESSAGES
89
from ddtrace.llmobs._constants import METADATA
910
from ddtrace.llmobs._constants import METRICS
1011
from ddtrace.llmobs._constants import MODEL_NAME
1112
from ddtrace.llmobs._constants import MODEL_PROVIDER
1213
from ddtrace.llmobs._constants import OUTPUT_MESSAGES
13-
from ddtrace.llmobs._constants import PARENT_ID_KEY
14-
from ddtrace.llmobs._constants import PROPAGATED_PARENT_ID_KEY
1514
from ddtrace.llmobs._constants import SPAN_KIND
1615
from ddtrace.llmobs._integrations import BaseLLMIntegration
1716
from ddtrace.llmobs._integrations.utils import get_llmobs_metrics_tags
18-
from ddtrace.llmobs._utils import _get_llmobs_parent_id
1917
from ddtrace.trace import Span
2018

2119

@@ -34,9 +32,7 @@ def _llmobs_set_tags(
3432
operation: str = "",
3533
) -> None:
3634
"""Extract prompt/response tags from a completion and set them as temporary "_ml_obs.*" tags."""
37-
if span.get_tag(PROPAGATED_PARENT_ID_KEY) is None:
38-
parent_id = _get_llmobs_parent_id(span) or "undefined"
39-
span._set_ctx_item(PARENT_ID_KEY, parent_id)
35+
LLMObs._instance._activate_llmobs_span(span)
4036
parameters = {}
4137
if span.get_tag("bedrock.request.temperature"):
4238
parameters["temperature"] = float(span.get_tag("bedrock.request.temperature") or 0.0)

ddtrace/llmobs/_integrations/langgraph.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,13 @@
88
from ddtrace.llmobs._constants import INPUT_VALUE
99
from ddtrace.llmobs._constants import NAME
1010
from ddtrace.llmobs._constants import OUTPUT_VALUE
11+
from ddtrace.llmobs._constants import PARENT_ID_KEY
12+
from ddtrace.llmobs._constants import ROOT_PARENT_ID
1113
from ddtrace.llmobs._constants import SPAN_KIND
1214
from ddtrace.llmobs._constants import SPAN_LINKS
1315
from ddtrace.llmobs._integrations.base import BaseLLMIntegration
1416
from ddtrace.llmobs._integrations.utils import format_langchain_io
1517
from ddtrace.llmobs._utils import _get_attr
16-
from ddtrace.llmobs._utils import _get_llmobs_parent_id
1718
from ddtrace.llmobs._utils import _get_nearest_llmobs_ancestor
1819
from ddtrace.trace import Span
1920
from ddtrace.trace import tracer
@@ -175,7 +176,7 @@ def _default_span_link(span: Span):
175176
the span is linked to its parent's input.
176177
"""
177178
return {
178-
"span_id": str(_get_llmobs_parent_id(span)) or "undefined",
179+
"span_id": span._get_ctx_item(PARENT_ID_KEY) or ROOT_PARENT_ID,
179180
"trace_id": "{:x}".format(span.trace_id),
180181
"attributes": {"from": "input", "to": "input"},
181182
}

0 commit comments

Comments
 (0)