-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MeasurementProcessor specification to Metrics SDK #4318
base: main
Are you sure you want to change the base?
Add MeasurementProcessor specification to Metrics SDK #4318
Conversation
|
This comment was marked as resolved.
This comment was marked as resolved.
Add status field Co-authored-by: Robert Pająk <[email protected]>
Co-authored-by: Reiley Yang <[email protected]>
For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow the processor to "drop" the measurement (e.g. the processor decided that it doesn't want the measurement) or other operations beyond modifications on the value and attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related question (thus decided to put it here).
Shouldn't the processor also be used when evaluating Enabled
?
Shouldn't we also add an OnEnabled
hook?
Related comment in other issue:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To allow processors to "drop" measurements, they must be somehow connected to the MetricsReader
. I agree that it would be a cool feature to have, providing great flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Lightstep Metrics SDK implements a MeasurementProcessor interface which was narrowly scoped to allow modifying the set of attributes for a measurement. In that use-case, we would take the incoming gRPC metadata from the context, look up specific headers, and apply header values as attribute values.
I admit I am not sure what reasons a user would have to modify measured values. Are there well-known use-cases? I found @jack-berg mentioned "unit conversion" here, but I am not sure how that would work--the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?
That SDK does not permit dropping measurements. Speaking also to @pellared's question about Enabled and whether measurement processors should intercept Enabled calls, I would recommend No. See my position on passing context to the metrics enabled method, #4256 (comment), which states the same. I am nervous about letting measurement processors change measurements and selectively enable/disable call sites because IMO it will make interpreting the resulting data very difficult.
As an example, suppose we have a measurement processor that is designed to redact sensitive attribute values. IMO it would be better to change attributes, not to drop events, because otherwise a user can be easily misled. Suppose we have a counter which counts requests with an attribute for success (boolean) and a client ID (string). We have a policy that says client IDs should not resemble e-mail addresses, otherwise they are invalid. The two options are to redact the client ID (e.g., give it a value like "redacted") or to drop the measurement. If we drop the measurement, all sorts of queries might be impacted. What's my success rate? I have no idea because an unknown number of redacted measurements were dropped.
Therefore, I would propose that measurement processors can only modify attributes, not values, and not drop events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?
Providing this feature without the ability to do unit conversion or drop measurements would be a miss. Can solve the lack of knowledge about unit by providing the processor access to instrument metadata. I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll come around on this topic. I see how dropping metric events is a useful feature, despite the potential for difficult consequences. Dropping metric events is not very different than sampling traces at 0%. Just like 0% sampling (which we call "non probabilistic"), there is a loss of information, but that is intentional.
@jack-berg Given your statement, I think it means that the Measurement
type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API. Tangentially, I see a potential for us to form new APIs (like OpenCensus) which accept a list of measurements atomically and apply a single timestamp (e.g., or process the dynamic context once for multiple events).
Let me pose a thought experiment. What does a MeasurementProcessor do better than you could achieve simply by wrapping a MeterProvider with a new instance containing the desired logic? I'm looking at the complexity trade-off here. I see how the desire to modify units comes about -- especially with the base-2 exponential histogram -- we see a desire to change seconds to/from milliseconds w/o loss of information as a compelling use-case. In the wrapped-MeterProvider scenario, the units-conversion wrapper would ("simply") register a new instrument with the delegate MeterProvider having different units and divide/multiply the value on its way through.
I thought of another case that I'm aware of, which calls for modifying the instrument kind, i.e., more than just a change of unit. I'm aware of use-cases for synchronous UpDownCounter instruments where the user would like to separate positive from negative values as two Counters. In this case, the two absolute value instruments convey the rate of ups and down as separate information. Still, the input-to-output mapping is 1:1.
I prefer to think of MeasurementProcessor as something like syntactic sugar for the example I described above, meaning that it can be defined abstractly as a wrapper of meter providers with a per-instrument event translation rule. There seems to be a potential -- do we know any use-cases? -- for one metric API event to translate into more than one metric API event on the wrapped meter provider. In this sense, we could define MeasurementProcessor as a per-instrument function that maps one input measurement into a list of zero or more output measurements, enabling both dropping and proliferation of events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it means that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API.
@jmacd I think this makes sense. Having access to an Instrument
inside the processor makes it very powerful.
I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.
@jack-berg I'm reading the View
specification, which explicitly mentions that views work on the "metric" level. Therefore, configuring processors on the View
s (instead of on MeterProvider
) would require updating the View
specification as well, unless I'm misunderstanding something.
Regarding dropping Measurements
, changing instrument kinds, modifying the value, or even creating new Measurements
on the fly (e.g., split UpDownCounter
into two counters), we could make the proposed Measure()
method return an array of Measurements
instead of Void
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Blinkuu About view-attached processors: I am generally wary of making the metrics SDK more complex, and the idea of making measurement processors view-specific has me vaguely worried. One fear is that this will limit the potential for MeasurementProcessors to be optimized.
If there are multiple readers and multiple measurement processors, do we evaluate the chain of processors per reader or once? I would prefer once.
I fear we're letting implementation details into the specification, if we dictate the use of a "next" processor here. I don't think a next processor is required. A better API for the processor IMO would be to return a measurement, so a signature like
type Processor interface {
Process(context.Context, Measurement) (_ Measurement, valid bool)
}
In other words, the return value is an optional Measurement. I suggest the specification use pseudo-code such as the following, to answer the question raised in https://github.com/open-telemetry/opentelemetry-specification/pull/4318/files#r1915514480 about the order of operations, and this makes the drop-behavior clear:
func (sdki *sdkInstrument) onEvent(ctx context.Context, m Measurement) {
for _, processor in sdki.processors() {
if mr, valid := processor.Process(ctx, m); valid {
m = mr
} else {
// measurement was dropped
return
}
}
// instrument-specific logic for final processed measurement `m`
// ...
}
If there's an argument in favor of view-specific measurement processors, I would suggest surveying the implementors of the various metrics SDKs view mechanisms for their opinion. Otherwise, I think it's much simpler to explain to users what's happening with measurement processors: they literally change the events that enter the SDK and are seen by all metric readers alike.
@Blinkuu regarding an array of measurements. This is another question that could impact performance. Unless we need it, I don't think one measurement should be allowed to become >1 measurement, in other words. In the example I gave of translating an UpDownCounter into two Counters, one input event becomes one output event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to revisit the API. I think the one you're proposing aligns more with my original idea but doesn't allow duplicate measurements.
My personal use case does not require duplicating measurements. Furthermore, I don't work with metrics so much to have a strong opinion on this subject. But it's been raised a few times, hence the current design allows it.
In the example I gave of translating an UpDownCounter into two Counters, one input event becomes one output event.
Could you expand on this? How would you implement the split of counters in a MeasurementProcessor
that implements your API:
type Processor interface {
Process(context.Context, Measurement) (_ Measurement, valid bool)
}
Co-authored-by: Reiley Yang <[email protected]>
For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Lightstep Metrics SDK implements a MeasurementProcessor interface which was narrowly scoped to allow modifying the set of attributes for a measurement. In that use-case, we would take the incoming gRPC metadata from the context, look up specific headers, and apply header values as attribute values.
I admit I am not sure what reasons a user would have to modify measured values. Are there well-known use-cases? I found @jack-berg mentioned "unit conversion" here, but I am not sure how that would work--the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?
That SDK does not permit dropping measurements. Speaking also to @pellared's question about Enabled and whether measurement processors should intercept Enabled calls, I would recommend No. See my position on passing context to the metrics enabled method, #4256 (comment), which states the same. I am nervous about letting measurement processors change measurements and selectively enable/disable call sites because IMO it will make interpreting the resulting data very difficult.
As an example, suppose we have a measurement processor that is designed to redact sensitive attribute values. IMO it would be better to change attributes, not to drop events, because otherwise a user can be easily misled. Suppose we have a counter which counts requests with an attribute for success (boolean) and a client ID (string). We have a policy that says client IDs should not resemble e-mail addresses, otherwise they are invalid. The two options are to redact the client ID (e.g., give it a value like "redacted") or to drop the measurement. If we drop the measurement, all sorts of queries might be impacted. What's my success rate? I have no idea because an unknown number of redacted measurements were dropped.
Therefore, I would propose that measurement processors can only modify attributes, not values, and not drop events.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Still working on this; will try to provide another iteration early next year. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
||
`OnMeasure` is called when a `Measurement` is recorded. This method is called synchronously on the thread that emitted the `Measurement`, therefore it SHOULD NOT block or throw exceptions. | ||
|
||
**Parameters:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the MeasurementProcessor
have also access to instrumentation scope (connected with the instrument that was used to emit the measurement) and resource (associated with the meter provider)? The LogRecord and Span processors have access to this data.
This would allow e.g. adding a processor that makes some changes for measurements emitted by a concrete instrumentation library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level idea is that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument).
This isn't concretely defined right now, as the Measurement
itself is vaguely defined. The primary reason is we don't want to impose implementation details, allowing for optimal/idiomatic approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason is we don't want to impose implementation details, allowing for optimal/idiomatic approaches.
The languages do not need to copy/implement the abstractions one to one.
The high-level idea is that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument).
I totally agree with it and I want to have it defined. Otherwise languages may not support all of the data which is relevant for processing. Ambiguity like this has caused us trouble at least a few times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason it's not explicit: #4318 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explicitly call out them as separate parameters in the specification. Then it is the language SIG decision if they want to combine them into individual types or keep them separated. I think this would be fine for both @jack-berg (#4318 (comment)) and me (#4318 (comment)). Also notice that I am calling out instrumentation scope and resource which are distinct from measurement which you see as 3-tuple (Value, Attributes, Instrument).
@jack-berg Does it sound reasonable?
@Blinkuu Please do not make any changes until at least @jack-berg agrees 😉
Is there any prototype? @Blinkuu, are you prototyping it in Go somewhere? |
Not yet - I won't have time to prototype it this quarter. I could gladly use some help. Happy to help with the review. Here's a high-level idea for how the implementation could look like: https://go.dev/play/p/wPZRm5xk3nO |
I'm pretty excited about this idea and will try to find some time to build a prototype implementation in opentelemetry-java. Hopefully won't be too difficult. |
When I finish some things related to logs I can try prototyping it in Go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super cool to see. It would be relatively simple to implement in Python since the internal impl already behaves like this, e.g. Counter.add()
|
||
* `context` - the resolved `Context` (the explicitly passed `Context` or the current `Context`) | ||
* `measurement` - a [Measurement](./api.md#measurement) that was recorded | ||
* `next` - this allows the `MeasurementProcessor` to pass the measurements to the next `MeasurementProcessor` in the chain. It can be a reference to the next `MeasurementProcessor`, a bound callback to invoke `OnMeasure` on the next processor in the chain without an explicit reference to the next processor, or something else. [OpenTelemetry SDK](../overview.md#sdk) authors MAY decide the language idiomatic approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how liberally we can interpret the "language idiomatic approach". For example, would it be reasonable to allow OnMeasure()
return a new Measurement and have the SDK chain them together instead of recursing?
for mp in mps:
measurement = mp.on_measure(measurement)
if measurement == "DROP":
break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather suggest:
for mp in mps:
measurement = mp.on_measure(measurement)
if measurement == None:
break
I think that a design where a processor would return a measurement passed to the next processor would be easier for implementation and testing.
This would also prevent from: https://github.com/open-telemetry/opentelemetry-specification/pull/4318/files#r1962103972
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design with next()
allows duplicating measurements into downstream processors. For example, you can take one measurement and transform it into two. We discussed an API like this where the OnMeasure()
function would return a list of measurements. The conclusion was that such an API would be slightly confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, I see some people arguing we shouldn't allow duplication of measurements. I'm keeping the debate open for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some people arguing we shouldn't allow duplication of measurements
Can you link to the comments? Are there any reasons why it should be disallowed? If there is no good reason I think that the design should be open to different advanced use cases (e.g. like this for logs: #4407).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
`MeasurementProcessors` can be registered directly on SDK `MeterProvider` and they are invoked in the same order as they were registered. | ||
|
||
Each processor registered on the `MeterProvider` is part of a pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be implicit and not something the user has to configure? I'm wondering if we need to expose the DefaultProcessor at all.
|
||
A `MeasuremenetProcessor` MAY freely modify `measurement` for the duration of the `OnMeasure` call. | ||
|
||
A `MeasurementProcessor` SHOULD invoke `next`. A `MeasurementProcessor` MAY decide to drop the `Measurement` by not invoking the next processor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine some possible use cases where users would expect the whole pipeline to run regardless. For example, a self observability processor that counts the number of dropped measurements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine some possible use cases where users would expect the whole pipeline to run regardless.
I'm not following. Run regardless of what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run regardless of if the point is dropped
|
||
* `context` - the resolved `Context` (the explicitly passed `Context` or the current `Context`) | ||
* `measurement` - a [Measurement](./api.md#measurement) that was recorded | ||
* `next` - this allows the `MeasurementProcessor` to pass the measurements to the next `MeasurementProcessor` in the chain. It can be a reference to the next `MeasurementProcessor`, a bound callback to invoke `OnMeasure` on the next processor in the chain without an explicit reference to the next processor, or something else. [OpenTelemetry SDK](../overview.md#sdk) authors MAY decide the language idiomatic approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May a processor create new measurements by invoking next()
multiple times? Dumb example
def on_measure(measurement, next)
measurement_squared = measurement.copy()
measurement_squared.attributes["is_squared"] = True
measurement_squared.value = measurement2.value**2
next(measurement)
next(measurement_squared)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is currently possible. I see some conflicting views regarding this. The use cases I wanted to cover do not require it, but it's been raised in this PR before. I'm happy to revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current design makes sense, but similarly to e.g. logs we should document what a measurement processor can mutate when dealing with measurements. For instance, I think that instrumentation scope and resource should come as arguments to OnMeasure
but next
should not accept neither instrumentation scope and resource. This way the processor could not be able to change the instrumentation scope nor resource (similarly to LogRecordProcessor
and SpanProcessor
)
Co-authored-by: Aaron Abbott <[email protected]>
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Fixes #4298
This PR adds the
MeasurementProcessor
concept to the Metrics SDK specification.The goal is to allow use cases such as:
Context