-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/stefexporter] Add basic STEF exporter implementation #37564
base: main
Are you sure you want to change the base?
[exporter/stefexporter] Add basic STEF exporter implementation #37564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has lots of concurrency issues, and I would suggest to simplify the design because otherwise this is impossible to get right.
// stefWriter is not safe for concurrent writing, protect it. | ||
s.stefWriterMutex.Lock() | ||
defer s.stefWriterMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing synchronous cross cloud regions (since if I understand correctly STEF is for that) is a questionable design in my opinion. Should we at least have multiple "connection" that we use in the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, this is not a good implementation for a streaming protocol. I will rework once we discuss and agree on how we want streaming exporters like this to work.
@bogdandrutu I agree, I don't like the design myself. If we can find a simpler way I will be happier. Here are my constraints: Prefer single gRPC streamWhy? Two reasons:
Sync API of Exporter HelperThe current exporter helper design requires exportMetrics() call to synchronously block until sent data is confirmed to be delivered to the destination via ACK messages that destination sends back on the same gRPC stream. When exportMetrics() returns, the metric data is removed from the queue (and is garbage collected). If we change exportMetrics() to return before data is confirmed to be delivered to the destination, without waiting for ACK messages, then there is a chance that the data will be lost if the gRPC connection breaks before STEF data is actually delivered to the destination. Furthermore if exportMetrics() were to return immediately after encoding STEF data, that data most likely will not be written to the gRPC stream at all, since STEF encoders buffer data into fairly large frames before being written to gRPC stream. To guarantee encoded data is sent over the gRPC stream exportMetrics() has to issue a Flush() call to STEF encoder. If this is done for every single exportMetrics() call this can result in very significant reduction in compression ratio since there is typically a fixed overhead per STEF frame (Flush() sends and creates a new frame). The difference in experiments is about 2x times worse compression if you do Flush() every time (on the datasets I have). This is unacceptable and defeats the purpose of STEF. Note, as described above, even issuing Flush() call every time does not guarantee delivery, so this is still not good enough for reliable delivery. It would be ideal if exporter helper design decoupled the act of consuming from the queue from the act of deleting from the queue. This would be perfect for asynchronous protocols like STEF. For example a hypothetical async exporter design could look like this: func exportMetrics(ctx context.Context, md pdata.Metrics, ack func(id SomeIDType)) (id SomeIDType, err error) With this API we would then implement STEF exporter's exportMetrics() call to return immediately after encoding md into STEF stream and return the id of the written record. STEF exporter later would asynchronously call the ack() func when it receives delivery confirmation from the destination. This would also allow to have much, much simpler implementation of STEF exporter, I would delete 90% of the code that you (and I) don't like. I have briefly discussed this topic with @dmitryax but I think this is a much bigger effort and for now we have to work with the exporter helper API we have. If you have any thoughts on how to simplify the design within the current constraints or if you think there is a better way to handle asynchronous sending please tell. |
@tigrannajaryan The OTel-Arrow exporter supports a limited number of streams and I would agree, the concurrency wasn't easy to get right. In the experiment (https://opentelemetry.io/blog/2024/otel-arrow-production/), we found that a single-stream would lead to problems associated with high latency, there's a point at which your batches are large enough to get the compression benefit you want, and at that point it's better to add a stream. I've suggested it once, now twice: I think it will be worth the time and energy to generalize the otelarrow codebase, to let it support multiple codecs for large-frame compression protocols like STEF and OTAP so that most of the code between these two is shared. I suspect at least 90% of the exporter/receiver codebase are not directly concerned with the OTAP representation, because most of the challenge is handling gRPC streams and cancellation. |
@jmacd I agree, it would be great to have a general solution for streaming exporters. I will see if I can find time to look at OTAP codebase. |
dcbdb78
to
a3eaeb8
Compare
@bogdandrutu I simplified the implementation to a completely basic one, eliminating significant portion of concurrency control, so it should be easier now to reason about. Please take another look. This implementation is extremely basic and is not what I would like to see in the production version. A proper version would not block for acks. I think we should discuss how exactly we want exporters like this to be written and perhaps have generic helpers for these uses cases like @jmacd offered. Since that's a longer story I think it is worth having the basic implementation as a reference and improve it after we decide on the direction. |
637de4b
to
62bcc5a
Compare
62bcc5a
to
d4cc29f
Compare
Added STEF exporter implementation for metrics, sending data over gRPC stream. For now only timeout, queuing and retry exporter helpers are used. We will need to decide later if other helpers are needed for this exporter. Unit tests that verify connecting, reconnecting, sending, acking of data are included. Added to README. More extensive test coverage is desirable and will likely be added in the future. We likely want to implement STEF receiver and add STEF as a tested protocol to our testbed. A full-duplex implementation is desirable, which takes advantage of the streaming nature of STEF protocol and does not block waiting for acks.
d4cc29f
to
5b3c9aa
Compare
Description
Added STEF exporter implementation for metrics, sending data over gRPC stream. For now only queuing and retry exporter helpers are used. We will need to decide later if other helpers are needed for this exporter.
Testing
Unit tests that verify connecting, reconnecting, sending, acking of data are included.
Documentation
Added to README.
Future Work
More extensive test coverage is desirable and will likely be added in the future.
We likely want to implement STEF receiver and add STEF as a tested protocol to our testbed.