Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add ADR dir and error handling ADR #2664

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

scottgerring
Copy link
Contributor

@scottgerring scottgerring commented Feb 14, 2025

Fixes #2571

Capturing decision record for error handling in repo with new docs.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@scottgerring scottgerring marked this pull request as ready for review February 14, 2025 09:59
@scottgerring scottgerring requested a review from a team as a code owner February 14, 2025 09:59
Copy link

codecov bot commented Feb 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.3%. Comparing base (29eda05) to head (651cc41).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2664   +/-   ##
=====================================
  Coverage   79.3%   79.3%           
=====================================
  Files        123     123           
  Lines      22654   22654           
=====================================
  Hits       17970   17970           
  Misses      4684    4684           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@scottgerring
Copy link
Contributor Author

@cijothomas re-wrote a bunch to reflect discussion and current state!


### When to box custom errors

Note above that we do not box anything into `InternalFailure`. Our rule here is that if the caller cannot reasonably be expected to handle a particular error variant, we will use a simplified interface that returns only a descriptive string. In the concrete example we are using with the exporters, we have a [strong signal in the opentelemetry-specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#export) that indicates concretely that the error types are not actionable by the caller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this phrasing!

@scottgerring
Copy link
Contributor Author

@cijothomas is this good to merge? It'd be great to have a concrete example in place so we can start to follow the pattern - for instance for the tracing interop Björn is working on

@@ -0,0 +1,5 @@
# Architectural Decision Records

This directory contains architectural decision records made for the opentelemetry-rust project. These allow us to consolidate discussion, options, and outcomes, around key architectural decisions. You can read more about ADRs [here](https://adr.github.io/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid links to the adr.github.io/similar. We can simply document the designs here, without necessarily adhering to any particular version of it.


Note above that we do not box anything into `InternalFailure`. Our rule here is that if the caller cannot reasonably be expected to handle a particular error variant, we will use a simplified interface that returns only a descriptive string. In the concrete example we are using with the exporters, we have a [strong signal in the opentelemetry-specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#export) that indicates concretely that the error types are not actionable by the caller.

If the caller may potentially recover from an error, we will follow [canonical's rust best practices](https://canonical.github.io/rust-best-practices/error-and-panic-discipline.html) and instead preserve the nested error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to remove the link to canonical guides. It is not clear whether we'll always follow it or not.


This approach generalises across both **signals** and **trait methods**. For example, returning to our exporter traits, we have a trait that looks the same for each signal, with the same three methods. Upon closer inspection ([#2600](https://github.com/open-telemetry/opentelemetry-rust/issues/2600)), the potential error set is the same both between the methods *and* between the signals; this means we can use a single shared error type across both axes:

```rust
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us put this under a header - Example/something, so its easy to link to from other discussions.


## Considered Options

**Option 1: Continue as is**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still suggest to only list the decision here to avoid long reads. At the end of the doc, we can mention considered-alternatives, and move this there.


## Accepted Option

**Option 3**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to make it super easy for future reader to know what is the design guidance here, without having to scan through rest of doc. i.e Let's put the followed design right here, and a why that was chosen just below it.

Everything else can be moved to bottom of the doc. https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/log/DESIGN.md#rejected-alternatives has something like this.

Our preference for error types is thus:

1. Consolidated error that covers all methods of a particular "trait type" (e.g., signal export) and method
1. Devolves into error type per method of a particular trait type (e.g., `SdkShutdownResult`, `SdkExportResult`) _if the error types need to diverge_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final outcome is - there is no separate Export vs Shutdown result.

pub trait LogExporter {
fn export(...) -> OtelSdkResult;
fn shutdown(...) -> OtelSdkResult;
fn force_flush(...) -> OTelSdkResult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - indentation

pub trait SpanExporter {
fn export(...) -> OtelSdkResult;
fn shutdown(...) -> OtelSdkResult;
fn force_flush(...) -> OTelSdkResult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - indentation


Note above that we do not box anything into `InternalFailure`. Our rule here is that if the caller cannot reasonably be expected to handle a particular error variant, we will use a simplified interface that returns only a descriptive string. In the concrete example we are using with the exporters, we have a [strong signal in the opentelemetry-specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#export) that indicates concretely that the error types are not actionable by the caller.

If the caller may potentially recover from an error, we will follow [canonical's rust best practices](https://canonical.github.io/rust-best-practices/error-and-panic-discipline.html) and instead preserve the nested error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear - if InternalError variant is defined as InternalFailure(String), - it strictly holds the string, and no boxing is possible. It would be good to clarify that, or specify the example for variant with boxing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error handling ADR
3 participants