Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/httpforwarder] Shutdown should wait server exit #37735

Merged

Conversation

pjanotti
Copy link
Contributor

@pjanotti pjanotti commented Feb 6, 2025

Description

Ensures that the Shutdown waits for the server to release the listen port and finish the corresponding go routine.

Although a bug on the component it doesn't seem to deserve a changelog since it was detected via tests and there is no issue around the shutdown of the component.

Link to tracking issue

Fixes #37716

Testing

Repro the issue locally and validated no repro with the changes.

Documentation

N/A

@pjanotti pjanotti requested review from atoulme and a team as code owners February 6, 2025 01:53
@pjanotti pjanotti added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Feb 6, 2025
Comment on lines +67 to +69
err := h.server.Close()
h.shutdownWG.Wait()
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using Shutdown ?

Copy link
Contributor Author

@pjanotti pjanotti Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of Shutdown vs Close is orthogonal to the issue being fixed in this PR: deterministic release of the port used by the component. In both cases there is still the need to wait the call to Serve(listener) to return.

I tend to prefer Close since, in certain cases Shutdown can take much longer given that it needs to wait existing connections to finish gracefully. That said the component owners will have more context if they want to change the current behavior in that regard.

Copy link
Member

@bogdandrutu bogdandrutu Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may miss something:

// Shutdown gracefully shuts down the server without interrupting any
// active connections. Shutdown works by first closing all open
// listeners, then closing all idle connections, and then waiting
// indefinitely for connections to return to idle and then shut down.
// If the provided context expires before the shutdown is complete,
// Shutdown returns the context's error, otherwise it returns any
// error returned from closing the [Server]'s underlying Listener(s).

A.k.a because it returns the error from the listener that function that you wait for it must be done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take into account the possible interleaves of the calls to Serve and Shutdown/Close. The ownership of listener is transferred to the http.Server when the call to Serve(listener) happens. However, since that happens in a goroutine it is possible that Shutdown is called before the http.Server took the ownership of listener. In this case the listener is not closed until the call to Serve happens. Serve adds a deferred call to listener.Close, even if Shutdown was already called, which ensures that the port is released by the time the method returns.

While this may be rather improbable in practice, it is better for the component to behave deterministically with the shutdown ensuring no left over goroutine and/or port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu PTAL at reply above

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a solution tofor the flaky test. The Close vs Shutdown discussion can be solved in a separate PR if needed

@dmitryax dmitryax merged commit 408f780 into open-telemetry:main Feb 20, 2025
162 checks passed
@github-actions github-actions bot added this to the next release milestone Feb 20, 2025
yiquanzhou added a commit to dash0hq/opentelemetry-collector-contrib that referenced this pull request Feb 24, 2025
* main: (55 commits)
  [chore] Update core dependencies (open-telemetry#38124)
  Add kafka topics observer implementation (open-telemetry#38060)
  [exporter/splunk_hec] Mute errors from draining the response body (open-telemetry#38118)
  [chore] [exporter/splunk_hec] Remove dead code (open-telemetry#38113)
  Add support for JUnit test results (open-telemetry#37941)
  [chore] amend changelog for prometheus receiver change (open-telemetry#38109)
  [chore] Fix dead links in issue-triaging.md (open-telemetry#38105)
  [chore] fix deprecation (open-telemetry#38107)
  [exporter/coralogix] Add new batch options to Coralogix exporter (open-telemetry#38082)
  [chore][exporter/datadog] fix integration test (open-telemetry#38091)
  [chore] Update otel to unblock contrib test in core repo (open-telemetry#38100)
  [chore] Bump go-version match to 1.23 (open-telemetry#38099)
  [exporter/elasticsearch] Add _metric_names_hash to avoid metric rejections (open-telemetry#37511)
  elasticsearchexporter: refactor encoding; drop metrics support from raw/none/bodymap mapping modes (open-telemetry#37928)
  [exporter/stefexporter] Fix incorrectly implemented STEF exporter zstd compression option (open-telemetry#38089)
  [exporter/clickhouse] Add client info for identifying exporter in `system.query_log` (open-telemetry#37146)
  [chore] Prepare release 0.120.1 (open-telemetry#38055)
  [extension/httpforwarder] Shutdown should wait server exit (open-telemetry#37735)
  receiver/prometheusremotewrite: Add two fields timestamp and value. (open-telemetry#37895)
  [reciver/sqlqueryreceiver] Add support for SapASE (sybase) (open-telemetry#37773)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension/httpforwarder Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[flaky test][exporter/httpforwarder] TestComponentLifecycle is flaky
5 participants