Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stale host / port bug for failure detector #15116

Merged
merged 1 commit into from
Feb 25, 2025

Conversation

yashmayya
Copy link
Collaborator

  • The query service port for a server (used for MSE queries) can change across server restarts.
  • There is currently an edge case scenario with the failure detector mechanism (See Rewrite FailureDetector interface and implementations to also work with the multi-stage engine #15005, Add connection based FailureDetector #8491 for background) in a scenario where there are MSE queries pre server restart that have established gRPC channels to the server host / port, and there aren't any MSE queries post server restart. Here, if an SSE query failure triggers the failure detector mechanism to mark the server as unhealthy, the server won't be marked as healthy until the retry period ends or if an MSE query is sent and a new gRPC channel is established to the server's new query service port.
  • This patch fixes the issue to not use stale host / port information for the failure detector's retry mechanism.

@codecov-commenter
Copy link

codecov-commenter commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.

Project coverage is 63.45%. Comparing base (59551e4) to head (abaf5b3).
Report is 1753 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/query/service/dispatch/QueryDispatcher.java 0.00% 8 Missing ⚠️
...roker/requesthandler/GrpcBrokerRequestHandler.java 0.00% 3 Missing ⚠️
...requesthandler/MultiStageBrokerRequestHandler.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15116      +/-   ##
============================================
+ Coverage     61.75%   63.45%   +1.70%     
- Complexity      207     1480    +1273     
============================================
  Files          2436     2748     +312     
  Lines        133233   154441   +21208     
  Branches      20636    23816    +3180     
============================================
+ Hits          82274    98000   +15726     
- Misses        44911    49045    +4134     
- Partials       6048     7396    +1348     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.41% <0.00%> (+1.70%) ⬆️
java-21 63.34% <0.00%> (+1.72%) ⬆️
skip-bytebuffers-false 63.44% <0.00%> (+1.69%) ⬆️
skip-bytebuffers-true 63.31% <0.00%> (+35.58%) ⬆️
temurin 63.45% <0.00%> (+1.70%) ⬆️
unittests 63.45% <0.00%> (+1.70%) ⬆️
unittests1 56.06% <0.00%> (+9.17%) ⬆️
unittests2 33.98% <0.00%> (+6.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yashmayya yashmayya requested a review from gortiz February 25, 2025 05:12
@yashmayya yashmayya merged commit fffa736 into apache:master Feb 25, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants