Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for retry cancellation #2456

Merged
merged 4 commits into from
Feb 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions src/Polly.Core/Retry/RetryResilienceStrategy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ protected internal override async ValueTask<Outcome<T>> ExecuteCore<TState>(Func
TelemetryUtil.ReportExecutionAttempt(_telemetry, context, outcome, attempt, executionTime, handle);
}

if (context.CancellationToken.IsCancellationRequested || isLastAttempt || !handle)
if (isLastAttempt || !handle)
{
return outcome;
}
Expand Down Expand Up @@ -100,17 +100,20 @@ protected internal override async ValueTask<Outcome<T>> ExecuteCore<TState>(Func
await DisposeHelper.TryDisposeSafeAsync(resultValue, context.IsSynchronous).ConfigureAwait(context.ContinueOnCapturedContext);
}

// stryker disable once all : no means to test this
if (delay > TimeSpan.Zero)
try
{
try
context.CancellationToken.ThrowIfCancellationRequested();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the ThrowIfCancellationRequested here might be too late. The OnRetry telemetry event is reported and the OnRetry callback is already executed. But there will be no new retry attempt if cancellation is requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this. The event may be named "on retry," but the actual meaning is more like "on handled by the retry policy." Currently, any outcome that is not returned to the caller is considered "handled" and triggers this event. Users don't want to wait for the next attempt (which may or may not happen). We want to know the strategy was triggered since this tells us the callback completed and why the outcome was not returned.

Cancellation in .NET is cooperative. It's normal for code to finish doing certain important tasks until it comes to a better "stopping place" to acknowledge the cancellation. Logging/telemetry for what has just happened I think falls into this category. I would be open to skipping the "delay" calculation and logging a zero if cancellation is triggered, but I'm also not convinced this adds much value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now the strategy works like this on high level for a retry attempt (happy path):

  • user provided callback is executed
  • telemetry event is reported
  • delay is calculated
  • onretry is executed
  • delay is waited

The execution can be stopped due to the following circumstances: the outcome is not handled by the strategy, the attempts are exhausted. From one side treating the cancellation in a different way feels a bit odd. But I agree that if the user provided callback executed then the telemetry and OnRetry hook should be performed as well because they allow the consumers to get insights what happened.

The OnRetryArguments serves multiple purposes. It tells about the past (outcome, duration, etc.) but also shares some information about the future (delay). You can access the information whether the cancellation was requested via the context (context.CancellationToken.IsCancellationRequested) but since it is not a top-level field I have doubts that anyone has ever checked it. IMHO making this information as a top-level field would make the 0/-1 delay more meaningful by providing contextual information.

IMHO the best would be to have something like this:

flowchart TD
    Args[OnRetryAgruments]
    Current[CurrentAttempt]
    Next[NextAttempt]

    Args --> Current
    Args --> Next

    Current --> Duration
    Current --> Outcome
    Current --> A[etc.]

    Next --> IsCancelled
    Next --> Delay
    Next --> b[etc.]
Loading

Maybe in V9 😛


// stryker disable once all : no means to test this
if (delay > TimeSpan.Zero)
{
await _timeProvider.DelayAsync(delay, context).ConfigureAwait(context.ContinueOnCapturedContext);
}
catch (OperationCanceledException e)
{
return Outcome.FromException<T>(e);
}

}
catch (OperationCanceledException e)
{
return Outcome.FromException<T>(e);
}

if (incrementAttempts)
Expand Down
131 changes: 111 additions & 20 deletions test/Polly.Core.Tests/Retry/RetryResilienceStrategyTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -31,39 +31,130 @@ public void ExecuteAsync_EnsureResultNotDisposed()
}

[Fact]
public async Task ExecuteAsync_CancellationRequested_EnsureNotRetried()
public async Task ExecuteAsync_CanceledBeforeExecution_EnsureNotExecuted()
{
SetupNoDelay();
var sut = CreateSut();
using var cts = new CancellationTokenSource();
cts.Cancel();
var context = ResilienceContextPool.Shared.Get(cts.Token);
var executed = false;

var result = await sut.ExecuteOutcomeAsync((_, _) => { executed = true; return Outcome.FromResultAsValueTask("dummy"); }, context, "state");
result.Exception.ShouldBeOfType<OperationCanceledException>();
var result = await sut.ExecuteOutcomeAsync(
(_, _) =>
{
executed = true;
return Outcome.FromResultAsValueTask(new object());
},
ResilienceContextPool.Shared.Get(new CancellationToken(canceled: true)),
default(object));

result.Exception.ShouldBeAssignableTo<OperationCanceledException>();
executed.ShouldBeFalse();
}

[Fact]
public async Task ExecuteAsync_CancellationRequestedAfterCallback_EnsureNotRetried()
public async Task ExecuteAsync_CanceledDuringExecution_EnsureResultReturned()
{
using var cts = new CancellationTokenSource();
var sut = CreateSut();
using var cancellation = new CancellationTokenSource();
var executions = 0;

var result = await sut.ExecuteOutcomeAsync(
(_, _) =>
{
executions++;
cancellation.Cancel();
return Outcome.FromResultAsValueTask(new object());
},
ResilienceContextPool.Shared.Get(cancellation.Token),
default(object));

result.Exception.ShouldBeNull();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that the result from the delegate is indeed returned?

executions.ShouldBe(1);
}

[Fact]
public async Task ExecuteAsync_CanceledDuringExecution_EnsureNotExecutedAgain()
{
var reported = false;

_options.ShouldHandle = _ => PredicateResult.True();
_options.OnRetry = _ =>
{
cts.Cancel();
return default;
};
_options.OnRetry =
args =>
{
reported = true;
return default;
};

var sut = CreateSut(TimeProvider.System);
var context = ResilienceContextPool.Shared.Get(cts.Token);
var executed = false;
var sut = CreateSut();
using var cancellation = new CancellationTokenSource();
var executions = 0;

var result = await sut.ExecuteOutcomeAsync(
(_, _) =>
{
executions++;
cancellation.Cancel();
return Outcome.FromResultAsValueTask(new object());
},
ResilienceContextPool.Shared.Get(cancellation.Token),
default(object));

result.Exception.ShouldBeAssignableTo<OperationCanceledException>();
executions.ShouldBe(1);
reported.ShouldBeTrue();
}

[Fact]
public async Task ExecuteAsync_CanceledAfterExecution_EnsureNotExecutedAgain()
{
using var cancellation = new CancellationTokenSource();

_options.ShouldHandle = _ => PredicateResult.True();
_options.OnRetry =
args =>
{
cancellation.Cancel();
return default;
};

var sut = CreateSut();
var executions = 0;

var result = await sut.ExecuteOutcomeAsync(
(_, _) =>
{
executions++;
return Outcome.FromResultAsValueTask(new object());
},
ResilienceContextPool.Shared.Get(cancellation.Token),
default(object));

result.Exception.ShouldBeAssignableTo<OperationCanceledException>();
executions.ShouldBe(1);
}

[Fact]
public async Task ExecuteAsync_CanceledDuringDelay_EnsureNotExecutedAgain()
{
_options.ShouldHandle = _ => PredicateResult.True();

using var cancellation = _timeProvider.CreateCancellationTokenSource(_options.Delay);

var sut = CreateSut();
var executions = 0;

var resultTask = sut.ExecuteOutcomeAsync(
(_, _) =>
{
executions++;
return Outcome.FromResultAsValueTask(new object());
},
ResilienceContextPool.Shared.Get(cancellation.Token),
default(object));

_timeProvider.Advance(_options.Delay);
var result = await resultTask;

var result = await sut.ExecuteOutcomeAsync((_, _) => { executed = true; return Outcome.FromResultAsValueTask("dummy"); }, context, "state");
result.Exception.ShouldBeOfType<OperationCanceledException>();
executed.ShouldBeTrue();
result.Exception.ShouldBeAssignableTo<OperationCanceledException>();
executions.ShouldBe(1);
}

[Fact]
Expand Down