-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 crt client timeouts and retry strategy are not working. #2971
Comments
CRT retries and timeouts work differently from regular CPP SDK config option and CRT S3 client currently does not honor them. We have a backlog feature request here to improve the situation #2594. As an alternative you can configure lowSpeedLimit to indicate to CRT to kill requests that are too slow. |
Hi @DmitriyMusatkin, i tried setting lowSpeedLimit to 75MBps which will kill the requests that are below 75MBps and tried running the test.
|
Hi, |
the way low speed limit is configured in sdk is to kill connection if throughput dips under the specified number for a given number of intervals (3 by default and otherwise derived from request timeout) - https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp#L330. Hard to tell exactly, whats going on and we might need trace level logs to debug it further. Note: CRT already breaks up get requests into part requests based on the configured part size and your code does the same thing, so there might be something weird interaction between the 2 going on. |
Hi @DmitriyMusatkin attaching the trace logs for this can you please check this? |
bad link? it just points to this issue |
Sorry @DmitriyMusatkin here it is aws_sdk_2024-06-04-11.log |
The log you show is less than 2 secs.
You can see the task to monitor the throughput all canceled within 1 sec, which they are scheduled to run after 1 sec. referring to here The reason for those task to be canceled is because the request has completed.
As @DmitriyMusatkin said, SDK will only kill the connection unless it keeps being slow for a certain time. |
Hi @TingDaoK @DmitriyMusatkin,
https://getshared.com/SvtM9D7w Can you please explain why it is taking more time than the configured lowspeedlimt and targetthroughput and when will SDK kill the connection and what is the exact threshold to consider it to be slow? |
Hi, |
We reviewed the logs.
what seems to happen in your case is:
|
Hi @DmitriyMusatkin thanks for the detailed analysis.
Attaching the trace logs download link for issue 2 LINK can you please check this? |
Hi, |
|
@DmitriyMusatkin : Could you please point me to the place in the code where partSize is being used for parallel requests? So far I couldn't find a place where this value is being used as you described and I'm trying to understand how the parallelization works (e.g. by dynamically spawning more threads or by using async IO on the socket or ...). |
Describe the bug
Below mentioned s3 crt client configs we set.
We are reading 25MB at a time and we are measuring how much time it is taking for getObject.
average of 500ms taken for each getObject
Even though we set all the timeouts s3 crt client is taking more time?
We even tried with no retry strategy, but still there is no use and the behaviour is same.
GetObject() is not exiting even timeout is reached. One time we saw GetObject() for size 25MB took 10seconds.
Basically behavior is same even if we didn't set timeouts or assign custom retry strategy.
we are running this in a c5a4x large instance which is running in ap-south-1 region and s3 bucket is also in the same region and account.
Please tell whats wrong with this and suggest how to reduce time taken for GetObject().
Expected Behavior
With zero retries GetObject should exit once timeout is reached.
Current Behavior
With zero retries GetObject is not exiting even timeout is reached.
Reproduction Steps
zero retries
default strategy
Possible Solution
No response
Additional Information/Context
No response
AWS CPP SDK version used
1.11.269
Compiler and Version used
gcc (GCC) 4.8.5
Operating System and version
CentOS Linux and version 7
The text was updated successfully, but these errors were encountered: