Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data integrity when uploading objects using BlockingInputStreamAsyncRequestBody #5957

Open
1 task
sne11ius opened this issue Mar 13, 2025 · 2 comments
Open
1 task
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. p1 This is a high priority issue

Comments

@sne11ius
Copy link

sne11ius commented Mar 13, 2025

Describe the bug

This issue reproduces a bug in the AWS SDK for Java (S3 Async Client) where data integrity is not preserved when uploading large objects (e.g., 50 MB) using BlockingInputStreamAsyncRequestBody. The uploaded data's SHA-256 digest does not match the downloaded data's digest, indicating corruption or mishandling during the upload/download process.

If I reduce the stream size (eg. to 10mb), the test succeeds.

import static java.util.Base64.getEncoder;
import static org.junit.jupiter.api.Assertions.assertEquals;

import org.junit.jupiter.api.Test;
import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
import software.amazon.awssdk.auth.credentials.AwsCredentials;
import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
import software.amazon.awssdk.core.ResponseInputStream;
import software.amazon.awssdk.core.async.AsyncRequestBody;
import software.amazon.awssdk.core.async.AsyncResponseTransformer;
import software.amazon.awssdk.core.async.BlockingInputStreamAsyncRequestBody;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.s3.S3AsyncClient;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.services.s3.model.PutObjectResponse;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Random;
import java.util.UUID;
import java.util.concurrent.CompletableFuture;

class S3ClientTest
{
    @Test
    public void test() throws NoSuchAlgorithmException, IOException
    {
        final String bucketName = "redacted";
        final String objectPath = UUID.randomUUID().toString();
        final int size = 1024 * 1024 * 50;
        final Random r = new Random("nice".hashCode());
        AwsCredentials credentials = AwsBasicCredentials.create("redacted", "redacted");
        S3AsyncClient s3AsyncClient = S3AsyncClient
            .builder()
            .credentialsProvider(StaticCredentialsProvider.create(credentials))
            .region(Region.EU_NORTH_1)
            .multipartEnabled(true)
            .forcePathStyle(true)
            .build();

        BlockingInputStreamAsyncRequestBody body = AsyncRequestBody.forBlockingInputStream(null);

        CompletableFuture<PutObjectResponse> put =
            s3AsyncClient.putObject(req -> req.bucket(bucketName).key(objectPath).build(), body);
        DigestInputStream inputStream = new DigestInputStream(new InputStream()
        {
            int current = 0;

            @Override
            public int read()
            {
                if (current++ >= size)
                {return -1;}
                return r.nextInt(256);
            }
        }, MessageDigest.getInstance("SHA-256"));
        body.writeInputStream(inputStream);
        put.join();
        String uploadDigest = getEncoder().encodeToString(inputStream.getMessageDigest().digest());
        GetObjectRequest.builder().bucket(bucketName).key(objectPath).build();
        ResponseInputStream<GetObjectResponse> res = s3AsyncClient
            .getObject(req -> req.bucket(bucketName).key(objectPath).build(),
                AsyncResponseTransformer.toBlockingInputStream())
            .join();
        DigestInputStream digestInputStream = new DigestInputStream(res, MessageDigest.getInstance("SHA-256"));
        digestInputStream.transferTo(new OutputStream()
        {
            @Override
            public void write(int b) { }
        });
        digestInputStream.close();
        String downloadDigest = getEncoder().encodeToString(digestInputStream.getMessageDigest().digest());
        assertEquals(uploadDigest, downloadDigest);
    }
}

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

I would expect this test to succeed.

Current Behavior

The test fails because the upload digest (uploadDigest) does not match the download digest (downloadDigest).

Reproduction Steps

Run the test

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.30.38

JDK version used

openjdk version "21.0.5" 2024-10-15 LTS OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)

Operating System and version

Ubuntu 24.04.1 LTS

@sne11ius sne11ius added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 13, 2025
@sne11ius sne11ius changed the title Incorrect Data Integrity with AWS S3 Async Client When Uploading Large Objects Using BlockingInputStreamAsyncRequestBody Data integrity with when uploading objects using BlockingInputStreamAsyncRequestBody Mar 13, 2025
@sne11ius sne11ius changed the title Data integrity with when uploading objects using BlockingInputStreamAsyncRequestBody Data integrity when uploading objects using BlockingInputStreamAsyncRequestBody Mar 13, 2025
@bhoradc bhoradc added investigating This issue is being investigated and/or work is in progress to resolve the issue. p1 This is a high priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Mar 13, 2025
@bhoradc
Copy link

bhoradc commented Mar 13, 2025

Hi @sne11ius,

Thank you for reporting the issue and providing the minimal reproducible code sample.

We are able to reproduce the mentioned scenario where both hashes for upload digest and download digest varies for file sizes greater than ~20 MB. We are further investigating into this issue.

Regards,
Chaitanya

@zoewangg zoewangg pinned this issue Mar 14, 2025
@zoewangg zoewangg unpinned this issue Mar 14, 2025
@sne11ius
Copy link
Author

Thanks, that was a quick fix. We updated to 2.31.1 and it works now.
Could this also fix #4272 by any chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. investigating This issue is being investigated and/or work is in progress to resolve the issue. p1 This is a high priority issue
Projects
None yet
Development

No branches or pull requests

2 participants