Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3A S3Seekable stream refactor + move S3AInputStream creation to factory under S3AStore #7295

Open
wants to merge 8 commits into
base: feature-HADOOP-19363-analytics-accelerator-s3
Choose a base branch
from

Conversation

rajdchak
Copy link

@rajdchak rajdchak commented Jan 17, 2025

Description of PR

Move InputStreamCreation to the new Factory

How was this patch tested?

Tested using the integration tests

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

steveloughran and others added 7 commits January 16, 2025 12:20
First iteration
* Factory interface with a parameter object creation method
* Base class AbstractS3AInputStream for all streams to create
* S3AInputStream subclasses that and has a factory
* Production and test code to use it

Not done
* Input stream callbacks pushed down to S3Store
* S3Store to dynamically choose factory at startup, stop in close()
* S3Store to implement the factory interface, completing final binding
  operations (callbacks, stats)

Change-Id: I8d0f86ca1f3463d4987a43924f155ce0c0644180
Revision

API: Make clear this is part of the fundamental store Model:

* abstract stream class is now ObjectInputStream
* interface is ObjectInputStreamFactory
* move to package org.apache.hadoop.fs.s3a.impl.model

Implementation: Prefetching stream is created this way too;
adds one extra parameter.

Maybe we should pass conf down too

Change-Id: I5bbb5dfe585528b047a649b6c82a9d0318c7e91e
Change-Id: If42bdd0b227c4da07c62a410a998e6d8c35581f6
Moves all prefetching stream related options into the prefetching stream
factory; the standard ReadOpContext removes them, so
a new PrefetchingOptions is passed around.

Stream factories can now declare how many extra shared threads they
want and whether or not to create a future pool around the bounded pool.
This is used in S3AFileSystem when creating its thread pools -this class
no longer reads in any of the prefetching options.

All tests which enable/disable prefetching, or probe for its state,
now use S3ATestUtils methods for this.
This avoids them having to now explicitly unset two properties,
set the new input stream type, and any more complications in test
setup in future.

Everything under S3AStore is a service, so service lifecycle matches everywhere
-and store just adds to the list of managed services for start/stop/close
integration.

+ adjust assertions in ITestS3AInputStreamLeakage for prefetching
+ update the prefetching.md doc for factory changs
+ javadocs
+ add string values of type names to Constants

Once the analytics stream is in, a full doc on "stream performance"
will be needed.

package for this stuff is now impl.streams

Change-Id: Id6356d2ded2c477ba16cbb9027ac0cfbece2a542
Push factory construction into the enum itself

Store implements stream capabilities, which are then
relayed to the active factory. This avoids the FS having
to know what capabilities are available in the stream.

Abstract base class for stream factories.

Change-Id: Ib757e6696f29cc7e0e8edd1119e738c6adc6f98f
Change-Id: Id79f8aa019095c1601bb0b2a282c51bdb0b7b817
Conflicts:
  hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

Change-Id: I1eddd195a9a3e3332bfaac2e225acf69774c3ce8
Copy link

@fuatbasik fuatbasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @rajdchak for this change. I put some minor comments.

@@ -230,7 +232,23 @@ public class S3AStoreImpl
@Override
protected void serviceInit(final Configuration conf) throws Exception {

objectInputStreamFactory = createStreamFactory(conf);
if(conf.getBoolean(ANALYTICS_ACCELERATOR_ENABLED_KEY, ANALYTICS_ACCELERATOR_ENABLED_DEFAULT)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we still doing this or using the new StreamKind? See here:

Adds a new config, fs.s3a.input.stream.type. This can be set to classic, prefetch, analytics. Believe this is better than having multipleprefetch.enabled and analytics.enabled flags.

LOG.info("Using S3SeekableInputStream");
if(analyticsAcceleratorCRTEnabled) {
LOG.info("Using S3 CRT client for analytics accelerator S3");
s3AsyncClient = S3CrtAsyncClient.builder().maxConcurrency(600).build();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to other shall we move this to a method getOrCreateAsyncCRTClient? or maybe even change the existing method to make a decision to use CRT or not?


import static org.apache.hadoop.fs.s3a.Constants.*;

public class S3SeekableInputStreamFactory extends AbstractObjectInputStreamFactory {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming this to S3ASeekableInputStreamFactory. This is inline with the S3ASeekableInputStream name and also we can get rid of full-path reference in the below lines


@Override
public ObjectInputStream readObject(final ObjectReadParameters parameters) throws IOException {
return new S3ASeekableStream(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if shall we rename this class to S3ASeekableInputStream since it now implements ObjectInputStream

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept this name following the others PrefetchingInputStreamFactory and ClassicObjectInputStreamFactory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i was asking this for S3ASeekableStream -> S3ASeekableInputStream

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 51 new or modified test files.
_ feature-HADOOP-19363-analytics-accelerator-s3 Compile Tests _
+0 🆗 mvndep 5m 57s Maven dependency ordering for branch
+1 💚 mvninstall 30m 13s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 compile 16m 43s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 15m 45s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 4m 12s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 mvnsite 1m 43s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 javadoc 1m 38s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 25s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 2m 40s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 shadedclient 37m 38s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 38m 4s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 1m 3s the patch passed
+1 💚 compile 17m 26s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 17m 26s the patch passed
+1 💚 compile 15m 7s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 15m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 10s /results-checkstyle-root.txt root: The patch generated 47 new + 34 unchanged - 6 fixed = 81 total (was 40)
+1 💚 mvnsite 1m 38s the patch passed
-1 ❌ javadoc 0m 47s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌ javadoc 0m 44s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga.
+1 💚 spotbugs 3m 11s the patch passed
+1 💚 shadedclient 37m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 29m 49s /patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt hadoop-hdfs-rbf in the patch passed.
+1 💚 unit 3m 0s hadoop-aws in the patch passed.
-1 ❌ asflicense 0m 57s /results-asflicense.txt The patch generated 1 ASF License warnings.
241m 21s
Reason Tests
Failed junit tests hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/1/artifact/out/Dockerfile
GITHUB PR #7295
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname Linux 581ce05f3dd3 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-19363-analytics-accelerator-s3 / 26977dc
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/1/testReport/
Max. process+thread count 3736 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 6m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 51 new or modified test files.
_ feature-HADOOP-19363-analytics-accelerator-s3 Compile Tests _
+0 🆗 mvndep 5m 39s Maven dependency ordering for branch
+1 💚 mvninstall 19m 17s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 compile 9m 12s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 8m 34s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 1m 54s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 mvnsite 1m 0s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 javadoc 0m 57s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 50s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 1m 30s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 shadedclient 22m 7s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 22m 24s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 20s Maven dependency ordering for patch
+1 💚 mvninstall 0m 43s the patch passed
+1 💚 compile 9m 59s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 9m 59s the patch passed
+1 💚 compile 8m 38s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 8m 38s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 1s /results-checkstyle-root.txt root: The patch generated 42 new + 34 unchanged - 6 fixed = 76 total (was 40)
+1 💚 mvnsite 1m 1s the patch passed
-1 ❌ javadoc 0m 28s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌ javadoc 0m 32s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga.
+1 💚 spotbugs 1m 59s the patch passed
+1 💚 shadedclient 22m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 38m 54s /patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt hadoop-hdfs-rbf in the patch passed.
+1 💚 unit 2m 24s hadoop-aws in the patch passed.
-1 ❌ asflicense 0m 42s /results-asflicense.txt The patch generated 1 ASF License warnings.
172m 16s
Reason Tests
Failed junit tests hadoop.fs.contract.router.TestRouterHDFSContractRootDirectorySecure
hadoop.fs.contract.router.TestRouterHDFSContractOpenSecure
hadoop.fs.contract.router.TestRouterHDFSContractSetTimes
hadoop.fs.contract.router.TestRouterHDFSContractConcatSecure
hadoop.fs.contract.router.TestRouterHDFSContractGetFileStatus
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/3/artifact/out/Dockerfile
GITHUB PR #7295
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname Linux 02891828666b 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-19363-analytics-accelerator-s3 / 6fc63b7
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/3/testReport/
Max. process+thread count 3341 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 51 new or modified test files.
_ feature-HADOOP-19363-analytics-accelerator-s3 Compile Tests _
+0 🆗 mvndep 5m 42s Maven dependency ordering for branch
+1 💚 mvninstall 33m 2s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 compile 16m 28s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 15m 26s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 5m 11s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 mvnsite 1m 45s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 javadoc 1m 43s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 1m 28s feature-HADOOP-19363-analytics-accelerator-s3 passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 2m 47s feature-HADOOP-19363-analytics-accelerator-s3 passed
+1 💚 shadedclient 36m 41s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 37m 6s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 1m 3s the patch passed
+1 💚 compile 15m 58s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 15m 58s the patch passed
+1 💚 compile 15m 6s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 15m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 48s /results-checkstyle-root.txt root: The patch generated 47 new + 34 unchanged - 6 fixed = 81 total (was 40)
+1 💚 mvnsite 1m 41s the patch passed
-1 ❌ javadoc 0m 50s /patch-javadoc-hadoop-tools_hadoop-aws-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.txt hadoop-aws in the patch failed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04.
-1 ❌ javadoc 0m 44s /patch-javadoc-hadoop-tools_hadoop-aws-jdkPrivateBuild-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga.txt hadoop-aws in the patch failed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga.
+1 💚 spotbugs 3m 17s the patch passed
+1 💚 shadedclient 37m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 30m 10s hadoop-hdfs-rbf in the patch passed.
+1 💚 unit 3m 6s hadoop-aws in the patch passed.
-1 ❌ asflicense 1m 5s /results-asflicense.txt The patch generated 1 ASF License warnings.
244m 9s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/2/artifact/out/Dockerfile
GITHUB PR #7295
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname Linux c3a3e3db058e 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision feature-HADOOP-19363-analytics-accelerator-s3 / 98bc8f4
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/2/testReport/
Max. process+thread count 3796 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-rbf hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7295/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants