-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3A S3Seekable stream refactor + move S3AInputStream creation to factory under S3AStore #7295
base: feature-HADOOP-19363-analytics-accelerator-s3
Are you sure you want to change the base?
S3A S3Seekable stream refactor + move S3AInputStream creation to factory under S3AStore #7295
Conversation
First iteration * Factory interface with a parameter object creation method * Base class AbstractS3AInputStream for all streams to create * S3AInputStream subclasses that and has a factory * Production and test code to use it Not done * Input stream callbacks pushed down to S3Store * S3Store to dynamically choose factory at startup, stop in close() * S3Store to implement the factory interface, completing final binding operations (callbacks, stats) Change-Id: I8d0f86ca1f3463d4987a43924f155ce0c0644180
Revision API: Make clear this is part of the fundamental store Model: * abstract stream class is now ObjectInputStream * interface is ObjectInputStreamFactory * move to package org.apache.hadoop.fs.s3a.impl.model Implementation: Prefetching stream is created this way too; adds one extra parameter. Maybe we should pass conf down too Change-Id: I5bbb5dfe585528b047a649b6c82a9d0318c7e91e
Change-Id: If42bdd0b227c4da07c62a410a998e6d8c35581f6
Moves all prefetching stream related options into the prefetching stream factory; the standard ReadOpContext removes them, so a new PrefetchingOptions is passed around. Stream factories can now declare how many extra shared threads they want and whether or not to create a future pool around the bounded pool. This is used in S3AFileSystem when creating its thread pools -this class no longer reads in any of the prefetching options. All tests which enable/disable prefetching, or probe for its state, now use S3ATestUtils methods for this. This avoids them having to now explicitly unset two properties, set the new input stream type, and any more complications in test setup in future. Everything under S3AStore is a service, so service lifecycle matches everywhere -and store just adds to the list of managed services for start/stop/close integration. + adjust assertions in ITestS3AInputStreamLeakage for prefetching + update the prefetching.md doc for factory changs + javadocs + add string values of type names to Constants Once the analytics stream is in, a full doc on "stream performance" will be needed. package for this stuff is now impl.streams Change-Id: Id6356d2ded2c477ba16cbb9027ac0cfbece2a542
Push factory construction into the enum itself Store implements stream capabilities, which are then relayed to the active factory. This avoids the FS having to know what capabilities are available in the stream. Abstract base class for stream factories. Change-Id: Ib757e6696f29cc7e0e8edd1119e738c6adc6f98f
Change-Id: Id79f8aa019095c1601bb0b2a282c51bdb0b7b817
Conflicts: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java Change-Id: I1eddd195a9a3e3332bfaac2e225acf69774c3ce8
26977dc
to
98bc8f4
Compare
98bc8f4
to
6fc63b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @rajdchak for this change. I put some minor comments.
@@ -230,7 +232,23 @@ public class S3AStoreImpl | |||
@Override | |||
protected void serviceInit(final Configuration conf) throws Exception { | |||
|
|||
objectInputStreamFactory = createStreamFactory(conf); | |||
if(conf.getBoolean(ANALYTICS_ACCELERATOR_ENABLED_KEY, ANALYTICS_ACCELERATOR_ENABLED_DEFAULT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we still doing this or using the new StreamKind? See here:
Adds a new config, fs.s3a.input.stream.type. This can be set to classic, prefetch, analytics. Believe this is better than having multipleprefetch.enabled and analytics.enabled flags.
LOG.info("Using S3SeekableInputStream"); | ||
if(analyticsAcceleratorCRTEnabled) { | ||
LOG.info("Using S3 CRT client for analytics accelerator S3"); | ||
s3AsyncClient = S3CrtAsyncClient.builder().maxConcurrency(600).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to other shall we move this to a method getOrCreateAsyncCRTClient? or maybe even change the existing method to make a decision to use CRT or not?
|
||
import static org.apache.hadoop.fs.s3a.Constants.*; | ||
|
||
public class S3SeekableInputStreamFactory extends AbstractObjectInputStreamFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about renaming this to S3ASeekableInputStreamFactory
. This is inline with the S3ASeekableInputStream name and also we can get rid of full-path reference in the below lines
|
||
@Override | ||
public ObjectInputStream readObject(final ObjectReadParameters parameters) throws IOException { | ||
return new S3ASeekableStream( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if shall we rename this class to S3ASeekableInputStream since it now implements ObjectInputStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept this name following the others PrefetchingInputStreamFactory and ClassicObjectInputStreamFactory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, i was asking this for S3ASeekableStream -> S3ASeekableInputStream
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Description of PR
Move InputStreamCreation to the new Factory
How was this patch tested?
Tested using the integration tests
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?