Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19131. Assist reflection IO with WrappedOperations class #6686

Merged

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Mar 28, 2024

HADOOP-19131
Assist reflection IO with WrappedOperations class

How was this patch tested?

Needs new tests going through reflection, maybe some in openfile contract to guarantee full use.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran
Copy link
Contributor Author

prepared parquet for this by renaming vectorio package to org.apache.parquet.hadoop.util.wrappedio

@steveloughran steveloughran force-pushed the s3/HADOOP-19131-WrappedOperations branch from 12c95ff to 668c1ce Compare April 5, 2024 16:51
@steveloughran steveloughran force-pushed the s3/HADOOP-19131-WrappedOperations branch 2 times, most recently from c1e52f5 to 827b41c Compare April 25, 2024 17:57
@steveloughran steveloughran force-pushed the s3/HADOOP-19131-WrappedOperations branch from 0dad2aa to e6241ab Compare May 20, 2024 20:26
@steveloughran steveloughran force-pushed the s3/HADOOP-19131-WrappedOperations branch from 128ba0c to 128e2d7 Compare May 29, 2024 13:21
steveloughran added a commit to steveloughran/iceberg that referenced this pull request May 29, 2024
This is in sync with apache/hadoop#6686
which has renamed one of the method names to load.

The new DynamicWrappedIO class is based on one being written as
part of that PR, as both are based on the Parquet DynMethods class
a copy-and-paste is straightforward.
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
Copy link
Contributor Author

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • think I might cut the new read forms (parquet, orc) from the read policy, though parquet/1 and parquet/3 may be good

@steveloughran
Copy link
Contributor Author

@mukund-thakur this pr renames bulkDelete_PageSize to bulkDelete_pageSize to be consistent with everything else.

My iceberg PR apache/iceberg#10233 looks for the new name; it is now dynamic and should build link up if we can think of a way to test it (proposed: make it an option to use if present), default is true.

@apache apache deleted a comment from hadoop-yetus Jul 24, 2024
@steveloughran steveloughran force-pushed the s3/HADOOP-19131-WrappedOperations branch from b44df10 to a60f769 Compare July 26, 2024 16:41
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 13 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 27s Maven dependency ordering for branch
+1 💚 mvninstall 33m 19s trunk passed
+1 💚 compile 17m 27s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 compile 16m 20s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 checkstyle 4m 26s trunk passed
+1 💚 mvnsite 6m 0s trunk passed
+1 💚 javadoc 4m 52s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javadoc 5m 16s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 9m 28s trunk passed
+1 💚 shadedclient 34m 6s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 34m 33s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 3m 35s the patch passed
+1 💚 compile 16m 53s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javac 16m 53s the patch passed
+1 💚 compile 16m 15s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 javac 16m 15s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 27s /results-checkstyle-root.txt root: The patch generated 59 new + 66 unchanged - 3 fixed = 125 total (was 69)
+1 💚 mvnsite 5m 59s the patch passed
-1 ❌ javadoc 1m 12s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2 with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 javadoc 5m 20s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 10m 39s the patch passed
+1 💚 shadedclient 34m 27s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 53s hadoop-common in the patch passed.
+1 💚 unit 226m 33s hadoop-hdfs in the patch passed.
+1 💚 unit 3m 21s hadoop-aws in the patch passed.
+1 💚 unit 2m 55s hadoop-azure in the patch passed.
+1 💚 unit 0m 55s hadoop-aliyun in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
510m 43s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/31/artifact/out/Dockerfile
GITHUB PR #6686
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname Linux dcb10dc97a56 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a60f769
Default Java Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/31/testReport/
Max. process+thread count 3660 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure hadoop-tools/hadoop-aliyun U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/31/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

javadocs

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/functional/FunctionalIO.java:83: warning: no @param for <R>

checkstyles are all about use of _ in method names, except for one

./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/wrappedio/impl/TestWrappedStatistics.java:257:    snapshot.setCounter( "c1", 10);:24: '(' is followed by whitespace. [ParenPad]

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 53s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 14 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 54s Maven dependency ordering for branch
+1 💚 mvninstall 33m 43s trunk passed
+1 💚 compile 17m 23s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 compile 16m 15s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 checkstyle 4m 21s trunk passed
+1 💚 mvnsite 6m 2s trunk passed
+1 💚 javadoc 4m 36s trunk passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javadoc 4m 53s trunk passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 9m 51s trunk passed
+1 💚 shadedclient 34m 10s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 34m 37s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 3m 35s the patch passed
+1 💚 compile 16m 56s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javac 16m 56s the patch passed
+1 💚 compile 16m 16s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 javac 16m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 22s /results-checkstyle-root.txt root: The patch generated 60 new + 66 unchanged - 3 fixed = 126 total (was 69)
+1 💚 mvnsite 6m 0s the patch passed
+1 💚 javadoc 4m 44s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2
+1 💚 javadoc 5m 17s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
+1 💚 spotbugs 10m 42s the patch passed
+1 💚 shadedclient 34m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 19m 53s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
-1 ❌ unit 85m 10s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
-1 ❌ unit 0m 56s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ unit 0m 45s /patch-unit-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ unit 0m 44s /patch-unit-hadoop-tools_hadoop-aliyun.txt hadoop-aliyun in the patch failed.
+0 🆗 asflicense 0m 46s ASF License check generated no output?
375m 8s
Reason Tests
Failed junit tests hadoop.util.functional.TestFunctionalIO
hadoop.hdfs.TestHdfsAdmin
hadoop.hdfs.web.TestWebHDFS
hadoop.hdfs.qjournal.TestSecureNNWithQJM
hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
hadoop.hdfs.TestDistributedFileSystem
hadoop.hdfs.TestBlockTokenWrappingQOP
hadoop.hdfs.TestDFSClientExcludedNodes
hadoop.hdfs.TestErasureCodeBenchmarkThroughput
hadoop.hdfs.TestSecureEncryptionZoneWithKMS
hadoop.hdfs.qjournal.TestNNWithQJM
hadoop.hdfs.TestDFSRollback
hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter
hadoop.hdfs.TestWriteRead
hadoop.hdfs.TestReservedRawPaths
hadoop.hdfs.TestDFSUpgradeFromImage
hadoop.hdfs.web.TestWebHDFSXAttr
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/32/artifact/out/Dockerfile
GITHUB PR #6686
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname Linux c220f37dde5d 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b59846b
Default Java Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu120.04.2 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~20.04.1-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/32/testReport/
Max. process+thread count 2251 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure hadoop-tools/hadoop-aliyun U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/32/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

legitimate failure

 Expected to find '404' but got unexpected exception: java.io.UncheckedIOException: java.io.FileNotFoundException: missing
 at org.apache.hadoop.util.functional.FunctionRaisingIOE.unchecked(FunctionRaisingIOE.java:50)
 at org.apache.hadoop.util.functional.TestFunctionalIO.lambda$testUncheckedFunction$9(TestFunctionalIO.java:105)
 at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:500)
 at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:386)
 at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:455)
 at org.apache.hadoop.util.functional.TestFunctionalIO.testUncheckedFunction(TestFunctionalIO.java:104)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.FileNotFoundException: missing
 at org.apache.hadoop.util.functional.TestFunctionalIO.lambda$testUncheckedFunction$8(TestFunctionalIO.java:103)
 at org.apache.hadoop.util.functional.FunctionRaisingIOE.unchecked(FunctionRaisingIOE.java:48)
 ... 18 more

Class WrappedIO extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable
  * test on supported filesystems (hdfs)
  * Plus tests with validation of degradation when IO methods are
    not found.

Explicitly add read policies for columnar, parquet and orc
Add IOStatistics context accessors and reset()

* columnar
* orc
* parquet
* avro

This to make it clearer to the filesystem implementations that they
should optimize for whatever their data traces recommend.

Class DynamicWrappedIO to access the WrappedIO Methods
through Parquet's DynMethods API.

This class becomes easy to copy and paste into
Parquet and Iceberg and then be immediately used.

Class WrappedStatistics to provide an equivalent to access
IOStatistics interfaces, objects and operations.

Ability to
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

Tuned AbstractContractBulkDeleteTest

* make setUp() an override of the existing setup();
  this makes initialization more deterministic.
* inline some variables in setup()

Important: this change renames bulkDelete_PageSize to be bulkDelete_pageSize
so it is consistent with all the new methods being added.

This is sync with initial implementation of PARQUET-2493;
tuning code to suit actual use.

In particular
-WrappedIO methods raise UncheckedIOEs
-DynamicWrappedIO methods unwrap these
-static method to switch between openFile() and open() based on
 method availability.

Change-Id: Ib4f177d5409156217f4c3d14f1c99adfe82b96d2
+move the DynMethods and related classes under oah.utils.dynamic,
 marked as private.

Change-Id: I9ff52ab02d51bf2175862a3020b41e969088fb65
+ boolean to enable/disable footer caching.

These are all hints with no wiring up. Google GCS does footer
cache, and abfs has it as a WiP; those clients can adopt
as desired.

The reason for the footer cache flag is that some query engines do
their own caching -having the input stream try and "be helpful" is
at best needless and at worst counterproductive.

Change-Id: Ibf5914d9fa327438790b946b29b9369d098ae14c
Indicates that locations are generated client side
and don't refer to real hosts.

If found, list calls which return LocatedFileStatus are low cost

Added for: file, s3a, abfs, oss

Change-Id: Id94be4cbf1a41ac84818c7b2e061423b9b24d149
Got signature wrong.
logging loading at debug to diagnose this.

Change-Id: I9c96ffe61d123b9461636380ef77f55d8ddbe3a4
move the unchecking as default methods in CallableRaisingIOE,
FunctionRaisingIOE etc makes for a clean and flexible design.

some test enhancements.

Change-Id: If25b6d0377bc9e4e8d4a6e689692ddfa96b1c756
javadoc, checkstyle and unit test for the new method

Change-Id: Id16d01c193814c46215c81e8040ffa7a25720f1c
@steveloughran steveloughran force-pushed the s3/HADOOP-19131-WrappedOperations branch from b59846b to 3fe9cdb Compare August 7, 2024 13:46
declare that hbase is an hbase table; s3a maps to random IO.
abfs recommends disabling prefetch for these files...it should do
it automatically when support for read policies is wired up.

Change-Id: I0823cd307a059bf0f3499e7555d9ccc87fb4ae70
@steveloughran steveloughran force-pushed the s3/HADOOP-19131-WrappedOperations branch from 3fe9cdb to 76b0afc Compare August 7, 2024 16:15
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 59s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 15 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 9s Maven dependency ordering for branch
+1 💚 mvninstall 36m 50s trunk passed
+1 💚 compile 18m 46s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 16m 13s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 4m 25s trunk passed
+1 💚 mvnsite 5m 58s trunk passed
+1 💚 javadoc 4m 47s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 14s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 9m 35s trunk passed
+1 💚 shadedclient 34m 54s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 35m 23s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 3m 39s the patch passed
+1 💚 compile 17m 4s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 17m 4s the patch passed
+1 💚 compile 16m 3s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 16m 3s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 5m 11s /results-checkstyle-root.txt root: The patch generated 58 new + 66 unchanged - 3 fixed = 124 total (was 69)
+1 💚 mvnsite 5m 44s the patch passed
+1 💚 javadoc 4m 27s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 24s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 10m 26s the patch passed
+1 💚 shadedclient 36m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 28s hadoop-common in the patch passed.
+1 💚 unit 226m 10s hadoop-hdfs in the patch passed.
+1 💚 unit 3m 4s hadoop-aws in the patch passed.
+1 💚 unit 2m 54s hadoop-azure in the patch passed.
+1 💚 unit 0m 45s hadoop-aliyun in the patch passed.
+1 💚 asflicense 1m 9s The patch does not generate ASF License warnings.
528m 6s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/33/artifact/out/Dockerfile
GITHUB PR #6686
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname Linux e67f494e0409 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3fe9cdb
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/33/testReport/
Max. process+thread count 4028 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure hadoop-tools/hadoop-aliyun U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/33/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 15 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 0s Maven dependency ordering for branch
+1 💚 mvninstall 35m 46s trunk passed
+1 💚 compile 19m 15s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 17m 24s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 4m 45s trunk passed
+1 💚 mvnsite 5m 32s trunk passed
+1 💚 javadoc 4m 19s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 4m 51s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 9m 16s trunk passed
+1 💚 shadedclient 40m 31s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 40m 58s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 3m 52s the patch passed
+1 💚 compile 20m 17s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 20m 17s the patch passed
+1 💚 compile 18m 24s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 18m 24s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 5m 2s /results-checkstyle-root.txt root: The patch generated 58 new + 66 unchanged - 3 fixed = 124 total (was 69)
+1 💚 mvnsite 5m 59s the patch passed
+1 💚 javadoc 4m 27s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 4m 59s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 11m 19s the patch passed
+1 💚 shadedclient 38m 39s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 20m 26s hadoop-common in the patch passed.
+1 💚 unit 227m 59s hadoop-hdfs in the patch passed.
+1 💚 unit 3m 14s hadoop-aws in the patch passed.
+1 💚 unit 2m 54s hadoop-azure in the patch passed.
+1 💚 unit 0m 55s hadoop-aliyun in the patch passed.
+1 💚 asflicense 1m 13s The patch does not generate ASF License warnings.
532m 59s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/34/artifact/out/Dockerfile
GITHUB PR #6686
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname Linux 8d5fb1d05a35 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 76b0afc
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/34/testReport/
Max. process+thread count 3385 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure hadoop-tools/hadoop-aliyun U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/34/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Change-Id: Ibd158b3a14bacc95059f0e4e86179e78bebdb53c
@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 15 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 10s Maven dependency ordering for branch
+1 💚 mvninstall 32m 59s trunk passed
+1 💚 compile 17m 50s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 compile 16m 7s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 4m 21s trunk passed
+1 💚 mvnsite 5m 58s trunk passed
+1 💚 javadoc 4m 52s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 9s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 9m 41s trunk passed
+1 💚 shadedclient 34m 41s branch has no errors when building and testing our client artifacts.
-0 ⚠️ patch 35m 8s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 3m 39s the patch passed
+1 💚 compile 16m 59s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javac 16m 59s the patch passed
+1 💚 compile 16m 20s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 javac 16m 20s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 26s /results-checkstyle-root.txt root: The patch generated 58 new + 66 unchanged - 3 fixed = 124 total (was 69)
+1 💚 mvnsite 6m 4s the patch passed
+1 💚 javadoc 4m 44s the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 5m 22s the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 10m 36s the patch passed
+1 💚 shadedclient 34m 50s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 54s hadoop-common in the patch passed.
+1 💚 unit 226m 42s hadoop-hdfs in the patch passed.
+1 💚 unit 3m 21s hadoop-aws in the patch passed.
+1 💚 unit 2m 56s hadoop-azure in the patch passed.
+1 💚 unit 0m 54s hadoop-aliyun in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
522m 37s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/35/artifact/out/Dockerfile
GITHUB PR #6686
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle markdownlint
uname Linux 2ebdeb4da928 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 20d385c
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/35/testReport/
Max. process+thread count 3527 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure hadoop-tools/hadoop-aliyun U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6686/35/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

All checkstyles are from underscores; I tried to set up a style rule to disable this but it didn't work right as there's no checkstyle overrides in hadoop-common right now

Copy link
Contributor

@mukund-thakur mukund-thakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a big patch. Have been reviewing a few weeks ago and checked again today. Overall looks great to me +1
Just I don't understand why we added fs.capability.virtual.block.locations in this patch?

@steveloughran
Copy link
Contributor Author

fs.capability.virtual.block.locations

it's to say "this fs makes up block locations". it means that cost of looking up block locations is a lot less (no remote calls) and you don't really need to schedule work elsewhere.

now that hasPathCapability() is being exported to legacy code, I just felt this would be useful. Currently things look for the default (host == localhost) and go from there -but they only get to do that after the looup

@steveloughran steveloughran merged commit 55a5769 into apache:trunk Aug 14, 2024
4 checks passed
steveloughran added a commit to steveloughran/hadoop that referenced this pull request Aug 14, 2024
…he#6686)


1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a 

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
steveloughran added a commit that referenced this pull request Aug 15, 2024
1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
steveloughran added a commit to steveloughran/hadoop that referenced this pull request Aug 15, 2024
…he#6686)

1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
KeeProMise pushed a commit to KeeProMise/hadoop that referenced this pull request Sep 9, 2024
…he#6686)


1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a 

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
Hexiaoqiao pushed a commit to Hexiaoqiao/hadoop that referenced this pull request Sep 12, 2024
…he#6686)


1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a 

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
@anujmodi2021
Copy link
Contributor

Hi @steveloughran
Please check my comment on the merged commit.
55a5769#r147058497

Sorry I should have added that comment her itself.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants