HADOOP-18679. Add API for bulk/paged delete of files #6726

mukund-thakur · 2024-04-12T22:02:17Z

Adding tests on top of #6494

Description of PR

How was this patch tested?

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

A more minimal design that is easier to use and implement. Caller creates a BulkOperation; they get the page size of it and then submit batches to delete of less than that size. The outcome of each call contains a list of failures. S3A implementation to show how straightforward it is. Even with the single entry page size, it is still more efficient to use this as it doesn't try to recreate a parent dir or perform any probes to see if it is a directory: it maps straight to a DELETE call. Change-Id: Ibe8737e7933fe03d39070e7138a710b50c3d60c2

Add methods in FileUtil to take an FS, cast to a BulkDeleteSource then perform the pageSize/bulkDelete operations. This is to make reflection based access straightforward: no new interfaces or types to work with, just two new methods with type-erased lists. Change-Id: I2d7b1bf8198422de635253fc087ca551a8bc6609

Change-Id: Ib098c07cc1f7747ed1a3131b252656c96c520a75

Using this PR to start with the initial design, implementation and services offered by having lower-level interaction with S3 pushed down into an S3AStore class, with interface/impl split. The bulk delete callbacks now to talk to the store, *not* s3afs, with some minor changes in behaviour (IllegalArgumentException is raised if root paths / are to be deleted) Mock tests are failing; I expected that: they are always brittle. What next? get this in and then move lower level fs ops over a method calling s3client at a time, or in groups, as appropriate. The metric of success are: * all callback classes created in S3A FS can work through the store * no s3client direct invocation in S3AFS Change-Id: Ib5bc58991533fd5d13e78426e88b884b6ae5205c

Changing results of method calls, using Tuples.pair() to return Map.Entry() instances as immutable tuples. Change-Id: Ibdd5a5b11fe0a57b293b9cb623272e272c8bab69

and some minor prod changes.

...-azure/src/test/java/org/apache/hadoop/fs/azurebfs/contract/ITestAbfsContractBulkDelete.java

hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java

hadoop-common-project/hadoop-common/pom.xml

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/S3AStoreImpl.java

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md

...adoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java

...s/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java

hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java

steveloughran · 2024-04-16T13:47:17Z

commented. I've also done a PR #6738 which tunes the API to work with iceberg, having just written a PoC of the iceberg binding.

My PR

moved the wrapper methods to a new wrappedio.WrappedIO class
add a probe for the api being available
I also added an availability probe in the interface. not sure about that as we really should make it available everywhere, always.

Can you cherrypick this PR onto your branch and then do the review comments.

After which, please do not do any rebasing of your PR. That way, it is easier for me too keep my own branch in sync with your changes. Thanks.

PoC of iceberg integration, based on their S3FileIO one.

https://github.com/steveloughran/iceberg/blob/s3/HADOOP-18679-bulk-delete-api/core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java#L208

The iceberg api passes in a collection of paths, which may span multiple filesystems.

To handle this,

the bulk delete API should take a Collection, not a list
it needs to be implemented in every FS, because trying to distinguish case-by-case on support would be really complex.

We are going to need a default FS impl which just invokes delete(path, false) and maps any IOE to a failure. Change-Id: If56bca7cb8529ccbfbb1dfa29cedc8287ec980d4

steveloughran

commented

steveloughran · 2024-04-19T11:21:58Z

...mon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java

+ */
+public class DefaultBulkDeleteOperation implements BulkDelete {
+
+    private final int pageSize;


this is always 1, isn't it? so much can be simplified here

no need for the field

no need to pass it in the constructor

pageSize() to return 1

steveloughran · 2024-04-19T11:23:02Z

...mon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java

+        validateBulkDeletePaths(paths, pageSize, basePath);
+        List<Map.Entry<Path, String>> result = new ArrayList<>();
+        // this for loop doesn't make sense as pageSize must be 1.
+        for (Path path : paths) {


there's only even going to be 1 entry here

steveloughran · 2024-04-19T11:24:34Z

...mon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java

+            try {
+                fs.delete(path, false);
+                // What to do if this return false?
+                // I think we should add the path to the result list with value "Not Deleted".


good q. I'd say yes. or actually do a getFileStatus() cal and see what is there for

file doesn't exist (not an error)

path is a directory: add to result

key is that file not found isn't escalated

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md

...s/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractBulkDelete.java

steveloughran · 2024-04-19T11:58:08Z

hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java

+    bindReadOnlyRolePolicy(assumedRoleConfig, readOnlyDir);
+    roleFS = (S3AFileSystem) destDir.getFileSystem(assumedRoleConfig);
+
+    int range = 10;


on a store where bulk delete page size == 1, use that as the range

hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java

...adoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java

hadoop-yetus · 2024-04-24T03:42:14Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
			_ Prechecks _
+1 💚	dupname	0m 05s		No case conflicting files found.
+0 🆗	codespell	0m 01s		codespell was not available.
+0 🆗	detsecrets	0m 01s		detect-secrets was not available.
+0 🆗	xmllint	0m 01s		xmllint was not available.
+0 🆗	spotbugs	0m 01s		spotbugs executables are not available.
+0 🆗	markdownlint	0m 01s		markdownlint was not available.
+1 💚	@author	0m 00s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 00s		The patch appears to include 6 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	2m 31s		Maven dependency ordering for branch
+1 💚	mvninstall	91m 09s		trunk passed
+1 💚	compile	40m 32s		trunk passed
+1 💚	checkstyle	6m 09s		trunk passed
-1 ❌	mvnsite	4m 42s	/branch-mvnsite-hadoop-common-project_hadoop-common.txt	hadoop-common in trunk failed.
+1 💚	javadoc	14m 12s		trunk passed
+1 💚	shadedclient	171m 45s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	2m 24s		Maven dependency ordering for patch
+1 💚	mvninstall	11m 00s		the patch passed
+1 💚	compile	38m 33s		the patch passed
+1 💚	javac	38m 33s		the patch passed
-1 ❌	blanks	0m 00s	/blanks-eol.txt	The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	checkstyle	6m 31s		the patch passed
-1 ❌	mvnsite	4m 36s	/patch-mvnsite-hadoop-common-project_hadoop-common.txt	hadoop-common in the patch failed.
+1 💚	javadoc	14m 19s		the patch passed
+1 💚	shadedclient	185m 02s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 ❌	asflicense	5m 46s	/results-asflicense.txt	The patch generated 1 ASF License warnings.
		555m 55s

Subsystem	Report/Notes
GITHUB PR	#6726
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname	MINGW64_NT-10.0-17763 cfb6e8c364ad 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool	maven
Personality	/c/hadoop/dev-support/bin/hadoop.sh
git revision	trunk / `7415427`
Default Java	Azul Systems, Inc.-1.8.0_332-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/testReport/
modules	C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/1/console
versions	git=2.44.0.windows.1
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Add bulk delete path capability true for all FS

steveloughran

commented.

I'd prefer that BulkDeleteSource to be in FileSystem, with the base implementatoon returning a DefaultBulkDeleteOperation instance.
and so we should have one of the contract tests doing it directly through the API on the getFileSystem() instance
other than that though, looking really good.

...mon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md

...adoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java

hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

...mon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java

hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md

...common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefalutBulkDeleteSource.java

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDeleteUtils.java

...mon-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDeleteSource.java

steveloughran

Commented. Now, one more thing based on my changes in #6686.

it's adding many more methods to this class, so I'm giving them the name of the class/interface + "_" + method name.

can you do the same here? some style checker will complain but it will help us to separate the methods in the new class.

other than that, all good

steveloughran · 2024-05-02T19:50:43Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java

@@ -4980,4 +4982,17 @@ public MultipartUploaderBuilder createMultipartUploader(Path basePath)
    methodNotSupported();
    return null;
  }
+
+  /**
+   * Create a default bulk delete operation to be used for any FileSystem.


This doesn't hold for the subclasses. better to say

Create a bulk delete operation. The default implementation returns an instance of {@link DefaultBulkDeleteOperation}

mukund-thakur · 2024-05-06T22:41:06Z

can you do the same here? some style checker will complain but it will help us to separate the methods in the new class.

I don't understand what to do here.

steveloughran

@mukund-thakur tried to clarify what i mean: we have the class/interface split from the method/operation by a _ character

it makes a lot more sense when you look at my PR, which brings a lot more methods into the same class. Why all the same class? less reflection code

steveloughran · 2024-05-07T17:24:54Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/wrappedio/WrappedIO.java

+   * @throws IllegalArgumentException path not valid.
+   * @throws IOException problems resolving paths
+   */
+  public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException {


rename bulkDelete_pageSize

steveloughran · 2024-05-07T17:25:21Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/wrappedio/WrappedIO.java

+   * @throws IOException IO problems including networking, authentication and more.
+   * @throws IllegalArgumentException if a path argument is invalid.
+   */
+  public static List<Map.Entry<Path, String>> bulkDelete(FileSystem fs,


rename bulkDelete_delete

hadoop-yetus · 2024-05-09T11:25:44Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
			_ Prechecks _
+1 💚	dupname	0m 07s		No case conflicting files found.
+0 🆗	codespell	0m 01s		codespell was not available.
+0 🆗	detsecrets	0m 01s		detect-secrets was not available.
+0 🆗	xmllint	0m 01s		xmllint was not available.
+0 🆗	spotbugs	0m 01s		spotbugs executables are not available.
+0 🆗	markdownlint	0m 01s		markdownlint was not available.
+1 💚	@author	0m 01s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 00s		The patch appears to include 11 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	2m 42s		Maven dependency ordering for branch
+1 💚	mvninstall	107m 34s		trunk passed
+1 💚	compile	48m 30s		trunk passed
+1 💚	checkstyle	7m 10s		trunk passed
-1 ❌	mvnsite	5m 19s	/branch-mvnsite-hadoop-common-project_hadoop-common.txt	hadoop-common in trunk failed.
+1 💚	javadoc	23m 51s		trunk passed
+1 💚	shadedclient	225m 30s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	228m 42s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+0 🆗	mvndep	2m 57s		Maven dependency ordering for patch
+1 💚	mvninstall	19m 17s		the patch passed
+1 💚	compile	45m 43s		the patch passed
+1 💚	javac	45m 43s		the patch passed
+1 💚	blanks	0m 01s		The patch has no blanks issues.
+1 💚	checkstyle	7m 25s		the patch passed
-1 ❌	mvnsite	5m 28s	/patch-mvnsite-hadoop-common-project_hadoop-common.txt	hadoop-common in the patch failed.
+1 💚	javadoc	24m 00s		the patch passed
+1 💚	shadedclient	232m 12s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	asflicense	7m 14s		The patch does not generate ASF License warnings.
		700m 51s

Subsystem	Report/Notes
GITHUB PR	#6726
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle markdownlint
uname	MINGW64_NT-10.0-17763 296d6abd6fb2 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool	maven
Personality	/c/hadoop/dev-support/bin/hadoop.sh
git revision	trunk / `e37d88f`
Default Java	Azul Systems, Inc.-1.8.0_332-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/testReport/
modules	C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-tools/hadoop-aws hadoop-tools/hadoop-azure U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6726/5/console
versions	git=2.44.0.windows.1
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran · 2024-05-13T19:50:17Z

mukund, if you can do those naming changes then I'm +1

steveloughran · 2024-05-17T14:13:04Z

+1

I was about to merge then I realised that yetus wasn't ready. Here is my draft commit message

as soon as this is in I will rebase HADOOP-19131. Assist reflection IO with WrappedOperations class #6686 onto it, which extends WrappedIO and adds the reflection utility classes from Parquet to assist in testing.
I'll leave you to do the cherrypick and merge onto 3.4.x
And I want to get a minimal version into 3.3.x, maybe with a page size of 1 even on S3A, but without the safety checks, so still saves on LIST/HEAD calls.

create a BulkDelete implementation from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran

steveloughran · 2024-05-20T16:06:10Z

merged to trunk. mukund, can you do the backport to branch-3.4.1, while we can think about what to do for 3.3.9? speaking of which, we should think about that...

steveloughran · 2024-05-20T20:27:29Z

fyi #6686 adds dynamic load for the wrapped methods, and calls them through the tests.

mukund-thakur · 2024-05-21T23:01:15Z

While backporting this to branch-3.4 I see this failure. Will check if this is happening on trunk as well.
[INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects compared to deleted paths org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4{ requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, deletes=0}] Expecting: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to contain exactly in any order: <[]> but the following elements were unexpected: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]>

steveloughran · 2024-05-22T10:48:38Z

yeah, just seen those too. the mocked s3 client isn't getting the delete one object calls via the store object.

got a small PR up to help debug it, which is not a fix

steveloughran · 2024-05-22T11:21:54Z

problem is store is null in innermost s3afs, triggers an NPE in deleteObject() before the aws client has its delete operation called, so list of deleted paths is not updated.

easiest immediate fix is to mock deleteObject(), longer term we should actually be stubbing the entire store, as that's what the interface/impl split is meant to assist

mukund-thakur · 2024-05-22T21:53:15Z

fix this with mockito magic somehow. https://github.com/apache/hadoop/pull/6843/files
this was not easy to debug.

steveloughran · 2024-05-23T14:07:22Z

I tried to do it yesterday too. I think my solution would have pulled out createStore() into a method and overrode it, or added S3AInternals.setStore() call.

steveloughran · 2024-05-27T15:54:30Z

update, #5081 does successfully show the staging UT failure (good!) but also some in hadoop-common due to the new interface...we need to modify the tests to indicate it is safe to not override the base implementation.

Applications can create a BulkDelete instance from a BulkDeleteSource; the BulkDelete interface provides the pageSize(): the maximum number of entries which can be deleted, and a bulkDelete(Collection paths) method which can take a collection up to pageSize() long. This is optimized for object stores with bulk delete APIs; the S3A connector will offer the page size of fs.s3a.bulk.delete.page.size unless bulk delete has been disabled. Even with a page size of 1, the S3A implementation is more efficient than delete(path) as there are no safety checks for the path being a directory or probes for the need to recreate directories. The interface BulkDeleteSource is implemented by all FileSystem implementations, with a page size of 1 and mapped to delete(pathToDelete, false). This means that callers do not need to have special case handling for object stores versus classic filesystems. To aid use through reflection APIs, the class org.apache.hadoop.io.wrappedio.WrappedIO has been created with "reflection friendly" methods. Contributed by Mukund Thakur and Steve Loughran Conflicts: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

alkis · 2024-06-24T07:42:55Z

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/BulkDelete.java

+   *   <li>The operation is treated as idempotent: network failures may
+   *        trigger resubmission of the request -any new objects created under a
+   *        path in the list may then be deleted.</li>


It is great that we call this out, but can we do better at the API level?

Had the API took Collection<FileStatus>, we could make the operation idempotent in most clouds by passing the version/generationid/etag in the request.

steveloughran · 2024-06-24T10:27:17Z

Aws sdk delete with version id actually requires more IAM permissions than unversioned delete, which always removes HEAD object, because granting that permission allows the caller to delete backups. Deployments where apps can delete HEAD but not versions are not unusual for this reason.

This is why S3A doesn't use it even in simple listing -> delete calls where the status is known.

you might also need to issue getFileStatus/list calls, which would massively increase the cost if the process didn't have those values already.

A bulk delete with a tuple of (path, version) for each entry could work, if the store could be configured to use that version ID/type. for S3A we would leave it off by default. the tuple would be Map.entry to be reflection friendly.

if you do thing version/etag support would be a blocker to use, well, things haven't shipped yet, though @mukund-thakur is preparing a 3.4.1 alpha release.

You (and it would be you, sorry) will need to modify the api with

Collection<Map.Entry<Path, version>>[]
S3A impl to not use version by default, option to turn it on, parameterized testing for this if a versioned bucket is the test bucket.
WrappedIO changed to match

This isn't that useful for table compaction, as the engines tend to use randomness in their names to spread the s3 store load across shards. But it could have other uses.

for example. here's some work to do version printing, recovery and copy within the same bucket, lets you pull out the layers underneath a directory tree

https://github.com/steveloughran/cloudstore/tree/main/src/main/extra/org/apache/hadoop/fs/s3a/extra

question is: do we want to make something that complex part of a broader api with tests, specification, commitments to maintain etc, or do we just say call S3A.getInternals().getClient() and then sort it out yourself?

steveloughran · 2024-06-24T10:36:03Z

some interface getVersionForDeleteOperations() -> String to put on s3/abfs file status, the way we do for getEtag(), so you can extract version info without having to play class cast games, built with hadoop-aws and hadoop-azure on CP etc.

#3633 shows what's required there.

steveloughran · 2024-06-24T10:44:35Z

BTW, the "is delete idempotent?" question is a recurrent one; search the mail lists for history on it, including even HDFS. In a world with parallel writers, it clearly isn't. But if you are going from paged list -> delete you are already getting non-atomic listings where race conditions can do odd things (rename file zzzzzzz to aaaaa and not have the listing find it at all).

You can rebuild S3AFS with InternalConstants.DELETE_CONSIDERED_IDEMPOTENT set to false. Nobody has ever done this and nobody has ever complained. Just avoid using it as part of atomic commit protocols and you'll be fine

alkis · 2024-06-25T07:21:59Z

Thank you for the context @steveloughran. Lots of good info above.

Should we be adding an overload that takes FileStatus that pushes VersionIds in delete? Then caller can use it whenever FileStatuses are available.

To close, I want to mention that idempotency of operations is not only useful in the face of multiple (application) writers. It also plays a role when only one writer is involved. The reason is the distributed nature of the cloud storage and loadbalances/backends inside cloud storage implementation. For example:

writer lists a prefix, sees 2 files (a and b)
writer issues delete of 2 files to IP1 because of an intermitted connection issue at IP1
writers request times out
writer retries issuing deletes to IP2 and succeeds
writer creates file a
meanwhile IP1 network is back up and attempts to execute the deletes in (2) and deletes files a and b
writer observes the deletion of a which it just created

Had the request in (2) and (4) been idempotent, writer wouldn't observe a being deleted at (7) because the deletes would target specific versions of the files.

steveloughran · 2024-06-26T11:12:42Z

That scenario is a key part of the whole delete idempotency discussion. And as noted, trying to implement idempotency there on s3 requires permissions which apps are often not trusted with.

As for abfs, I'd worry more about rename resilience under load than deletion. If you've not hit problems there: you've not generated enough load. Look at HADOOP-18613 and HADOOP-18012 there.

Really you can't rely on delete or rename doing what you expect, so your commit protocol had better not rely on them. Though #6716 shows that even when I think I've done that, I can get caught out.
Be ware that step 2 of your failure condition is more likely to be due to a throttle response triggering backoff and retry rather than actually connectivity failures.

As for the "overload which takes FileStatus", look at openFile() work and followup issues in the s3a and abfs codebase which caught us out. In particular, any virtual FS on top (hive, viewFS, maybe the databricks one...) builds their own FileStatus instances because the path in them is absolute and includes the full URL. Which means the file status passed in is often of a different type from any FS specific implementation, lacks all version/etag info and whose path doesn't resolve.

it'd have to take path and filestatus separately.

Returning to this API, look at apache/iceberg#10233 for the latest integration; it is only given strings to map to paths, which is what the new API takes.

For a commit process with idempotency, are you looking for rename with versioning? or even a createFile() with if-matches condition on underlying files...something Azure supports.

adding if-matches into createFile() is possible if you want to design that, the spec would need to say "failure MAY happen at close() to support any store where the put/post only happens at end. And you need a way to pass that version of the file being overwritten (etag, version) down. Assuming this was etag only, we have that API though wrapper filesystems don't let pick it up (not in the base class; adding new field would break serialization across versions...).

If you want this *and are prepared to do the work, at least for stores which implement it (azure) then I'm happy to supervise your effort, especially if we can get the MSFT engineers involved. Create that JIRA and once you put up your first design I'll assign the JIRA to you.

It'd have target 3.4.2., but as getEtag() and openFile() have shipped for a while

string etag;
status = fs.getFileStatus(dest); (catch FNFE, set etag to "")
etag = status.getEtag()

FSDataOutputStream out = openFile(dest).
 must("fs.create.option.if-match", etag)
 .build()
...
out.close();

catch FileAlreadyExistsException in open or close as a failure sign. if IllegalArgumentException is raised in build, that if-match tag is unknown/unsupported.

like I said, gets complex fast -but if you want something stable to work with current and future stores, it'd have to be the way. None of this needs new public FS APIs, just standard behaviour in stores which implement it.

go for it!

github-actions bot added build trunk Common TOOLS AWS labels Apr 12, 2024

steveloughran and others added 6 commits April 15, 2024 12:36

HADOOP-18679. ?: values the wrong way round

910dd76

Change-Id: Ib098c07cc1f7747ed1a3131b252656c96c520a75

HADOOP-18679. Bulk Delete getting the API simpler

ec5625c

Changing results of method calls, using Tuples.pair() to return Map.Entry() instances as immutable tuples. Change-Id: Ibdd5a5b11fe0a57b293b9cb623272e272c8bab69

Adding some contract tests. This will evolve.

a61f139

and some minor prod changes.

mukund-thakur force-pushed the bulk-delete-mukund-HADOOP-18679 branch from 4bbb3b7 to a61f139 Compare April 15, 2024 18:02

github-actions bot added the ABFS label Apr 15, 2024

mukund-thakur commented Apr 15, 2024

View reviewed changes

steveloughran self-assigned this Apr 15, 2024

steveloughran mentioned this pull request Apr 15, 2024

[Feature Request] More performant transaction log parsing for Azure delta-io/delta#1568

Open

3 tasks

steveloughran changed the title ~~Add API for bulk/paged object deletion~~ HADOOP-18679. Add API for bulk/paged object deletion Apr 15, 2024

steveloughran mentioned this pull request Apr 16, 2024

HADOOP-18679. Add API for bulk/paged object deletion #6738

Closed

4 tasks

steveloughran requested changes Apr 16, 2024

View reviewed changes

HADOOP-18679. My changes based on iceberg PoC

8ef09f7

We are going to need a default FS impl which just invokes delete(path, false) and maps any IOE to a failure. Change-Id: If56bca7cb8529ccbfbb1dfa29cedc8287ec980d4

steveloughran mentioned this pull request Apr 16, 2024

[SHUFFLE] [WIP] Prototype: store shuffle file on external storage like S3 apache/spark#34864

Closed

mukund-thakur added 2 commits April 18, 2024 13:01

Refactoring and review comments and adding more tests

9ad220b

Adding default impl for bulk delete for other file systems

7415427

steveloughran requested changes Apr 19, 2024

View reviewed changes

review comments

27b16f9

mukund-thakur added 2 commits April 24, 2024 13:56

add path capability

86544f2

Add bulk delete path capability true for all FS

Some more tests and docs

88c8623

github-actions bot added the HDFS label Apr 25, 2024

some more fixes

01b2ae8

steveloughran requested changes Apr 30, 2024

View reviewed changes

mukund-thakur added 3 commits April 30, 2024 14:57

review comments

89e87d4

BulkDeleteSource to be implemented by base FileSystem

058d099

Removing default impl from BulkDeleteSource

e37d88f

steveloughran requested changes May 2, 2024

View reviewed changes

steveloughran requested changes May 7, 2024

View reviewed changes

doc fixes and reflection friendly api names

b80a8e6

steveloughran changed the title ~~HADOOP-18679. Add API for bulk/paged object deletion~~ HADOOP-18679. Add API for bulk/paged delete of files May 17, 2024

steveloughran approved these changes May 17, 2024

View reviewed changes

steveloughran merged commit 47be1ab into apache:trunk May 20, 2024
1 of 2 checks passed

alkis reviewed Jun 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HADOOP-18679. Add API for bulk/paged delete of files #6726

HADOOP-18679. Add API for bulk/paged delete of files #6726

mukund-thakur commented Apr 12, 2024 •

edited by steveloughran

Loading

steveloughran commented Apr 16, 2024

steveloughran left a comment

steveloughran Apr 19, 2024 •

edited

Loading

steveloughran Apr 19, 2024

steveloughran Apr 19, 2024

steveloughran Apr 19, 2024

hadoop-yetus commented Apr 24, 2024

steveloughran left a comment

steveloughran left a comment

steveloughran May 2, 2024

mukund-thakur commented May 6, 2024

steveloughran left a comment

steveloughran May 7, 2024

steveloughran May 7, 2024

hadoop-yetus commented May 9, 2024

steveloughran commented May 13, 2024

steveloughran commented May 17, 2024

steveloughran commented May 20, 2024

steveloughran commented May 20, 2024

mukund-thakur commented May 21, 2024

steveloughran commented May 22, 2024

steveloughran commented May 22, 2024

mukund-thakur commented May 22, 2024

steveloughran commented May 23, 2024

steveloughran commented May 27, 2024

alkis Jun 24, 2024

steveloughran commented Jun 24, 2024

steveloughran commented Jun 24, 2024

steveloughran commented Jun 24, 2024

alkis commented Jun 25, 2024

steveloughran commented Jun 26, 2024

HADOOP-18679. Add API for bulk/paged delete of files #6726

HADOOP-18679. Add API for bulk/paged delete of files #6726

Conversation

mukund-thakur commented Apr 12, 2024 • edited by steveloughran Loading

Description of PR

How was this patch tested?

For code changes:

steveloughran commented Apr 16, 2024

steveloughran left a comment

Choose a reason for hiding this comment

steveloughran Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

steveloughran Apr 19, 2024

Choose a reason for hiding this comment

steveloughran Apr 19, 2024

Choose a reason for hiding this comment

steveloughran Apr 19, 2024

Choose a reason for hiding this comment

hadoop-yetus commented Apr 24, 2024

steveloughran left a comment

Choose a reason for hiding this comment

steveloughran left a comment

Choose a reason for hiding this comment

steveloughran May 2, 2024

Choose a reason for hiding this comment

mukund-thakur commented May 6, 2024

steveloughran left a comment

Choose a reason for hiding this comment

steveloughran May 7, 2024

Choose a reason for hiding this comment

steveloughran May 7, 2024

Choose a reason for hiding this comment

hadoop-yetus commented May 9, 2024

steveloughran commented May 13, 2024

steveloughran commented May 17, 2024

steveloughran commented May 20, 2024

steveloughran commented May 20, 2024

mukund-thakur commented May 21, 2024

steveloughran commented May 22, 2024

steveloughran commented May 22, 2024

mukund-thakur commented May 22, 2024

steveloughran commented May 23, 2024

steveloughran commented May 27, 2024

alkis Jun 24, 2024

Choose a reason for hiding this comment

steveloughran commented Jun 24, 2024

steveloughran commented Jun 24, 2024

steveloughran commented Jun 24, 2024

alkis commented Jun 25, 2024

steveloughran commented Jun 26, 2024

mukund-thakur commented Apr 12, 2024 •

edited by steveloughran

Loading

steveloughran Apr 19, 2024 •

edited

Loading