Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19233: ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint #7265

Open
wants to merge 14 commits into
base: trunk
Choose a base branch
from

Conversation

bhattmanish98
Copy link
Contributor

@bhattmanish98 bhattmanish98 commented Jan 2, 2025

Description of PR:


This PR is in correlation to the series of work done under Parent Jira: [HADOOP-19179]
Jira for this Patch: [HADOOP-19233]

Currently, we only support rename and delete operations on the DFS endpoint. The reason for not supporting rename and delete operations on the Blob endpoint is that the Blob endpoint does not account for hierarchy. We need to ensure that the HDFS contracts are maintained when performing rename and delete operations. Renaming or deleting a directory over the Blob endpoint requires the client to handle the orchestration and rename or delete all the blobs within the specified directory.
 
The task outlines the considerations for implementing rename and delete operations for the FNS-blob endpoint to ensure compatibility with HDFS contracts.

  • Blob Endpoint Usage: The task addresses the need for abstraction in the code to maintain HDFS contracts while performing rename and delete operations on the blob endpoint, which does not support hierarchy.
  • Rename Operations: The AzureBlobFileSystem#rename() method will use a RenameHandler instance to handle rename operations, with separate handlers for the DFS and blob endpoints. This method includes prechecks, destination adjustments, and orchestration of directory renaming for blobs.
  • Atomic Rename: Atomic renaming is essential for blob endpoints, as it requires orchestration to copy or delete each blob within the directory. A configuration will allow developers to specify directories for atomic renaming, with a JSON file to track the status of renames.
  • Delete Operations: Delete operations are simpler than renames, requiring fewer HDFS contract checks. For blob endpoints, the client must handle orchestration, including managing orphaned directories created by Az-copy.
  • Orchestration for Rename/Delete: Orchestration for rename and delete operations over blob endpoints involves listing blobs and performing actions on each blob. The process must be optimized to handle large numbers of blobs efficiently.
  • Need for Optimization: Optimization is crucial because the ListBlob API can return a maximum of 5000 blobs at once, necessitating multiple calls for large directories. The task proposes a producer-consumer model to handle blobs in parallel, thereby reducing processing time and memory usage.
  • Producer-Consumer Design: The proposed design includes a producer to list blobs, a queue to store the blobs, and a consumer to process them in parallel. This approach aims to improve efficiency and mitigate memory issues.

@bhattmanish98 bhattmanish98 changed the title HADOOP-19381: [ABFS] Support Rename and Delete operation over FNS-Blob endpoint HADOOP-19233: [ABFS] Support Rename and Delete operation over FNS-Blob endpoint Jan 2, 2025
@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@bhattmanish98 bhattmanish98 changed the title HADOOP-19233: [ABFS] Support Rename and Delete operation over FNS-Blob endpoint HADOOP-19233: ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint Jan 3, 2025
@bhattmanish98 bhattmanish98 marked this pull request as ready for review January 6, 2025 07:00
@hadoop-yetus

This comment was marked as outdated.

Copy link
Contributor

@anujmodi2021 anujmodi2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

AUTHORIZATION_PERMISSION_MISS_MATCH("AuthorizationPermissionMismatch", HttpURLConnection.HTTP_FORBIDDEN, null),
ACCOUNT_REQUIRES_HTTPS("AccountRequiresHttps", HttpURLConnection.HTTP_BAD_REQUEST, null),
MD5_MISMATCH("Md5Mismatch", HttpURLConnection.HTTP_BAD_REQUEST,
"The MD5 value specified in the request did not match with the MD5 value calculated by the server."),
COPY_BLOB_FAILED("COPY_BLOB_FAILED", HttpURLConnection.HTTP_INTERNAL_ERROR, null),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error Codes should be in camelcase as others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken!


package org.apache.hadoop.fs.azurebfs.enums;

public enum BlobCopyProgress {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc for class and enums

Copy link
Contributor Author

@bhattmanish98 bhattmanish98 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java Doc added.

@@ -49,6 +49,9 @@ public interface SASTokenProvider {
String SET_PERMISSION_OPERATION = "set-permission";
String SET_PROPERTIES_OPERATION = "set-properties";
String WRITE_OPERATION = "write";
String COPY_BLOB_DESTINATION = "copy-blob-dst";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to discuss this change once, we do not support UDS for FNS Blob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not in use anywhere, so removed it for now.

destination, sourceEtag, isAtomicRenameKey(source), tracingContext
);
incrementAbfsRenamePath();
return blobRenameHandler.execute();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need rechecking. We do not want to return op as null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the offline discussion, made the changes.

final TracingContext tracingContext,
final boolean isNamespaceEnabled) throws AzureBlobFileSystemException {
getBlobDeleteHandler(path, recursive, tracingContext).execute();
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we returning null here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done the changes to return dummy response same as rename Path.

@@ -201,4 +238,25 @@ public int getAcquireRetryCount() {
public TracingContext getTracingContext() {
return tracingContext;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java docs for all public methods and classes

Copy link
Contributor Author

@bhattmanish98 bhattmanish98 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java doc added.

* limitations under the License.
*/

package org.apache.hadoop.fs.azurebfs.services;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadocs missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java doc added.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 11 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 3s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 40s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 1m 7s trunk passed
+1 💚 shadedclient 40m 1s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 4 new + 13 unchanged - 3 fixed = 17 total (was 16)
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 28s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 javadoc 0m 26s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 spotbugs 1m 6s the patch passed
+1 💚 shadedclient 39m 57s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 39s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
134m 43s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/7/artifact/out/Dockerfile
GITHUB PR #7265
JIRA Issue HADOOP-19233
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 79efdf0cd2bb 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6863cf8
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/7/testReport/
Max. process+thread count 535 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 11 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 45s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 41s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 32s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 1m 9s trunk passed
+1 💚 shadedclient 39m 58s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 21s hadoop-tools/hadoop-azure: The patch generated 0 new + 13 unchanged - 3 fixed = 13 total (was 16)
+1 💚 mvnsite 0m 32s the patch passed
+1 💚 javadoc 0m 29s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 javadoc 0m 25s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 spotbugs 1m 8s the patch passed
+1 💚 shadedclient 39m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 39s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
134m 12s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/8/artifact/out/Dockerfile
GITHUB PR #7265
JIRA Issue HADOOP-19233
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux fc9e167f5b5d 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b4f157e
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/8/testReport/
Max. process+thread count 550 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

* Blob copy API is an async API, this configuration defines polling duration
* for checking copy status {@value}
*/
public static final String FS_AZURE_BLOB_COPY_PROGRESS_WAIT_MILLIS = "fs.azure.blob.copy.progress.wait.millis";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the comments formatting constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken!

@@ -104,5 +104,9 @@ public final class HttpHeaderConfigurations {
*/
public static final String X_MS_BLOB_CONTENT_MD5 = "x-ms-blob-content-md5";

public static final String X_MS_COPY_ID = "x-ms-copy-id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs for new constants

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken!

@@ -359,6 +375,34 @@ public AbfsRestOperation listPath(final String relativePath, final boolean recur
return op;
}

private void fixAtomicEntriesInListResults(final AbfsRestOperation op,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadocs for all methods wherever missing

@@ -338,6 +353,7 @@ public AbfsRestOperation listPath(final String relativePath, final boolean recur
requestHeaders);

op.execute(tracingContext);
fixAtomicEntriesInListResults(op, tracingContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain the need for the same

List<BlobListResultEntrySchema> filteredEntries = new ArrayList<>();
for (BlobListResultEntrySchema entry : listResultSchema.paths()) {
if (!takeListPathAtomicRenameKeyAction(entry.path(),
(int) (long) entry.contentLength(), tracingContext)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double casting is not needed, we can use something like :-
Long longValue = 12345L;
int intValue = longValue.intValue();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken!

@@ -45,6 +45,8 @@ public enum AzureServiceErrorCode {
INVALID_SOURCE_OR_DESTINATION_RESOURCE_TYPE("InvalidSourceOrDestinationResourceType", HttpURLConnection.HTTP_CONFLICT, null),
RENAME_DESTINATION_PARENT_PATH_NOT_FOUND("RenameDestinationParentPathNotFound", HttpURLConnection.HTTP_NOT_FOUND, null),
INVALID_RENAME_SOURCE_PATH("InvalidRenameSourcePath", HttpURLConnection.HTTP_CONFLICT, null),
DIRECTORY_NOT_EMPTY_DELETE("DirectoryNotEmpty", HttpURLConnection.HTTP_CONFLICT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming could be changed

} else {
throw new AbfsRestOperationException(HTTP_INTERNAL_ERROR,
AzureServiceErrorCode.UNKNOWN.getErrorCode(),
"FNS-Blob Rename was not successfull",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: successful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add path or file name as well where rename failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken!

throw ex;
}
}
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we directly returning true here and not checking for renameSrchasChanged as in the previous method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above if condition we are filtering out paths for which we don't have to do the rename redo and returning false there itself. In case where path require rename redo, we are returning true if no exception is raised.


/**
* Orchestrator for delete over Blob endpoint. Blob endpoint for flat-namespace
* account does not support director delete. This class is responsible for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved!

try {
/*
* Delete the required path.
* Directory needs to be safely delete the path, as the path can be implicit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: grammar issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

this.isAtomicRenameRecovery = isAtomicRenameRecovery;
}

public BlobRenameHandler(final String src,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs

* @return true if the path contains a colon
*/
private boolean containsColon(Path p) {
return p.toUri().getPath().contains(":");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COLON constant can be used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken!

@Override
boolean takeAction(final Path path) throws AzureBlobFileSystemException {
return renameInternal(path,
createDestinationPathForBlobPartOfRenameSrcDir(dst, path, src));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name can be shortened

try {
AbfsRestOperation copyPathOp = getAbfsClient().copyBlob(src, dst, leaseId,
tracingContext);
final String progress = copyPathOp.getResult()
Copy link
Contributor

@anmolanmol1234 anmolanmol1234 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for copyPathOp != null && copyPathOp.getResult() != null

tracingContext, null, false);
final String srcCopyPath = ROOT_PATH + getAbfsClient().getFileSystem()
+ src.toUri().getPath();
if (dstPathStatus.getResult() != null && (srcCopyPath.equals(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have check for dstPathStatus != null as well

}
final long pollWait = getAbfsClient().getAbfsConfiguration()
.getBlobCopyProgressPollWaitMillis();
while (handleCopyInProgress(dst, tracingContext, copyId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check copyId != null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If handleCopyInProgress keeps returning PENDING, the code might enter an infinite loop of waiting. We should introduce maximum wait time and if exceeded fail.


if (op.getResult() != null && copyId.equals(
op.getResult().getResponseHeader(X_MS_COPY_ID))) {
final String copyStatus = op.getResult()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for response header is not null is needed

getAbfsClient().checkIsDir(op.getResult()),
extractEtagHeader(op.getResult()),
op.getResult() instanceof AbfsHttpOperation.AbfsHttpOperationWithFixedResultForGetFileStatus);
} catch (AzureBlobFileSystemException e) {
Copy link
Contributor

@anmolanmol1234 anmolanmol1234 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to catch AbfsRestOperationException itself ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken!

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;

//import java.util.ArrayList;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go once Ingress handler changes are taken.

renamePendingJsonFormatObj = objectMapper.readValue(contents,
RenamePendingJsonFormat.class);
} catch (JsonProcessingException e) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it right to just return without throwing an exception ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is expected. In case of Json Processing Error, we are deleting the json file and returning from it.

abfsClient.append(path.toUri().getPath(), bytes,
appendRequestParameters, null, null, tracingContext);

// List<String> blockIdList = new ArrayList<>(Collections.singleton(blockId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go once Ingress handler changes are taken.

// List<String> blockIdList = new ArrayList<>(Collections.singleton(blockId));
// String blockList = generateBlockListXml(blockIdList);
// PutBlockList on the path.
String blockList = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if flush is called on empty string, how does it take the blockId into usage ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this line just to pass the build. The above commented line is calling generateBlockListXml which requires ingress handler changes. Will pick the changes of ingress handler once it is merged and this line is no longer needed after that.

* endpoint, the orchestration would be done by the client. The idempotency
* issue would not happen for blob endpoint.
*/
assertTrue(fs.getAbfsClient() instanceof AbfsDfsClient);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use getAbfsServiceType()

@@ -314,4 +347,278 @@ public void deleteBlobDirParallelThreadToDeleteOnDifferentTracingContext()
fs.delete(new Path("/testDir"), true);
fs.close();
}

private void assumeBlobClient() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] This method can be avoided and all calls to it can be replaced by getAbfsServiceType() == AbfsServiceType.BLOB

fs.create(new Path("testDir2/test4/file1"));
assertTrue(fs.exists(new Path("testDir2/test1/test2/test3/file")));
assertTrue(fs.exists(new Path("testDir2/test1/test2/test3/file1")));
Assert.assertTrue(fs.rename(new Path("testDir2/test1/test2/test3"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assertions.assertThat

* API of {@link AzureBlobFileSystem} should recover the paused rename.
*/
@Test
public void testHBaseHandlingForFailedRenameWithListRecovery()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the test below are almost similar, code can be reused

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 18m 7s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 11 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 40m 26s trunk passed
+1 💚 compile 0m 41s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 checkstyle 0m 31s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 41s trunk passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 33s trunk passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 spotbugs 1m 7s trunk passed
+1 💚 shadedclient 40m 0s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 28s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
+1 💚 javac 0m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 20s hadoop-tools/hadoop-azure: The patch generated 0 new + 13 unchanged - 3 fixed = 13 total (was 16)
+1 💚 mvnsite 0m 31s the patch passed
+1 💚 javadoc 0m 28s hadoop-tools_hadoop-azure-jdkUbuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 javadoc 0m 26s hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga generated 0 new + 10 unchanged - 1 fixed = 10 total (was 11)
+1 💚 spotbugs 1m 8s the patch passed
+1 💚 shadedclient 40m 24s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 40s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
152m 57s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/9/artifact/out/Dockerfile
GITHUB PR #7265
JIRA Issue HADOOP-19233
Optional Tests dupname asflicense codespell detsecrets xmllint compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle
uname Linux 0fe1b624c68c 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e2ddbdf
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu220.04-ga
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/9/testReport/
Max. process+thread count 615 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7265/9/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

testAtomicityRedoInvalidFile(fs);
}

private void testRenamePreRenameFailureResolution(final AzureBlobFileSystem fs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments for tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants