diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md b/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
index fdf366f95d34b..037212e79a094 100644
--- a/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
+++ b/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
@@ -69,7 +69,7 @@ with Hierarchical Namespaces.
## Hierarchical Namespaces (and WASB Compatibility)
A key aspect of ADLS Gen 2 is its support for
-[hierachical namespaces](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace)
+[hierarchical namespaces](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace)
These are effectively directories and offer high performance rename and delete operations
—something which makes a significant improvement in performance in query engines
writing data to, including MapReduce, Spark, Hive, as well as DistCp.
@@ -297,7 +297,7 @@ This is shown in the Authentication section.
## Authentication
-Authentication for ABFS is ultimately granted by [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/develop/authentication-scenarios).
+Authentication for ABFS is ultimately granted by [Azure Active Directory](https://docs.microsoft.com/en-us/azure/active-directory/develop/authentication-scenarios) (now Microsoft Entra ID).
The concepts covered there are beyond the scope of this document to cover;
developers are expected to have read and understood the concepts therein
@@ -332,7 +332,7 @@ possible
### AAD Token fetch retries
-The exponential retry policy used for the AAD token fetch retries can be tuned
+The exponential retry policy used for the AAD (now Entra ID) token fetch retries can be tuned
with the following configurations.
* `fs.azure.oauth.token.fetch.retry.max.retries`: Sets the maximum number of
retries. Default value is 5.
@@ -652,8 +652,7 @@ CustomDelegationTokenManager interface.
{fully-qualified-class-name-for-implementation-of-CustomDelegationTokenManager-interface}
```
-In case delegation token is enabled, and the config `fs.azure.delegation.token
-.provider.type` is not provided then an IlleagalArgumentException is thrown.
+In case delegation token is enabled, and the config `fs.azure.delegation.token.provider.type` is not provided then an IllegalArgumentException is thrown.
### Shared Access Signature (SAS) Token Provider
@@ -663,7 +662,7 @@ To know more about how SAS Authentication works refer to
[Grant limited access to Azure Storage resources using shared access signatures (SAS)](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview)
There are three types of SAS supported by Azure Storage:
-- [User Delegation SAS](https://learn.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas): Recommended for use with ABFS Driver with HNS Enabled ADLS Gen2 accounts. It is Identity based SAS that works at blob/directory level)
+- [User Delegation SAS](https://learn.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas): Recommended for use with ABFS Driver with HNS Enabled ADLS Gen2 accounts. It is an identity-based SAS that works at blob/directory level)
- [Service SAS](https://learn.microsoft.com/en-us/rest/api/storageservices/create-service-sas): Global and works at container level.
- [Account SAS](https://learn.microsoft.com/en-us/rest/api/storageservices/create-account-sas): Global and works at account level.
@@ -754,16 +753,16 @@ requests. User can specify them as fixed SAS Token to be used across all the req
```
- 1. Fixed SAS Token:
- ```xml
-
- fs.azure.sas.fixed.token
- FIXED_SAS_TOKEN
-
- ```
+ 2. Account SAS (Fixed SAS Token at Account Level):
+ ```xml
+
+ fs.azure.sas.fixed.token
+ FIXED_SAS_TOKEN
+
+ ```
- Replace `FIXED_SAS_TOKEN` with fixed Account/Service SAS. You can also
-generate SAS from Azure portal. Account -> Security + Networking -> Shared Access Signature
+ - Replace `FIXED_SAS_TOKEN` with fixed Account/Service SAS. You can also
+ generate SAS from Azure portal. Account -> Security + Networking -> Shared Access Signature
- **Security**: Account/Service SAS requires account keys to be used which makes
them less secure. There is no scope of having delegated access to different users.
@@ -864,16 +863,16 @@ Azure OAuth tokens.
Consult the source in `org.apache.hadoop.fs.azurebfs.extensions`
and all associated tests to see how to make use of these extension points.
-_Warning_ These extension points are unstable.
+_Warning_ : These extension points are unstable.
### Networking Layer:
ABFS Driver can use the following networking libraries:
- ApacheHttpClient:
- Library Documentation.
- - Default networking library.
- JDK networking library:
- Library documentation.
+ - Default networking library.
The networking library can be configured using the configuration `fs.azure.networking.library`
while initializing the filesystem.
@@ -1007,13 +1006,13 @@ greater than or equal to 0.
retries of IO operations. Currently this is used only for the server call retry
logic. Used within `AbfsClient` class as part of the ExponentialRetryPolicy. This
value indicates the smallest interval (in milliseconds) to wait before retrying
-an IO operation. The default value is 3000 (3 seconds).
+an IO operation. The default value is 500 milliseconds.
`fs.azure.io.retry.max.backoff.interval`: Sets the maximum backoff interval for
retries of IO operations. Currently this is used only for the server call retry
logic. Used within `AbfsClient` class as part of the ExponentialRetryPolicy. This
value indicates the largest interval (in milliseconds) to wait before retrying
-an IO operation. The default value is 30000 (30 seconds).
+an IO operation. The default value is 25000 (25 seconds).
`fs.azure.io.retry.backoff.interval`: Sets the default backoff interval for
retries of IO operations. Currently this is used only for the server call retry
@@ -1023,7 +1022,7 @@ value. This random delta is then multiplied by an exponent of the current IO
retry number (i.e., the default is multiplied by `2^(retryNum - 1)`) and then
contstrained within the range of [`fs.azure.io.retry.min.backoff.interval`,
`fs.azure.io.retry.max.backoff.interval`] to determine the amount of time to
-wait before the next IO retry attempt. The default value is 3000 (3 seconds).
+wait before the next IO retry attempt. The default value is 500 milliseconds.
`fs.azure.write.request.size`: To set the write buffer size. Specify the value
in bytes. The value should be between 16384 to 104857600 both inclusive (16 KB
@@ -1361,9 +1360,9 @@ Operation failed: "Server failed to authenticate the request.
Causes include:
* Your credentials are incorrect.
-* Your shared secret has expired. in Azure, this happens automatically
+* Your shared secret has expired. In Azure, this happens automatically.
* Your shared secret has been revoked.
-* host/VM clock drift means that your client's clock is out of sync with the
+* Host/VM clock drift means that your client's clock is out of sync with the
Azure servers —the call is being rejected as it is either out of date (considered a replay)
or from the future. Fix: Check your clocks, etc.
diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/blobEndpoint.md b/hadoop-tools/hadoop-azure/src/site/markdown/blobEndpoint.md
index 07c499cea5db8..5116ded54f09d 100644
--- a/hadoop-tools/hadoop-azure/src/site/markdown/blobEndpoint.md
+++ b/hadoop-tools/hadoop-azure/src/site/markdown/blobEndpoint.md
@@ -15,7 +15,7 @@
# Azure Blob Storage REST API (Blob Endpoint)
## Introduction
-The REST API for Blob Storage defines HTTP operations against the storage account, containers(filesystems), and blobs.(files)
+The REST API for Blob Storage defines HTTP operations against the storage account, containers(filesystems), and blobs(files).
The API includes the operations listed in the following table.
| Operation | Resource Type | Description |
@@ -27,8 +27,8 @@ The API includes the operations listed in the following table.
| [List Blobs](#list-blobs) | Filesystem | Lists the paths under the specified directory inside container acting as hadoop filesystem. |
| [Put Blob](#put-blob) | Path | Creates a new path or updates an existing path under the specified filesystem (container). |
| [Lease Blob](#lease-blob) | Path | Establishes and manages a lease on the specified path. |
-| [Put Block](#put-block) | Path | Appends Data to an already created blob at specified path. |
-| [Put Block List](#put-block-list) | Path | Flushes The Appended Data to the blob at specified path. |
+| [Put Block](#put-block) | Path | Appends data to an already created blob at specified path. |
+| [Put Block List](#put-block-list) | Path | Flushes the appended data to the blob at specified path. |
| [Set Blob Metadata](#set-blob-metadata) | Path | Sets the user-defined attributes of the blob at specified path. |
| [Get Blob Properties](#get-blob-properties) | Path | Gets the user-defined attributes of the blob at specified path. |
| [Get Blob](#get-blob) | Path | Reads data from the blob at specified path. |
@@ -43,7 +43,7 @@ already exists, the operation fails.
Rest API Documentation: [Create Container](https://docs.microsoft.com/en-us/rest/api/storageservices/create-container)
## Delete Container
-The Delete Container operation marks the specified container for deletion. The container and any blobs contained within it.
+The Delete Container operation marks the specified container and any blobs contained within it for deletion.
Rest API Documentation: [Delete Container](https://docs.microsoft.com/en-us/rest/api/storageservices/delete-container)
## Set Container Metadata
@@ -67,7 +67,7 @@ Partial updates are not supported with Put Blob
Rest API Documentation: [Put Blob](https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob)
## Lease Blob
-The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite.
+The Lease Blob operation creates and manages a lock on a blob for creating file, opening file for write and rename operations. The lock duration can be 15 to 60 seconds, or can be infinite.
Rest API Documentation: [Lease Blob](https://docs.microsoft.com/en-us/rest/api/storageservices/lease-blob)
## Put Block
@@ -104,4 +104,4 @@ Rest API Documentation: [Copy Blob](https://docs.microsoft.com/en-us/rest/api/st
## Append Block
The Append Block operation commits a new block of data to the end of an existing append blob.
-Rest API Documentaion: [Append Block](https://learn.microsoft.com/en-us/rest/api/storageservices/append-block)
\ No newline at end of file
+Rest API Documentation: [Append Block](https://learn.microsoft.com/en-us/rest/api/storageservices/append-block)
\ No newline at end of file
diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/fns_blob.md b/hadoop-tools/hadoop-azure/src/site/markdown/fns_blob.md
index 27934b2e25aa5..f104dfd463e59 100644
--- a/hadoop-tools/hadoop-azure/src/site/markdown/fns_blob.md
+++ b/hadoop-tools/hadoop-azure/src/site/markdown/fns_blob.md
@@ -18,10 +18,10 @@
The ABFS driver is recommended to be used only with HNS Enabled ADLS Gen-2 accounts
for big data analytics because of being more performant and scalable.
-However, to enable users of legacy WASB Driver to migrate to ABFS driver without
-needing them to upgrade their general purpose V2 accounts (HNS-Disabled), Support
+However, to allow users of legacy WASB Driver to migrate to ABFS driver without
+requiring them to upgrade their general purpose V2 accounts (HNS-Disabled), support
for FNS accounts is being added to ABFS driver.
-Refer to [WASB Deprication](./wasb.html) for more details.
+Refer to [WASB Deprecation](./wasb.html) documentation for more details.
## Azure Service Endpoints Used by ABFS Driver
Azure Services offers two set of endpoints for interacting with storage accounts:
@@ -38,7 +38,7 @@ HNS Enabled accounts will still use DFS Endpoint which continues to be the
recommended stack based on performance and feature capabilities.
## Configuring ABFS Driver for FNS Accounts
-Following configurations will be introduced to configure ABFS Driver for FNS Accounts:
+Following configurations have been introduced to configure ABFS Driver for FNS Accounts:
1. Account Type: Must be set to `false` to indicate FNS Account
```xml
@@ -47,19 +47,19 @@ Following configurations will be introduced to configure ABFS Driver for FNS Acc
```
-2. Account Url: It is the URL used to initialize the file system. It is either passed
-directly to file system or configured as default uri using "fs.DefaultFS" configuration.
-In both the cases the URL used must be the blob endpoint url of the account.
+2. Account Url: It is the URL used to initialize the file system. It is either be passed
+ directly to the file system or configured as the default URI using "fs.DefaultFS" configuration.
+ In both cases the URL used must be the blob endpoint url of the account.
```xml
fs.defaultFS
abfss://CONTAINER_NAME@ACCOUNT_NAME.blob.core.windows.net
```
-3. Service Type for FNS Accounts: This will allow an override to choose service
-type specially in cases where any local DNS resolution is set for the account and driver is
-unable to detect the intended endpoint from above configured URL. If this is set
-to blob for HNS Enabled Accounts, FS init will fail with InvalidConfiguration error.
+3. Service Type for FNS Accounts: This allows an override to choose the service
+ type especially in cases where local DNS resolution is set for the account and the driver is
+ unable to detect the intended endpoint from above configured URL. If this is set
+ to blob for HNS-enabled accounts, FS initialization will fail with InvalidConfiguration error.
```xml
fs.azure.fns.account.service.type
@@ -67,11 +67,11 @@ to blob for HNS Enabled Accounts, FS init will fail with InvalidConfiguration er
```
-4. Service Type for Ingress Operations: This will allow an override to choose service
-type only for Ingress Related Operations like [Create](./blobEndpoint.html#put-blob),
-[Append](./blobEndpoint.html#put-block),
-and [Flush](./blobEndpoint.html#put-block-list). All other operations will still use the
-configured service type.
+4. Service Type for Ingress Operations: This allows an override to choose service
+ type only for Ingress related operations like [Create](./blobEndpoint.html#put-blob),
+ [Append](./blobEndpoint.html#put-block),
+ and [Flush](./blobEndpoint.html#put-block-list). All other operations will still use the
+ configured service type.
```xml
fs.azure.ingress.service.type
@@ -106,40 +106,33 @@ The following configs are related to rename and delete operations.
- `fs.azure.blob.copy.max.wait.millis`: Maximum time to wait for a blob copy
operation to complete. The default value is 5 minutes.
-- `fs.azure.blob.atomic.rename.lease.refresh.duration`: Blob rename lease
- refresh
+- `fs.azure.blob.atomic.rename.lease.refresh.duration`: Blob rename lease refresh
duration in milliseconds. This setting ensures that the lease on the blob is
- periodically refreshed during a rename operation to prevent other operations
+ periodically refreshed during a rename operation preventing other operations
from interfering.
The default value is 60 seconds.
-- `fs.azure.blob.dir.list.producer.queue.max.size`: Maximum number of blob
- entries
+- `fs.azure.blob.dir.list.producer.queue.max.size`: Maximum number of blob entries
enqueued in memory for rename or delete orchestration. The default value is 2
times the default value of list max results, which is 5000, making the current
value 10000.
- `fs.azure.blob.dir.list.consumer.max.lag`: It sets a limit on how much blob
information can be waiting to be processed (consumer lag) during a blob
- listing
- operation. If the amount of unprocessed blob information exceeds this limit,
- the
- producer will pause until the consumer catches up and the lag becomes
+ listing operation. If the amount of unprocessed blob information exceeds this limit,
+ the producer will pause until the consumer catches up and the lag becomes
manageable. The default value is equal to the value of default value of list
- max
- results which is 5000 currently.
+ max results which is 5000 currently.
- `fs.azure.blob.dir.rename.max.thread`: Maximum number of threads per blob
- rename
- orchestration. The default value is 5.
+ rename orchestration. The default value is 5.
-- `fs.azure.blob.dir.delete.max.thread`: Maximum number of thread per
- blob-delete
- orchestration. The default value currently is 5.
+- `fs.azure.blob.dir.delete.max.thread`: Maximum number of thread per blob
+ delete orchestration. The default value currently is 5.
## Features currently not supported
-1. **User Delegation SAS** feature is currently not supported but we
+1. **User Delegation SAS** feature is currently not supported, but we
plan to bring support for it in the future.
Jira to track this
workitem : https://issues.apache.org/jira/browse/HADOOP-19406.
diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/wasb.md b/hadoop-tools/hadoop-azure/src/site/markdown/wasb.md
index 270fd14da4c44..e176a5d890139 100644
--- a/hadoop-tools/hadoop-azure/src/site/markdown/wasb.md
+++ b/hadoop-tools/hadoop-azure/src/site/markdown/wasb.md
@@ -16,7 +16,7 @@
## Introduction
WASB Driver is a legacy Hadoop File System driver that was developed to support
-[FNS(FlatNameSpace) Azure Storage accounts](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
+[FNS (FlatNameSpace) Azure Storage accounts](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
that do not honor File-Folder syntax.
HDFS Folder operations hence are mimicked at client side by WASB driver and
certain folder operations like Rename and Delete can lead to a lot of IOPs with
@@ -93,5 +93,4 @@ Refer to [ABFS Authentication](abfs.html/authentication) for more details.
### ABFS Features Not Available for migrating Users
Certain features of ABFS Driver will be available only to users using HNS accounts with ABFS driver.
-1. ABFS Driver's SAS Token Provider plugin for UserDelegation SAS and Fixed SAS.
-2. Client Provided Encryption Key (CPK) support for Data ingress and egress.
+1. Client Provided Encryption Key (CPK) support for Data ingress and egress.