Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fns_blob.md changes #151

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions hadoop-tools/hadoop-azure/src/site/markdown/blobEndpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# Azure Blob Storage REST API (Blob Endpoint)

## Introduction
The REST API for Blob Storage defines HTTP operations against the storage account, containers(filesystems), and blobs.(files)
The REST API for Blob Storage defines HTTP operations against the storage account, containers(filesystems), and blobs(files).
The API includes the operations listed in the following table.

| Operation | Resource Type | Description |
Expand All @@ -27,8 +27,8 @@ The API includes the operations listed in the following table.
| [List Blobs](#list-blobs) | Filesystem | Lists the paths under the specified directory inside container acting as hadoop filesystem. |
| [Put Blob](#put-blob) | Path | Creates a new path or updates an existing path under the specified filesystem (container). |
| [Lease Blob](#lease-blob) | Path | Establishes and manages a lease on the specified path. |
| [Put Block](#put-block) | Path | Appends Data to an already created blob at specified path. |
| [Put Block List](#put-block-list) | Path | Flushes The Appended Data to the blob at specified path. |
| [Put Block](#put-block) | Path | Appends data to an already created blob at specified path. |
| [Put Block List](#put-block-list) | Path | Flushes the appended data to the blob at specified path. |
| [Set Blob Metadata](#set-blob-metadata) | Path | Sets the user-defined attributes of the blob at specified path. |
| [Get Blob Properties](#get-blob-properties) | Path | Gets the user-defined attributes of the blob at specified path. |
| [Get Blob](#get-blob) | Path | Reads data from the blob at specified path. |
Expand All @@ -43,7 +43,7 @@ already exists, the operation fails.
Rest API Documentation: [Create Container](https://docs.microsoft.com/en-us/rest/api/storageservices/create-container)

## Delete Container
The Delete Container operation marks the specified container for deletion. The container and any blobs contained within it.
The Delete Container operation marks the specified container and any blobs contained within it for deletion.
Rest API Documentation: [Delete Container](https://docs.microsoft.com/en-us/rest/api/storageservices/delete-container)

## Set Container Metadata
Expand All @@ -67,7 +67,7 @@ Partial updates are not supported with Put Blob
Rest API Documentation: [Put Blob](https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob)

## Lease Blob
The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite.
The Lease Blob operation creates and manages a lock on a blob for creating file, opening file for write and rename operations. The lock duration can be 15 to 60 seconds, or can be infinite.
Rest API Documentation: [Lease Blob](https://docs.microsoft.com/en-us/rest/api/storageservices/lease-blob)

## Put Block
Expand Down Expand Up @@ -104,4 +104,4 @@ Rest API Documentation: [Copy Blob](https://docs.microsoft.com/en-us/rest/api/st

## Append Block
The Append Block operation commits a new block of data to the end of an existing append blob.
Rest API Documentaion: [Append Block](https://learn.microsoft.com/en-us/rest/api/storageservices/append-block)
Rest API Documentation: [Append Block](https://learn.microsoft.com/en-us/rest/api/storageservices/append-block)
59 changes: 26 additions & 33 deletions hadoop-tools/hadoop-azure/src/site/markdown/fns_blob.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@
The ABFS driver is recommended to be used only with HNS Enabled ADLS Gen-2 accounts
for big data analytics because of being more performant and scalable.

However, to enable users of legacy WASB Driver to migrate to ABFS driver without
needing them to upgrade their general purpose V2 accounts (HNS-Disabled), Support
However, to allow users of legacy WASB Driver to migrate to ABFS driver without
requiring them to upgrade their general purpose V2 accounts (HNS-Disabled), support
for FNS accounts is being added to ABFS driver.
Refer to [WASB Deprication](./wasb.html) for more details.
Refer to [WASB Deprecation](./wasb.html) documentation for more details.

## Azure Service Endpoints Used by ABFS Driver
Azure Services offers two set of endpoints for interacting with storage accounts:
Expand All @@ -38,7 +38,7 @@ HNS Enabled accounts will still use DFS Endpoint which continues to be the
recommended stack based on performance and feature capabilities.

## Configuring ABFS Driver for FNS Accounts
Following configurations will be introduced to configure ABFS Driver for FNS Accounts:
Following configurations have been introduced to configure ABFS Driver for FNS Accounts:
1. Account Type: Must be set to `false` to indicate FNS Account
```xml
<property>
Expand All @@ -47,31 +47,31 @@ Following configurations will be introduced to configure ABFS Driver for FNS Acc
</property>
```

2. Account Url: It is the URL used to initialize the file system. It is either passed
directly to file system or configured as default uri using "fs.DefaultFS" configuration.
In both the cases the URL used must be the blob endpoint url of the account.
2. Account Url: It is the URL used to initialize the file system. It is either be passed
directly to the file system or configured as the default URI using "fs.DefaultFS" configuration.
In both cases the URL used must be the blob endpoint url of the account.
```xml
<property>
<name>fs.defaultFS</name>
<value>abfss://CONTAINER_NAME@ACCOUNT_NAME.blob.core.windows.net</value>
</property>
```
3. Service Type for FNS Accounts: This will allow an override to choose service
type specially in cases where any local DNS resolution is set for the account and driver is
unable to detect the intended endpoint from above configured URL. If this is set
to blob for HNS Enabled Accounts, FS init will fail with InvalidConfiguration error.
3. Service Type for FNS Accounts: This allows an override to choose the service
type especially in cases where local DNS resolution is set for the account and the driver is
unable to detect the intended endpoint from above configured URL. If this is set
to blob for HNS-enabled accounts, FS initialization will fail with InvalidConfiguration error.
```xml
<property>
<name>fs.azure.fns.account.service.type</name>
<value>BLOB</value>
</property>
```

4. Service Type for Ingress Operations: This will allow an override to choose service
type only for Ingress Related Operations like [Create](./blobEndpoint.html#put-blob),
[Append](./blobEndpoint.html#put-block),
and [Flush](./blobEndpoint.html#put-block-list). All other operations will still use the
configured service type.
4. Service Type for Ingress Operations: This allows an override to choose service
type only for Ingress related operations like [Create](./blobEndpoint.html#put-blob),
[Append](./blobEndpoint.html#put-block),
and [Flush](./blobEndpoint.html#put-block-list). All other operations will still use the
configured service type.
```xml
<property>
<name>fs.azure.ingress.service.type</name>
Expand Down Expand Up @@ -106,40 +106,33 @@ The following configs are related to rename and delete operations.
- `fs.azure.blob.copy.max.wait.millis`: Maximum time to wait for a blob copy
operation to complete. The default value is 5 minutes.

- `fs.azure.blob.atomic.rename.lease.refresh.duration`: Blob rename lease
refresh
- `fs.azure.blob.atomic.rename.lease.refresh.duration`: Blob rename lease refresh
duration in milliseconds. This setting ensures that the lease on the blob is
periodically refreshed during a rename operation to prevent other operations
periodically refreshed during a rename operation preventing other operations
from interfering.
The default value is 60 seconds.

- `fs.azure.blob.dir.list.producer.queue.max.size`: Maximum number of blob
entries
- `fs.azure.blob.dir.list.producer.queue.max.size`: Maximum number of blob entries
enqueued in memory for rename or delete orchestration. The default value is 2
times the default value of list max results, which is 5000, making the current
value 10000.

- `fs.azure.blob.dir.list.consumer.max.lag`: It sets a limit on how much blob
information can be waiting to be processed (consumer lag) during a blob
listing
operation. If the amount of unprocessed blob information exceeds this limit,
the
producer will pause until the consumer catches up and the lag becomes
listing operation. If the amount of unprocessed blob information exceeds this limit,
the producer will pause until the consumer catches up and the lag becomes
manageable. The default value is equal to the value of default value of list
max
results which is 5000 currently.
max results which is 5000 currently.

- `fs.azure.blob.dir.rename.max.thread`: Maximum number of threads per blob
rename
orchestration. The default value is 5.
rename orchestration. The default value is 5.

- `fs.azure.blob.dir.delete.max.thread`: Maximum number of thread per
blob-delete
orchestration. The default value currently is 5.
- `fs.azure.blob.dir.delete.max.thread`: Maximum number of thread per blob
delete orchestration. The default value currently is 5.

## Features currently not supported

1. **User Delegation SAS** feature is currently not supported but we
1. **User Delegation SAS** feature is currently not supported, but we
plan to bring support for it in the future.
Jira to track this
workitem : https://issues.apache.org/jira/browse/HADOOP-19406.
Expand Down