Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Clarifying where customers need the table bucket ARN vs Table ARN #743

Merged
merged 1 commit into from
Feb 6, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,9 @@ aws s3 cp s3table-iceberg-pyspark.py s3://<S3_BUCKET>/s3table-example/scripts/

Navigate to example directory and submit the Spark job.

### Step 5: Create Amazon S3 Table
### Step 5: Create Amazon S3 table bucket

This is the main step where you will create an S3 bucket that will be used for S3 Tables, which your PySpark job will access later.
This is the main step where you will create an S3 table bucket that will be used for S3 Tables, which your PySpark job will access later.

Replace `<S3TABLE_BUCKET_NAME>` with your desired bucket name. Replace `<REGION>` with your AWS region.

Expand All @@ -188,7 +188,7 @@ aws s3tables create-table-bucket \
--name "<S3TABLE_BUCKET_NAME>"
```

Make note of the S3TABLE ARN generated by this command. Verify the S3 Table ARN from AWS Console.
Make note of the S3TABLE BUCKET ARN generated by this command. Verify the S3 table bucket ARN from AWS Console.

![alt text](img/s3table_bucket.png)

Expand All @@ -197,8 +197,8 @@ aws s3tables create-table-bucket \
Update the Spark Operator YAML file as below:

- Open [s3table-spark-operator.yaml](https://github.com/awslabs/data-on-eks/blob/main/analytics/terraform/spark-k8s-operator/examples/s3-tables/s3table-spark-operator.yaml) file in your preferred text editor.
- Replace `<S3_BUCKET>` with your S3 bucket created by this blueprint(Check Terraform outputs). S3 Bucket is the place where you copied test data and sample spark job in the above steps.
- REPLACE `<S3TABLE_ARN>` with your S3 Table ARN captured in the previous step.
- Replace `<S3_BUCKET>` with your S3 bucket created by this blueprint(Check Terraform outputs). S3 bucket is the place where you copied the test data and sample spark job in the above steps.
- REPLACE `<S3TABLE_BUCKET_ARN>` with your S3 table bucket ARN captured in the previous step.

You can see the snippet of Spark Operator Job config below.

Expand Down Expand Up @@ -227,7 +227,7 @@ spec:
mainApplicationFile: "s3a://<S3_BUCKET>/s3table-example/scripts/s3table-iceberg-pyspark.py"
arguments:
- "s3a://<S3_BUCKET>/s3table-example/input/"
- "<S3TABLE_ARN>"
- "<S3TABLE_BUCKET_ARN>"
sparkConf:
"spark.app.name": "s3table-example"
"spark.kubernetes.driver.pod.name": "s3table-example"
Expand Down
Loading