layout

title

subtitle

tags

comments

post

EMR Serverless Cross-Account Access to Iceberg Tables

Learn how to make your Iceberg tables available for spark jobs running cross-account on EMR Serverless

blog

false

A few weeks ago, I was working on a project where I had to access Iceberg tables from a Spark job running on EMR cluster in another account. I found it a bit tricky to set up, so I decided to write this post to help others who might be facing the same issue.

If you follow the EMR documentation on how to access Iceberg tables you're going to find the following spark-submit parameters recommendation:

--conf spark.jars=/usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf spark.sql.catalog.<YOUR_CATALOG_NAME_HERE>=org.apache.iceberg.spark.SparkCatalog 
--conf spark.sql.catalog.<YOUR_CATALOG_NAME_HERE>.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog 
--conf spark.sql.catalog.<YOUR_CATALOG_NAME_HERE>.warehouse=s3://DOC-EXAMPLE-BUCKET/EXAMPLE-PREFIX/
--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

After setting those and configuring the cross-account access on your AWS Glue Catalog in the account where the iceberg table lives, you're going to receive an error similar to org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table table_name. StorageDescriptor#InputFormat cannot be null for table: table_name(Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null).

Considering you've done everything right while setting the permissions you can solve this issue adding the following parameter:

--conf spark.sql.catalog.<YOUR_CATALOG_NAME_HERE>.glue.id=<ICEBERG_TABLE_ACCOUNT_ID>

After this, you can access the database using SparkSQL:

SELECT *
FROM <YOUR_CATALOG_NAME_HERE>.<DATABASE>.<TABLE_NAME>

That's all for this post, hope it helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-08-26-emr-cross-account-to-iceberg.md

2024-08-26-emr-cross-account-to-iceberg.md

Files

2024-08-26-emr-cross-account-to-iceberg.md

Latest commit

History

2024-08-26-emr-cross-account-to-iceberg.md

File metadata and controls