Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect to S3 catalog #1683

Open
IanVlasov opened this issue Feb 19, 2025 · 1 comment
Open

Connect to S3 catalog #1683

IanVlasov opened this issue Feb 19, 2025 · 1 comment

Comments

@IanVlasov
Copy link

IanVlasov commented Feb 19, 2025

Question

Hello!
I am rather new in using Iceberg, could you help me with the following question.

I have a catalog stored on S3 with the metadata stored nearby, the structure is simple:

.
└── iceberg/
    └── table_name/
        ├── data/
        │   └── *.parquet
        └── metadata/
            └── *.avro

I can successfully connect to it using Spark with the following parameters:

.config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")
.config("spark.sql.defaultCatalog", "iceberg")
.config("spark.sql.catalog.iceberg", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.iceberg.warehouse", "s3a://bucket_name/iceberg")
.config("spark.sql.catalog.iceberg.type", "hadoop")
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")

But PyIceberg insists on providing URI for metastore which I don't know where to find.
Is there any way to define a configuration that will work in this case?

@kevinjqliu
Copy link
Contributor

Hey @IanVlasov
In the above example, you're using the hadoop catalog type in Spark. PyIceberg does not have the equivalent. And Iceberg discourages using the hadoop catalog. I would recommend interacting with a catalog implementation.

If its a one-time thing, you can try loading the metadata.json directly using StaticTable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants