Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hive Parquet ZSTD compression support #24761

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wraymo
Copy link

@wraymo wraymo commented Mar 19, 2025

Description

This PR adds ZSTD compression support for Hive Parquet.

Motivation and Context

Impact

Test Plan

  • Add hive.compression-codec=zstd to the hive catalog file
  • Start the coordinator and worker
  • Run the following queries
CREATE TABLE orders_2 ( id INT, items ARRAY<ROW(name VARCHAR, quantity INTEGER, price DOUBLE)> ) with (format='PARQUET');
INSERT INTO orders_2(id, items) VALUES (1, ARRAY[ROW('Apple', 10, 1.99), ROW('Banana', 5, 0.99)])

Before the change, an error ZSTD compression is not supported with PARQUET was reported. Now there's no error and when using parquet-tools to inspect the file, we can see compression: ZSTD
Also verified it with presto-native-execution

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... 
* ... 

Hive Connector Changes
* ... 
* ... 

If release note is NOT required, use:

== NO RELEASE NOTE ==

@wraymo wraymo requested a review from a team as a code owner March 19, 2025 22:49
@wraymo wraymo requested a review from presto-oss March 19, 2025 22:49
@wraymo wraymo changed the title add Hive Parquet ZSTD compression support Add Hive Parquet ZSTD compression support Mar 19, 2025
@steveburnett
Copy link
Contributor

hive.compression-codec is mentioned in Hive Configuration Properties in the documentation and GZIP is shown as the default.

Suggest adding a mention of ZSTD as an option for hive.compression-codec to the doc. If you have a better idea of where than this table, let me know. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants