-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle v2 uuid named json/parquet checkpoints #3222
base: main
Are you sure you want to change the base?
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3222 +/- ##
==========================================
- Coverage 72.31% 72.28% -0.04%
==========================================
Files 138 138
Lines 45398 45432 +34
Branches 45398 45432 +34
==========================================
+ Hits 32831 32840 +9
- Misses 10489 10510 +21
- Partials 2078 2082 +4 ☔ View full report in Codecov by Sentry. |
@zeevm could you add some tests? |
@ion-elgreco If anyone can tell me exactly how to generate such table in databricks I'll do it, otherwise I don't see how to add a UT, all I can do is verify the fix locally with my customers' table. |
@zeevm I took a look at this pull request this morning, and I''m not sure if we can safely merge it. I believe these uuid checkpoints are v2 checkpoints which can be enabled via a table feature. They're structurally different than v1 checkpoints and may include sidecars which contain additional checkpoint data that I believe delta-rs will currently ignore. Without sharing the table data (obviously 😆) can you share the error that was run into? I can imagine a scenario where older versions of the transaction log were cleaned up and the historical information on a table would be in v2 checkpoints causing us problems reading the table 🤔 |
@rtyler I've created such sample table I can share, try opening this table |
@rtyler I see all the "sidecar" code is already implemented, why would the library have a problem reading them? |
There is an ability to parse a Thanks for the sample table, I'll take a looksee at it shortly. |
@rtyler any insights with the table I provided? |
Delta-kernel-rs supports v2 checkpoints reading. @roeap when do you expect your PR to be ready move over to the new log replay relying more on kernel, I assume v2 checkpoints will be auto supported by that change? |
Description
Match V2 UUID named checkpoints and read JSON checkpoints
Related Issue(s)