Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] [Kernel] Delete a partition or delete Parquet files from a table #3988

Open
2 of 8 tasks
chrisparrinello opened this issue Dec 19, 2024 · 1 comment
Open
2 of 8 tasks
Labels
enhancement New feature or request

Comments

@chrisparrinello
Copy link

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Our pipeline generates Parquet files outside of a Delta or Spark context so we use Delta Standalone to add the Parquet files to a Delta table for a partition. If the pipeline regenerates the Parquet files for a partition, we need the ability to be able to remove the old Parquet files for that partition and add the new Parquet files to the same partition.

Motivation

This feature will add more bulk-level updates versus row-level updates currently supported by Delta Kernel

Further details

Delta Standalone support AddFile and Remove file operations on the Delta log via code such as the following:

        OptimisticTransaction txn = deltaLog.startTransaction();
        txn.commit(newFiles, new Operation(Operation.Name.UPDATE), engineInfo);

The Delta Kernel equivalent for AddFile ss creating DataFileStatus objects and then using Transaction.generateAppendActions. There doesn't seem to be an equivalent in the code for a RemoveFile operation.

In lieu of a RemoveFile operation, we could work around that by having the ability to delete a partition. We can then recreate the partition via DataFileStatus/Transaction.generateAppendActions calls.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@chrisparrinello chrisparrinello added the enhancement New feature or request label Dec 19, 2024
@glistening-apricots
Copy link

glistening-apricots commented Feb 3, 2025

General Options

It seems like there's at least two different options here, either of which would modify the "Transaction" interfaces in the Java and Rust kernels:

  1. Add an API like overwritePartitions(predicate).
  2. Add a generalized API that permits arbitrary remove-actions.

Notes on Option 1

Ideally, this API can be called multiple times, to handle use-cases (such as ours) in which the set of partitions being over-written is updated over time.

While a delta-kernel user could accumulate a single, large predicate to apply all at once, if a user knows at t1 that such-and-such partition(s) require overwriting, and they know at a much later t2 that such-and-such partition(s) require overwriting, it can benefit performance to apply these predicates at t1 and t2, respectively, instead of waiting until t2. This is because the evaluation-time required to evaluate predicates scales linearly w.r.t. the number of files in the targeted snapshot and w.r.t. the size of the predicate. So, if you can, applying the t1-predicate while the t2-predicate is still "being discovered" can provide greater parallelism and a faster transaction overall.

Absent caching of the resolved snapshot's addFile entries (which would be a nice option for high-memory environments), this approach can increase the number of transaction-log reconciliations, which can cost more money depending on the underlying storage, but it seems best to defer the resolution of this tradeoff to end-consumers of the kernel.

If the API can be called multiple times per-transaction, then it also needs to handle the case that predicate@t1 and predicate@t2 are possibly overlapping w.r.t. the partitions being over-written. It's not clear to me what's supposed to happen during action-reconciliation when there are multiple RemoveFile entries for a single file within a single transaction-log entry, especially if they (intentionally or unintentionally) end up with distinct copies of values such as the stats or deletionVector fields.

But if a single file must have at-most one RemoveFile per transaction-log entry (and per snapshot?), then the multi-predicate approach should take to avoid submitting duplicate RemoveFile entries.

Notes on Option 2

Personally, I prefer this option because my use-case is intended to eventually handle arbitrary DML operations anyways. Currently, this is done using Delta-Standalone's markFilesAsRead-API, alongside its ability to add and remove arbitrary files as part of a single transaction.

Aside from my use-case, my understanding is that the delta-rs library will eventually migrate to utilize delta-kernel-rs. Since delta-rs supports assorted operations that require RemoveFile entries (e.g. UPDATE), I imagine they'll eventually be able to use something like option-2, but I am not a contributor to delta-rs, so anyone else should feel free to correct me if I'm mistaken.

Another option is for each operation (UPDATE, delete-partitions, etc.) to be its own high-level operation for which kernel provides support, but for the sake of flexibility for end-users, I'd prefer if the API (additionally?) permitted arbitrary add and remove-actions. This would make it easier and faster for us to migrate our Delta-Standalone code since that API was designed around directly creating AddFile and RemoveFile actions.

Note on Isolation

New transactions can require retries for two general reasons, which can co-present for a single retry.

  1. The transaction's semantics depend upon some portion of the table's latest state, but that portion has changed.
  2. The transaction was going to be committed with transaction-log entry number N, but N is now unavailable.

If overwriting a partition, the list of files to remove must reflect the previous snapshot, so both types of retries may be required.

The design-doc for appends included a "withReadSet" API (that doesn't seem to have been implemented?), which, for option-2 (arbitrary RemoveFile support), could ensure the required validation is performed. See: https://docs.google.com/document/d/1Sqxug9RsqSQy2iPuT-GbRdSRui3sT45QnzlJAPdoGM4/edit?tab=t.0

For option-1 (delete-partition(s) support), a user-configurable retry-limit would be nice, where the retries themselves are implemented within kernel. Each retry would include the ability to retrieve the latest list of files that exist in each partition being over-written.

Finally, there are use-cases where serializable isolation is unnecessary (or guaranteed externally, e.g. a system that never performs concurrent writes on overlapping partitions); for such use-cases, some configuration of the validation-performed would be nice. The idea would be that if I know my writes never target the same partitions, then I should be able to perform concurrent partition-deletes without reading the transaction-log to verify that those partition-deletes are non-overlapping. If supported, such validation-skipping should be opt-out and not opt-in.

A singular retry-limit would be insufficient for enabling this since type-2 retries are necessary even if concurrent writes are not logically conflicting, yet in that case, type-1 retries (and associated validation) are unnecessary.

The Apache Iceberg Java library provides such configurability, to some extent: https://iceberg.apache.org/javadoc/0.11.0/org/apache/iceberg/IsolationLevel.html#:~:text=Two%20isolation%20levels%20are%20supported,see%20only%20already%20committed%20data.

Note on Acidity

For our use-case at least, the ability to merely delete partition(s) is insufficient. If design-1 were used, it would not help us migrate from Delta-Standalone unless it were additionally possible to append new data in the same transaction as partition-deletion. i.e. for us, if partition-deletion were one transaction, and related appends were another transaction, that would prevent us from migrating the relevant code.

Row-Level Deletes: we don't need them (yet)!

While we'd eventually like row-level delete-support, since Delta-Standalone does not support deletion-vectors, our existing use-cases can be covered by file-level deletes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants