[tracking] Kernelize! #3298

roeap · 2025-03-04T14:06:20Z

Important

This is a living issue that we'll update with new issues / comments as we get more clarity on the concrete implementations.

Description

This is a tracking issue to align on and coordinate the integration of delta-kernel-rs into delta-rs.

Motivation

While tremendous strides have been made by the community to support more and more delta features in delta-rs, we are still lagging behind with more features on the way that user will want to leverage. This is exactly the use case the kernel libraries aim to address - a correct and complete implementation of the Delta protocol.

Kernel explicitly does not take an opinion on all io / execution related aspects that are needed to actually consume / work with delta tables. This is what delta-rs provides, leaving the current (high level) user facing APIs conceptually as is.

Execution

In simplified terms adopting kernel mean carving out the functionality that currently resides in

core/src/kernel (named so in preparation for being replaced by kernel)
core/src/protocol (mainly our snapshot code, that I wanted to update for quite a while now)
core/src/schema (only partition pruning remained in this module after previous updates)

At the heart of the migration is creating a new snapshot implementation (RFC in #3137) which provides all required machinery (the engine) to kernel and exposes methods tailored to the needs of delta-rs.

One potential avenue forward is to get the RFC merge-ready and merge it without being "hooked-up" to the rest of the crate. This PR also exposes a Snapshot trait (we already have something similar, but not quite fitting - I think) the we can hopefully leverage to refactor all the operations that require access to the snapshot - i.e. implement that trait for current snapshots ... This should hopefully surface any missing APIs in kernel that we may yet require for full adoption.

Challenges

kernel currently only has very limited write paths support, so we'll have to keep maintaining that for now. However we can motivate the API designs based on our needs.
In terms of feature support there is no full overlap as of now. e.g. kernel supports deletion vectors, which delta-rs does not, but delta-rs supports generated columns, which are not yet part of kernel and still require some designs (i.e. how to handle arbitrary SQL that an engine will need to parse).
While kernel offers great opportunities for performance enhancements, there are several areas that might take an initial hit until we can implement performance optimizations that work well with kernel. These mainly relate to less frequently requests actions such as Txn, CommtInfo ...

Any feedback / concers around proceeding with this is highly appreciated.

Related Work

PRs cannot be tracked as sub-issues

The text was updated successfully, but these errors were encountered:

ion-elgreco · 2025-03-04T16:01:04Z

@roeap Based on the challenges you wrote, I think we should actually move forward with the current API as 1.0 for us.

Because one thing I would not like to see is we hit many regressions after switching to kernel, and then this would set us back many months if not longer to get to a 1.0 release. I would rather see us doing this in 1.X work and then we can release 2.0 after that is stabilized

roeap · 2025-03-04T16:29:00Z

@ion-elgreco - i do see the point, but are we under some time pressure? I think we still have a few correctness problems anyways that I personally would expect to handled in a 1.0.

But yes, this is certainly not going to be done in a week or so.

Personally I do also still struggle a bit with how wide our APIs on the delta table are. i.e. tailored to both log inspection and table scans. The SQL APIs are also still considered experimental IIRC, do we have a design for that?

Finally getting to 1.0 is something I am tremendously looking forward to, but is there any motivation to rush it now that we are nearing the end? Knowing we might break things?

W.r.t. to the challenges, I think we can cover most of this via a hybrid state where we leverage what kernel can do and layer our stuff on top of that.

ion-elgreco · 2025-03-07T07:34:14Z

@ion-elgreco - i do see the point, but are we under some time pressure? I think we still have a few correctness problems anyways that I personally would expect to handled in a 1.0.

Which things exactly?

But yes, this is certainly not going to be done in a week or so.

This worries me, I want a python 1.0 out as soon as possible

Personally I do also still struggle a bit with how wide our APIs on the delta table are. i.e. tailored to both log inspection and table scans. The SQL APIs are also still considered experimental IIRC, do we have a design for that?

SQL apis? You mean parsing predicates, we also have this experimental querybuilder in python which I'm not a major fan of

Finally getting to 1.0 is something I am tremendously looking forward to, but is there any motivation to rush it now that we are nearing the end? Knowing we might break things?

But we shouldn't break things anymore for python users. I've been thinking about this for a while but I think we can do a similar thing as Polars does, make python 1.0 but the rust crates not.

roeap · 2025-03-07T13:31:56Z

Which things exactly?

I guess most if not all of this is on the rust side, but may break things on the python side.

Definitely NULL handling in partition skipping is not correct.
DId we ever get around to handle the time travel / VACUUM / metadata cleanup interactions? i.e. time travel further then log retention.
Are we using timestamps in commit infos for time travel?
our casting logic in general might be too permissive.

I want a python 1.0 out as soon as possible

Do we have a pressing motivation for this? I agree that this is a priority, but are there time constraints?

we also have this experimental querybuilder in python which I'm not a major fan of

^^ this

But we shouldn't break things anymore for python users.

I may not be the best judge of this, but there are a few things I would take a close look at in terms of api maintenance. This may just be a personal feeling, but to me it seems we kept just adding more and more parameters to write_deltalake specifically. Right now this function has 23 kw parameters. some of which I believe may overlap and just be used in different cases? For instance, (and I may be wrong 😆) but is seems all of the following try to configure similar things, maybe in different branches?

file_options
max_rows_per_file
min_rows_per_group
max_rows_per_group
writer_properties

With kernel specifically, these two should "collapse" since kernel handles file skipping.

partition_filters
predicate

Also, I think we may need to deprecate the "pyarrow" engine altogether as it cannot support all features.

Just my personal preference (so nothing that matter all that much 😆), but I'd like to remove the file* method from the DeltaTable and move it to something dedicated to log inspection.

ion-elgreco · 2025-03-07T13:46:37Z

Which things exactly?

I guess most if not all of this is on the rust side, but may break things on the python side.

Definitely NULL handling in partition skipping is not correct.

Hmm, do you have repro for this? First time I hear about this

DId we ever get around to handle the time travel / VACUUM / metadata cleanup interactions? i.e. time travel further then log retention.

Yeah I dove into it, it's actually working fine and throwing the correct erros when you do try to time travel beyond such a state.

Are we using timestamps in commit infos for time travel?
We had a short discussion on this here: Use commitinfo timestamp instead of file medata timestamp to get version #1972 but decided not to do this since it's not stable.

our casting logic in general might be too permissive.
Do you have an example?

we also have this experimental querybuilder in python which I'm not a major fan of

^^ this

Yes, but I preferably don't want any SQL support in deltalake. We already have duckdb and datafusion that can read delta tables, so I never truly saw the purpose of this.

But we shouldn't break things anymore for python users.

I may not be the best judge of this, but there are a few things I would take a close look at in terms of api maintenance. This may just be a personal feeling, but to me it seems we kept just adding more and more parameters to write_deltalake specifically. Right now this function has 23 kw parameters. some of which I believe may overlap and just be used in different cases? For instance, (and I may be wrong 😆) but is seems all of the following try to configure similar things, maybe in different branches?

file_options

max_rows_per_file

min_rows_per_group

max_rows_per_group

writer_properties

With kernel specifically, these two should "collapse" since kernel handles file skipping.

partition_filters

predicate

Also, I think we may need to deprecate the "pyarrow" engine altogether as it cannot support all features.

I think you missed this PR :) #3285

All of these options were dedicated to the pyarrow engine writer. Since that is now removed I was able to restructure and simplify everything. I could only do this since we are closing in on 1.0.

Just my personal preference (so nothing that matter all that much 😆), but I'd like to remove the file* method from the DeltaTable and move it to something dedicated to log inspection.
Not sure what you have in mind here, but if log inspection provides the same thing to get a list of files with the same ease, then sure!

roeap · 2025-03-07T14:26:27Z

Yes, but I preferably don't want any SQL support in deltalake. We already have duckdb and datafusion that can read delta tables, so I never truly saw the purpose of this.

Agreed, but how to proceed with these APIs then?

It's marked as experimental so in theory we can always remove it imho. But if others want to add things to it they can do that.

Not sure what you have in mind here, but if log inspection provides the same thing to get a list of files with the same ease, then sure!

I guess yes. The main point being, that clients wanting to read data should not need to worry about such things, and we should discourage usage. Different more operations focused use cases that want to optimize / maintain the table have very different needs - i.e. only want to inspect the log and read no data at all. I think each of these should have a dedicated API - maybe DeltaLog for the latter?

For me the ideal solution would be to get rid of all functions that expose any action o.a. and expose all of this just as record batches.

In python a DeltaLog could be fine, but also something we can introduce in 1.x.

roeap added the enhancement New feature or request label Mar 4, 2025

roeap pinned this issue Mar 4, 2025

roeap self-assigned this Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tracking] Kernelize! #3298

[tracking] Kernelize! #3298

roeap commented Mar 4, 2025 •

edited

Loading

ion-elgreco commented Mar 4, 2025

roeap commented Mar 4, 2025 •

edited

Loading

ion-elgreco commented Mar 7, 2025 •

edited

Loading

roeap commented Mar 7, 2025

ion-elgreco commented Mar 7, 2025 •

edited

Loading

roeap commented Mar 7, 2025 •

edited by ion-elgreco

Loading

[tracking] Kernelize! #3298

[tracking] Kernelize! #3298

Comments

roeap commented Mar 4, 2025 • edited Loading

Description

Motivation

Execution

Challenges

Related Work

ion-elgreco commented Mar 4, 2025

roeap commented Mar 4, 2025 • edited Loading

ion-elgreco commented Mar 7, 2025 • edited Loading

roeap commented Mar 7, 2025

ion-elgreco commented Mar 7, 2025 • edited Loading

roeap commented Mar 7, 2025 • edited by ion-elgreco Loading

roeap commented Mar 4, 2025 •

edited

Loading

roeap commented Mar 4, 2025 •

edited

Loading

ion-elgreco commented Mar 7, 2025 •

edited

Loading

ion-elgreco commented Mar 7, 2025 •

edited

Loading

roeap commented Mar 7, 2025 •

edited by ion-elgreco

Loading