Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: update DataFusion to 45.0 and Arrow to 54.1 #3503

Merged
merged 21 commits into from
Mar 7, 2025

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Mar 3, 2025

This PR updates DataFusion to 45.0 and Arrow to 54.1.

The update to Arrow required updating PyO3 for the python package. This had a series of breaking changes to their API.

Copy link

github-actions bot commented Mar 3, 2025

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@timsaucer timsaucer changed the title Bump DataFusion to 45.0 feat: Update DataFusion to 45.0 Mar 3, 2025
@github-actions github-actions bot added the enhancement New feature or request label Mar 3, 2025
@codecov-commenter
Copy link

codecov-commenter commented Mar 3, 2025

Codecov Report

Attention: Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.

Project coverage is 78.49%. Comparing base (9888678) to head (9a0ef7c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/write.rs 70.00% 3 Missing ⚠️
rust/lance/src/dataset.rs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3503      +/-   ##
==========================================
- Coverage   78.49%   78.49%   -0.01%     
==========================================
  Files         253      253              
  Lines       94542    94543       +1     
  Branches    94542    94543       +1     
==========================================
- Hits        74213    74211       -2     
- Misses      17319    17327       +8     
+ Partials     3010     3005       -5     
Flag Coverage Δ
unittests 78.49% <87.09%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timsaucer timsaucer marked this pull request as draft March 4, 2025 17:59
@github-actions github-actions bot added the python label Mar 5, 2025
@timsaucer timsaucer changed the title feat: Update DataFusion to 45.0 feat: update DataFusion to 45.0 and Arrow to 54.1 Mar 5, 2025
@eddyxu
Copy link
Contributor

eddyxu commented Mar 5, 2025

We need to make this as a breaking change.

@timsaucer
Copy link
Contributor Author

@eddyxu I tried adding a breaking change text in a commit to see if the github actions bot would add the tag, but that didn't seem to make an impact. Can you add the tag or do I need to put something in the description?

@eddyxu eddyxu changed the title feat: update DataFusion to 45.0 and Arrow to 54.1 feat!: update DataFusion to 45.0 and Arrow to 54.1 Mar 5, 2025
@timsaucer timsaucer force-pushed the feat/datafusion-45 branch from 8136543 to 4923988 Compare March 5, 2025 19:09
@timsaucer
Copy link
Contributor Author

Looks like CI kept failing because there were changes on main that I didn't have yet, so rebased and corrected them.

@timsaucer timsaucer marked this pull request as ready for review March 5, 2025 20:28
@timsaucer timsaucer force-pushed the feat/datafusion-45 branch from 9a0ef7c to 75f4222 Compare March 6, 2025 17:41
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for taking this on. I had made an attempt earlier and gotten daunted by all the python changes.

I have a few minor suggestions but overall things look good

Comment on lines +316 to +317
let old_id: String = ob.getattr("old_id")?.extract()?;
let new_id: String = ob.getattr("new_id")?.extract()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that extract works here I wonder? I feel like all the other string extract were changed to downcast

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I recognize the PyO3 changes were a lot.

I would like to not have to modify WriteDestination like that. It makes the changes rather large. I found an alternative solution using pyo3's PyBackedStr, and have put up a commit here: rerun-io@aa14dd3

@timsaucer
Copy link
Contributor Author

Thank you both for the reviews, and especially for that commit. That is a much nicer solution than I came up with. Can one of you run the workflow again?

Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last tiny nit and then I'm good. Thanks again for taking this task on :)

Co-authored-by: Weston Pace <[email protected]>
@westonpace
Copy link
Contributor

The linux-arm failure is a timeout. We can ignore for now.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on those changes. Great work here.

@wjones127 wjones127 merged commit 3b9d546 into lancedb:main Mar 7, 2025
26 of 27 checks passed
@westonpace
Copy link
Contributor

Sorry, I just noticed the PR title was feat!. Did this PR actually contain any breaking changes? I don't think it did.

@eddyxu
Copy link
Contributor

eddyxu commented Mar 8, 2025

Bumping arrow counts as breaking change iiuc?

@timsaucer
Copy link
Contributor Author

Also, anyone who uses the python crate as a dependency will be forced to update pyo3 which introduces many deprecations.

@westonpace
Copy link
Contributor

Bumping arrow counts as breaking change iiuc?

Also, anyone who uses the python crate as a dependency will be forced to update pyo3 which introduces many deprecations.

Fair points. Some of our APIs take arrow data directly so our users would need to stay in lock-step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants