-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(Source-S3): Use dataframe processing in place of singleton record operations (polars) #44194
Draft
aaronsteers
wants to merge
70
commits into
master
Choose a base branch
from
aj/source-s3/dataframe-ops
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
726a722
initial code scaffold for dataframe processing
aaronsteers faa8517
drive-by-fix: typo in type hint
aaronsteers 9fc9cf7
Merge remote-tracking branch 'origin/master' into aj/source-s3/datafr…
aaronsteers 50207aa
drive-by-fix: missing .gitignore for test artifact
aaronsteers 392c43b
mark method as abstract
aaronsteers f8a7b7c
`poetry add polars` (TODO: Move to extra)
aaronsteers 0a00d1e
implement `parse_records_to_dataframes()` for jsonl file type
aaronsteers cec132a
add config option `FileBasedStreamConfig.bulk_mode`
aaronsteers 09e8bf7
apply new enum class
aaronsteers c78a576
checkpoint: basic plumbing in place
aaronsteers 1dab0c4
resolve version conflicts in `source-s3`
aaronsteers 63a8dc2
minor fixes
aaronsteers 2d3031d
ability to step-debug "full refresh" acceptance tests
aaronsteers a7b1989
make polars part of the file-based extra
aaronsteers e8b4a2d
fix lock check in airbyte-ci
aaronsteers 92733e9
script to download secret config
aaronsteers cbb8777
fix extra args
aaronsteers 07f7929
cleanup secret fetch script
aaronsteers d0da02a
checkpoint: jsonl sync running successfully
aaronsteers 87ea175
tidy secrets install script using latest pyairbyte features
aaronsteers 82568e0
tidy some more
aaronsteers f8093ce
make helper script slightly more reusable
aaronsteers a487c13
use local CDK in poetry
aaronsteers 3a6305d
add read_to_buffer stub
aaronsteers 77120ab
improve handling
aaronsteers 1ca2ead
add perftest
aaronsteers a92fa34
chore: perf tests
aaronsteers feac74a
add code to stream partition class
aaronsteers ed0032b
lint fixes
aaronsteers d1abb84
default to lazy
aaronsteers 1e2e657
add `out` override arg
aaronsteers b8ff8cd
Merge remote-tracking branch 'origin/master' into aj/source-s3/datafr…
aaronsteers b8c0b11
git: hide python venvs
aaronsteers 2b0e986
update perf test script
aaronsteers 9eece3f
move perf test script to new poetry project
aaronsteers 42ed3ea
Merge remote-tracking branch 'origin/master' into aj/source-s3/datafr…
aaronsteers 6744216
re-lock poetry in cdk and connector
aaronsteers b982e50
working sync and perf tests
aaronsteers 1589768
chore: misc pr clean up
aaronsteers 55e1f46
update perf test script
aaronsteers e00857f
fix: add missing file-based columns
aaronsteers 1c5b7ea
chore: clean up pr
aaronsteers 78a2a99
chore: clean up defaults
aaronsteers faa9068
clean up perf test script
aaronsteers 952d90d
improve default bulk mode handling
aaronsteers 209caf7
add bulk mode logging
aaronsteers 7112041
improve bulk mode resolve
aaronsteers 644861f
tidy
aaronsteers 0901763
tidy
aaronsteers 486fdab
tidy jsonl parser comments
aaronsteers cd5a7dd
rename variable
aaronsteers d7c7af7
fix type hint
aaronsteers 0d80e81
update perf-test script
aaronsteers a94187d
chore: update comment
aaronsteers 015fd90
delete unused
aaronsteers 4cec7a9
multiple fixes, refactoring, including change to concurrent cursor fo…
aaronsteers 60f843c
chore: add comment
aaronsteers 6a20761
chore: add CLI entrypoint
aaronsteers 5ebf102
update tests
aaronsteers 87e43c7
update poetry projects and lock files
aaronsteers c930f69
remove very slow tests from acceptance tests
aaronsteers b2cd2c5
minor format stuff
aaronsteers 51d511c
clean up files
aaronsteers 9526835
update comment
aaronsteers da92440
update poetry
aaronsteers b946e43
update perf test
aaronsteers 1919b8d
update cursor logic
aaronsteers 2bcbc98
update concurrency
aaronsteers 6aba77e
update poetry
aaronsteers 5478be1
buffered reads
aaronsteers File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
venv | ||
.venv | ||
.venv-* | ||
.gradle | ||
.idea | ||
*.iml | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this will be surfaced to users, it would be nice to give them more information about how to choose. If we dynamically select whether we use bulk mode if a user selects
AUTO
, we should also consider telling them the criteria we're using.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - if this is only available for jsonl to start it should probably be in the
JsonlFormat
file.