Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: What to do in event of no file_layout for an ArtifactSource #196

Open
graza-io opened this issue Feb 7, 2025 · 1 comment
Open
Assignees
Labels

Comments

@graza-io
Copy link
Contributor

graza-io commented Feb 7, 2025

Context

If no default file_layout is provided by the table and no file_layout is provided by the configuration, currently all items will be obtained.

I.e: This will pull all objects from the root onwards for an aws_s3_bucket source.

Whilst this is the behaviour we would expect; is it correct?

Additional Notes:

  • What should we do if patterns are provided but no file_layout exists?
  • Should we warn/fail if no file_layout and no default file_layout?
  • If no file_layout is provided, we get no date information thus collection_state becomes large as it stores all items.
  • This could be problematic if numerous file types are within any of the paths walked, as we'd expect them to match the designated mapper, etc.
@graza-io
Copy link
Contributor Author

Notes:

  • Build a distinct list of tp_source_locations (for files which data was removed) and passing these to the collect request
  • ^ Only do above when no time information is available
    • Does the collection need to be more aware of granularity etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants