Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ♻️ convert from PlantUML into pseudocode #1019

Merged
merged 2 commits into from
Jan 30, 2025

Conversation

lwjohnst86
Copy link
Member

Description

This PR converts the .puml file into Python pseudocode.

This PR needs an in-depth review.

Checklist

  • Updated documentation

@lwjohnst86 lwjohnst86 requested a review from a team as a code owner January 28, 2025 15:30
martonvago
martonvago previously approved these changes Jan 29, 2025
Comment on lines +6 to +8
extracts properties from the file into a `ResourceProperties` object. This
function is often followed by `edit_resource_properties()` to fill in any
remaining missing fields, like the `path` property field. Usually, you use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We'll have to think about how this works exactly, but it would be nice if the path was assigned automatically when linking the resource (properties) to a package (properties).)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, agreed. I think once we use it in the guides/examples, we'll have a better idea for it.

data_path: The path to a raw data file of a supported format.

Returns:
Outputs a `ResourceProperties` object. Use `write_resource_properties()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll just have to make sure the flow still works if we do #995

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for sure, I already have it in my mind to refactor those once we've discussed/decided! 😀

check_is_supported_format(data_path)
# Make use of frictionless here?
properties = extract_properties_from_file(data_path)
return check_resource_properties(properties)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And probably ignore some expected errors like a missing path.
Or maybe there is also an argument for allowing bad resource properties to be returned by this function in general? If, for whatever reason, the extraction doesn't yield a correct set of properties, it is still useful to me as a user to have something I can fix/edit and not have to type it all out by hand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, that's a very good point! But yea, ignore some things. Maybe check some basic things that should definitely be there after extracting.

"""
check_is_file(data_path)
check_is_supported_format(data_path)
# Make use of frictionless here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good question!

If we're okay with using frictionless, we could get around the task of writing a converter from their classes to our properties classes by having them output a dict and feeding that dict to our ResourceProperties.from_dict. On paper it sounds great!

The other, unquestionably more arduous, road would be to turn the data into a pandas dataframe (polars not yet supported for this), use pandera to extract the metadata, and convert that output to ResourceProperties.

@lwjohnst86 lwjohnst86 merged commit 2895ebc into main Jan 30, 2025
1 check passed
@lwjohnst86 lwjohnst86 deleted the docs/pseudocode-for-extract-properties branch January 30, 2025 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants