-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor issue parser - lifecycle methods, use pydantic models, support reviewer lists #174
Conversation
…dels, do staged parsing and isolate special casing by treating each key separately, add support for reviewers as lists
i guess the thing that would be nice here re: pyOpenSci/pyopensci.github.io#434 is single-sourcing the template: like ideally we would give the human readable names as aliases or title or something and make a classmethod for the ReviewModel that generates that review template, so changing the review template would be a change to the ReviewModel and there is no gap between them where it's possible to forget. Then we could have example reviews from each time the template changes to make sure we still support the current and all prior review checklist formats. |
it looks like a number of fields were marked as optional in |
it would b nice to have something like VCR to record all the website updates and put that in pytest for local testing bc rn i'm just using the CI like pytest since i don't have the GITHUB_TOKEN |
ok there ya have it. since the model has a ton of optional params, i can't be sure that things are actually still working, but that's enough for one day. hopefully saves us some headaches in the future and makes it a little easier to tweak the reviewer template/speeds adoption of the pyos bot by making it easier to handle parsing multiple forms of review template |
@sneakers-the-rat what would that look like. i've used i think VCS (casettes?) in a package where we did api things but someone else set it up.. |
@@ -35,27 +34,21 @@ def main(): | |||
# Get all issues for approved packages - load as dict | |||
# TODO: this doesn't have to be in process issues at all. it could fully | |||
# Call the github module | |||
issues = process_review.return_response() | |||
accepted_reviews = process_review.parse_issue_header(issues, 45) | |||
issues = process_review.get_issues() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above - if you rebase from main (i am pretty sure_ i fiex this
labels=["6/pyOS-approved 🚀🚀🚀"]
-->
labels=["6/pyOS-approved"]
this is why we had a magical deleted file in our last pr 🙈 i removed the emojis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged in main, got the new label!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!! now we are no longer deleting all of the packages ✨ 😆
@sneakers-the-rat this is working really well. There is one other issue. when i run update-review-teams it used to populate the github data from each user's github profile - specifically name given we always have ghusername. but now it doesn't seem to be popoulating that. Given this is a big refactor should we handle that separtely? or is this a small enough thing to fix in this pr (along with the other tiny thing that is the label name (i commented on this). ![]() thank you for this. I refactored this code from a much bigger mess to pydantic last year and i learned a TON. but i knew it was still really hard to maintain / debug. My brain isn't always great about knowing how to best refactor things !! so i really appreciate this pr. the test issue idea in the _data folder is so smart. We can then add other examples of issues to process in the future as things come up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am approving this.
We can fix the things i noted here or elsewhere.
also happy to provide merge / admin permissions. 🙌🏻
src/pyosmeta/models/base.py
Outdated
editor: dict[str, str | None] = {} | ||
reviewer_1: dict[str, str | None] = {} | ||
reviewer_2: dict[str, str | None] = {} | ||
editor: ReviewUser = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the eic field here as well? For now allow it to default to None
because we need to update all older reviews to have that field. It should exist for newer issues submitted. The goal is to make user our EiC get credit for their work in addition to the editors ✨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it! cfdcad3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing. thank you!!
@all-contributors please add @sneakers-the-rat for code, review |
I've put up a pull request to add @sneakers-the-rat! 🎉 |
from pydantic import AnyUrl, BaseModel, ConfigDict, Field | ||
|
||
|
||
class User(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sneakers-the-rat we do this in stravalib but have a bot to grab things. Can you answer a few questions for me about this (it's awesome and i want to learn more & document it correctly).
1 should we add datamodel-codegen to optional dependencies?
2. where is issue_schema .json coming from? with strava there is a swagger.json file that we grab online.
3. did you manually trim things to map to our workflow here?
thank you again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lwasser sorry I missed this question -
- nah, it was just a one-off to create the model from the JSON schema, and then I heavily modified and trimmed it anyway.
- the URL I got it from is in the docstring :)
- yep! Basically I took out all the fields that we weren't/probably wouldnt use and set
extra = "allow"
in the model config so that those extra fields would be there/wouldnt throw a validation error, but youd get a static analysis warning if you tried to access them (hinting they should be added to the model formally). I also simplified the variousUser
models - each of the blank models that inherit from theUser
class now were just duplicates. And I also made some fields optional I blv to let the tests run. But basically those models in thegithub
module shouldnt be changed (unless github's API changes) - we just want those to reflect what comes back from the API, and the models we use internally in this package are separate.
hmm mysterious. seems worth it to fix. let me see |
@lwasser here try this - 8ff2ac3 looks like the |
Amazing. thank you. yes that CLI script is really really hacky. i really fought with how to create it in a easy-to-maintain and simple way. This is all running locally now. i'm going to merge and also i'm going to bump your permissions here Jonny. The one item i'll open an issue about is the data-code-generator piece. I just want to document how that was done. thank you so so much! |
Fix: #173
Wanted to do this for awhile, but i refactored the issue parser so it should be a bit easier to make modifications to it going forward.
Previously the field parsing logic was split in a bunch of different places, but this brings it into a linear set of lifecycle methods for parsing a review issue:
Currently:
comment_to_list
split whole issue body into a list of lists, split by\n
and:
. Separate out package nameget_issue_meta
, iterate over list of lists with a fixed number of lines, processing key into snake_case, and handling special cases for different fieldsclean_date_accepted_key
a no op as far as i can tell?This PR:
Issue
object (created from the github API's json schema) or a string (mostly for testing)_split_header
: Split the header from the body of the issue_header_as_dict
: Cast the header as a dictionary, preprocessing keys into snake case_preprocess_meta
: Apply any preprocessing steps to header metadata, split out into separable preprocessing methods (eg in this case by gathering old reviewer forms in_combine_reviewers
)_parse_field
: Apply any special field parsing logic here, similar toget_issue_meta
but only handling a single field at a time_add_issue_metadata
- add any requested fields from theIssue
model_postprocess_meta
- do any postprocessing steps like gathering categories from the issue bodyReviewModel
This gives clear places to add parsing logic, in a preprocessing, field processing, or postprocessing step. It also uses pydantic models instead of anonymous lists of lists, dicts of dicts, etc. so we can know what to expect at each of the stages. I noticed that a few of the methods were sort of awkward because they expected to operate over the whole list-of-lists of the
'\n'
and': '
split issue body, so this separates things out a bit so they should be easier to maintain and test.also:
Reviewer 1:
,Reviewer 2:
... andReviewers:
style specification of reviewers---
separator to split the header from the body first.return_response
toget_issues
bc i couldn't tell what was supposed to come out of it andparse_issue_header
toparse_issues
because it's plural (and created aparse_issue
method to process just one)tests/data
directory for sample inputs so tests can be a bit less cluttered by defining them inline.env
a bunch of tests failed expecting it to be there, and with one that test would fail because it was present but the monkeypatch only deleted the env variable but not the load dotenv function. I assume this is the case because the CI tests set aGITHUB_TOKEN
env variable but not a.env
filepytest-mock
wasn't specified, not sure where that gets installed in CI but it wasn't when running locally.