Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First full package review #76

Merged
merged 632 commits into from
Oct 18, 2024
Merged

First full package review #76

merged 632 commits into from
Oct 18, 2024

Conversation

davidsantiagoquevedo
Copy link
Member

Hello,

This PR is for the full package review of {vaccineff 0.0.4} before our submission to CRAN. The package has undergone significant refactoring since version 0.0.1 to enhance usability and simplify the estimation pipeline, based on feedback from user testing. Many functions are now internal, and {linelist} objects have been incorporated to improve data handling and reduce redundancy in external function inputs. Please see the changelog in NEWS.MD for more details. We have also improved the documentation to guide users through vaccine effectiveness estimation and the definition of vaccineff_data objects. This is a crucial aspect that we would like to assess in this PR.

davidsantiagoquevedo and others added 30 commits June 1, 2024 13:23
This is necessary for the logo to be detected by pkgdown and r-universe
Co-authored-by: Chris Hartgerink <[email protected]>
Co-authored-by: Chris Hartgerink <[email protected]>
@davidsantiagoquevedo
Copy link
Member Author

Thanks @davidsantiagoquevedo for all your hard work on this package! I know it's been a while in the making, and your effort shows 😊

I left some items for your considerations in the review. More general comments, are:

  • Variable/object names are frequently opaque or with typo's. It may be helpful to introduce more clarity into object naming. Examples:

    • imm_limit, delta_imm where elsewhere immunization_date is used
    • tf_follow_up v t_follow_up_at
    • thershold
    • balace_all
  • Naming of the test files does not line up with the R/ files, which makes it harder to see which functions have tests and which do not

  • For code consistency, it may be helpful to run styler::style_pkg()

  • There are quite some nested if/else statements. These can be hard to understand and if at all possible, I would recommend disentangling that logic to help keep things maintainable.

I hope this helps in the further refinement of the package! I did not at this time look at the tests, as there was already quite a lot of (new) code to consider. How do you regard the state of the testsuite in the current form?

Hi @chartgerink. Thanks for this in-depth review. I opened issues #78, #79, #80, #81, #82, #83 and #84 to address the suggestions that will imply major changes in the package. I marked the rest of the comments as solved and committed the solution in this PR.
I just left two open discussions for your review. Can you please have a look at them?

Please ensure all the vignettes adhere to the markdown lint standard. I found quite some out of the ordinary markdown throughout the vignettes, for example:

I am happy to submit an automated fix if you'd like. (npx markdownlint-cli vignettes/* --fix).
(#76 (comment))

Please ensure all the vignettes adhere to the markdown lint standard. I found quite some out of the ordinary markdown throughout the vignettes, for example:

I am happy to submit an automated fix if you'd like. (npx markdownlint-cli vignettes/* --fix).
(#76 (comment))

@davidsantiagoquevedo
Copy link
Member Author

Great work on the package! Just a few general comments:

  • Try to use the styler for better formatting.
  • Some examples took a little longer than expected (effectiveness, make_vaccineff_data, plot_coverage). Might be worth looking into it if. Maybe you can use a shorter dataset for the examples, or use a \donttest instruction.

Other minor comments are directly on the review. Looking forward to seeing it on CRAN soon!

Hi @jd-otero. Thanks for your review! You raised important points to improve the package! I will address the suggestions that imply major changes in issues #80, #84 and #88. The rest of the suggestions were committed in this PR.

We will reduce the size of the dataset (issue #82) to improve the accuracy of the results because it introduces an observational bias that the package cannot deal with. This will decrease the computation time of the examples.

@davidsantiagoquevedo
Copy link
Member Author

Really nice work @davidsantiagoquevedo and collaborators!

Overall, the package is well designed, has a simple interface and is thoroughly documented.

I've left some comments on the files. The functions seem well structured, I didn't have time to check all of the functions and can do this in another review in the future.

Some additional comments:

  • You have some cohort data in /data but you don't have a /data-raw folder giving this data provenance. It would be good to show how the data was generated.
  • The documentation for the cohortdata in R/cohortdata.R is also incomplete not stating where the data is from.
  • The cohortdata is stored a <data.table> object. But the {data.table} package is not a dependency of {vaccineff}, so when the package is loaded, {data.table} does not get loaded. This has some small effects such as printing:
library(vaccineff)
head(cohortdata)
#>         id sex age death_date death_other_causes vaccine_date_1 vaccine_date_2
#> 1 afade1b2   F  37       <NA>               <NA>           <NA>           <NA>
#> 2 556c8c76   M  19       <NA>               <NA>           <NA>           <NA>
#> 3 04edf85a   M  50       <NA>               <NA>           <NA>           <NA>
#> 4 7e51a18e   F   8       <NA>               <NA>           <NA>           <NA>
#> 5 c5a83f56   M  66       <NA>               <NA>           <NA>           <NA>
#> 6 7f675ec3   M  29       <NA>               <NA>     2044-04-09     2044-04-30
#>   vaccine_1 vaccine_2
#> 1      <NA>      <NA>
#> 2      <NA>      <NA>
#> 3      <NA>      <NA>
#> 4      <NA>      <NA>
#> 5      <NA>      <NA>
#> 6    BRAND1    BRAND1
library(data.table)
head(cohortdata)
#>          id    sex   age death_date death_other_causes vaccine_date_1
#>      <char> <char> <int>     <Date>             <Date>         <Date>
#> 1: afade1b2      F    37       <NA>               <NA>           <NA>
#> 2: 556c8c76      M    19       <NA>               <NA>           <NA>
#> 3: 04edf85a      M    50       <NA>               <NA>           <NA>
#> 4: 7e51a18e      F     8       <NA>               <NA>           <NA>
#> 5: c5a83f56      M    66       <NA>               <NA>           <NA>
#> 6: 7f675ec3      M    29       <NA>               <NA>     2044-04-09
#>    vaccine_date_2 vaccine_1 vaccine_2
#>            <Date>    <char>    <char>
#> 1:           <NA>      <NA>      <NA>
#> 2:           <NA>      <NA>      <NA>
#> 3:           <NA>      <NA>      <NA>
#> 4:           <NA>      <NA>      <NA>
#> 5:           <NA>      <NA>      <NA>
#> 6:     2044-04-30    BRAND1    BRAND1

Created on 2024-09-27 with reprex v2.1.0

but additionally it may mean that the efficiency gains of using <data.table> objects are not used by the user unless they have loaded {data.table} themselves. My recommendation would just be to resave the cohortdata as a <data.frame> which seems to me the simplest solution.

  • The documentation of cohortdata say that it is an object of class cross but that doesn't seem to be the case
  • Is there a reason cohortdata is stored both in /data and /inst/extdata. I assume you only need it /data if you want users to access it. Apologies if I'm missing something here.

Tangential question: Do you have any simulation functionality. This seems like a good area to develop {simulist} to simulate vaccination that can then be used to test {vaccineff}, we can discuss this once you've finished the review and released the package.

Hi @joshwlambert. Thanks for your review! The points you raised will be very beneficial for improving the structure of the package :) I agree with the suggestions on the data structure. I will address it in #82. The rest of your suggestions will be implemented in #78, #81, #85, #88, #89.

Expanding {simulist} for vaccination data is a really good idea! We have been discussing internally how to test the package with theoretical datasets and {simulist} came to our mind as well for this task. It would be nice to have a discussion about this in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants