First full package review #76

davidsantiagoquevedo · 2024-09-23T09:28:51Z

Hello,

This PR is for the full package review of {vaccineff 0.0.4} before our submission to CRAN. The package has undergone significant refactoring since version 0.0.1 to enhance usability and simplify the estimation pipeline, based on feedback from user testing. Many functions are now internal, and {linelist} objects have been incorporated to improve data handling and reduce redundancy in external function inputs. Please see the changelog in NEWS.MD for more details. We have also improved the documentation to guide users through vaccine effectiveness estimation and the definition of vaccineff_data objects. This is a crucial aspect that we would like to assess in this PR.

This is necessary for the logo to be detected by pkgdown and r-universe

…me is created

… from generic methods

Co-authored-by: Chris Hartgerink <[email protected]>

davidsantiagoquevedo · 2024-10-07T14:08:18Z

Thanks @davidsantiagoquevedo for all your hard work on this package! I know it's been a while in the making, and your effort shows 😊

I left some items for your considerations in the review. More general comments, are:

Variable/object names are frequently opaque or with typo's. It may be helpful to introduce more clarity into object naming. Examples:

imm_limit, delta_imm where elsewhere immunization_date is used

tf_follow_up v t_follow_up_at

thershold

balace_all

Naming of the test files does not line up with the R/ files, which makes it harder to see which functions have tests and which do not

For code consistency, it may be helpful to run styler::style_pkg()

There are quite some nested if/else statements. These can be hard to understand and if at all possible, I would recommend disentangling that logic to help keep things maintainable.

I hope this helps in the further refinement of the package! I did not at this time look at the tests, as there was already quite a lot of (new) code to consider. How do you regard the state of the testsuite in the current form?

Hi @chartgerink. Thanks for this in-depth review. I opened issues #78, #79, #80, #81, #82, #83 and #84 to address the suggestions that will imply major changes in the package. I marked the rest of the comments as solved and committed the solution in this PR.
I just left two open discussions for your review. Can you please have a look at them?

Please ensure all the vignettes adhere to the markdown lint standard. I found quite some out of the ordinary markdown throughout the vignettes, for example:

I am happy to submit an automated fix if you'd like. (npx markdownlint-cli vignettes/* --fix).
(#76 (comment))

davidsantiagoquevedo · 2024-10-11T09:41:53Z

Great work on the package! Just a few general comments:

Try to use the styler for better formatting.

Some examples took a little longer than expected (effectiveness, make_vaccineff_data, plot_coverage). Might be worth looking into it if. Maybe you can use a shorter dataset for the examples, or use a \donttest instruction.

Other minor comments are directly on the review. Looking forward to seeing it on CRAN soon!

Hi @jd-otero. Thanks for your review! You raised important points to improve the package! I will address the suggestions that imply major changes in issues #80, #84 and #88. The rest of the suggestions were committed in this PR.

We will reduce the size of the dataset (issue #82) to improve the accuracy of the results because it introduces an observational bias that the package cannot deal with. This will decrease the computation time of the examples.

davidsantiagoquevedo · 2024-10-12T12:52:57Z

Really nice work @davidsantiagoquevedo and collaborators!

Overall, the package is well designed, has a simple interface and is thoroughly documented.

I've left some comments on the files. The functions seem well structured, I didn't have time to check all of the functions and can do this in another review in the future.

Some additional comments:

You have some cohort data in /data but you don't have a /data-raw folder giving this data provenance. It would be good to show how the data was generated.

The documentation for the cohortdata in R/cohortdata.R is also incomplete not stating where the data is from.

The cohortdata is stored a <data.table> object. But the {data.table} package is not a dependency of {vaccineff}, so when the package is loaded, {data.table} does not get loaded. This has some small effects such as printing:
library(vaccineff)
head(cohortdata)
#>         id sex age death_date death_other_causes vaccine_date_1 vaccine_date_2
#> 1 afade1b2   F  37       <NA>               <NA>           <NA>           <NA>
#> 2 556c8c76   M  19       <NA>               <NA>           <NA>           <NA>
#> 3 04edf85a   M  50       <NA>               <NA>           <NA>           <NA>
#> 4 7e51a18e   F   8       <NA>               <NA>           <NA>           <NA>
#> 5 c5a83f56   M  66       <NA>               <NA>           <NA>           <NA>
#> 6 7f675ec3   M  29       <NA>               <NA>     2044-04-09     2044-04-30
#>   vaccine_1 vaccine_2
#> 1      <NA>      <NA>
#> 2      <NA>      <NA>
#> 3      <NA>      <NA>
#> 4      <NA>      <NA>
#> 5      <NA>      <NA>
#> 6    BRAND1    BRAND1
library(data.table)
head(cohortdata)
#>          id    sex   age death_date death_other_causes vaccine_date_1
#>      <char> <char> <int>     <Date>             <Date>         <Date>
#> 1: afade1b2      F    37       <NA>               <NA>           <NA>
#> 2: 556c8c76      M    19       <NA>               <NA>           <NA>
#> 3: 04edf85a      M    50       <NA>               <NA>           <NA>
#> 4: 7e51a18e      F     8       <NA>               <NA>           <NA>
#> 5: c5a83f56      M    66       <NA>               <NA>           <NA>
#> 6: 7f675ec3      M    29       <NA>               <NA>     2044-04-09
#>    vaccine_date_2 vaccine_1 vaccine_2
#>            <Date>    <char>    <char>
#> 1:           <NA>      <NA>      <NA>
#> 2:           <NA>      <NA>      <NA>
#> 3:           <NA>      <NA>      <NA>
#> 4:           <NA>      <NA>      <NA>
#> 5:           <NA>      <NA>      <NA>
#> 6:     2044-04-30    BRAND1    BRAND1
Created on 2024-09-27 with reprex v2.1.0

but additionally it may mean that the efficiency gains of using <data.table> objects are not used by the user unless they have loaded {data.table} themselves. My recommendation would just be to resave the cohortdata as a <data.frame> which seems to me the simplest solution.

The documentation of cohortdata say that it is an object of class cross but that doesn't seem to be the case

Is there a reason cohortdata is stored both in /data and /inst/extdata. I assume you only need it /data if you want users to access it. Apologies if I'm missing something here.

Tangential question: Do you have any simulation functionality. This seems like a good area to develop {simulist} to simulate vaccination that can then be used to test {vaccineff}, we can discuss this once you've finished the review and released the package.

Hi @joshwlambert. Thanks for your review! The points you raised will be very beneficial for improving the structure of the package :) I agree with the suggestions on the data structure. I will address it in #82. The rest of your suggestions will be implemented in #78, #81, #85, #88, #89.

Expanding {simulist} for vaccination data is a really good idea! We have been discussing internally how to test the package with theoretical datasets and {simulist} came to our mind as well for this task. It would be nice to have a discussion about this in a few weeks.

davidsantiagoquevedo and others added 30 commits June 1, 2024 13:23

fix: remove factor conversion in vaccine status

b569c57

refac: test for cohort_match_ auxiliary function

57262b6

tests for effectiveness

1bf51ba

refac: input check using checkmate

2a02031

refac: changed assert_class by test_class

5fc7608

fix: changed sapply for vapply gp()

8906f05

typo and lintr

fb80b0e

fix: snap for summary of effectiveness

6ef30df

feat: added tests for match_cohort() static

2616bee

roxygen2: updated and generated documentation of functions

4ce9e3b

fix: example for effectiveness

1d914ff

fix: removed unnecesary head() and negative time check

eb954be

fix: added stats prefix where missing

c27d32b

roxygen: updated example in documentation

fd6d222

feat: updated example in getstart for new estimation pipeline

e26d24d

Use standard (& expected) name for logo

a42aaea

This is necessary for the logo to be detected by pkgdown and r-universe

Render favicon for pkgdown

50d51a1

fix: input check for vacc_name_col and df where immunizing_vaccine_na…

5b03078

…me is created

feat: tests for make_immunization

dd7a6ae

added open arguments for generic method dataset

9aa35c8

feat: test for match_summary comparing only two datasets

0851b37

test for generic method

b64f9ea

roxygen

ea834a9

proofread and check in methodology, details of example

a2c4eec

fix: spelling checks

d5f988b

fix: added missing documentation of parameters and renamed parameters…

c28e5f7

… from generic methods

lintr: replaced expect_equal by expect_identical

4d4fa35

fix: missing parameter documentation

95cdde2

fix: parameters of generic functions

cd0fdac

roxygen: updated documentation

53305ba

davidsantiagoquevedo and others added 2 commits October 1, 2024 12:11

Update LICENSE.md

c4d93a6

Co-authored-by: Chris Hartgerink <[email protected]>

Update LICENSE

38c107f

Co-authored-by: Chris Hartgerink <[email protected]>

davidsantiagoquevedo mentioned this pull request Oct 1, 2024

centralize validations in handler_validations #81

Open

davidsantiagoquevedo and others added 9 commits October 1, 2024 12:38

style documentation set_status()

f63454f

Co-authored-by: Chris Hartgerink <[email protected]>

typo on control message

f35305b

Co-authored-by: Chris Hartgerink <[email protected]>

style: last line vignettes/vaccineff.Rmd

2525222

Co-authored-by: Chris Hartgerink <[email protected]>

typo: documentation

bf7c8bb

Co-authored-by: Chris Hartgerink <[email protected]>

fix: removed reference to example data

0a4ed20

typo: consistency

b5c673f

fix: typo thershold -> threshold

000a1c8

Co-authored-by: Chris Hartgerink <[email protected]>

Merge remote-tracking branch 'refs/remotes/origin/review' into review

2a55f76

fix: update threshold within function

04ffac7

This was referenced Oct 7, 2024

rename effectiveness object as vaccineff #83

Closed

create loglog plot from Cox model estimation #84

Closed

refac: definition of linelist object

810f6c1

Co-authored-by: Chris Hartgerink <[email protected]>

style: usethis::use_tidy_description()

6641ed0

davidsantiagoquevedo mentioned this pull request Oct 7, 2024

integrate plot_coverage into plot method for vaccineff_data #85

Closed

This was referenced Oct 8, 2024

Update markdown lint issue with headings #86

Merged

Reduce duplicate code in rematching procedure #87

Merged

davidsantiagoquevedo mentioned this pull request Oct 11, 2024

update packagetemplate #88

Open

This was referenced Oct 11, 2024

refac summary functions #89

Open

refactoring tests and R files #78

Open

reduce example data to improve permormance and accuracy of examples #82

Open

davidsantiagoquevedo merged commit c2e0618 into empty Oct 18, 2024

davidsantiagoquevedo deleted the review branch October 18, 2024 16:15

davidsantiagoquevedo mentioned this pull request Oct 18, 2024

Minor changes from first package review #92

Merged

davidsantiagoquevedo mentioned this pull request Nov 2, 2024

Create loglog plot using prediction from Cox model #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First full package review #76

First full package review #76

davidsantiagoquevedo commented Sep 23, 2024

davidsantiagoquevedo commented Oct 7, 2024

davidsantiagoquevedo commented Oct 11, 2024

davidsantiagoquevedo commented Oct 12, 2024

First full package review #76

First full package review #76

Conversation

davidsantiagoquevedo commented Sep 23, 2024

davidsantiagoquevedo commented Oct 7, 2024

davidsantiagoquevedo commented Oct 11, 2024

davidsantiagoquevedo commented Oct 12, 2024