Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce example data to improve permormance and accuracy of examples #82

Open
davidsantiagoquevedo opened this issue Oct 7, 2024 · 0 comments
Assignees

Comments

@davidsantiagoquevedo
Copy link
Member

davidsantiagoquevedo commented Oct 7, 2024

During user test #3, it was noted that the estimations are biased due to the inclusion of several age groups without outcomes. This issue is open to improving the accuracy of the examples.

A few issues were also raised regarding the performance of examples and the data structure during the full package review in #76 (see below). These also will be addressed here.

  • You have some cohort data in /data but you don't have a /data-raw folder giving this data provenance. It would be good to show how the data was generated.
  • The documentation for the cohortdata in R/cohortdata.R is also incomplete not stating where the data is from.
  • The cohortdata is stored a <data.table> object. But the {data.table} package is not a dependency of {vaccineff}, so when the package is loaded, {data.table} does not get loaded. This has some small effects such as printing:
library(vaccineff)
head(cohortdata)
#>         id sex age death_date death_other_causes vaccine_date_1 vaccine_date_2
#> 1 afade1b2   F  37       <NA>               <NA>           <NA>           <NA>
#> 2 556c8c76   M  19       <NA>               <NA>           <NA>           <NA>
#> 3 04edf85a   M  50       <NA>               <NA>           <NA>           <NA>
#> 4 7e51a18e   F   8       <NA>               <NA>           <NA>           <NA>
#> 5 c5a83f56   M  66       <NA>               <NA>           <NA>           <NA>
#> 6 7f675ec3   M  29       <NA>               <NA>     2044-04-09     2044-04-30
#>   vaccine_1 vaccine_2
#> 1      <NA>      <NA>
#> 2      <NA>      <NA>
#> 3      <NA>      <NA>
#> 4      <NA>      <NA>
#> 5      <NA>      <NA>
#> 6    BRAND1    BRAND1
library(data.table)
head(cohortdata)
#>          id    sex   age death_date death_other_causes vaccine_date_1
#>      <char> <char> <int>     <Date>             <Date>         <Date>
#> 1: afade1b2      F    37       <NA>               <NA>           <NA>
#> 2: 556c8c76      M    19       <NA>               <NA>           <NA>
#> 3: 04edf85a      M    50       <NA>               <NA>           <NA>
#> 4: 7e51a18e      F     8       <NA>               <NA>           <NA>
#> 5: c5a83f56      M    66       <NA>               <NA>           <NA>
#> 6: 7f675ec3      M    29       <NA>               <NA>     2044-04-09
#>    vaccine_date_2 vaccine_1 vaccine_2
#>            <Date>    <char>    <char>
#> 1:           <NA>      <NA>      <NA>
#> 2:           <NA>      <NA>      <NA>
#> 3:           <NA>      <NA>      <NA>
#> 4:           <NA>      <NA>      <NA>
#> 5:           <NA>      <NA>      <NA>
#> 6:     2044-04-30    BRAND1    BRAND1

Created on 2024-09-27 with reprex v2.1.0

but additionally it may mean that the efficiency gains of using <data.table> objects are not used by the user unless they have loaded {data.table} themselves. My recommendation would just be to resave the cohortdata as a <data.frame> which seems to me the simplest solution.

  • The documentation of cohortdata say that it is an object of class cross but that doesn't seem to be the case
  • Is there a reason cohortdata is stored both in /data and /inst/extdata. I assume you only need it /data if you want users to access it. Apologies if I'm missing something here.

Tangential question: Do you have any simulation functionality. This seems like a good area to develop {simulist} to simulate vaccination that can then be used to test {vaccineff}, we can discuss this once you've finished the review and released the package.
(#76 (review))

The {qtl} package is a suggested dependency but is only used in the documentation of cohortdata which seems to be for the read.cross() function. Can this suggested dependency be removed, especially as it seems the cohortdata is not a <cross> object?
(#76 (comment))

  • Some examples took a little longer than expected (effectiveness, make_vaccineff_data, plot_coverage). Might be worth looking into it if. Maybe you can use a shorter dataset for the examples, or use a \donttest instruction.
    (First full package review #76 (review))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant