reduce example data to improve permormance and accuracy of examples #82

davidsantiagoquevedo · 2024-10-07T10:22:33Z

During user test #3, it was noted that the estimations are biased due to the inclusion of several age groups without outcomes. This issue is open to improving the accuracy of the examples.

A few issues were also raised regarding the performance of examples and the data structure during the full package review in #76 (see below). These also will be addressed here.

You have some cohort data in /data but you don't have a /data-raw folder giving this data provenance. It would be good to show how the data was generated.

The documentation for the cohortdata in R/cohortdata.R is also incomplete not stating where the data is from.

The cohortdata is stored a <data.table> object. But the {data.table} package is not a dependency of {vaccineff}, so when the package is loaded, {data.table} does not get loaded. This has some small effects such as printing:
library(vaccineff)
head(cohortdata)
#>         id sex age death_date death_other_causes vaccine_date_1 vaccine_date_2
#> 1 afade1b2   F  37       <NA>               <NA>           <NA>           <NA>
#> 2 556c8c76   M  19       <NA>               <NA>           <NA>           <NA>
#> 3 04edf85a   M  50       <NA>               <NA>           <NA>           <NA>
#> 4 7e51a18e   F   8       <NA>               <NA>           <NA>           <NA>
#> 5 c5a83f56   M  66       <NA>               <NA>           <NA>           <NA>
#> 6 7f675ec3   M  29       <NA>               <NA>     2044-04-09     2044-04-30
#>   vaccine_1 vaccine_2
#> 1      <NA>      <NA>
#> 2      <NA>      <NA>
#> 3      <NA>      <NA>
#> 4      <NA>      <NA>
#> 5      <NA>      <NA>
#> 6    BRAND1    BRAND1
library(data.table)
head(cohortdata)
#>          id    sex   age death_date death_other_causes vaccine_date_1
#>      <char> <char> <int>     <Date>             <Date>         <Date>
#> 1: afade1b2      F    37       <NA>               <NA>           <NA>
#> 2: 556c8c76      M    19       <NA>               <NA>           <NA>
#> 3: 04edf85a      M    50       <NA>               <NA>           <NA>
#> 4: 7e51a18e      F     8       <NA>               <NA>           <NA>
#> 5: c5a83f56      M    66       <NA>               <NA>           <NA>
#> 6: 7f675ec3      M    29       <NA>               <NA>     2044-04-09
#>    vaccine_date_2 vaccine_1 vaccine_2
#>            <Date>    <char>    <char>
#> 1:           <NA>      <NA>      <NA>
#> 2:           <NA>      <NA>      <NA>
#> 3:           <NA>      <NA>      <NA>
#> 4:           <NA>      <NA>      <NA>
#> 5:           <NA>      <NA>      <NA>
#> 6:     2044-04-30    BRAND1    BRAND1
Created on 2024-09-27 with reprex v2.1.0

but additionally it may mean that the efficiency gains of using <data.table> objects are not used by the user unless they have loaded {data.table} themselves. My recommendation would just be to resave the cohortdata as a <data.frame> which seems to me the simplest solution.

The documentation of cohortdata say that it is an object of class cross but that doesn't seem to be the case

Is there a reason cohortdata is stored both in /data and /inst/extdata. I assume you only need it /data if you want users to access it. Apologies if I'm missing something here.

Tangential question: Do you have any simulation functionality. This seems like a good area to develop {simulist} to simulate vaccination that can then be used to test {vaccineff}, we can discuss this once you've finished the review and released the package.
(#76 (review))

The {qtl} package is a suggested dependency but is only used in the documentation of cohortdata which seems to be for the read.cross() function. Can this suggested dependency be removed, especially as it seems the cohortdata is not a <cross> object?
(#76 (comment))

Some examples took a little longer than expected (effectiveness, make_vaccineff_data, plot_coverage). Might be worth looking into it if. Maybe you can use a shorter dataset for the examples, or use a \donttest instruction.
(First full package review #76 (review))

The text was updated successfully, but these errors were encountered:

davidsantiagoquevedo self-assigned this Oct 7, 2024

davidsantiagoquevedo mentioned this issue Oct 7, 2024

First full package review #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce example data to improve permormance and accuracy of examples #82

reduce example data to improve permormance and accuracy of examples #82

davidsantiagoquevedo commented Oct 7, 2024 •

edited

Loading

reduce example data to improve permormance and accuracy of examples #82

reduce example data to improve permormance and accuracy of examples #82

Comments

davidsantiagoquevedo commented Oct 7, 2024 • edited Loading

davidsantiagoquevedo commented Oct 7, 2024 •

edited

Loading