Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary censoring in right truncation simulation #207

Open
seabbs opened this issue Mar 5, 2025 · 3 comments
Open

Primary censoring in right truncation simulation #207

seabbs opened this issue Mar 5, 2025 · 3 comments

Comments

@seabbs
Copy link

seabbs commented Mar 5, 2025

I was just reading the truncation code

#' @param delay A `function` (either anonymous or predefined) that

and noticed that it uses dates to define the truncation delay. That means I think that the delay distribution should be at least primary censored to ensure that the distribution that is used in simulation is the one that users actually think they are specifying. This is because I believe that the reference date is defined to be a date and so is censored.

Potential solutions

  • Document this
  • Use a safer default
  • Do an internal correction to account for censoring (this would look something like the simulation process in primary censored I believe i.e approximate it by sampling from a uniform).
@joshwlambert
Copy link
Member

Thanks for raising this @seabbs!

There has been a fairly substantial refactor of truncate_linelist() in PR #201. There is no longer a delay argument. Instead it now takes the truncation day (either as the number of days before the end of the outbreak or number of days after the start of the outbreak, or a <Date>).

Therefore, if I'm understanding correctly your comment is more in reference to the reporting_delay argument in sim_linelist() which controls reporting delay distribution between $date_onset and $date_reporting ($date_reporting is then used by truncate_linelist() to determine which cases are subset).

The outbreak simulation runs in continuous time with dates stored to double point precision. Therefore, although R's default <Date> printing can be misleading to make it seem that all dates are discrete, the dates in a line list output from sim_linelist() can be, for example, half way through a day. Therefore, it's my understanding that the precision of the date means it's not primary censored.

Let me know your thoughts.

@joshwlambert
Copy link
Member

joshwlambert commented Mar 26, 2025

I think this issue can now be closed in relation to the original point. But before closing, in relation to censored dates, as the dates in the line list output by sim_linelist() are in continuous time, e.g.

set.seed(2)
ll <- simulist::sim_linelist()
ll$date_onset
#>  [1] "2023-01-01" "2023-01-13" "2023-01-18" "2023-01-18" "2023-01-15"
#>  [6] "2023-01-13" "2023-01-16" "2023-01-23" "2023-01-18" "2023-01-19"
#> [11] "2023-01-19" "2023-01-29" "2023-02-01" "2023-02-01" "2023-02-03"
#> [16] "2023-02-10"
ll$date_onset <- as.POSIXct(ll$date_onset)
ll$date_onset
#>  [1] "2023-01-01 00:00:00 UTC" "2023-01-13 03:11:03 UTC"
#>  [3] "2023-01-18 01:04:30 UTC" "2023-01-18 23:21:20 UTC"
#>  [5] "2023-01-15 21:36:08 UTC" "2023-01-13 23:12:39 UTC"
#>  [7] "2023-01-16 14:55:21 UTC" "2023-01-23 11:40:41 UTC"
#>  [9] "2023-01-18 21:47:14 UTC" "2023-01-19 19:18:30 UTC"
#> [11] "2023-01-19 01:54:39 UTC" "2023-01-29 09:01:01 UTC"
#> [13] "2023-02-01 08:00:38 UTC" "2023-02-01 01:15:18 UTC"
#> [15] "2023-02-03 07:05:22 UTC" "2023-02-10 18:42:20 UTC"

Created on 2025-03-26 with reprex v2.1.1

I'm thinking it might be useful to show the users how to easily convert these into daily interval dates, resulting in data that can be used to fit models from {primarycensored}.

It think the best place for this would be the wrangling-linelist.Rmd vignette, which is where random tips and tricks for working with simulated data go.

Another option is a new exported function (e.g. censor_linelist()) which could take an interval argument for daily, weekly, etc. censoring intervals, but I don't really want to increase the namespace of the package.

@joshwlambert
Copy link
Member

Jotting down a few notes:

  • Dates should be floored not rounded to maintain the correct day
as.Date(10957)
#> [1] "2000-01-01"
as.Date(10957.6)
#> [1] "2000-01-01"
as.Date(round(10957.6)) 
#> [1] "2000-01-02"
as.Date(floor(10957.6))
#> [1] "2000-01-01"

Created on 2025-03-26 with reprex v2.1.1

  • floor() cannot be directly applied on <Date> objects
floor(as.Date(10957.6))
#> Error in Math.Date(as.Date(10957.6)): floor not defined for "Date" objects

Created on 2025-03-26 with reprex v2.1.1

For daily censoring, it's probably as simple as:

set.seed(2)
ll <- simulist::sim_linelist()
unclass(ll$date_onset)
#>  [1] 19358.00 19370.13 19375.04 19375.97 19372.90 19370.97 19373.62 19380.49
#>  [9] 19375.91 19376.80 19376.08 19386.38 19389.33 19389.05 19391.30 19398.78
cens_ll <- as.data.frame(lapply(ll, FUN = function(x) {
  if (inherits(x, "Date")) {
    x <- as.Date(floor(as.numeric(x)))
  }
  x
}))
unclass(cens_ll$date_onset)
#>  [1] 19358 19370 19375 19375 19372 19370 19373 19380 19375 19376 19376 19386
#> [13] 19389 19389 19391 19398

Created on 2025-03-26 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants