Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To Do List #70

Closed
65 tasks done
aspina7 opened this issue Feb 6, 2019 · 11 comments
Closed
65 tasks done

To Do List #70

aspina7 opened this issue Feb 6, 2019 · 11 comments

Comments

@aspina7
Copy link
Member

aspina7 commented Feb 6, 2019

Everywhere:

  • [Zhian] text size in plots doesnt respond to theme_set base_size from setup chunk; only when setting element_text in the plot itself. Need to adapt outbreak and survey templates accordingly.

  • [Discussion] would be good to also output a document (beside the word) which summarises what datasets were used (and their file paths), starting cases, dropped cases where the output is. And maybe warnings/errors? [moved to nice-to-haves pr]

  • [Zhian] in tab_* functions, change the warning when dropping NAs to be "call. = FALSE" to the warning function - as descriptive is no longer a user-facing function.

  • [Zhian] add a license to the repo? see Question: who is the copyright holder? #44

  • [Kate/Neale / Alex / Zhian] decide if also want to show code for making implicit NAs out of explicit e.g. "Missing" chars to NA. So if you want to make "No answer" an implicit NA, you would use fct_recode(NULL = "No answer")

  • [Kate] update all sitrep::descriptive tables with groupers to use sitrep::rename_redundant and augment_redundant for renaming similar columns

  • [Neale] Add chi-squared test and t.test examples to website (there is already an example of this in the vaccination survey) [moved to https://github.com/Show example code comparing HEV rdt positives v negatives R4EPIs-website#19#issue-494502584 on website]

  • [Zhian/Alex] consider dropping sitrep::discreptive in favour of arsenal::tableby (dplyr compatible? simple enough syntax for beginners?) - decided against, syntax too dense

  • [Neale] Under "## Installing and loading required packages " add that can check where your packages are using .libPaths() and give link to wiki/training material. Currently only says: "Program Files/R/R-[version]/library" which wont always be the case (particularly on MSF computers).

  • [Neale?] double check that case_when is doing what it should when NAs are involved. case_when doesnt leave NAs as NAs, need to add extra argument. see below:

x <- c(NA, "good", "bad")
dplyr::case_when(
  is.na(x)    ~ NA_character_,
  x == "good" ~ "YAY",
  x == "bad"  ~ "BOO!",
  TRUE        ~ "WAT"
)
#> [1] NA     "YAY"  "BOO!"
  • [Neale?] switch all factor stuff to forcats (swap recode_factor out) - cross reference with point under "surveys" - all of the factor cleaning needs a bit of a fix
  • [Neale] add to intro to templates that feedback is always welcome via github. a la Add feedback section to templates and other documentation #33 (comment)
  • [Alex] check the ceramics package as a map tile solution. See spatial analysis issue. Package is not quite there yet, but may be in the future.
  • [Alex] Clean up sitrep::univariate_analysis - lot of unecessary repeating. Also need to add stratified.

Surveys:

  • [Alex] all surveys - fix factor cleaning according to neale examples from outbreak templates (neale may have already done this)
  • [Alex] mortality and nutrition: finalise reason_no_consent chunk once descriptive function is able to deal with multiple choice variables split over several columns (requires new tab_linelist function)
  • [Alex] nutriton: add a weighted table by household if received soap or not (requires fixing weighting below).
  • [Alex/Zhian] Add option for cluster design to add_weights. Input the number of clusters (i.e. villages), then the number of households within those clusters and the numberof kids for each house (i.e.) ... TO BE DISCUSSED once hear back from statistician
  • [Alex] add examples of stratified design and analysis by region.
  • [Alex] nutrition survey - confirm that weighted proportions are the same as generated by the anthro package see tabulate_survey and anthro_prevalence estimate discrepancies #140
  • [Alex] Nutrition: updated dictionary to switch measles for programme penetrance variable (soap)
  • [Alex/Zhian] Fix surveys cleaning section factors (cause of death and no consent somehow in same cleaning step) - may just be for vaccination.
  • [Alex/Zhian] Add replace NA explicit in factor cleaning for vaccination (from mortality)
  • [Alex] Change mortality and vaccination to use surv_weight variable after change to the add_weight function output change!
  • [Zhian - discuss with alex] the chunk "descriptive_sampling_bias" throws an error if the age group variable has missings (because descriptive now returns a row with "missings") - and the population data frame obviously doesnt have that extra row .... so row numbers differ. Do we just add a comment??
  • [Alex] fix the cluster_hh_size chunk - counting number of houses definitely wrong.
  • [Alex] add cluster to mortality survey (need to define in dict)
  • [Alex] add descriptive of non-consent reasons at the begining results where describe sample
  • [Alex] add a weighted table for reason not vaccinated (to vacc survey template)
  • [Alex] double check in all templates that using correct order for counter and stratifier variables (e.g. in vaccination - can't just swap to have age_group first in order to flip the table.... use the new transpose argument!
  • [Zhian] need to add design effect option to tabulate_survey (see example from epiet RAS case study)
  • [Zhian] tabulate_survey function when pretty=TRUE, returns % symbol as well as CI in each cell. It would be enough to just have the column heading showing % (95%CI), and then in the cell have e.g. 35 (21-90). This relevnt to all pretty merging functions...
  • [Zhian] tabulate_survey when stratified - possible to give option for row or col props, as well as of total?? (same as we do for the descriptive functin)

Extras:

  • [NICE TO HAVE] if have time then add in the options to add sample size calculations see Support sample size calculations for all three surveys #5 [moved to nice to have pr]
  • ~~[NICE TO HAVE] Consider implementing a variation of the wordr package which allows us to put pagebreaks in. ~~ [moved to nice to have pr]
  • [NICE TO HAVE] on all plots - make y axis numbering go to top of axis, this happens because of expand(c(0,0) [moved to nice to have pr]
  • [see: https://github.com/Add option to calculate/plot rolling averages reconhub/incidence#105] on epicurves - possible to add a 2 weekly moving average? - issue posted on recon. Long term issue case.[moved to nice to have pr]
  • [NICE TO HAVE] Consider adding example of creating age_categories grouping in months for under fives to generic template (added in cholera tempalte - copy paste if acceptable)
  • [NICE TO HAVE] add kates suggestion of removing previously infected cases from denom in consecutive epiweeks (So in theory this is methodologically correct if can actually confirm that those who were counted as cases were in fact confirmed to be from the disease in question. In practice – likely have many suspected cases therefor would not necessarily be removed from the at risk category. Therefor questionable) - decided against - rare use case.
  • [NICE TO HAVE] Confirm if there is an existing way to have discrete categories in choropleth maps
    https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/ Mense, just stick with what we have.
  • [NICE TO HAVE] adjust fmt_count to have an option of removing proportions, and an option for specifying a different denom (from measles - but works just as well with count...) - count is fine stick with that.
  • [NICE TO HAVE] switch from cowplot to patchwork when released on cran (https://github.com/thomasp85/patchwork) - patchwork not being released any time soon. (hasnt been worked on in a while)
  • [NICE TO HAVE] Add non binary gender option to plot_age_pyramid, see Make plot_age_distribution to include non-binary gender values #102 (comment)

Outbreaks:

  • [ZHIAN] Fix cowplot alignment of epicurvs and ar/cfr, so that ticks are in the middle of bars

From measles:

  • No lab data in dictionary - add fake lab data with example of how to merge and create case def. (see generic outbreak template). Do all above commented out,- comment out other analyses stratified by case def (just leave in as an example)
  • in gen_data specify that only those who received vaccine have dose entered
  • add an example to data cleaning section for setting dose to NA where vaccine not given. (commented out)
  • add a thing about showing how many NA in each variable (summary has that) - add a bit to drop rows with xzy missing or based on bla... just use dplyr filter
  • add an example of writing cleaned dataset to excel (or double check that we have it there)

  • [ZHIAN] when creating the epiweek variable - define as a factor and then add all weeks between min and max as levels - so that dont have to fuck around in tables with zerocount weeks.
  • [ZHIAN] consider option of add_totals for proportions function,- so it just sums the counts of res, then runs proportions function and bind_rows. If you look at what I did in the CFR section of the cholera template, having to bind_rows of an overall and a group specific CFR calculation is a bit long winded....
  • [ZHIAN] Consider adding counts(proportions%) to inline_fun. See cholera template inline code before #### Demographics
  • [ZHIAN] fmt_ci_df(ar) adds a % sign at the end ... but if its per 10,000 population we dont want a % sign as seen after attack_rate code chunk
  • [ZHIAN] Get rid of do(..) and change functions to NSE? goes back to issue#48

  • [ZHIAN/ALEX] Try and make tables that are too big fit nicely in worddoc output (maybe shorten col names or merge categories....

  • [ALEX] add option to add a ceiling to age_group - e.g. to have the highest group in months end at 24months... (not be 24+)
  • [ALEX] update descriptive function with option to have percentage of total, rather than column specific.
  • [ALEX] Mapping section: consider changing the plotting of choropleths as categories rather than continuous... also make the points stuff better
  • [ALEX] make sure all the 95%CIs are merged in the document tables... (think is just mortality section left over)
  • [ALEX] add kates admissions/exits table in seperate tables

  • [DONE? ZHIAN?] Consider adding an option to age_pyramid which returns proportions rather than counts; and option to remove NAs; Horizontal_lines does not seem to work either....
  • [DONE?] When library(excel.link), message about someone called daniela - supress messages...
  • [DONE] on epicurves, when you use scale_x_date(date_breaks = "1 week") - the axis labels change to full dates, is it possible to keep it with the default 2013-W01 for example?
  • [DONE?] using fmt_ci_df function doesnt work if use the mergeCI function from props functions (e.g. attack_rate)
  • [DONE] Find a better way to reference lines/chunks (is there some kind of hyperlink function?), for "Introduction to this template" section - no solution really, just reference code chunk names
@zkamvar
Copy link
Member

zkamvar commented Feb 7, 2019

on epicurves, when you use scale_x_date(date_breaks = "1 week") - the axis labels change to full dates, is it possible to keep it with the default 2013-W01 for example?

For this one, I think the answer is: don't use scale_x_date(date_breaks = "1 week") since incidence should do this by default for weekly incidence.

@zkamvar
Copy link
Member

zkamvar commented Feb 7, 2019

When library(excel.link), message about someone called daniela - supress messages...

Interesting design.... the author uses the package startup message as a sort of dedication page..

The solution is suppressPackageStartupMessages(library("excel.link"))

@aspina7
Copy link
Member Author

aspina7 commented Feb 7, 2019

on epicurves, when you use scale_x_date(date_breaks = "1 week") - the axis labels change to full dates, is it possible to keep it with the default 2013-W01 for example?

For this one, I think the answer is: don't use scale_x_date(date_breaks = "1 week") since incidence should do this by default for weekly incidence.

mmmm dont think it did though - only put the axis labels every couple of weeks, or will it automatically add them all but just making them go slanted?

@zkamvar
Copy link
Member

zkamvar commented Feb 7, 2019

mmmm dont think it did though - only put the axis labels every couple of weeks, or will it automatically add them all but just making them go slanted?

By default, it will only put labels at six points along the curve. That can be changed by setting n_breaks = nrow(<incidence object>) in the plot function. I'll make a quick PR to fix it

@zkamvar
Copy link
Member

zkamvar commented Feb 22, 2019

on epicurves - possible to add a 2 weekly moving average?

I'm not exactly sure what this means... Does this mean that you want the epicurve on a bi-weekly basis or that you want to track data on a bi-weekly basis?

@zkamvar
Copy link
Member

zkamvar commented Feb 22, 2019

* Find a better way to reference lines/chunks (is there some kind of hyperlink function?), for "Introduction to this template" section

We've seen that referencing lines doesn't really work too well. Referencing chunks by name is the best way to go so far.

@zkamvar
Copy link
Member

zkamvar commented Feb 22, 2019

Were we going to find a way around not using do(..)?

I believe the plan is to add a stratifying option to the proportion functions so that a lot of the boilerplate can go away.

@zkamvar
Copy link
Member

zkamvar commented Feb 22, 2019

Consider adding an option to age_pyramid which returns proportions rather than counts; and option to remove NAs; Horizontal_lines does not seem to work either....

This is more than one point!

  • Option to return proportions, so basically, each gender would scale to 100% of the total for that gender? Its essentially plotting the props from the "cases by age group and sex" table . [this is probably completely wrong - alex to double check :) ]
  • Option to remove NAs seems worthwhile.
  • RE horizontal_lines: prove it.

Alex: haha horizontal lines I think you fixed already and i didnt update this

@aspina7
Copy link
Member Author

aspina7 commented Feb 22, 2019

on epicurves - possible to add a 2 weekly moving average?

I'm not exactly sure what this means... Does this mean that you want the epicurve on a bi-weekly basis or that you want to track data on a bi-weekly basis?

you add a line on top of the bars for epicurves, for each week you take the x number of weeks before and x number of weeks after to calculate the average. Essentially just smooths out the epicurve - useful for when your surveillance/reporting coverage fluctuates by week.

Can do it with rollapply in zoo package - but wonder if worth adding as an option to incidence package?

Zhian: I think this is a reasonable request for the epicurve. It seems similar to the goals of reconhub/incidence#75 and reconhub/incidence#83

Alex: mmm simpler than those requests. In most cases, for each week - you take the counts from x number of weeks before, average those counts and plot for that week. See ecdc report

essentially it would be:
Case counts: week 1 = 2, week 2 = 3, week 3 = 3, week 4 = 4
2 week retrospective moving average: week 1 = NA, week 2 = NA, week 3 = 2.5, week 4 = 3

Zhian: Yup, but the I would need to write a general function that would annotate the epicurve, which would also fit those purposes.

@aspina7
Copy link
Member Author

aspina7 commented Feb 22, 2019

Were we going to find a way around not using do(..)?

I believe the plan is to add a stratifying option to the proportion functions so that a lot of the boilerplate can go away.

you or me?

Zhian: I could give it a stab

@zkamvar
Copy link
Member

zkamvar commented Feb 28, 2019

  • [ZHIAN] consider option of add_totals for proportions function,- so it just sums the counts of res, then runs proportions function and bind_rows. If you look at what I did in the CFR section of the cholera template, having to bind_rows of an overall and a group specific CFR calculation is a bit long winded....

I think this and getting rid of do() are intertwined....

@aspina7 aspina7 changed the title Outbreak template remaining open points To Do List Apr 26, 2019
@zkamvar zkamvar mentioned this issue May 8, 2019
@zkamvar zkamvar pinned this issue Jun 5, 2019
@aspina7 aspina7 closed this as completed Sep 28, 2019
@aspina7 aspina7 unpinned this issue Sep 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants