In this repository we characterized and compared, using statistical tools, the first four consecutive epidemiological waves in the Bogotá, Colombia, that occurred between March 2020 and April 2022. We used the report of confirmed cases from the District Health Secretary of Bogotá , and the genomic surveillance data published by the Global Initiative on Sharing All Influenza Data (GISAID). We focused mainly on the estimation of:
- The instantaneous reproduction number R(t).
- The transmissibility advantage between variants.
- The delay times for onset-to-hospitalisation, onset-to-ICU, onset-to-death, hospital stay, and ICU stay.
- The characterization of severe outcomes using the severe ratios: Hospitalisation/ICU Case Rate (H/ICU-CR), Case Fatality Ratio (CFR), Hospitalisation/ICU Fatality Rate (H/ICU-FR) per age group and wave; and the percentages of Hospitalisation, ICU admission and Deaths per age group and wave.
-
Report of confirmed cases from the District Health Secretary of Bogotá (Private database - last update: 2022-08-02)
-
Genomic surveillance data published by the Global Initiative on Sharing All Influenza Data (GISAID) (Public database - last update: 2022-08-02)
-
Reproduction number R(t): we estimated the time-varying instantaneous reproduction number R(t) using the epidemiological package for R: EpiEstim
-
Transmissibility advantage: we evaluated the transmissibility advantage using a multinomial logistic regression with a single explanatory variable
$t$ given by:
In the previous expression
Where
With these coefficientes we computed the relative transmissibiliy advantage between two variants
The multinomial regressions were run in stan using the library PyStan for python.
- Probability distributions of delay times: we used a bayesian hierarchical model adapted from this repository. We fitted initial parameters for the district level and then sample the parameters for each wave as follows:
where
- Severe outcomes: all the results of this section where calculated for subpopulations
$(i,g)$ defined by the waves$i \in$ {$1,2,3,4$ } and the age groups$g \in$ {$all, 0-9, 9-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+$ }.
For the CFR and H/ICU-FR we used the following formula:
Where
For the H/ICU-CR we used:
In this case
For the percentages we used:
Where
In all the cases we estimated a confidence interval of 95% using binomial proportions.
All the folders, except plots and tables, are organized following the same structure: a scripts subfolder that contains the necessary codes to run the models and the analysis, and an outputs subfolder that contains the results.
- The folder epidemiological_distributions contains the following scripts:
-
Models implemented in stan for the distric level and the partial pooling for waves: model_exponential_district.stan, model_gamma_district.stan, model_gln_district.stan, model_log_normal_district.stan, model_weibull_district.stan, model_exponential_pool.stan, model_gamma_pool.stan, model_gln_pool.stan, model_log_normal_pool.stan, model_weibull_pool.stan.
-
Python scripts to run the models and extract the results: run_exponential.py, run_gamma.py, run_gln.py, run_log_normal.py, run_weibull.py).
-
bayes_factor.py: python script to calculate the bayes factor.
-
summarize_results.py: python script that sumarizes the main results of the bayesian inference.
-
utilities_epi_dist.py: python script with functions used for preparing and cleaning the data, and tools for the statistical analysis .
- The folder genomics contains the following scripts:
- genomics_functions.R, variants_record: functions implemented in R to process the genomic data.
- process_data.py: python script to generate the inputs for the model.
- multinomial_model.stan: multinomial model implemented in stan.
- run_model.py: python script to run the multinomial model.
- process_results: python script to process the results from the multinomial model.
- The folder rt contains the following script:
- rt.R: R script to estimate the Reproduction number.
- The folder severe_outcomes contains the following scripts:
- percentages.py: python script to calculate the percentages.
- proportions.py: python script to calculate the binomial proportions.
- rates.py: python script to calculate the CFR, HCR, ICU-CR, HFR and ICU-FR.
- utilities_severity.py: python script with tools used for the severity analysis.
- The folder waves contains the following scripts:
- roots_confirmed_cases.py: python script to find the roots of the epidemic curve using gaussian smoothing and interpolation.
- process_waves.py: python script to process the waves after visual inspection of the roots.
- utilities_waves.py: python script with tools used for determining the waves.
- The folder tables contains the following scripts:
- Python scripts to generate the tables of the supplementary materials: table_s1.py, table_s2.py, table_s4.py, table_s5.py.
This scripts import and call functions from the utilities of each section and the results contained in the corresponding output subfolder.
- The folder plots contains the following scripts:
-
plot_style.mplstyle: script with the styles used to generate all the figures.
-
Python scripts to generate the figures included in the main document: figure_1.py, figure_2.py, figure_3.py, figure_4.py, figure_5.py).
-
individual_plots.py: python script to generate the individual plots included in the supplementary materials and some visualizations of the analysis.
-
Scripts with the visualization functions for every section of the analysis: overview.py, results_epidemiological_distributions.py, results_genomics.py, results_rt.py, results_severe_outcomes.py, results_waves.py.
This YAML file contains information about the models, the paths used in the scripts and the roots selected for the waves. It is called in the beginning of all the scripts.