Skip to content

Commit

Permalink
Untested rest of workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
Ulthran committed Jan 21, 2025
1 parent 110c7c9 commit 4860cea
Show file tree
Hide file tree
Showing 12 changed files with 198 additions and 108 deletions.
51 changes: 0 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,3 @@
This is a template to use to extend the [Sunbeam pipeline](https://github.com/sunbeam-labs/sunbeam). There are three major parts to a Sunbeam extension:

- `sbx_sga_env.yml` specifies the extension's dependencies
- `config.yml` contains configuration options that can be specified by the user when running an extension
- `sbx_sga.rules` contains the rules (logic/commands run) of the extension

## Creating an extension

Any dependencies (available through conda) required for an extension should be listed in the `sbx_[your_extension_name]_env.yml` file. An example of how to format this is shown in the `sbx_sga_env.yml` file. These dependencies are automatically handled and installed by Snakemake through conda (see below for how to make sure your rule finds this env file), so the user doesn't have to worry about installing dependencies themselves. You can also specify specific software versions like so:

dependencies:
- python=3.7
- megahit<2

Rarely, you may want to specify different environments for different rules within your extension, in the event that different rules have different (potentially conflicting) requirements.

The `config.yml` contains parameters that the user might need to modify when running an extension. For example, if your downstream analysis is run differently depending on whether reads are paired- or single-end, it would probably be wise to include a `paired_end` parameter. Default values should be specified for each terminal key. As of Sunbeam v3.0, as long as the parameter config file is named `config.yml`, configuration options for installed extensions are automatically included in new config files generated by `sunbeam init` and `sunbeam config update`.

Finally, `sbx_sga.rules` contains the actual logic for the extension, including required input and output files. A detailed discussion of Snakemake rule creation is beyond the scope of this tutorial, but definitely check out [the Snakemake tutorial](http://snakemake.readthedocs.io/en/stable/tutorial/basics.html) and any of the [extensions by sunbeam-labs](https://github.com/sunbeam-labs) for inspiration.

For each rule that needs dependencies from your environment file, make sure to let snakemake know in the rule like this:

example_rule:
...
conda:
"sbx_sga_env.yml"
...

The dependency .yml file can be named whatever you want, as long as you refer to it by the correct filename in whatever rule needs those dependencies. The path to the dependency .yml file is relative to the .rules file (which in most cases is in the same directory).

## Additional extension components

### .github/

This directory contains CI workflows for GitHub to run automatically on PRs, including tests and linting. If the linter raises errors, you can fix them by running `snakefmt` on any snakemake files and `black` on any python files. The release workflow will build and push a docker image for each environment in the extension.

### .tests/

This directory contains tests, broken down into types such as end-to-end (e2e) and unit, as well as data for running these tests.

### scripts/

This directory contains scripts that can be run by rules. Use this for any rules that need to run python, R, etc code.

### envs/*.Dockerfile

The Dockerfiles provided with conda env specifications allow for containerized runs of sunbeam (meaning they use docker containers to run each rule rather than conda envs).

(You can delete everything above this line)
-----------------------------------------------------------------

<img src="https://github.com/sunbeam-labs/sunbeam/blob/stable/docs/images/sunbeam_logo.gif" width=120, height=120 align="left" />

# sbx_sga
Expand Down
4 changes: 3 additions & 1 deletion config.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
sbx_sga:
example_rule_options: 'DUMMY STRING'
mash_ref: RefSeq88n.msh
checkm_ref: /mnt/isilon/marc_genomics/shared_data/CheckM2_database/uniref100.KO.1.dmnd
bakta_ref: /mnt/isilon/marc_genomics/shared_data/bakta_db_5/db/
7 changes: 7 additions & 0 deletions envs/abritamr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: sga_abritamr
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- abritamr
7 changes: 7 additions & 0 deletions envs/bakta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: sga_bakta
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- bakta
6 changes: 6 additions & 0 deletions envs/checkm2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
name: sga_checkm2
channels:
- conda-forge
- bioconda
dependencies:
- checkm2
4 changes: 2 additions & 2 deletions envs/sbx_sga_env.yml → envs/mash.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: sbx_sga
name: sga_mash
channels:
- conda-forge
- bioconda
dependencies:
- samtools
- mash
6 changes: 6 additions & 0 deletions envs/mlst.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
name: sga_mlst
channels:
- bioconda
- conda-forge
dependencies:
- mlst
7 changes: 7 additions & 0 deletions envs/quast.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: sga_quast
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- quast
17 changes: 0 additions & 17 deletions envs/sbx_template_env.Dockerfile

This file was deleted.

6 changes: 6 additions & 0 deletions envs/shovill.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
name: sga_shovill
channels:
- conda-forge
- bioconda
dependencies:
- shovill
181 changes: 154 additions & 27 deletions sbx_sga.smk
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
def get_template_path() -> Path:
def get_sga_path() -> Path:
for fp in sys.path:
if fp.split("/")[-1] == "sbx_sga":
return Path(fp)
Expand All @@ -7,7 +7,8 @@ def get_template_path() -> Path:
)


SBX_TEMPLATE_VERSION = open(get_template_path() / "VERSION").read().strip()
ISOLATE_FP = Cfg["all"]["output_fp"] / "isolate"
SBX_SBA_VERSION = open(get_sga_path() / "VERSION").read().strip()

try:
BENCHMARK_FP
Expand All @@ -20,47 +21,173 @@ except NameError:


localrules:
all_template,
all_sga,


rule all_template:
rule all_sga:
input:
QC_FP / "mush" / "big_file.txt",
# QC
#expand(ISOLATE_FP / "fastqc" / "{sample}_{rp}_fastqc/fastqc_data.txt", rp=Pairs, sample=Samples),
expand(ISOLATE_FP / "mash" / "{sample}_sorted_winning.tab", sample=Samples),
# Assembly QC
ISOLATE_FP / "checkm" / "quality_report.tsv",
expand(ISOLATE_FP / "quast" / "{sample}" / "report.tsv", sample=Samples),
# Typing
ISOLATE_FP / "mlst" / "mlst_report.tsv",
# Annotation
expand(ISOLATE_FP / "bakta" / "{sample}" / "{sample}.txt", sample=Samples),
# AMR Profiling
expand(ISOLATE_FP / "abritamr" / "{sample}" / "amrfinder.out", sample=Samples),


rule example_rule:
"""Takes in cleaned .fastq.gz and mushes them all together into a file"""
rule sga_mash:
input:
expand(QC_FP / "cleaned" / "{sample}_{rp}.fastq.gz", sample=Samples, rp=Pairs),
reads=expand(QC_FP / "decontam" / "{{sample}}_{rp}.fastq.gz", rp=Pairs),
output:
QC_FP / "mush" / "big_file1.txt",
agg=temp(ISOLATE_FP / "mash" / "{sample}.fastq"),
win=temp(ISOLATE_FP / "mash" / "{sample}_winning.tab"),
sort=ISOLATE_FP / "mash" / "{sample}_sorted_winning.tab",
params:
ref=Cfg["sbx_sga"]["mash_ref"],
log:
LOG_FP / "example_rule.log",
LOG_FP / "sga_mash_{sample}.log",
benchmark:
BENCHMARK_FP / "example_rule.tsv"
BENCHMARK_FP / "sga_mash_{sample}.tsv"
conda:
"envs/mash.yml"
shell:
"""
zcat {input.reads} > {output.agg}
mash screen -w -p 8 {params.ref} {output.agg} > {output.win} 2> {log}
sort -gr {output.win} > {output.sort} 2>> {log}
"""


rule sga_shovill:
input:
rp1=QC_FP / "decontam" / "{sample}_1.fastq.gz",
rp2=QC_FP / "decontam" / "{sample}_2.fastq.gz",
output:
contigs=ISOLATE_FP / "shovill" / "{sample}" / "contigs.fa",
log:
LOG_FP / "sga_shovill_{sample}.log",
benchmark:
BENCHMARK_FP / "sga_shovill_{sample}.tsv"
conda:
"envs/shovill.yml"
shell:
"""
shovill --assembler skesa --outdir $(dirname {output.contigs}) --R1 {input.rp1} --R2 {input.rp2} &> {log}
"""


### Assembly QC
rule sga_checkm:
input:
contigs=expand(
ISOLATE_FP / "shovill" / "{sample}" / "contigs.fa", sample=Samples
),
output:
quality_report=ISOLATE_FP / "checkm" / "quality_report.tsv",
params:
opts=Cfg["sbx_sga"]["example_rule_options"],
ref=Cfg["sbx_sga"]["checkm_ref"],
log:
LOG_FP / "sga_checkm.log",
benchmark:
BENCHMARK_FP / "sga_checkm.tsv"
conda:
"envs/sbx_sga_env.yml"
container:
f"docker://sunbeamlabs/sbx_sga:{SBX_TEMPLATE_VERSION}"
"envs/checkm2.yml"
shell:
"cat {params.opts} {input} >> {output} 2> {log}"
"""
checkm2 predict \\
-x fa \\
-i $(dirname {input.contigs[0]}) \\
-o $(dirname {output.quality_report}) \\
--database_path {params.ref} \\
&> {log}
"""


rule example_with_script:
"""Take in big_file1 and then ignore it and write the results of `samtools --help` to the output using a python script"""
rule sga_quast:
input:
QC_FP / "mush" / "big_file1.txt",
contigs=ISOLATE_FP / "shovill" / "{sample}" / "contigs.fa",
output:
QC_FP / "mush" / "big_file.txt",
quast_dir=ISOLATE_FP / "quast" / "{sample}" / "report.tsv",
log:
LOG_FP / "example_with_script.log",
LOG_FP / "sga_quast_{sample}.log",
benchmark:
BENCHMARK_FP / "example_with_script.tsv"
BENCHMARK_FP / "sga_quast_{sample}.tsv"
conda:
"envs/sbx_sga_env.yml"
container:
f"docker://sunbeamlabs/sbx_sga:{SBX_TEMPLATE_VERSION}"
script:
"scripts/example_with_script.py"
"envs/quast.yml"
shell:
"""
quast.py \\
-o $(dirname {output.quast_dir}) \\
{input.contigs} \\
&> {log}
"""


### Typing
rule sga_mlst:
input:
contigs=expand(
ISOLATE_FP / "shovill" / "{sample}" / "contigs.fa", sample=Samples
),
output:
mlst=ISOLATE_FP / "mlst" / "mlst_report.tsv",
log:
LOG_FP / "sga_mlst.log",
benchmark:
BENCHMARK_FP / "sga_mlst.tsv"
conda:
"envs/mlst.yml"
shell:
"""
mlst {input.contigs} > {output.mlst} 2> {log}
"""


### Annotation
rule sga_bakta:
input:
contigs=ISOLATE_FP / "shovill" / "{sample}" / "contigs.fa",
output:
bakta=ISOLATE_FP / "bakta" / "{sample}" / "{sample}.txt",
params:
ref=Cfg["sbx_sga"]["bakta_ref"],
log:
LOG_FP / "sga_bakta_{sample}.log",
benchmark:
BENCHMARK_FP / "sga_bakta_{sample}.tsv"
conda:
"envs/bakta.yml"
shell:
"""
bakta --db {params.ref} \\
--output $(dirname {output.bakta}) \\
--prefix {wildcards.sample} \\
--skip-plot {input.contigs} \\
&> {log}
"""


### AMR Profiling
rule sga_abritamr:
input:
contigs=ISOLATE_FP / "shovill" / "{sample}" / "contigs.fa",
output:
abritamr=ISOLATE_FP / "abritamr" / "{sample}" / "amrfinder.out",
log:
LOG_FP / "sga_abritamr_{sample}.log",
benchmark:
BENCHMARK_FP / "sga_abritamr_{sample}.tsv"
conda:
"envs/abritamr.yml"
shell:
"""
abritamr run \\
--contigs {input.contigs} \\
--prefix {output.abritamr} \\
&> {log}
"""
10 changes: 0 additions & 10 deletions scripts/example_with_script.py

This file was deleted.

0 comments on commit 4860cea

Please sign in to comment.