Add GRIDSS preprocessing #175

Faizal-Eeman · 2024-09-19T00:03:45Z

Description

Add GRIDSS preprocessing step.

Closes #176

Testing Results

DNA A-mini
- sample: TWGSAMIN000001-N003-S03-F, TWGSAMIN000001-T003-S03-F
- input yaml: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/TWGSAMIN000001-T003-S03-F.yaml
- config: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/TWGSAMIN000001-T003-S03-F.config
- output: /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/call-sSV-6.1.0/S2_v1.1.5/GRIDSS-2.13.2/intermediate/

Checklist

I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the manifest block in the nextflow.config as part of this pull request, am listed
already, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)
I have tested the pipeline on at least one A-mini sample with algorithm = ['delly', 'manta']. The paths to the test config files and output directories are attached above.

Faizal-Eeman · 2024-09-19T15:58:59Z

config/F16.config

@@ -4,6 +4,17 @@ process {
        memory = 1.GB
        }

+    withName: preprocess_BAM_GRIDSS {
+        cpus = 1
+        memory = 26.GB


looks like only 6GB is utilized. Testing a run with 4 CPUs and 10GB of memory. Will update this thread once done.

Test successful with the recommended 4CPUs by GRIDSS.
/hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/preprocess-4cpu-10GBMem.log

kiarod · 2024-09-26T20:34:18Z

main.nf

@@ -189,4 +200,12 @@ workflow {
            call_sSV_Manta.out.manta_vcfs.flatten()
            )
        }
+    if ('gridss2' in params.algorithm) {
+        preprocess_BAM_GRIDSS(
+            gridss_ch,


(Question | non-blocking): I'm trying to confirm that the preprocess_BAM_GRIDSS process is functioning as you want it to, but I need more context about the intended behavior.

How many times is preprocess_BAM_GRIDSS expected to run? It seems gridss is sample level tuple of [id, path, index] is this pipeline going to be run with multiple samples? I am not familiar with the tools being used here and I'm trying to understand how the sample(s) relate to thegridss_reference_fasta andgridss_reference_files channels. The preprocess_BAM_GRIDSS treats the corresponded expected input from these channels as paths. However, from what I understand these channels will be lists due to the .collect() function. Are they singleton lists which act as channels consumed once? If there will be multiple samples are these singletons meant to be used for each sample? Or do the lists have many elements meant to be consumed one at a time as a channel? Does their order matter/do they need to be synced with the sample passed?

kiarod · 2024-09-26T21:02:34Z

module/gridss.nf

+    input:
+        tuple(val(sample_id), path(sample_bam), path(sample_index))
+        path(gridss_reference_fasta)
+        path(gridss_reference_files)


(Question | non-blocking): Is gridss_reference_files used?

main.nf

module/gridss.nf

… in sample dirs

yashpatel6

One minor question and can you also address @kiarod's questions?

config/F16.config

Faizal-Eeman · 2024-09-27T22:43:04Z

One minor question and can you also address @kiarod's questions?

It seems gridss is sample level tuple of [id, path, index] is this pipeline going to be run with multiple samples?

GRIDSS takes a single tumor/normal pair for variant calling. However, preprocess_BAM_GRIDSS is just step 1 of 3 in the GRIDSS workflow which produces intermediate files per given BAM. Meaning, it produces intermediate BAMs for Normal and Tumor BAM due to which this process takes one BAM at a time as input.

How many times is preprocess_BAM_GRIDSS expected to run?

Two times, one for normal and one for tumor.

I am not familiar with the tools being used here and I'm trying to understand how the sample(s) relate to the gridss_reference_fasta and gridss_reference_files channels

The process requires one BAM input to be passed with a FASTA reference and additional reference files (.fasta.idx, .fasta.gridsscache, .fasta.dict, etc) which are expected to be in the same dir as the FASTA reference.

However, from what I understand these channels will be lists due to the .collect() function. Are they singleton lists which act as channels consumed once? If there will be multiple samples are these singletons meant to be used for each sample? Or do the lists have many elements meant to be consumed one at a time as a channel? Does their order matter/do they need to be synced with the sample passed?

gridss_reference_files is a list because of .collect() which allows all the files in the list to be softlinked in the working dir. The ordering of files in the list doesn't matter as all those files are required to be present in the working dir when the process is run.

Faizal-Eeman · 2024-10-01T04:08:00Z

F32 test - /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/ILHNLNEV000002-T001-P01-F.F32/F32.test.log
F72 test - /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/ILHNLNEV000009-T002-L01-F.F72/F72.test.log

yashpatel6

Looks good! Anything else to add @kiarod ?

kiarod

LGTM

Mootor added 12 commits September 6, 2024 15:47

add gridss nf module

dec0752

add preprocess gridss BAM process and update I/O/Script section

177089c

add gridss version and docker

0945ca0

add preprocess_BAM_GRIDSS

4caf21a

add gridss_reference_diir param to template.config

6493571

input preprocess channels; set variables; publish formatted filenames

b11d459

add gridss2 channels, ref and process call

c3f0e38

add gridss2 to algorithhms in template

25e70d7

update schema YAML

0be354f

add gridss ref dir to schema

7335cd7

Update CHANGELOG.md

a6e420c

update metadata.yaml

e9df509

Faizal-Eeman marked this pull request as ready for review September 19, 2024 15:56

Faizal-Eeman requested a review from a team as a code owner September 19, 2024 15:56

Faizal-Eeman requested a review from yashpatel6 September 19, 2024 15:56

Faizal-Eeman commented Sep 19, 2024

View reviewed changes

Mootor added 2 commits September 19, 2024 11:09

add jvmheap memory arg to gridss

c5534fa

4 CPUs and 10GB memory for gridss preprocessing

2137ff1

Faizal-Eeman force-pushed the mmootor-add-GRIDSS branch from e05cc69 to 2137ff1 Compare September 19, 2024 18:14

yashpatel6 assigned yashpatel6 and kiarod Sep 20, 2024

kiarod reviewed Sep 26, 2024

View reviewed changes

Faizal-Eeman requested a review from kiarod September 26, 2024 21:34

yashpatel6 reviewed Sep 26, 2024

View reviewed changes

main.nf Outdated Show resolved Hide resolved

yashpatel6 reviewed Sep 27, 2024

View reviewed changes

module/gridss.nf Outdated Show resolved Hide resolved

module/gridss.nf Outdated Show resolved Hide resolved

Mootor added 5 commits September 27, 2024 10:03

change gridss ref dir to ref FASTA

63422d4

change gridss ref dir to ref FASTA

44199ae

reconfigure gridss ref files variable

49dc266

fix gridss ref file path

ad67aaa

store intermediate file files in their process and store sample files…

a12d9bc

… in sample dirs

save logs tintask index dir

05805fb

Faizal-Eeman requested a review from yashpatel6 September 27, 2024 19:29

yashpatel6 reviewed Sep 27, 2024

View reviewed changes

config/F16.config Show resolved Hide resolved

Mootor added 2 commits September 27, 2024 15:46

add F32, F72 and M64 configs

432ee5d

update memory in F32, F72 and M64

64337fc

yashpatel6 approved these changes Oct 1, 2024

View reviewed changes

kiarod approved these changes Oct 1, 2024

View reviewed changes

Faizal-Eeman merged commit 253dd16 into main Oct 1, 2024
5 checks passed

Faizal-Eeman deleted the mmootor-add-GRIDSS branch October 2, 2024 19:15

nwiltsie mentioned this pull request Dec 9, 2024

NFTest cases are broken #188

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GRIDSS preprocessing #175

Add GRIDSS preprocessing #175

Faizal-Eeman commented Sep 19, 2024 •

edited

Loading

Faizal-Eeman Sep 19, 2024

Faizal-Eeman Sep 19, 2024

kiarod Sep 26, 2024

kiarod Sep 26, 2024

yashpatel6 left a comment

Faizal-Eeman commented Sep 27, 2024

Faizal-Eeman commented Oct 1, 2024

yashpatel6 left a comment

kiarod left a comment

Add GRIDSS preprocessing #175

Add GRIDSS preprocessing #175

Conversation

Faizal-Eeman commented Sep 19, 2024 • edited Loading

Description

Closes #176

Testing Results

Checklist

Faizal-Eeman Sep 19, 2024

Choose a reason for hiding this comment

Faizal-Eeman Sep 19, 2024

Choose a reason for hiding this comment

kiarod Sep 26, 2024

Choose a reason for hiding this comment

kiarod Sep 26, 2024

Choose a reason for hiding this comment

yashpatel6 left a comment

Choose a reason for hiding this comment

Faizal-Eeman commented Sep 27, 2024

Faizal-Eeman commented Oct 1, 2024

yashpatel6 left a comment

Choose a reason for hiding this comment

kiarod left a comment

Choose a reason for hiding this comment

Faizal-Eeman commented Sep 19, 2024 •

edited

Loading