-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GRIDSS preprocessing #175
Conversation
config/F16.config
Outdated
@@ -4,6 +4,17 @@ process { | |||
memory = 1.GB | |||
} | |||
|
|||
withName: preprocess_BAM_GRIDSS { | |||
cpus = 1 | |||
memory = 26.GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like only 6GB is utilized. Testing a run with 4 CPUs and 10GB of memory. Will update this thread once done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test successful with the recommended 4CPUs by GRIDSS.
/hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/preprocess-4cpu-10GBMem.log
e05cc69
to
2137ff1
Compare
@@ -189,4 +200,12 @@ workflow { | |||
call_sSV_Manta.out.manta_vcfs.flatten() | |||
) | |||
} | |||
if ('gridss2' in params.algorithm) { | |||
preprocess_BAM_GRIDSS( | |||
gridss_ch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Question | non-blocking): I'm trying to confirm that the preprocess_BAM_GRIDSS
process is functioning as you want it to, but I need more context about the intended behavior.
How many times is preprocess_BAM_GRIDSS
expected to run? It seems gridss is sample level tuple of [id, path, index] is this pipeline going to be run with multiple samples? I am not familiar with the tools being used here and I'm trying to understand how the sample(s) relate to thegridss_reference_fasta
andgridss_reference_files
channels. The preprocess_BAM_GRIDSS
treats the corresponded expected input from these channels as paths. However, from what I understand these channels will be lists due to the .collect()
function. Are they singleton lists which act as channels consumed once? If there will be multiple samples are these singletons meant to be used for each sample? Or do the lists have many elements meant to be consumed one at a time as a channel? Does their order matter/do they need to be synced with the sample passed?
input: | ||
tuple(val(sample_id), path(sample_bam), path(sample_index)) | ||
path(gridss_reference_fasta) | ||
path(gridss_reference_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Question | non-blocking): Is gridss_reference_files
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor question and can you also address @kiarod's questions?
GRIDSS takes a single tumor/normal pair for variant calling. However, preprocess_BAM_GRIDSS is just step 1 of 3 in the GRIDSS workflow which produces intermediate files per given BAM. Meaning, it produces intermediate BAMs for Normal and Tumor BAM due to which this process takes one BAM at a time as input.
Two times, one for normal and one for tumor.
The process requires one BAM input to be passed with a FASTA reference and additional reference files (
|
F32 test - /hot/software/pipeline/pipeline-call-sSV/Nextflow/development/unreleased/mmootor-add-GRIDSS/ILHNLNEV000002-T001-P01-F.F32/F32.test.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Anything else to add @kiarod ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Add GRIDSS preprocessing step.
Closes #176
Testing Results
Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline on at least one A-mini sample with
algorithm = ['delly', 'manta']
. The paths to the test config files and output directories are attached above.