No error during align-DNA failure #252

alkaZeltser · 2022-12-06T22:15:06Z

Describe the issue
I am running some new samples through the metapipeline and trying to test various partitions (F72/F32/F16) to find the minimum requirement for my dataset. I found no issues when running on F72, however the other two partitions result in a failure during the align-DNA process. The pipeline stops and errors out, but no descriptive error message from BWA-MEM is returned, so trouble-shooting is difficult. The failure occurred about 5 hours into F32 alignment and 12 hours into F16 alignment. No completed BAMs were returned.

The test sample I'm using is from the recently registered /hot/data/PRAD/PRAD0000068
It is a single germline WGS sample (not tumor-normal pair).
More info here: https://github.com/uclahs-cds/dataset-register-file/pull/116

From successfully completed F72 test runs, I know that the aligned BAM of this sample is 110G - quite large.
I suspect this is a resource issue, but would be nice to get a definitive error message from the aligner on why it stops.

Error messages in logs:

executor >  local (2), slurm (1)
[c2/74efbe] process > create_input_csv_metapipeli... [100%] 1 of 1 ✔
[24/73ecf1] process > create_config_json             [100%] 1 of 1 ✔
[54/a283d8] process > call_metapipeline_DNA (1)      [100%] 1 of 1, failed: 1 ✔
[54/a283d8] NOTE: Process `call_metapipeline_DNA (1)` terminated with an error exit status (1) -- Error is ignored

Dec-05 19:39:38.224 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'align_DNA:call_align_DNA (1)'

Caused by:
  Process `align_DNA:call_align_DNA (1)` terminated with an error exit status (1)

Command executed:

  nextflow run         /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/../../external/pipeline-align-DNA/main.nf         --sample_name EZPRLPUV000001-N001-B01-F         --aligner BWA-MEM2          --enable_spark true --mark_duplicates true --reference_fasta_bwa /hot/ref/tool-specific-input/BWA-MEM2-2.2.1/GRCh38-BI-20160721/index/genome.fa         --output_dir $(pwd)         --work_dir /scratch         --input_csv EZPRLPUV000001-N001-B01-F_align_DNA_input.csv         -c /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/module/align_DNA/default.config

Command exit status:
  1

Command output: and Command error: are empty lines.

Pipeline release version: metapipeline-DNA: 3.0.0 align-DNA: 8.1.0
Cluster you are using (SGE/Slurm-Dev/Slurm-Test): Slurm-Dev
Node type (F2s (lowmem) / F72s (midmem) / M64s (execute)): F2 leading node, F32 and F16 work nodes.
Submission method (interactive/submission script): submission script
Actual submission script (python submission script, "nextflow run ...", etc.): /hot/user/nzeltser/project-disease-ProstateTumor-PRAD-000110-URGGermlineWGS/script/run-metapipeline.sh
Sbatch or qsub command and logs if applicable:
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/EZPRLPUV-test-F16.log
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/EZPRLPUV-test-F32.log
Config files:
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F16.config
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F32.config
Path to the working directory
Any logs produced by the pipeline
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/68/a854a5e349c2880341b4165070e5db
/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/54/a283d82752cf67f95b7f3514c1443b

To Reproduce

Run

python3 /hot/user/nzeltser/tool-submit-nf/submit_nextflow_pipeline.py \
    --nextflow_script /hot/software/pipeline/metapipeline-DNA/Nextflow/release/3.0.0/main.nf \
    --nextflow_config /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_meta-pipeline_F32.config \
    --pipeline_run_name F32-TEST \
    --partition_type F2 \
    --nextflow_yaml /hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/input/EDRN-Zeltser-PRAD-LPUV/EDRN-Zeltser-PRAD-LPUV_one_sample_test.yaml

Wait 5 hours.

Expected behavior
I don't actually expect this size of a sample to complete on an F16 node, maaaybe an F32, but I do expect an error message telling me why it failed.

The text was updated successfully, but these errors were encountered:

yashpatel6 · 2022-12-06T22:19:27Z

This issue should really be in the align-DNA repository; this isn't an issue with the metapipeline itself.

alkaZeltser · 2022-12-06T22:20:57Z

This issue should really be in the align-DNA repository; this isn't an issue with the metapipeline itself.

True, should I copy it over and remove this one?

yashpatel6 · 2022-12-06T22:22:37Z

You should be able to Transfer issue with the option on the column to the right so you won't have to copy-and-paste/delete issues anywhere.

jarbet · 2022-12-09T17:19:20Z

@alkaZeltser: can you try directly running align-DNA instead of meta-pipeline and see if it gives a more informative error?

Also:

I see you are using align-DNA v8.1.0, can you try 9.0.0?
Can you send me the location of an align-DNA input.csv file for the fastq files you are testing?

alkaZeltser · 2022-12-09T20:07:52Z

@alkaZeltser: can you try directly running align-DNA instead of meta-pipeline and see if it gives a more informative error?

I could.. but Paul told me to document the issue and move on :D
But I support anyone else's attempts if they so choose.

I see you are using align-DNA v8.1.0, can you try 9.0.0?

I'm using what the metapipeline is pointing to, which is uclahs-cds/pipeline-align-DNA: 8.1.0

Can you send me the location of an align-DNA input.csv file for the fastq files you are testing?

Here is the csv file generated by the metapipeline for my test sample:

/hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/00/922554427af447b234b14e51ebce1f/EZPRLPUV000001-N001-B01-F_metapipeline_DNA_input.csv

jarbet · 2022-12-12T17:19:01Z

I'm getting the following error when I test on F16 or F32 using v9.0.0 or the current branch:

BWA-MEM2
- input csv: //hot/project/disease/ProstateTumor/PRAD-000110-URGGermlineWGS/meta-pipeline/output/EDRN-Zeltser-PRAD-LPUV/test/00/922554427af447b234b14e51ebce1f/EZPRLPUV000001-N001-B01-F_metapipeline_DNA_input.csv
- config: /hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-no-error/BWA-MEM2.config
- output: /hot/software/pipeline/pipeline-align-DNA/Nextflow/development/unreleased/jarbet-no-error/align-DNA-9.0.0/test/log-align-DNA-9.0.0-20221212T031631Z


Error executing process > 'align_DNA_BWA_MEM2_workflow:run_MarkDuplicatesSpark_GATK'

Caused by:
  Process `align_DNA_BWA_MEM2_workflow:run_MarkDuplicatesSpark_GATK` input file name collision -- There are multiple input files for each of the following file names: BWA-MEM2-2.2.1_0000068_test_382644260-L002-sorted.bam, BWA-MEM2-2.2.1_0000068_test_382644260-L001-sorted.bam, BWA-MEM2-2.2.1_0000068_test_382644260-L003-sorted.bam, BWA-MEM2-2.2.1_0000068_test_382644260-L004-sorted.bam

When checking the nextflow html report, there are no "failed" tasks. However, there is 1 "aborted" task for pipeval's remove_intermediate_files.

@yashpatel6 : did something change in pipeval recently that could be causing this error?

nkwang24 · 2023-05-08T03:38:48Z

I believe this is related to #229. When it fails during Spark, it seems like Spark isn't able to return the corresponding error message back to the main process resulting in no error message in the log.

alkaZeltser transferred this issue from uclahs-cds/metapipeline-DNA Dec 6, 2022

jarbet self-assigned this Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No error during align-DNA failure #252

No error during align-DNA failure #252

alkaZeltser commented Dec 6, 2022 •

edited

Loading

yashpatel6 commented Dec 6, 2022

alkaZeltser commented Dec 6, 2022

yashpatel6 commented Dec 6, 2022

jarbet commented Dec 9, 2022 •

edited

Loading

alkaZeltser commented Dec 9, 2022

jarbet commented Dec 12, 2022

nkwang24 commented May 8, 2023

No error during align-DNA failure #252

No error during align-DNA failure #252

Comments

alkaZeltser commented Dec 6, 2022 • edited Loading

yashpatel6 commented Dec 6, 2022

alkaZeltser commented Dec 6, 2022

yashpatel6 commented Dec 6, 2022

jarbet commented Dec 9, 2022 • edited Loading

alkaZeltser commented Dec 9, 2022

jarbet commented Dec 12, 2022

nkwang24 commented May 8, 2023

alkaZeltser commented Dec 6, 2022 •

edited

Loading

jarbet commented Dec 9, 2022 •

edited

Loading