Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

captus generating empty files on HPC #4

Open
korjent opened this issue Nov 24, 2023 · 5 comments
Open

captus generating empty files on HPC #4

korjent opened this issue Nov 24, 2023 · 5 comments

Comments

@korjent
Copy link

korjent commented Nov 24, 2023

Hi Team,

I am trying to run captus on HPC, 1 node, multiple cores, and basically not doing it on my own computer.
All seems to be installed normally, works..
I clean, all good.
Then assemble > and i get all empty folders, even truing the tutorial dataset.
any advice?

(CAPTUS) [a1638383@p2-log-2 captus_test]$ captus_assembly assemble -r 01_clean_reads

Starting Captus-assembly: ASSEMBLE (2023-11-24 17:05:43)
Welcome to the de novo assembly step of Captus-assembly. In this step,
Captus will use MEGAHIT to assemble your input reads. It is also possible to
subsample a number of reads using reformat.sh from BBTools prior to assembly,
this is useful while performing tests or when including samples with
considerably higher sequencing depth in a dataset.
Since you provided a directory name, Captus will look in that location for
all the FASTQ files that contain the string '_R1' in their names and match them
with their respective '_R2' pairs. If the '_R2' can not be found, the sample is
treated as single-end. Sample names are derived from the text found before the
'_R1' string.The full set of reads per sample will be assembled, no subsampling
will be performed.
For more information, please see https://github.com/edgardomortiz/Captus

   Captus version: v1.0.0
          Command: /home/a1638383/mambaforge/envs/CAPTUS/bin/captus_assembly assemble -r 01_clean_reads
         Max. RAM: 248.7GB (out of 251.2GB)
     Max. Threads: 72 (out of 72)

     Dependencies:
          MEGAHIT: v1.2.9 OK
  megahit_toolkit: v1.2.9 OK
          BBTools: not used

 Python libraries:
            numpy: v1.26.0 OK
           pandas: v2.1.3 OK
           plotly: v5.18.0 OK

 Output directory: /hpcfs/users/a1638383/DeMux_Trim_runs/captus_test/02_assemblies
                   Output directory successfully created

Subsampling Reads with reformat.sh (2023-11-24 17:05:44)
Now Captus will randomly subsample 0 read pairs (or single-end reads) from
each sample prior to de novo assembly with MEGAHIT.

Skipping read subsampling step... (to enable provide a number of reads to subsample with '--sample_reads_target')

De Novo Assembly with MEGAHIT (2023-11-24 17:05:44)
Now Captus will perform de novo assembly with your input reads using
MEGAHIT. Both '--min_contig_len' (when set to 'auto') and '--k_list' will be
adjusted sample-wise according to the mean read length of the sample's FASTQ
files.

Concurrent assemblies: 4
RAM per assembly: 62.2GB
Threads per assembly: 18

           preset: CAPSKIM
           k_list: 31,39,47,63,79,95,111,127,143,159,175
        min_count: 2
      prune_level: 2
      merge_level: 20,0.95
   min_contig_len: auto
    max_contig_gc: 100.0%
    extra_options: None
          tmp_dir: /home/a1638383/captus_megahit_tmp

  Overwrite files: False
   Keep all files: False

Samples to assemble: 4

Output directories: /hpcfs/users/a1638383/DeMux_Trim_runs/captus_test/02_assemblies/[Sample_name]__captus-asm/01_assembly
A directory will be created for each sample

De novo assembling with MEGAHIT:
0%| | 0/4 [00:03<?, ?sample/s]
└─→ De novo assembly completed for 4 sample(s) [3.546s]

Skipping summarization step... (no assembly statistics files were produced)

MEGAHIT temporary directory '/home/a1638383/captus_megahit_tmp' deleted

Captus-assembly: ASSEMBLE -> successfully completed [4.899s] (2023-11-24 17:05:48)

@edgardomortiz
Copy link
Owner

Well, it seems like a MEGAHIT issue:
First, that workstation is a Mac? if so please check the note in the README about MEGAHIT version for Mac
Second, if you still have the folder of the assemblies, could you attach here the MEGAHIT logs for one sample? (you can find them inside any sample's folder, megahit_brief.log, and megahit_full.log)
Third, if you already erased the folder could you run Captus again using --debug to see more extensive error messages? (but also send me any sample's MEGAHIT logs)
Finally it could also be an issue that you don't have enough free space in the HOME folder (but this is more unlikely for the test data)

Thanks,

Edgardo

@korjent
Copy link
Author

korjent commented Nov 24, 2023

Hi Edgar,

Thank you for your prompt reply!
I ran it again with —debug
I also ran the commands for mac you suggested, although the HPC is not Mac. I use the terminal on mac to access.

attached is the folder I got for the tutorial..

Hope this helps…

cheers

Kor
[02_assemblies.zip](https://github.com/edgardomortiz/Captus/files/13457290/02_assemblies.
02_assemblies.zip
zip)

@edgardomortiz
Copy link
Owner

I see, then you reinstalled MEGAHIT and now it works? what version did you have before? I would like to replicate the issue to warn other users.

I guess with the new MEGAHIT you can now remove --debug and everything should work normally.

Edgardo

@korjent
Copy link
Author

korjent commented Dec 5, 2023

Hi Eduardo.
I did try something mine and got this:

�[93m�[1m�[4mStarting Captus-assembly: ASSEMBLE�[0m �[2m(2023-12-05 15:48:23)�[0m
�[2m Welcome to the de novo assembly step of Captus-assembly. In this step,�[0m
�[2mCaptus will use MEGAHIT to assemble your input reads. It is also possible to�[0m
�[2msubsample a number of reads using reformat.sh from BBTools prior to assembly,�[0m
�[2mthis is useful while performing tests or when including samples with�[0m
�[2mconsiderably higher sequencing depth in a dataset.�[0m
�[2m Since you provided a directory name, Captus will look in that location for�[0m
�[2mall the FASTQ files that contain the string '_R1' in their names and match them�[0m
�[2mwith their respective '_R2' pairs. If the '_R2' can not be found, the sample is�[0m
�[2mtreated as single-end. Sample names are derived from the text found before the�[0m
�[2m'_R1' string.The full set of reads per sample will be assembled, no subsampling�[0m
�[2mwill be performed.�[0m
�[2m For more information, please see https://github.com/edgardomortiz/Captus�[0m

   Captus version: �[1mv1.0.0�[0m
          Command: �[1m/home/a1638383/mambaforge/envs/CAPTUS/bin/captus_assembly assemble -r 01_clean_reads�[0m
         Max. RAM: �[1m186.7GB�[0m �[2m(out of 188.6GB)�[0m
     Max. Threads: �[1m80�[0m �[2m(out of 80)�[0m

     Dependencies:
          MEGAHIT: �[1mv1.2.9�[0m �[32m�[1mOK�[0m
  megahit_toolkit: �[1mv1.2.9�[0m �[32m�[1mOK�[0m
          BBTools: �[2mnot used�[0m

 Python libraries:
            numpy: �[1mv1.26.0�[0m �[32m�[1mOK�[0m
           pandas: �[1mv2.1.3�[0m �[32m�[1mOK�[0m
           plotly: �[1mv5.18.0�[0m �[32m�[1mOK�[0m

 Output directory: �[1m/hpcfs/users/a1638383/DeMux_Trim_runs/HybCap113/02_assemblies�[0m
                   �[2mOutput directory successfully created�[0m

�[93m�[1m�[4mSubsampling Reads with reformat.sh�[0m �[2m(2023-12-05 15:48:24)�[0m
�[2m Now Captus will randomly subsample 0 read pairs (or single-end reads) from�[0m
�[2meach sample prior to de novo assembly with MEGAHIT.�[0m

�[31mSkipping read subsampling step... (to enable provide a number of reads to subsample with '--sample_reads_target')�[0m

�[93m�[1m�[4mDe Novo Assembly with MEGAHIT�[0m �[2m(2023-12-05 15:48:24)�[0m
�[2m Now Captus will perform de novo assembly with your input reads using�[0m
�[2mMEGAHIT. Both '--min_contig_len' (when set to 'auto') and '--k_list' will be�[0m
�[2madjusted sample-wise according to the mean read length of the sample's FASTQ�[0m
�[2mfiles.�[0m

Concurrent assemblies: �[1m20�[0m
RAM per assembly: �[1m9.3GB�[0m
Threads per assembly: �[1m4�[0m

           preset: �[1mCAPSKIM�[0m
           k_list: �[1m31,39,47,63,79,95,111,127,143,159,175�[0m
        min_count: �[1m2�[0m
      prune_level: �[1m2�[0m
      merge_level: �[1m20,0.95�[0m
   min_contig_len: �[1mauto�[0m
    max_contig_gc: �[1m100.0�[0m%
    extra_options: �[1mNone�[0m
          tmp_dir: �[1m/home/a1638383/captus_megahit_tmp�[0m

  Overwrite files: �[1mFalse�[0m
   Keep all files: �[1mFalse�[0m

Samples to assemble: �[1m48�[0m

Output directories: �[1m/hpcfs/users/a1638383/DeMux_Trim_runs/HybCap113/02_assemblies/[Sample_name]__captus-asm/01_assembly�[0m
�[2mA directory will be created for each sample�[0m

�[1mDe novo assembling with MEGAHIT:�[0m

0%| | 0/48 [00:00<?, ?sample/s]
0%| | 0/48 [00:04<?, ?sample/s]
�[1m └─→ De novo assembly completed for 48 sample(s) [4.560s]�[0m

�[31mSkipping summarization step... (no assembly statistics files were produced)�[0m

MEGAHIT temporary directory '/home/a1638383/captus_megahit_tmp' deleted

�[93m�[1m�[4mCaptus-assembly: ASSEMBLE -> successfully completed [5.982s]�[0m �[2m(2023-12-05 15:48:29)�[0m


@korjent
Copy link
Author

korjent commented Dec 5, 2023

Not sure if I actually updated megahit?

cheers

Kor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants