Skip to content

Releases: edgardomortiz/Captus

Captus v1.2.0

09 Mar 19:56
Choose a tag to compare
  • Extraction speed has been improved considerably by applying a pre-filter to the PSL hits produced by BLAT. Irrelevant hits are removed earlier as well as cross-loci overlaps (except for some plastome genes that are known to overlap) under the assumption that two different loci in a target file represent non-overlapping regions in the genome, which is what we expect if the loci were carefully selected for phylogenomics.
  • The depth of coverage filter for the extraction step has been completely rewritten. The new filter is applied locus-wise, Captus will use the depth of the contig where the best hit for the locus was found to determine the minimum depth allowed for other contigs to be considered as part of the locus using the formula 10^(log(depth of contig with best hit)/depth_tolerance), where depth_tolerance can be given by the user with --nuc_depth_tolerance, --ptd_depth_tolerance, --mit_depth_tolerance, --dna_depth_tolerance, --clr_depth_tolerance. The default depth_tolerance of 2.0 means that if the best hit for a locus was found in a contig with depth of coverage of 100.0, the minimum depth allowed for other contigs will be 10^(log(100)/2.0) = 10.0 (or one order of magnitude smaller), but for example, for another contig with a best hit that has a depth of 275.0 the minimum allowed depth becomes 16.6. Accepted tolerance values are decimals greater or equal than 1.0, where 1.0 is the most strict tolerance (because it will retain only the contig with the best hit and other contigs with higher depth). To replicate the behavior of versions before 1.1.0 use --ignore_depth in your captus extract command.
  • Default clustering parameters changed (--cl_max_copies reduced from 5 to 3; also when auto is used for --cl_min_samples a sample proportion of 0.66 is chosen instead 0.30) to produce fewer clusters that includes more samples while keeping potential paralogy lower.
  • Installation instructions for Macs with Apple Silicon have been added to the README file.

Captus v1.1.2

24 Feb 02:21
Choose a tag to compare
  • Ability to restart incomplete assemblies automatically (just rerun same assemble command on the directory where assembly was interrupted)
  • Defaults for all depth_tolerance arguments have been increased to 20 until more tests are performed
  • Tweaking of settings for second round of Scipio
  • Despite choosing '--ignore_depth' the logs were showing the '--depth_tolerance' parameters. Now a message indicates the filters are disabled
  • Fix small bugs

Captus v1.1.1

29 Jan 02:52
Choose a tag to compare
  • If RAM is set to auto Captus will use 70% of available RAM for Java programs to avoid allocation errors
  • Empty assemblies produced by MEGAHIT are now logged as FAILED, and skipped for depth calculation and filtering
  • Code has been reformatted with the Ruff extension in VSCode

Captus v1.1.0

23 Dec 17:49
Choose a tag to compare

New in the assemble module:

  • Contig depth of coverage is now calculated by mapping the reads back to the contigs using Salmon right after the assembly with MEGAHIT. This is now the default behavior unless --disable_mapping is enabled.
  • The assembly is then automatically filtered by depth of contig, if --disable_mapping is used then only contigs with depth of coverage >1x are retained, otherwise contigs with depth of coverage >=1.5x are retained. The filtering threshold for depth can be changed with --min_contig_depth.
  • To replicate the behavior of previous versions use --disable_mapping --min_contig_depth 0.
  • The filtering can be repeated with --redo_filtering, without the need to reassemble, to try different values for --max_contig_gc and --min_contig_depth.
  • The assembly HTML report has been completely rewritten to reflect these changes.

New in the extract module:

  • Options --nuc_depth_tolerance, --ptd_depth_tolerance, --mit_depth_tolerance, and --dna_depth_tolerance allow to filter contigs by depth of coverage during locus extraction. Among the contigs with hits to a particular marker type (e.g., nuclear), the median of the depths of coverage is calculated and this tolerance factor is used to determine the minimum (median / tolerance) and maximum (median * tolerance) depth allowed. The depth of coverage is taken from the contig names when they contain the pattern _cov_X.XX_.
  • To replicate the behavior of previous versions use --ignore_depth.
  • Added option --disable_stitching. By default, Captus recover a locus across multiple contigs, this option forces the recovery of a locus in a single contig (for example when providing chromosome-level genome assemblies).

Other improvements or additions:

  • The accessory script creates a new reference target file with only the most common target per locus found during the extraction step. This new reference target set can be used to re-extract the loci and potentially improve the informed paralog filtering.
  • All the reports have been updated to include the version and command of Captus used.
  • Updated installation instructions and documentation.
  • Some long output filenames have been shortened.

Captus v1.0.1

02 Mar 19:48
Choose a tag to compare
  • During assembly of hits when extracting a miscellaneous DNA reference target, the delta in identity percentage between two hits to be considered compatible has been reduced from 5% to 3.33%, initial test indicate slight improvement in recovery.
  • In some edge cases, when translating a CDS reference target set, the same nucleotide sequence can produce perfectly translated protein in more than a single reading frame, we give now priority to positive reading frames in case of a tie.
  • Latest pandas versions introduced breaking changes, we provide a fix.
  • When creating a new miscellaneous DNA reference from clustering, each target sequence in a reference locus can have different strands. We add a method to uniformize the strand per reference locus.
  • Added an option to the align step to --only_collect the extracted markers and exit afterwards (requested by Diego Morales)
  • Fixed multiple small bugs.

Captus v1.0.0

21 Nov 10:57
Choose a tag to compare
  • Additional improvements to captusd bait: added options --min_expected_tiling and --remove_ambiguous_loci for the creation of baitsets and their corresponding reference target files.

Captus v0.9.99

14 Nov 15:47
Choose a tag to compare
Captus v0.9.99 Pre-release
  • Now any BUSCO lineage database can be used as reference target file, just download a .tar.gz from and provide the file path for Captus extraction
  • Added shortcut for captus_assembly as simply captus (data assembly)
  • Added entry point for captus_design and a shortcut as captusd (bait design)
  • The cluster step of bait design now reports mean number of copies per locus instead of just classifying it as single- or multi-copy
  • Added a function to create a reference target file (for locus extraction) after bait clustering and tiling
  • Code cleanup and minor cosmetic changes

Captus v0.9.98

30 Oct 11:03
Choose a tag to compare
Captus v0.9.98 Pre-release
  • Fixed potential problem with recognition of _R1. or _R1_ patterns in filenames
  • Support for FastQC v0.12.1 update (s-andrews/FastQC@fbd9cf5)
  • Speed up QC step during cleaning step
  • If the user provides a clustering threshold with --cl_min_identity then the miscellaneous DNA extraction is performed using the same identity.
  • Allow decimals in maximum average number of copies in a cluster via --cl_max_copies
  • Minor cosmetic improvements

Captus v0.9.97

02 Sep 13:56
Choose a tag to compare
Captus v0.9.97 Pre-release
  • Fixed a bug in the extraction report happening when the extraction statistics tables are not sorted. This bug doesn't affect the output at all, just the report heatmap.

Captus v0.9.96

25 Aug 14:57
Choose a tag to compare
Captus v0.9.96 Pre-release
  • Fixed indentation bugs that prevented Falco or FastQC from running during the clean step and the subsampling of reads during the assemble step
  • Secret feature, coding genes databases can also be extracted as nucleotide
  • Code cleanup and minor fixes