Skip to content
Anatoly Boshkin edited this page Dec 10, 2021 · 46 revisions

Welcome to the sra-tools wiki!

ANNOUNCEMENTS:

December 10, 2021

NCBI’s SRA will change the source build system to use CMake in the next toolkit release (date TBD). This change is an important step to improve developers’ productivity as it provides unified cross platform access to support multiple build systems. This change affects developers building NCBI SRA tools from source. Old makefiles and build systems will no longer be supported after we make this change.

This change will also include the structure of GitHub repositories, which will undergo consolidation to provide an easier environment for building tools and libraries (NGS libs and dependencies are consolidated). Consolidation of NGS libraries and dependencies provides better usage scope isolation and makes building more straightforward.

Affected repositories

  1. ncbi/ngs (https://github.com/ncbi/ngs)

    This repository will be frozen, and all the code moved to Github repository ncbi/sra-tools, under subdirectory ngs/. All future modifications
    will take place in sra-tools

  2. ncbi/ncbi-vdb (https://github.com/ncbi/ncbi-vdb)

    This project’s build system will be based on CMake. The libraries supporting access to VDB data via NGS API will be moved to Github repository ncbi/sra-tools.

The projects to move are:

Old location (base URL: https://github.com/ncbi/ncbi-vdb) New location (base URL: https://github.com/ncbi/sra-tools)
libs/ngs ngs/ncbi/ngs
libs/ngs-c++ ngs/ncbi/ngs-c++
libs/ngs-jni ngs/ncbi/ngs-jni
libs/ngs-py ngs/ncbi/ngs-py
libs/vdb-sqlite libs/vdb-sqlite
test/ngs-java test/ngs-java
test/ngs-python test/ngs-python
  1. ncbi/sra-tools (https://github.com/ncbi/sra-tools)

    This project’s build system will be based on CMake. The project will acquire some new components:

    3a) NGS SDK (now under ngs/, formerly in Github repository ncbi/ngs)
    
    3b) NGS-related VDB access libraries and their dependents, formerly in Github repository ncbi/ncbi-vdb, as listed in the table above.
    

SRA Toolkit 2.11.3 October 25, 2021

fixed a bug in fasterq-dump: fasta and fasta-unsorted parameters work correctly.

SRA data are now available either with full base quality scores (SRA Normalized Format), or with simplified quality scores (SRA Lite), depending on user preference. Both formats can be streamed on demand to the same filetypes (fastq, sam, etc.), so they are both compatible with existing workflows and applications that expect quality scores. However, the SRA Lite format is much smaller, enabling a reduction in storage footprint and data transfer times, allowing dumps to complete more rapidly. The SRA toolkit defaults to using the SRA Normalized Format that includes full, per-base quality scores, but users that do not require full base quality scores for their analysis can request the SRA Lite version to save time on their data transfers. To request the SRA Lite data when using the SRA toolkit, set the "Prefer SRA Lite files with simplified base quality scores" option on the main page of the toolkit configuration- this will instruct the tools to preferentially use the SRA Lite format when available (please be sure to use toolkit version 2.11.2 or later to access this feature). The quality scores generated from SRA Lite files will be the same for each base within a given read (quality = 30 or 3, depending on whether the Read Filter flag is set to 'pass' or 'reject'). Data in the SRA Normalized Format with full base quality scores will continue to have a .sra file extension, while the SRA Lite files have a .sralite file extension. For more information please see our data formats page.

SRA Toolkit 2.11.2 October 7, 2021

fasterq-dump: added flexible defline, fasta-unsorted, only-aligned, only-unaligned

fasterq-dump: new output format available: --fasta

fasterq-dump: option -t sets directory of all temp files (including VDB cache)

klib, ngs-tools, sra-tools: status messages (-v) are printed to stderr rather than stdout

ncbi-vdb, ngs-tools, sra-tools, vdb, vfs: added support of SRA Lite files with simplified base quality scores

prefetch, vfs: accept Percent Encoding in source URLs

sam-dump: fixed wrong value for SAM RNEXT field for unaligned records

sra-tools, sratools: verbose option displays information about the type of quality scores contained in the files being used

sratools: temporarily changes quality score preference when quality scores are not being used

sra-tools, vdb: environment variable NCBI_TMP_CACHE sets the caching directory, overriding any other setting

vdb-dump: --info produces valid JSON


With release 2.9.1 of sra-tools we have finally made available the tool fasterq-dump, a replacement for the much older fastq-dump tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump, which performs joins on a per-record basis (and is single-threaded).

fastq-dump is still supported as it handles more corner cases than fasterq-dump, but it is likely to be deprecated in the future.

You can get more information about fasterq-dump in this Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump.