Skip to content
kwrodarmer edited this page Aug 20, 2019 · 46 revisions

Welcome to the sra-tools wiki!

ANNOUNCEMENTS:

2019-08-19
We have released 2.10.0 of sra-tools that operate natively within AWS and GCP cloud environments. Most of the functionality you are accustomed to has been preserved, although there are a few changes.

  1. This release allows access to public SRA data stored within cloud buckets, now including the ability to retrieve original submission files (raw, unharmonized, no error correction) with prefetch.
  2. The local caching model has changed to support original submission files: we have introduced the accession directory for prefetch that will contain any files you have requested related to a particular accession.
  3. Contrary to prior behavior, if you have not specifically established a designated cache area, prefetch will use the accession-directory.
  4. Similarly, the converter (dumper) tools will make use of a process-local temporary cache area unless you have configured the toolkit for a specific cache. NB - this behavior will temporarily use more local space, but is preferred for cluster operation.
  5. Access to data within the cloud will generally require setting up cloud-specific account credentials and making them known to the toolkit via vdb-config. The tools will not send out any credentials until you have agreed to accept charges within vdb-config. Your account information is required so that the cloud provider may assess egress charges and is not used in any way by NCBI or transmitted for any other purpose.
  6. Access to cloud data from within a region that would not incur egress charges may be allowed without account credentials - as a special exception. In this case, you may configure the toolkit (using vdb-config) to send a cloud service provided environment credential as proof of your execution environment.

With release 2.9.1 of sra-tools we have finally made available the tool fasterq-dump, a replacement for the much older fastq-dump tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump, which performs joins on a per-record basis (and is single-threaded).

fastq-dump is still supported as it handles more corner cases than fasterq-dump, but it is likely to be deprecated in the future.

You can get more information about fasterq-dump in this Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump.