-
Notifications
You must be signed in to change notification settings - Fork 251
Home
Welcome to the sra-tools wiki!
2019-08-19
We have released 2.10.0 of sra-tools
that operate natively within AWS and GCP cloud environments. Most of the functionality you are accustomed to has been preserved, although there are a few changes.
- This release allows access to public SRA data stored within cloud buckets, now including the ability to retrieve original submission files (raw, unharmonized, no error correction) with
prefetch
. - The local caching model has changed to support original submission files: we have introduced the accession directory for
prefetch
that will contain any files you have requested related to a particular accession. - Contrary to prior behavior, if you have not specifically established a designated cache area,
prefetch
will use the accession-directory. - Similarly, the converter (dumper) tools will make use of a process-local temporary cache area unless you have configured the toolkit for a specific cache. NB - this behavior will temporarily use more local space, but is preferred for cluster operation.
- Access to data within the cloud will generally require setting up cloud-specific account credentials and making them known to the toolkit via
vdb-config
. The tools will not send out any credentials until you have agreed to accept charges withinvdb-config
. Your account information is required so that the cloud provider may assess egress charges and is not used in any way by NCBI or transmitted for any other purpose. - Access to cloud data from within a region that would not incur egress charges may be allowed without account credentials - as a special exception. In this case, you may configure the toolkit (using
vdb-config
) to send a cloud service provided environment credential as proof of your execution environment.
With release 2.9.1 of sra-tools
we have finally made available the tool fasterq-dump
, a replacement for the much older fastq-dump
tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump
is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump
, which performs joins on a per-record basis (and is single-threaded).
fastq-dump
is still supported as it handles more corner cases than fasterq-dump
, but it is likely to be deprecated in the future.
You can get more information about fasterq-dump
in this Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump.