Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove UR:file:// and UR:ftp:// from ref search path, plus REF_PATH to EBI #1881

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

jkbonfield
Copy link
Contributor

While use of the EBI refget server was originally encouraged by the CRAM inventors, it became a self-imposed DDOS and it is now unreliable due to explicit rate-limiting by the EBI. This removes EBI as a fallback when REF_PATH has not been set.

In doing this we discovered that we could still retrieve references (ironically also from EBI due to the test being a 1000genomes CRAM) via the SQ UR: tag supporting remote URIs. This behaviour is explicitly listed as not being supported in the samtools manpage and we believe it was an accidental ability added when switching from fopen to bgzf_open for reading the UR reference file.

Note this check must be in cram_populate_ref and not load_ref_portion or bgzf_open_ref as the user still has the ability to explicitly request an external reference, eg via "samtools view -T URI".

open_path_mfile() now takes an extra 'int *local' argument which is filled out with non-zero if the file found in REF_PATH is local. Non-local files will be cached to REF_CACHE if set, but it no longer has a default value as we did when ebi refget was the default REF_PATH. This means it should operate much as before, except for the lack of EBI defaults.

…o EBI.

While use of the EBI refget server was originally encouraged by the
CRAM inventors, it became a self-imposed DDOS and it is now unreliable
due to rate limiting by the EBI.  This removes EBI as a fallback when
REF_PATH has not been set.

In doing this we discovered that we could still retrieve references
(ironically also from EBI due to the test being a 1000genomes CRAM)
via the SQ UR: tag supporting remote URIs.  This behaviour is
explicity listed as not being supported in the samtools manpage and we
believe it was an accidental ability added when switching from fopen
to bgzf_open for reading the UR reference file.

Note this check must be in cram_populate_ref and not load_ref_portion
or bgzf_open_ref as the user still has the ability to explicitly
request an external reference, eg via "samtools view -T URI".

open_path_mfile() now takes an extra 'int *local' argument which is
filled out with non-zero if the file found in REF_PATH is local.
Non-local files will be cached to REF_CACHE if set, but it no longer
has a default value as we did when ebi refget was the default REF_PATH.
This means it should operate much as before, except for the lack of
EBI defaults.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants