Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple query (destination) mapping to same target (source) violates assumption #12

Open
jblachly opened this issue May 28, 2019 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@jblachly
Copy link
Member

TL;DR: Situation of query->target relationship being surjective, rather than bijective is ambiguous: when the liftover is done from target->query, which query result(s) should be returned, and how?

This is in some ways also an interval tree issue, but should be probably handled at the level of swiftover because intervaltree is suitably generic and in swiftover we have the ability to define opCmp and opEquals

Specifically, when trying to insert an interval with identical tStart,tEnd coordinates as an interval already in the tree, the intervaltree assertion in insert that i == current.interval fails when there is a different qStart.

This could happen if >1 regions in the query(destination) map precisely to the same region in target (source). Imagine a duplicated gene with 2 copies in build hg38, but 1 copy in hg19. Two identical segments of identical length may map to teh same segment in hg19. This will cause an assertion failure because opCmp calls them equal (ret 0) but == operator recognizes that the ChainLinks are not truly identical (different qStart).

Here is an example from Esko's chains:

jsbmbp13:~/Documents/Development/blachlylab/swiftover$ lldb -- ./swiftover -t bed -c resources/esko/subset_chains.chain -i resources/esko/chromosome_20_genes.bed -o /tmp/out -u /tmp/un && wc -l /tmp/out /tmp/un
(lldb) target create "./swiftover"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 52, in <module>
    import weakref
  File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/weakref.py", line 14, in <module>
    from _weakref import (
ImportError: cannot import name _remove_dead_weakref
Current executable set to './swiftover' (x86_64).
(lldb) settings set -- target.run-args  "-t" "bed" "-c" "resources/esko/subset_chains.chain" "-i" "resources/esko/chromosome_20_genes.bed" "-o" "/tmp/out" "-u" "/tmp/un"
(lldb) b splaytree.d:679
Breakpoint 2: where = swiftover`_D12intervaltree9splaytree__T17IntervalSplayTreeTS9swiftover5chain9ChainLinkZQBw6insertMFNbQBqZPSQDrQDg__T16IntervalTreeNodeTQCyZQx + 817 at splaytree.d:679:21, address = 0x000000010005f181
(lldb) r
Process 42797 launched: '/Users/james/Documents/Development/blachlylab/swiftover/swiftover' (x86_64)
i != current.interval!
i: 60881262-60881613 -> (contig):9552164-9552515
error: need to add support for DW_TAG_base_type 'immutable(char)' encoded with DW_ATE = 0x10, bit_size = 8
Process 42797 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x000000010005f181 swiftover`_D12intervaltree9splaytree__T17IntervalSplayTreeTS9swiftover5chain9ChainLinkZQBw6insertMFNbQBqZPSQDrQDg__T16IntervalTreeNodeTQCyZQx(this=0x0000000104a40900, i=ChainLink @ 0x00007ffeefbfe7e0) at splaytree.d:679:21
   676 	                if (i != current.interval)
   677 	                {
   678 	                    printf("i != current.interval!\ni: %d-%d -> (contig):%d-%d\n", i.tStart, i.tEnd, i.qStart, i.qEnd );
-> 679 	                    printf("current.interval: %d-%d\n", current.interval.start, current.interval.end);
   680 	                }
   681 	                assert(i == current.interval, "Failed assumption: i == current.interval");
   682 	                splay(current);
Target 0: (swiftover) stopped.
(lldb) frame var
(IntervalSplayTree!(ChainLink) *) this = 0x0000000104a40900
(ChainLink) i = {
  tStart = 60881262
  tEnd = 60881613
  qStart = 9552164
  qContig = (length = 10, ptr = 0x000000010283bac0)
  invert = '\x01'
}
(IntervalTreeNode!(ChainLink) *) current = 0x0000000103858a80
(lldb) frame var current->interval
(ChainLink) current->interval = {
  tStart = 60881262
  tEnd = 60881613
  qStart = 1055
  qContig = (length = 10, ptr = 0x000000010283b7b0)
  invert = '\x01'
}
(lldb) frame var i.qContig.ptr -f s
(void *) i.qContig.ptr = "utg000007l\xffffffb0"
(lldb) frame var current->interval.qContig.ptr -f s
(void *) current->interval.qContig.ptr = "utg000002l\xffffffb0"

The i interval comes from this chain:
chain 569211 20 64444167 + 60314367 60885787 utg000007l 9556681 + 8985467 9556681 86

Whereas current.interval, which already exists in the tree, comes from a chain with qContig utg000002l (17 possibilities)


The assertion error can be fixed by simply having ChainLink::opCmp order them separately (by qContig,qStart), but then only a single result may be returned depending on how the intervaltree's "find" operation is implemented; when query->target is surjective and not bijective , the target->query liftover may not return all outputs.

On the other hand, if we leave opCmp alone (continue to return 0) but redefine opEquals to consider them equal, we need to make sure the intervaltree can return more than one result, even when queried directly, the latter of which is nonsensical if we are trying to retrieve precisely a specific interval (including specific qContig) from the tree.

Will try option 1: Modify opCmp to order by (qContig,qStart). Leave issue open until verify that it returns all intervals in query (Dest) for single target interval (source)

@jblachly jblachly added the bug Something isn't working label May 28, 2019
jblachly added a commit that referenced this issue May 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants