You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TL;DR: Situation of query->target relationship being surjective, rather than bijective is ambiguous: when the liftover is done from target->query, which query result(s) should be returned, and how?
This is in some ways also an interval tree issue, but should be probably handled at the level of swiftover because intervaltree is suitably generic and in swiftover we have the ability to define opCmp and opEquals
Specifically, when trying to insert an interval with identical tStart,tEnd coordinates as an interval already in the tree, the intervaltree assertion in insert that i == current.interval fails when there is a different qStart.
This could happen if >1 regions in the query(destination) map precisely to the same region in target (source). Imagine a duplicated gene with 2 copies in build hg38, but 1 copy in hg19. Two identical segments of identical length may map to teh same segment in hg19. This will cause an assertion failure because opCmp calls them equal (ret 0) but == operator recognizes that the ChainLinks are not truly identical (different qStart).
Here is an example from Esko's chains:
jsbmbp13:~/Documents/Development/blachlylab/swiftover$ lldb -- ./swiftover -t bed -c resources/esko/subset_chains.chain -i resources/esko/chromosome_20_genes.bed -o /tmp/out -u /tmp/un && wc -l /tmp/out /tmp/un
(lldb) target create "./swiftover"
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 52, in <module>
import weakref
File "/usr/local/Cellar/python@2/2.7.14_3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/weakref.py", line 14, in <module>
from _weakref import (
ImportError: cannot import name _remove_dead_weakref
Current executable set to './swiftover' (x86_64).
(lldb) settings set -- target.run-args "-t" "bed" "-c" "resources/esko/subset_chains.chain" "-i" "resources/esko/chromosome_20_genes.bed" "-o" "/tmp/out" "-u" "/tmp/un"
(lldb) b splaytree.d:679
Breakpoint 2: where = swiftover`_D12intervaltree9splaytree__T17IntervalSplayTreeTS9swiftover5chain9ChainLinkZQBw6insertMFNbQBqZPSQDrQDg__T16IntervalTreeNodeTQCyZQx + 817 at splaytree.d:679:21, address = 0x000000010005f181
(lldb) r
Process 42797 launched: '/Users/james/Documents/Development/blachlylab/swiftover/swiftover' (x86_64)
i != current.interval!
i: 60881262-60881613 -> (contig):9552164-9552515
error: need to add support for DW_TAG_base_type 'immutable(char)' encoded with DW_ATE = 0x10, bit_size = 8
Process 42797 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: 0x000000010005f181 swiftover`_D12intervaltree9splaytree__T17IntervalSplayTreeTS9swiftover5chain9ChainLinkZQBw6insertMFNbQBqZPSQDrQDg__T16IntervalTreeNodeTQCyZQx(this=0x0000000104a40900, i=ChainLink @ 0x00007ffeefbfe7e0) at splaytree.d:679:21
676 if (i != current.interval)
677 {
678 printf("i != current.interval!\ni: %d-%d -> (contig):%d-%d\n", i.tStart, i.tEnd, i.qStart, i.qEnd );
-> 679 printf("current.interval: %d-%d\n", current.interval.start, current.interval.end);
680 }
681 assert(i == current.interval, "Failed assumption: i == current.interval");
682 splay(current);
Target 0: (swiftover) stopped.
(lldb) frame var
(IntervalSplayTree!(ChainLink) *) this = 0x0000000104a40900
(ChainLink) i = {
tStart = 60881262
tEnd = 60881613
qStart = 9552164
qContig = (length = 10, ptr = 0x000000010283bac0)
invert = '\x01'
}
(IntervalTreeNode!(ChainLink) *) current = 0x0000000103858a80
(lldb) frame var current->interval
(ChainLink) current->interval = {
tStart = 60881262
tEnd = 60881613
qStart = 1055
qContig = (length = 10, ptr = 0x000000010283b7b0)
invert = '\x01'
}
(lldb) frame var i.qContig.ptr -f s
(void *) i.qContig.ptr = "utg000007l\xffffffb0"
(lldb) frame var current->interval.qContig.ptr -f s
(void *) current->interval.qContig.ptr = "utg000002l\xffffffb0"
The i interval comes from this chain: chain 569211 20 64444167 + 60314367 60885787 utg000007l 9556681 + 8985467 9556681 86
Whereas current.interval, which already exists in the tree, comes from a chain with qContig utg000002l (17 possibilities)
The assertion error can be fixed by simply having ChainLink::opCmp order them separately (by qContig,qStart), but then only a single result may be returned depending on how the intervaltree's "find" operation is implemented; when query->target is surjective and not bijective , the target->query liftover may not return all outputs.
On the other hand, if we leave opCmp alone (continue to return 0) but redefine opEquals to consider them equal, we need to make sure the intervaltree can return more than one result, even when queried directly, the latter of which is nonsensical if we are trying to retrieve precisely a specific interval (including specific qContig) from the tree.
Will try option 1: Modify opCmp to order by (qContig,qStart). Leave issue open until verify that it returns all intervals in query (Dest) for single target interval (source)
The text was updated successfully, but these errors were encountered:
TL;DR: Situation of query->target relationship being surjective, rather than bijective is ambiguous: when the liftover is done from target->query, which query result(s) should be returned, and how?
This is in some ways also an interval tree issue, but should be probably handled at the level of swiftover because intervaltree is suitably generic and in swiftover we have the ability to define
opCmp
andopEquals
Specifically, when trying to insert an interval with identical tStart,tEnd coordinates as an interval already in the tree, the intervaltree assertion in
insert
thati == current.interval
fails when there is a different qStart.This could happen if >1 regions in the query(destination) map precisely to the same region in target (source). Imagine a duplicated gene with 2 copies in build hg38, but 1 copy in hg19. Two identical segments of identical length may map to teh same segment in hg19. This will cause an assertion failure because
opCmp
calls them equal (ret 0) but==
operator recognizes that the ChainLinks are not truly identical (different qStart).Here is an example from Esko's chains:
The
i
interval comes from this chain:chain 569211 20 64444167 + 60314367 60885787 utg000007l 9556681 + 8985467 9556681 86
Whereas
current.interval
, which already exists in the tree, comes from a chain with qContigutg000002l
(17 possibilities)The assertion error can be fixed by simply having ChainLink::opCmp order them separately (by qContig,qStart), but then only a single result may be returned depending on how the intervaltree's "find" operation is implemented; when query->target is surjective and not bijective , the target->query liftover may not return all outputs.
On the other hand, if we leave opCmp alone (continue to return 0) but redefine opEquals to consider them equal, we need to make sure the intervaltree can return more than one result, even when queried directly, the latter of which is nonsensical if we are trying to retrieve precisely a specific interval (including specific qContig) from the tree.
Will try option 1: Modify
opCmp
to order by (qContig,qStart). Leave issue open until verify that it returns all intervals in query (Dest) for single target interval (source)The text was updated successfully, but these errors were encountered: