Highlight copy number 2 #84

AlistairNWard · 2018-03-15T19:50:21Z

A significant goal of the read coverage chart is to identify (large) regions of abnormal copy number, e.g. large deletions, duplications etc. These can often be hidden in the noise. A potential solution is to change the opacity on the noisy data and highlight tracks of consistent coverage. Here is an example:

http://nv-dev-new.iobio.io/vue.bam.iobio/bamview?bam=https%3A%2F%2Fs3.amazonaws.com%2Fiobio%2Fsamples%2Fbam%2Fexample_data%2Ftest.bam&bai=&region=all

We should be able to quickly identify that:

a) chromosome 3 has a large chunk of missing data,
b) chromosome 6 is missing (issue #83)
c) chromosome 7 has a large deletion

Down the line, we can optionally remove/highlight, or toggle removal, of centromeres as these are regions of expected zero coverage.

anderspitman · 2018-08-20T17:10:38Z

@AlistairNWard could you provide more information on what a deletion or duplication is in this context? Is that deletion as in INDEL? For example, you identify chr7 as having a large deletion, but the coverage is above 0.

AlistairNWard · 2018-08-20T20:13:40Z

So, since the genome is diploid, we expect the majority of coverage to represent the number of reads being pulled from 2 chromosomes, e.g. copy number of 2. A deletion (INDEL just being a term to represent INsertions and DELetions). would be a case where one chromosome has a large segment of DNA missing with respect to the reference, but the second chromosome has this sequence. So, in that region, we would expect to only have reads pulled from one chromosome, so we'd see half as much coverage (copy number 1). Duplications could lead to copy number of greater than 2.

So, on chr7 above (the scale is weird, since this shouldn't drop below zero), the majority of the chromosome has coverage drawn from 2 chromosomes, but in one region, there is a segment that has half as much coverage. This is because one of the two copies of chromosome in this sample, has this segment of DNA deleted.

So, basically, we can assume that the majority of the genome has copy number 2, and we want to highlight regions (of reasonable extent) where the coverage significantly deviates from copy number 2. This would include centomeres/telomeres, or homozygous deletions (e.g. both of an individuals copies of the chromosome have the same deletion) with CN=0, heterozygous deletions, or other events that would give rise to with CN=1, duplications with CN>2.

Does this make sense? We can just have a call if not.

anderspitman · 2018-08-20T20:19:00Z

That's exactly what I had in mind. Much more clear now. Thanks!

AlistairNWard added the ready label Mar 15, 2018

AlistairNWard removed the ready label May 15, 2018

AlistairNWard mentioned this issue Aug 16, 2018

Default to 3 standard deviations #87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlight copy number 2 #84

Highlight copy number 2 #84

AlistairNWard commented Mar 15, 2018

anderspitman commented Aug 20, 2018

AlistairNWard commented Aug 20, 2018

anderspitman commented Aug 20, 2018

Highlight copy number 2 #84

Highlight copy number 2 #84

Comments

AlistairNWard commented Mar 15, 2018

anderspitman commented Aug 20, 2018

AlistairNWard commented Aug 20, 2018

anderspitman commented Aug 20, 2018