-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Highlight copy number 2 #84
Comments
@AlistairNWard could you provide more information on what a deletion or duplication is in this context? Is that deletion as in INDEL? For example, you identify chr7 as having a large deletion, but the coverage is above 0. |
So, since the genome is diploid, we expect the majority of coverage to represent the number of reads being pulled from 2 chromosomes, e.g. copy number of 2. A deletion (INDEL just being a term to represent INsertions and DELetions). would be a case where one chromosome has a large segment of DNA missing with respect to the reference, but the second chromosome has this sequence. So, in that region, we would expect to only have reads pulled from one chromosome, so we'd see half as much coverage (copy number 1). Duplications could lead to copy number of greater than 2. So, on chr7 above (the scale is weird, since this shouldn't drop below zero), the majority of the chromosome has coverage drawn from 2 chromosomes, but in one region, there is a segment that has half as much coverage. This is because one of the two copies of chromosome in this sample, has this segment of DNA deleted. So, basically, we can assume that the majority of the genome has copy number 2, and we want to highlight regions (of reasonable extent) where the coverage significantly deviates from copy number 2. This would include centomeres/telomeres, or homozygous deletions (e.g. both of an individuals copies of the chromosome have the same deletion) with CN=0, heterozygous deletions, or other events that would give rise to with CN=1, duplications with CN>2. Does this make sense? We can just have a call if not. |
That's exactly what I had in mind. Much more clear now. Thanks! |
A significant goal of the read coverage chart is to identify (large) regions of abnormal copy number, e.g. large deletions, duplications etc. These can often be hidden in the noise. A potential solution is to change the opacity on the noisy data and highlight tracks of consistent coverage. Here is an example:
http://nv-dev-new.iobio.io/vue.bam.iobio/bamview?bam=https%3A%2F%2Fs3.amazonaws.com%2Fiobio%2Fsamples%2Fbam%2Fexample_data%2Ftest.bam&bai=®ion=all
We should be able to quickly identify that:
a) chromosome 3 has a large chunk of missing data,
b) chromosome 6 is missing (issue #83)
c) chromosome 7 has a large deletion
Down the line, we can optionally remove/highlight, or toggle removal, of centromeres as these are regions of expected zero coverage.
The text was updated successfully, but these errors were encountered: