Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved rendering of modifications #4647

Merged
merged 18 commits into from
Nov 10, 2024
Merged

Improved rendering of modifications #4647

merged 18 commits into from
Nov 10, 2024

Conversation

cmdcolin
Copy link
Collaborator

@cmdcolin cmdcolin commented Nov 8, 2024

Rendering before this PR

image

the current main branch does not really consider the modification probability deeply, but it is actually important to do so to properly visualize it. this is because the data can report multiple different modifications at the same position, and it is best to only render the most probable one.

SAMtags.pdf shows an example where both 5mc and 5hmc are listed for each position, and the user should choose the modificaiton that has the highest probability (or even choose "no modification" if neither is a high probability. specifically, multiple modifications at the same position can't sum up to greater than 1.0 probability, and i believe, chemically, only one modification is even possible. otherwise it would get a chemical code. therefore, double counting modifications at a single position is misleading, but that's what we have in our UI on main

Rendering after this PR

image

IGV rendering

i modified the rendering to more closely follow IGV. it is a bit copy cat, but IGV does a lot of things right i believe.

image

text from SAMtags about the probabilities in the ML tag

https://samtools.github.io/hts-specs/SAMtags.pdf

"Note where several possible modifica-
tions are presented at the same site, the ML values represent the absolute probabilities of the modifi-
cation call being correct and not the relative likelihood between the alternatives. These probabilities
should not sum to above 1.0 (≈ 256 in integer encoding, allowing for some minor rounding errors),
but may sum to a lower total with the remainder representing the probability that none of the listed
modification types are present. In the example used above, the 6th C has 80% chance of being 5mC,
10% chance of being 5hmC and 10% chance of being an unmodified C"

coverage calculation

Now in coverage: now in modification mode the snpcoverage does not draw the raw number of modifications at the position, but the proportion of "modifiable" bases at that position. this aligns better with user expectations and is what igv does. therefore, for a CpG, on the forward strand, only half the reads will be a C at the CpG position, and the other half will be G on the reverse strand, but only the C can be methylated. but if all the C's there are methylated, then basically we can draw that as that this position is "fully modified", therefore maxing out the y-axis of the coverage track

modification colors

I adjusted the color scheme of modifications to match IGV

retained the "methylation" mode

i was hoping to maybe get rid of this, but the data files for BAM simply do not indicate unmethylated positions in some cases

igv is not to my knowledge, in this case, is not able to draw the unmethylated cpg's

image

image

@cmdcolin
Copy link
Collaborator Author

cmdcolin commented Nov 8, 2024

this is a follow up to closed PR here #4642

@cmdcolin cmdcolin added the bug Something isn't working label Nov 8, 2024
@cmdcolin cmdcolin force-pushed the lowprob branch 3 times, most recently from e7c6a4f to 78f72ab Compare November 9, 2024 09:20
@cmdcolin cmdcolin force-pushed the lowprob branch 2 times, most recently from 809ffc7 to 36555c2 Compare November 9, 2024 20:23
@cmdcolin cmdcolin force-pushed the lowprob branch 6 times, most recently from 83b6b5c to b7d7413 Compare November 10, 2024 03:09
@cmdcolin cmdcolin force-pushed the lowprob branch 2 times, most recently from d9d5c0f to 6861e03 Compare November 10, 2024 16:11
@cmdcolin cmdcolin merged commit 14e990c into main Nov 10, 2024
4 checks passed
@cmdcolin cmdcolin deleted the lowprob branch November 10, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant