-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved rendering of modifications #4647
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
this is a follow up to closed PR here #4642 |
cmdcolin
force-pushed
the
lowprob
branch
3 times, most recently
from
November 9, 2024 09:20
e7c6a4f
to
78f72ab
Compare
cmdcolin
force-pushed
the
lowprob
branch
2 times, most recently
from
November 9, 2024 20:23
809ffc7
to
36555c2
Compare
cmdcolin
force-pushed
the
lowprob
branch
6 times, most recently
from
November 10, 2024 03:09
83b6b5c
to
b7d7413
Compare
cmdcolin
force-pushed
the
lowprob
branch
2 times, most recently
from
November 10, 2024 16:11
d9d5c0f
to
6861e03
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rendering before this PR
the current main branch does not really consider the modification probability deeply, but it is actually important to do so to properly visualize it. this is because the data can report multiple different modifications at the same position, and it is best to only render the most probable one.
SAMtags.pdf shows an example where both 5mc and 5hmc are listed for each position, and the user should choose the modificaiton that has the highest probability (or even choose "no modification" if neither is a high probability. specifically, multiple modifications at the same position can't sum up to greater than 1.0 probability, and i believe, chemically, only one modification is even possible. otherwise it would get a chemical code. therefore, double counting modifications at a single position is misleading, but that's what we have in our UI on main
Rendering after this PR
IGV rendering
i modified the rendering to more closely follow IGV. it is a bit copy cat, but IGV does a lot of things right i believe.
text from SAMtags about the probabilities in the ML tag
https://samtools.github.io/hts-specs/SAMtags.pdf
"Note where several possible modifica-
tions are presented at the same site, the ML values represent the absolute probabilities of the modifi-
cation call being correct and not the relative likelihood between the alternatives. These probabilities
should not sum to above 1.0 (≈ 256 in integer encoding, allowing for some minor rounding errors),
but may sum to a lower total with the remainder representing the probability that none of the listed
modification types are present. In the example used above, the 6th C has 80% chance of being 5mC,
10% chance of being 5hmC and 10% chance of being an unmodified C"
coverage calculation
Now in coverage: now in modification mode the snpcoverage does not draw the raw number of modifications at the position, but the proportion of "modifiable" bases at that position. this aligns better with user expectations and is what igv does. therefore, for a CpG, on the forward strand, only half the reads will be a C at the CpG position, and the other half will be G on the reverse strand, but only the C can be methylated. but if all the C's there are methylated, then basically we can draw that as that this position is "fully modified", therefore maxing out the y-axis of the coverage track
modification colors
I adjusted the color scheme of modifications to match IGV
retained the "methylation" mode
i was hoping to maybe get rid of this, but the data files for BAM simply do not indicate unmethylated positions in some cases
igv is not to my knowledge, in this case, is not able to draw the unmethylated cpg's