-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaffold longer than reference genome due to NNNNs #172
Comments
Dear Vincent, I have also seen this behaviour while working with the Tribolium castaneum genome. If I scaffold my draft genome using the published reference, I see two giant contigs get attached to each other with ~1 Mbp of gap. I believe this behaviour is expected. However, A validation would be to draw a 1v1 dot plot between the reference and query to check if the introduced gaps make sense (you may use SibiliaZ or nucmer for this validation). I hope it helps. PS: I am not the author of this tool. I just used it for a project |
Dear @shivanshss, Thanks for the information. I will draw the dot plot. |
Dear @vappiah Dot plot between your query and reference before using Ragtag would tell you if there is a gap in your query that could have been filled with Ns at the time of scaffolding. Dot plot between your Ragtag output and your original reference will tell you if the gap position is weird in any way. You may need to do some breakpoint analysis with original reads used for assembly to further your understanding of the gap. Additionally I would also draw a kind of synteny plot between your original query and reference (this is similar to the dot plot but slight more informative). This would be a sanity check just to make sure that something unexpected is not happening. If you find that everything is as expected, then you don't have to worry about the Ns that are introduced at the time of scaffolding. I would also wait for the author to comment because, as I told earlier, I am not the author of this tool and they would know better. Hope it helps. Sincerely, |
Hi @malonge
I have used ragtag on different datasets and every time, the final sequence comes out being longer than the reference sequence. I found out that this is due to the introduction of NNNNs by ragtag. Is this behaviour expected?
Vincent
The text was updated successfully, but these errors were encountered: