Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poisson Distribution #459

Open
Karzinisierung opened this issue Jan 14, 2025 · 0 comments
Open

Poisson Distribution #459

Karzinisierung opened this issue Jan 14, 2025 · 0 comments
Assignees

Comments

@Karzinisierung
Copy link
Collaborator

Any sequence of DNA read produces 6 possible reading frames. Sorting through all reading frames individually using whole algorithm will be computationally wasteful.

Use Poisson distribution on each of 6 reading frames, with a mean expected STOP codon frequency of 3/64. Genes are usually above 100 base pairs. Any long sequence of low stop codons will likely encode a gene.

Stop codon distribution can also correlate with GC content, may be used in conjunction with other code by multiplying probabilities of two independent events.

Requires the use of an accurate p-value. 0.05 will not work, maybe 510^-6 to 510^-8 will be more accurate, but also risks eliminating good data.

Instead, one may also take the found DNA sequence and insert directly into BLAST to see what comes up. Overall this will improve efficiency and resource use.

@VerisimilitudeX VerisimilitudeX self-assigned this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants