Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of reads per contig? #14

Open
Zachary-Muscavitch opened this issue Sep 2, 2024 · 1 comment
Open

Number of reads per contig? #14

Zachary-Muscavitch opened this issue Sep 2, 2024 · 1 comment

Comments

@Zachary-Muscavitch
Copy link

I'm wondering if it would it be possible to indicate the number of reads assembled into each contig? I am doing target capture on metagenomic extractions which contain multiple organisms and some of them are congeners. Thus when extracting and aligning sequences the paralogs aren't necessarily paralogs and may instead be homologs.

Often, these homologs are present in vastly different concentrations in the source material, and this is reflected in the number of reads for each homolog. It would be nice if there was an option on paralog filtering step to filter based on the number of reads per contig as I usually want to contig with the greatest read depth and not the one which is most similar to the reference.

Maybe this is already some where in the output files.

@edgardomortiz
Copy link
Owner

Dear @Zachary-Muscavitch ,

This kind of filtering is exactly what we are working on now, we will use Salmon to determine the real coverage of each contig and we will also give the option to only recover targets that were assembled in a single contig.

In the meantime you have all this info in the stats.tsv inside the extraction folder, the contig names contain the MEGAHIT estimate of of coverage in the name (cov_x.xxx)

I know right now it is a lot of parsing on your side but I hope it helps until I finish the new filters.

Best,

Edgardo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants