Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can a custom reference dataset be a probe set? #20

Open
vincianem opened this issue Jan 15, 2025 · 6 comments
Open

Can a custom reference dataset be a probe set? #20

vincianem opened this issue Jan 15, 2025 · 6 comments

Comments

@vincianem
Copy link

vincianem commented Jan 15, 2025

Hej Hej,

I have just starting to work with capture data, and I wish to use Captus to process them.

I have two datasets: one with flowering plants (Teucrium) and one with sea animals (Octocorallia). For the flowering plants, it is simple as Captus comes bundled with Mega353, but I wonder how to process for the Octocorallia. Can a probe set be used as a target file if i properly format the sequence names?

I am new to Captus pipeline and this type of data, so please correct me if I've gotten something wrong.

Best wishes,
Vinciane

@edgardomortiz
Copy link
Owner

Hi @vincianem

You can provide any lineage set from the BUSCO database (https://busco-data.ezlab.org/v5/data/lineages/), just download the tar.gz file and provide its path to Captus for extraction step as -n

Now, if you have a custom probeset you must provide the sequences (full locus sequence, e.g. CDS) from where the probes (120bp segments) were derived.

I hope this helps, do not hesitate to ask me is something is not clear

Edgardo

@vincianem
Copy link
Author

vincianem commented Jan 16, 2025

Hi @edgardomortiz

Thank you for your answer!

We used the octocoral v.2 probe set but the target file was not made available with the probe set. How one would proceed to create a robust target file from the probe set? For example, how did you proceed to create the SeedPlantsPTD?

I also have a question regarding the ploidy level. I have at least 2n and 4n in my dataset. How is variation in ploidy taken into account in Captus?

Best wishes,
Vinciane

@edgardomortiz
Copy link
Owner

Regarding the octocoral v2 probe set, I am not familiar with it but perhaps you can contact the authors or maybe the file is available as supplementary material with the paper where it was published? About ploidy, Captus can recover any number of divergent copies of a single locus, as long as they are different enough to be assembled as separate contigs.

In the case of the plastome proteins I downloaded all the plastome proteins available in GenBank and then clustered and manually curated the clusters.

Edgardo

@vincianem
Copy link
Author

vincianem commented Feb 6, 2025

Thank you for your answers @edgardomortiz.

There was no target file provided with the paper of the octocoral v.2. We have contacted the authors but we've not heard back from them yet. Then, I used genome assemblies to produce my own target file using Phyluce.

I wonder, should loci for which I have multiple sequences be clustered in the final fasta file? I formatted the sequence names according to Captus manual.

That's great for the ploidy!

Vinciane

@edgardomortiz
Copy link
Owner

Hi @vincianem ,

Sorry for the delay, last weeks have been extremely busy. You can have multiple sequences per locus in your target file (if I understood your question correctly, if not please don't hesitate to ask again!)

By the way, Captus design can help designing probes and targets. If you want to try it I could help.

Edgardo

@vincianem
Copy link
Author

vincianem commented Feb 17, 2025

Hi @edgardomortiz,

No worries, Captus worked fine with the first target file I designed with Phyluce. That's said, I have two more target files to create using existing set of probes, genomes and transcriptome assemblies. I am gladly giving a try to Captus design. Where do I find the information on how does Captus design work?

Best wishes, Vinciane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants