-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
captus paralog filter - references added in the wrong direction #12
Comments
Dear Ed, Sorry for the really late reply, I was in the chaos of moving countries. I see that in your reference all sequences are in different reading frames so Captus might be having troubles translating them consistently. Captus translates the references using the six reading frames, then selects the reading frame that produces the fewest internal stop codons, if there is a tie between two reading frames it will prefer a positive reading frame. So maybe these are not CDS? If you don't care about obtaining the aminoacid format from the alignment step, and you are sure all are in the same direction you could provide the reference to Captus as miscellaneous DNA (-d AT1G03750.references.fasta). If the aminoacid output is necessary then I would suggest verifying that these are translatable (preferably in reading frame 1) or at least consistently for all Let me know if this helps! Edgardo |
Dear Edgardo, thanks for your reply – much appreciated. These are the sequences used for probe design, in some cases only partial exons, so that largely explains the issue. Hope your move went well and hoping that you continue to develop captus – were finding that it plugs a lot of gaps that are issues in other pipelines.
Ed
From: Edgardo M. Ortiz ***@***.***>
Date: Friday, 2 August 2024 at 3:11 pm
To: edgardomortiz/Captus ***@***.***>
Cc: Ed Biffin ***@***.***>, Author ***@***.***>
Subject: Re: [edgardomortiz/Captus] captus paralog filter - references added in the wrong direction (Issue #12)
CAUTION: External email. Only click on links or open attachments from trusted senders.
…________________________________
Dear Ed,
Sorry for the really late reply, I was in the chaos of moving countries. I see that in your reference all sequences are in different reading frames so Captus might be having troubles translating them consistently.
Captus translates the references using the six reading frames, then selects the reading frame that produces the fewest internal stop codons, if there is a tie between two reading frames it will prefer a positive reading frame. So maybe these are not CDS?
If you don't care about obtaining the aminoacid format from the alignment step, and you are sure all are in the same direction you could provide the reference to Captus as miscellaneous DNA (-d AT1G03750.references.fasta).
If the aminoacid output is necessary then I would suggest verifying that these are translatable (preferably in reading frame 1) or at least consistently for all
Let me know if this helps!
Edgardo
—
Reply to this email directly, view it on GitHub<#12 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHX653GVC4S3RFV542NUVY3ZPMLYJAVCNFSM6AAAAABKA5DSJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRUGU4TSOJSHE>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Dear Edgardo, Ive noticed that when adding reference sequences to alignments, prior to informed paralog filtering, in some cases these are added in the reverse direction to the extracted sequences in the alignment. Im using a custom reference file that comprises the sequences that were used for probe design, mostly sourced from 1KP and Phytozome - the references were generated by clustering using CD-Hit (longest sequence per cluster at specified identity). Ive attached an example alignment and also the references for that gene. I'm using v1.01. Any advice would be greatly appreciated.
AT1G03750.fna.txt
AT1G03750.references.txt
The text was updated successfully, but these errors were encountered: