-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many outliers #81
Comments
How many variants do you have as input? |
@privefl Thanks for your quick response. I identified 247,899 SNPs as outliers using a Q-value threshold of < 0.01, and 37,743 using the Bonferroni adjusted p-value threshold out of 7,856,218 SNPs. However, there was little overlap with RDA and LFMM with the Bonferroni method. I reviewed the issue and confirmed that I have enough variants for pcadapt to calculate the mahalanobis distance. Let me know if you need more information. Best regards, |
FYI, I divided the chromosome into 200 parts and merged them as input for pcadapt, which may have caused some misordering of SNPs. Should I sort the SNPs according to their respective chromosomes? |
Ok, the number of variants is not the problem then. The other obvious next issue can be LD. Are you capturing any LD in the PCA? |
Any update on this? |
@privefl |
Then, maybe the results are fine. You might want to increase the |
I have 79 individuals. Alright, I will attempt to increase the size in LD.clumping and update you with the results. Thanks for your suggestion. |
That's a very small sample. |
They are seperated in 3 groups clearly. How does the histogram of pvalues look like? |
What's the status on this issue? |
Hello, everyone,
I used pcadapt to identify environment-related outliers, but I obtained an excessive number of them. Is there anything I overlooked?
Best regards,
Sandy
PS: The code I used:
Here is the R session info:
Here is the plot output. The blue and light blue colored points indicate the outliers, while the black and grey colored points represent the non-outliers. The grey dashed line represents the threshold q-value of less than 0.01, and the red line represents the threshold Bonferroni adjusted p-value of less than 0.05. Additionally, the purple points indicate outliers that were identified using other software, such as RDA or LFMM. The Y-axis represents the unadjusted p-value, which has been transformed using the -log10() function.
I have also checked these plots, and everything seems to be okay. The plots include a histogram of p-values, a QQ plot, a histogram of Dj statistic, and loading scores plots.
The text was updated successfully, but these errors were encountered: