-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snp_asGeneticPos() for hg38 #200
Comments
It seems these files are not available for hg38 (see e.g. joepickrell/1000-genomes-genetic-maps#2). If not on Windows, you can use |
Thanks for getting back to me! I was wondering if it is possible to match SNPs with rsid instead of the physical position. |
I've just added a new parameter |
Thank you very much! This is very helpful! I will try matching with rsid. |
Hi Florian and users, Thank you very much for the previous comments and answers. They have been very helpful. I have the same issue as JinZhang1227 as my datasets are in hg38. I am following your tutorial, and I'd really appreciate if you could tell me how should I modify the code to use the new rsid function, I've already installed the latest version of bigsnpr. Many thanks in advance, Paloma |
@pjordab Just add |
Does it work? Any update on this? |
Good morning, I have converted the documents with the position in cM to hg38, as my data contains snpid in this format: chr:position:allele:allele so I cannot use the new rsid function you have suggested. There are some SNPs where the conversion has failed, either they are not included in the downloaded UCSC liftover document, or I have deleted them because the new chr was described as unidentified/random/ or another chromosome was written different from the original one. I would like to share the files in case they are useful to other users, but I don't know how to do it. Despite the new files, which are sorted by position, the script stops with this error 'infos.pos' is not sorted. Any idea how I can solve this? Many thanks! Paloma |
For which function do you get this error? For |
It happens after 2h of running the script. That is my error.log:
Loading required package: bigstatsr
Warning message:
NAs introduced by coercion
Warning message:
NAs introduced by coercion
Attaching package: ‘dplyr’
The following objects are masked from ‘package:data.table’:
between, first, last
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Warning message:
NAs introduced by coercion
1,111,908 variants to be matched.
0 ambiguous SNPs have been removed.
665,308 variants have been matched; 0 were flipped and 326,513 were
reversed.
Error: 'infos.pos' is not sorted.
Execution halted
Missatge de Florian Privé ***@***.***> del dia dt., 13 d’abr.
2021 a les 9:22:
For which function do you get this error? For snp_asGeneticPos()?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQV3QYGJJ4TCEUYG7ZVI4PTTIRASRANCNFSM42AHI3BQ>
.
I think it is related to this part of the script:
for (chr in 1:22) {
ind.chr <- which(info_snp$chr == chr)
ind.chr2 <- info_snp$`_NUM_ID_`[ind.chr]
corr0 <- snp_cor(
genotype,
ind.col = ind.chr2,
ncores = NCORES,
infos.pos = POS2[ind.chr2],
size = 3 / 1000
)
if (chr == 1) {
ld <- Matrix::colSums(corr0^2)
corr <- as_SFBM(corr0, tmp)
} else {
ld <- c(ld, Matrix::colSums(corr0^2))
corr$add_columns(corr0, nrow(corr))
}
}
|
|
Hi Florian, Thank you very much for your answers. I've tried your solution and multiple variants of it, but it gives me an out of memory error: for (chr in 1:22){ I'd really appreciate if you have other suggestions. Many thanks! Paloma |
For the current problem, I don't think you can use For the issue about memory, please open another issue, as we are very far from the initial subject here. |
Ok, many thanks! For the first step, you mean this: for (chr in 1:22) { |
Yes, and |
Hi Florian, |
I'm not sure I understand how the problem is solved, since you say the reordering is actually not doing anything. And I'm not checking for strict sorting, so it is okay to have consecutive equal values. |
After observing that order gives me exactly the same order, I've done this
for the 22 chr files:
ind.chr <- which(info_snp$chr == 22)
ind.chr2 <- info_snp$`_NUM_ID_`[ind.chr]
order(POS2[ind.chr2])->ord
is.unsorted(POS2[ind.chr2],strictly=F)
[1] FALSE
is.unsorted(POS2[ind.chr2],strictly=T)
[1] TRUE
is.unsorted(ind.chr2,strictly=T)
[1] FALSE
is.unsorted(ord,strictly=T)
[1] FALSE
is.unsorted(POS2[ind.chr2[ord]])
[1] FALSE
is.unsorted(POS2[ind.chr2[ord]],strictly=T)
[1] TRUE
From this:
(POS2[ind.chr2]) is only "strictly" unsorted, the new "ord" vector gives me
exactly the same order but this extra index vector resolves the issue
"infos.pos is unsorted".
Missatge de Florian Privé ***@***.***> del dia dc., 21 d’abr.
2021 a les 13:04:
… I'm not sure I understand how the problem is solved, since you say the
reordering is actually not doing anything.
And I'm not checking for strict sorting, so it is okay to have consecutive
equal values.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQV3QYEXQ2JKWXLHUPTCUDTTJ4AQLANCNFSM42AHI3BQ>
.
|
Hi Florian, I'd like to use the function snp_asGeneticPos using the rsID to get the position in cM. After installing the latest version of your package with remotes::install_github('privefl/bigsnpr'). I do: POS2 <- snp_asGeneticPos(CHR, POS, dir = ".", ncores=1, rsid=RSID) Being RSID the vector that contains my rsIDs. Am I using the new function correctly? Many thanks! Paloma |
Yes, this should work. |
This one:
packageVersion("bigsnpr")
[1] ‘1.8.1’
Is this the last one?
Missatge de Florian Privé ***@***.***> del dia dt., 15 de
juny 2021 a les 2:05:
… Yes, this should work.
Are you sure the installation was successful? What is
packageVersion("bigsnpr")?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQV3QYADOVZW6LZHFEBH5ZLTS3UT7ANCNFSM42AHI3BQ>
.
|
Yes, I am able to run info <- readRDS(url("https://ndownloader.figshare.com/files/25503788"))
with(info[1:100, ], bigsnpr::snp_asGeneticPos(chr, pos, rsid = rsid)) |
This is working POS2 <- snp_asGeneticPos(CHR, POS, rsid = RSID) (instead of what I was using before: POS2 <- snp_asGeneticPos(CHR, POS, dir = ".", ncores=1, rsid=RSID) Many thanks! |
I still do not understand why you get the error with the other call, though. |
Neither do I, sorry. In order to get POS2 from RSID I am using the info_snp file, since my target data does not contain rsid. Could you please tell me if this is correct: CHR <-info_snp$chr for (chr in 1:22){ Many thanks!! |
You need to use |
Dear developer: |
You need the uncompressed files locally. Line 287 in ff127e2
|
Dear developer: snp_split(infos.chr, function(ind.chr, pos, dir, rsid) {
gzfile <- paste0(mapfile, ".gz")
}, combine = "c", pos = infos.pos, dir = dir, rsid = rsid, ncores = ncores) Best |
Just gunzip all the 22 chromosome files and run |
Dear developer, Best Regards |
I come across the same problem with you, can you tell me how you deal with it at last? I don't understant how to change dir, and what to do next. |
That's the |
It works! Thank you! |
If you an unrelated question, please open a new issue. |
Hi Florian, Could you specify where should I put My code is as the following. for (chr in 1:22) {
ind.chr <- which(df_beta$chr == chr)
ind.chr2 <- df_beta$`_NUM_ID_`[ind.chr]
ord <- order(POS2[ind.chr2])
corr0 <- snp_cor(G, ind.col = ind.chr2[ord], size = 3 / 1000,
infos.pos = POS2[ind.chr2[ord]], ncores = NCORES)
if (chr == 1) {
ld <- Matrix::colSums(corr0^2)
corr <- as_SFBM(corr0, tmp, compact = TRUE)
} else {
ld <- c(ld, Matrix::colSums(corr0^2))
corr$add_columns(corr0, nrow(corr))
}
} Many thanks |
In the ifelse I guess. |
Hi Florian,
Thank you very much again for answering my earlier questions.
I was wondering if I were to build the LD reference using a dataset with genome build hg38, can I still use the snp_asGeneticPos() function? My understanding is the genetic map the function used is based on hg19. I was wondering if you have any recommendations for hg38.
Thank you very much!
The text was updated successfully, but these errors were encountered: