You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When returning all the works that are cited by and that cite a focal article the number of edges in returned edges data frame that go to the focal article should match the cited_by_count of the focal article, but it seems that they usually do not.
I am trying to figure out whether this is an artifact in the data or whether I have misunderstood precisely what oa_snowball returns.
Here is an example of where I think the edges should match but they don't:
library(openalexR)
focal_article<- oa_fetch(
entity="works",
doi= c("10.1056/nejmoa1000678"),
verbose=TRUE
)
snowball_docs<- oa_snowball(
identifier=focal_article$id,
verbose=TRUE
)
edges<-snowball_docs$edgesid<-stringr::str_replace(focal_article$id, "https://openalex.org/", "")
# drop all works the focal work citesedges<-edges|>
filter(to==id)
# Raise error if edges don't match focal_article citation count
tryCatch({
if(nrow(edges) !=focal_article$cited_by_count) {
stop("Number of edges doesn't match cited by count of focal article!")
}
}, error=function(e) {
cat("An error occurred: ", e$message, "\n")
})
The text was updated successfully, but these errors were encountered:
Thanks for the report! Definitely not ideal, but it's likely the same situation also reported in #115
For what it's worth, in my experience with snowball searching it's pretty common to have mismatches between the cited-by number in a paper's records vs. its discoverable connections (even within the same database). You can just think of the number of articles returned by backward-searching in oa_snowball() as the absolute lower bound estimate of cited-by (which doesn't account for older papers, retracted papers, inaccessible papers, etc.).
The discrepancy doesn't seem to be very sizable fortunately, less than 10 or so per article in my estimation. Thanks!
TimothyElder
changed the title
Number of edges from snowball_docs doesn't match cited_by_count
Number of edges from oa_snowball doesn't match cited_by_countOct 16, 2023
When returning all the works that are cited by and that cite a focal article the number of edges in returned
edges
data frame that go to the focal article should match thecited_by_count
of the focal article, but it seems that they usually do not.I am trying to figure out whether this is an artifact in the data or whether I have misunderstood precisely what
oa_snowball
returns.Here is an example of where I think the edges should match but they don't:
The text was updated successfully, but these errors were encountered: