New metadata fields for work entity #210

massimoaria · 2024-02-20T14:34:51Z

@trangdata
@yjunechoe
Recently, OA has added a lot of new metadata for entity work.
In particular, the API now also reports info regarding keywords, topics, grants that funded the research, APC paid, etc.

At the moment the only way to access this information is to use the "list" format.

TO DO:
Modify the works2df() function so that the data frame also includes this new metadata. This way even using the "tibble" or "data.frame" format will output this new metadata.

yjunechoe · 2024-02-20T15:06:23Z

Good point! And actually as a first step, I think it'd be helpful if we tracked somewhere what fields we already have covered vs. those that are new.

As a naive approach, this lists all fields from output="list" that's not present as a column in output="tibble":

library(openalexR)

tbl <- oa_fetch(id = "W2755950973")
lst <- oa_fetch(id = "W2755950973", output = "list")

sort(names(lst)[!names(lst) %in% colnames(tbl)])
#>  [1] "abstract_inverted_index"       "apc_list"                     
#>  [3] "apc_paid"                      "authorships"                  
#>  [5] "best_oa_location"              "biblio"                       
#>  [7] "cited_by_percentile_year"      "corresponding_author_ids"     
#>  [9] "corresponding_institution_ids" "countries_distinct_count"     
#> [11] "created_date"                  "fulltext_origin"              
#> [13] "has_fulltext"                  "indexed_in"                   
#> [15] "institutions_distinct_count"   "keywords"                     
#> [17] "locations"                     "locations_count"              
#> [19] "mesh"                          "ngrams_url"                   
#> [21] "open_access"                   "primary_location"             
#> [23] "primary_topic"                 "referenced_works_count"       
#> [25] "sustainable_development_goals" "title"                        
#> [27] "topics"                        "type_crossref"                
#> [29] "updated_date"

This of course doesn't mean we're missing coverage for these fields - some of them have been renamed in the df (e.g., authorships), intentionally dropped due to redundancy or low merit (e.g., title), or already covered via other means (e.g., we might not need ngrams_url given that we have the oa_ngrams() interface). But it's hard to distinguish those cases from fields like apc_list which is clearly new and not yet covered.

So as a preliminary, maybe it's worth introducing something to internally track covered fields, like:

#' @keywords internal
covered_fields <- c("title", "authorships", ...)

Then we (or at least I) can get a clearer picture of what we're missing and have a programmatic way to track the introduction of new fields.

I can take a stab at this, then reconvene here to decide how to deal with the new fields? For example, it immediately jumps out to me that apc_paid and apd_list share similar structures - I think it may be worth combining them into a single list column
apc of data frames. Ex:

Original:

lst$apc_list
#> $value
#> [1] 3680
#> 
#> $currency
#> [1] "USD"
#> 
#> $value_usd
#> [1] 3680
#> 
#> $provenance
#> [1] "doaj"

lst$apc_paid
#> $value
#> [1] 3680
#> 
#> $currency
#> [1] "USD"
#> 
#> $value_usd
#> [1] 3680
#> 
#> $provenance
#> [1] "doaj"

Formatted:

rbind.data.frame(
  c(type = "list", lst$apc_list),
  c(type = "paid", lst$apc_paid)
)
#>   type value currency value_usd provenance
#> 1 list  3680      USD      3680       doaj
#> 2 paid  3680      USD      3680       doaj

massimoaria · 2024-02-20T15:17:39Z

I totally agree

trangdata added the enhancement New feature or request label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New metadata fields for work entity #210

New metadata fields for work entity #210

massimoaria commented Feb 20, 2024

yjunechoe commented Feb 20, 2024

massimoaria commented Feb 20, 2024

New metadata fields for work entity #210

New metadata fields for work entity #210

Comments

massimoaria commented Feb 20, 2024

yjunechoe commented Feb 20, 2024

massimoaria commented Feb 20, 2024