You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Overview of the Feature Request
After discussion with @PaulBoon, I did some quick testing of the search API and realized that the post-solr step to add information retrieved from the database entries can slow search by a factor of 10 For a page size of 10, that's only a change from ~22 to ~250 ms, but for 300 it's a difference between ~200 ms and 2 seconds, and larger searches get worse.
It looks like the ability to skip that step was turned off in #6441 where the additional dataset information was added.
Given that the json returned w/o the added info (see example below) still seems pretty useful for many purposes, I wonder if, at a minimum, the ability to turn off the extra db retrieval from the API would make sense. Beyond that, it could be worth looking into whether the UI/SPA really need the additional info for the main display and/or whether any necessary items can now be retrieved from solr (since I think we've added fields since 2019) or if they could be so we can avoid retrieving things from the db. (I haven't looked to see if it is just retrieving the entity or the calls to get sub-objects that affect performance - it could be that a change similar to our findDeep methods to get all the things needed in one query could gain some performance even if we can't drop getting the entity completely.)
FWIW: The info with the old query_entities flag set to false (this is a random example dataset - didn't look for a maximal one)
{
"name": "Darwin's Finches",
"type": "dataset",
"url": "https://doi.org/10.5074/FKHWZP7X",
"global_id": "doi:10.5074/FKHWZP7X",
"description": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
"published_at": "2024-05-29T20:56:25Z",
"publisher": "dv96e4a254",
"citationHtml": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", <a href=\"[https://doi.org/10.5074/FKHWZP7X\](https://doi.org/10.5074/FKHWZP7X/)" target=\"_blank\">[https://doi.org/10.5074/FKHWZP7X</a](https://doi.org/10.5074/FKHWZP7X%3c/a)>, Root, V1",
"identifier_of_dataverse": "dv96e4a254",
"name_of_dataverse": "dv96e4a254",
"citation": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", https://doi.org/10.5074/FKHWZP7X, Root, V1",
"publicationStatuses": [
"Published"
],
"authors": [
"Finch, Fiona",
"Spruce, Sabrina"
]
}
With it true/as forced currently:
{
"name": "Darwin's Finches",
"type": "dataset",
"url": "https://doi.org/10.5074/FKHWZP7X",
"global_id": "doi:10.5074/FKHWZP7X",
"description": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
"published_at": "2024-05-29T20:56:25Z",
"publisher": "dv96e4a254",
"citationHtml": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", <a href=\"[https://doi.org/10.5074/FKHWZP7X\](https://doi.org/10.5074/FKHWZP7X/)" target=\"_blank\">[https://doi.org/10.5074/FKHWZP7X</a](https://doi.org/10.5074/FKHWZP7X%3c/a)>, Root, V1",
"identifier_of_dataverse": "dv96e4a254",
"name_of_dataverse": "dv96e4a254",
"citation": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", https://doi.org/10.5074/FKHWZP7X, Root, V1",
"publicationStatuses": [
"Published"
],
"storageIdentifier": "[file://10.5074/FKHWZP7X](file://10.0.19.210/FKHWZP7X)",
"subjects": [
"Medicine, Health and Life Sciences",
"Astronomy and Astrophysics"
],
"fileCount": 0,
"versionId": 149,
"versionState": "RELEASED",
"majorVersion": 1,
"minorVersion": 0,
"createdAt": "2024-05-29T20:56:21Z",
"updatedAt": "2024-05-29T20:56:25Z",
"contacts": [
{
"name": "Finch, Fiona",
"affiliation": ""
}
],
"authors": [
"Finch, Fiona",
"Spruce, Sabrina"
]
}
What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)
What inspired the request?
What existing behavior do you want changed?
Any brand new behavior do you want to add to Dataverse?
Any open or closed issues related to this feature request?
Are you thinking about creating a pull request for this feature?
Help is always welcome, is this feature something you or your organization plan to implement?
Not immediately, unless it is just to restore the ?query_entities param.
The text was updated successfully, but these errors were encountered:
Overview of the Feature Request
After discussion with @PaulBoon, I did some quick testing of the search API and realized that the post-solr step to add information retrieved from the database entries can slow search by a factor of 10 For a page size of 10, that's only a change from ~22 to ~250 ms, but for 300 it's a difference between ~200 ms and 2 seconds, and larger searches get worse.
It looks like the ability to skip that step was turned off in #6441 where the additional dataset information was added.
Given that the json returned w/o the added info (see example below) still seems pretty useful for many purposes, I wonder if, at a minimum, the ability to turn off the extra db retrieval from the API would make sense. Beyond that, it could be worth looking into whether the UI/SPA really need the additional info for the main display and/or whether any necessary items can now be retrieved from solr (since I think we've added fields since 2019) or if they could be so we can avoid retrieving things from the db. (I haven't looked to see if it is just retrieving the entity or the calls to get sub-objects that affect performance - it could be that a change similar to our findDeep methods to get all the things needed in one query could gain some performance even if we can't drop getting the entity completely.)
FWIW: The info with the old query_entities flag set to false (this is a random example dataset - didn't look for a maximal one)
With it true/as forced currently:
What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)
What inspired the request?
What existing behavior do you want changed?
Any brand new behavior do you want to add to Dataverse?
Any open or closed issues related to this feature request?
Are you thinking about creating a pull request for this feature?
Help is always welcome, is this feature something you or your organization plan to implement?
Not immediately, unless it is just to restore the ?query_entities param.
The text was updated successfully, but these errors were encountered: