Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Improve search API speed #11285

Open
qqmyers opened this issue Feb 25, 2025 · 0 comments
Open

Feature Request: Improve search API speed #11285

qqmyers opened this issue Feb 25, 2025 · 0 comments
Labels
Type: Feature a feature request

Comments

@qqmyers
Copy link
Member

qqmyers commented Feb 25, 2025

Overview of the Feature Request
After discussion with @PaulBoon, I did some quick testing of the search API and realized that the post-solr step to add information retrieved from the database entries can slow search by a factor of 10 For a page size of 10, that's only a change from ~22 to ~250 ms, but for 300 it's a difference between ~200 ms and 2 seconds, and larger searches get worse.

It looks like the ability to skip that step was turned off in #6441 where the additional dataset information was added.

Given that the json returned w/o the added info (see example below) still seems pretty useful for many purposes, I wonder if, at a minimum, the ability to turn off the extra db retrieval from the API would make sense. Beyond that, it could be worth looking into whether the UI/SPA really need the additional info for the main display and/or whether any necessary items can now be retrieved from solr (since I think we've added fields since 2019) or if they could be so we can avoid retrieving things from the db. (I haven't looked to see if it is just retrieving the entity or the calls to get sub-objects that affect performance - it could be that a change similar to our findDeep methods to get all the things needed in one query could gain some performance even if we can't drop getting the entity completely.)

FWIW: The info with the old query_entities flag set to false (this is a random example dataset - didn't look for a maximal one)

{
"name": "Darwin's Finches",
"type": "dataset",
"url": "https://doi.org/10.5074/FKHWZP7X",
"global_id": "doi:10.5074/FKHWZP7X",
"description": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
"published_at": "2024-05-29T20:56:25Z",
"publisher": "dv96e4a254",
"citationHtml": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", <a href=\"[https://doi.org/10.5074/FKHWZP7X\](https://doi.org/10.5074/FKHWZP7X/)" target=\"_blank\">[https://doi.org/10.5074/FKHWZP7X</a](https://doi.org/10.5074/FKHWZP7X%3c/a)>, Root, V1",
"identifier_of_dataverse": "dv96e4a254",
"name_of_dataverse": "dv96e4a254",
"citation": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", https://doi.org/10.5074/FKHWZP7X, Root, V1",
"publicationStatuses": [
"Published"
],
"authors": [
"Finch, Fiona",
"Spruce, Sabrina"
]
}

With it true/as forced currently:

{
"name": "Darwin's Finches",
"type": "dataset",
"url": "https://doi.org/10.5074/FKHWZP7X",
"global_id": "doi:10.5074/FKHWZP7X",
"description": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
"published_at": "2024-05-29T20:56:25Z",
"publisher": "dv96e4a254",
"citationHtml": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", <a href=\"[https://doi.org/10.5074/FKHWZP7X\](https://doi.org/10.5074/FKHWZP7X/)" target=\"_blank\">[https://doi.org/10.5074/FKHWZP7X</a](https://doi.org/10.5074/FKHWZP7X%3c/a)>, Root, V1",
"identifier_of_dataverse": "dv96e4a254",
"name_of_dataverse": "dv96e4a254",
"citation": "Finch, Fiona; Spruce, Sabrina, 2024, \"Darwin's Finches\", https://doi.org/10.5074/FKHWZP7X, Root, V1",
"publicationStatuses": [
"Published"
],
"storageIdentifier": "[file://10.5074/FKHWZP7X](file://10.0.19.210/FKHWZP7X)",
"subjects": [
"Medicine, Health and Life Sciences",
"Astronomy and Astrophysics"
],
"fileCount": 0,
"versionId": 149,
"versionState": "RELEASED",
"majorVersion": 1,
"minorVersion": 0,
"createdAt": "2024-05-29T20:56:21Z",
"updatedAt": "2024-05-29T20:56:25Z",
"contacts": [
{
"name": "Finch, Fiona",
"affiliation": ""
}
],
"authors": [
"Finch, Fiona",
"Spruce, Sabrina"
]
}

What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)

What inspired the request?

What existing behavior do you want changed?

Any brand new behavior do you want to add to Dataverse?

Any open or closed issues related to this feature request?

Are you thinking about creating a pull request for this feature?
Help is always welcome, is this feature something you or your organization plan to implement?

Not immediately, unless it is just to restore the ?query_entities param.

@qqmyers qqmyers added the Type: Feature a feature request label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature a feature request
Projects
None yet
Development

No branches or pull requests

1 participant