Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native API - Get Dataset Download Count endpoint #11244

Open
g-saracca opened this issue Feb 11, 2025 · 11 comments · May be fixed by #11282
Open

Native API - Get Dataset Download Count endpoint #11244

g-saracca opened this issue Feb 11, 2025 · 11 comments · May be fixed by #11282
Labels
FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 10 Size: 10 A percentage of a sprint. 7 hours. SPA.Q1.4 Dataset Page: Dataset Metrics SPA These changes are required for the Dataverse SPA Type: Feature a feature request

Comments

@g-saracca
Copy link
Contributor

Overview of the Feature Request
For the SPA, in order to replicate the "classic" download count, we need a new Native API endpoint that returns the Dataset download count (not makeDataCount related) as used in this dataverse repo code

What kind of user is the feature intended for?
API User

What inspired the request?
SPA Q1 Roadmap

@g-saracca g-saracca added GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 10 Size: 10 A percentage of a sprint. 7 hours. SPA These changes are required for the Dataverse SPA SPA.Q1.4 Dataset Page: Dataset Metrics Type: Feature a feature request labels Feb 11, 2025
@qqmyers
Copy link
Member

qqmyers commented Feb 11, 2025

To handle both the classic case, and the need to show the counts before MDC logging started in the MDC case, you'll probably want an API call that handles a cut-off date or has a preMDC=true flag as an option. FWIW - the counts used in JSF are coming from

public Long getDownloadCountByDatasetId(Long datasetId, LocalDate date) {
(for use with MDC, using the MDC start date from
public LocalDate getMDCStartDate() {
) or the convenience method a few lines up that take no date (for classic).

@GPortas GPortas moved this to SPRINT READY in IQSS Dataverse Project Feb 12, 2025
@g-saracca g-saracca moved this from SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Feb 12, 2025
@cmbz cmbz added the FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) label Feb 12, 2025
@stevenwinship stevenwinship self-assigned this Feb 21, 2025
@stevenwinship stevenwinship moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Feb 21, 2025
@stevenwinship
Copy link
Contributor

stevenwinship commented Feb 24, 2025

Here is a sample of the json response with an old date, a new date, and no date
API with optional date string: /api/datasets/{datasetId}/download/count?date=2025-02-20
{
"id": 2,
"downloadCount": 0,
"date": "2025-02-20"
}
{
"id": 2,
"downloadCount": 4,
"date": "2025-02-25"
}
{
"id": 2,
"downloadCount": 4
}

@g-saracca
Copy link
Contributor Author

Hi @stevenwinship , the json outputs looks good!
Why is it necessary to send the date? Sorry I didn't read Jim's comment before.
I ask just to understand the logic about MDC and not MDC

@stevenwinship
Copy link
Contributor

The returned date isn't really necessary but the date sent in will allow you to get counts before MDC was turned on. Counts from after MDC should be retrieved by a metrics api

@stevenwinship stevenwinship linked a pull request Feb 24, 2025 that will close this issue
@qqmyers
Copy link
Member

qqmyers commented Feb 24, 2025

The date MDC was turned on is a constant from the back-end.

Since MDC either is/isn't on, it seems like this one api call could send either just the count (internal), or the combo of counts (internal) prior to the cut-over, the cut-over date, and the MDC counts since. If there's a use case to get internal downloads even if MDC was turned on (not needed in the current UI), it might make sense to have an includeMDC flag of some sort, but them we'll need an API so you can discover if MDC is turned on on the back end.

@g-saracca
Copy link
Contributor Author

Ok I think I understand now.
For now we will show the classic downloads, then when we have a way to know all the enabled flags/features we will show the MDC counts with the existing api endpoints for that.
So for now using this endpoint without sending a date as query param will be enough.
Thanks !

@stevenwinship
Copy link
Contributor

stevenwinship commented Feb 24, 2025

I think just a flag in this api could get work since the back end would know if MDC is on.
includeMDC = false - return only counts up to mdc date or all if mdc is off (DEFAULT)
includeMDC = true - return all downloads regardless of the state of mdc.
The mdc date can still be returned if mdc is on so the ui has some context to the count returned

@qqmyers
Copy link
Member

qqmyers commented Feb 24, 2025

That works, though I don't know if anyone needs includeMDC=false. The date is needed because the display will show it, e.g.

Image

@stevenwinship
Copy link
Contributor

includeMDC=false is the default so it isn't need but it might be useful to test with.

@qqmyers
Copy link
Member

qqmyers commented Feb 24, 2025

Actually, since we have :DisplayMDCMetrics, there is a need for sending the internal downloads only, and the value should be for all time, ignoring the mdc logging start date. This is actually the case for Harvard Dataverse right now - MDC logging is on, but the display needs to all time total for internal download counts since the MDC is not displayed (because the log processing hasn't finished).

@stevenwinship
Copy link
Contributor

The MDCStartDate will be returned only if the count was limited by that date.

{
"id": 129,
"downloadCount": 0,
"MDCStartDate": "2019-10-01"
}
{
"id": 129,
"downloadCount": 0
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) GREI Re-arch Issues related to the GREI Dataverse rearchitecture Original size: 10 Size: 10 A percentage of a sprint. 7 hours. SPA.Q1.4 Dataset Page: Dataset Metrics SPA These changes are required for the Dataverse SPA Type: Feature a feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants