Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"identifier" field missing from .log file for Make Data Count, if I download right after publishing #11235

Open
Odysseas640 opened this issue Feb 10, 2025 · 5 comments
Assignees
Labels
Type: Bug a defect

Comments

@Odysseas640
Copy link

When I publish and download a dataset without reloading the page (or browsing away and coming back), the entries in the .log file for that dataset have no "identifier" field, and this causes counter-processor-1.05 to fail when reading the log file.

Steps to reproduce:

  1. Deploy https://github.com/IQSS/dataverse/blob/v6.5/docker/compose/demo/compose.yml
  2. Docker exec into the dataverse container and create a directory (I used /logs)
  3. Set "curl -X PUT -d '/logs' http://localhost:8080/api/admin/settings/:MDCLogPath"
  4. Log in as dataverseAdmin
  5. Create a dataset with only one file, publish it (with the fake DOI publisher), and download it with "Access Dataset" without reloading the page or navigating away
  6. Look at the log file in /logs, the "identifier" field will be missing, so if you run "main.py" or "counter_daily.sh" from counter-processor-1.05 with that log file the next day, it will fail.
  7. Reload the dataset's page and download it again.
  8. Now the new entry in the log file will contain the identifier.

While testing this with the v6.5 compose.yml file, I also noticed that the home page doesn't list any of the datasets I created.

This is my log file. The first 3 entries are from pressing "Access Dataset" without reloading the page first. The 4th entry is from reloading the page, and the other 2 are from pressing "Access Dataset" after reloading the page.

2025-02-10T12:44:32+0000 172.18.0.1 fb8308858206b87b4145faadb4c3 - @dataverseAdmin /api/access/datafile/20 - local://194efe364fc-4bb612cac37c 7040 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0 - - - - - - - /api/access/datafile/20 -
2025-02-10T12:44:40+0000 172.18.0.1 fb8308858206b87b4145faadb4c3 - @dataverseAdmin /api/access/datafile/20 - local://194efe364fc-4bb612cac37c 7040 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0 - - - - - - - /api/access/datafile/20 -
2025-02-10T12:45:32+0000 172.18.0.1 fb8308858206b87b4145faadb4c3 - @dataverseAdmin /api/access/datafile/20 - local://194efe364fc-4bb612cac37c 7040 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0 - - - - - - - /api/access/datafile/20 -
2025-02-10T12:45:36+0000 172.18.0.1 fb8308858206b87b4145faadb4c3 - @dataverseAdmin /dataset.xhtml?persistentId=doi%3A10.5072%2FFK2%2FZGZHY9&version=DRAFT doi:10.5072/FK2/ZGZHY9 - - Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0 d7 grid tbd Admin, Dataverse 2025-02-10T12:44:14Z 1 - /dataset.xhtml?persistentId=doi%3A10.5072%2FFK2%2FZGZHY9&version=DRAFT 2025
2025-02-10T12:45:43+0000 172.18.0.1 fb8308858206b87b4145faadb4c3 - @dataverseAdmin /api/access/datafile/20 doi:10.5072/FK2/ZGZHY9 local://194efe364fc-4bb612cac37c 7040 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0 d7 grid tbd Admin, Dataverse 2025-02-10T12:44:14Z 1 - /api/access/datafile/20 2025
2025-02-10T12:45:52+0000 172.18.0.1 fb8308858206b87b4145faadb4c3 - @dataverseAdmin /api/access/datafile/20 doi:10.5072/FK2/ZGZHY9 local://194efe364fc-4bb612cac37c 7040 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0 d7 grid tbd Admin, Dataverse 2025-02-10T12:44:14Z 1 - /api/access/datafile/20 2025

@Odysseas640 Odysseas640 added the Type: Bug a defect label Feb 10, 2025
@pdurbin
Copy link
Member

pdurbin commented Feb 10, 2025

@Odysseas640 thanks for bringing this to our attention.

@stevenwinship is working on MDC right now in this issue:

He mentioned some bugs he's fixing in Counter Processor (we recently forked it to https://github.com/gdcc/counter-processor if you aren't aware). Let me check with him if this is one of them.

@qqmyers
Copy link
Member

qqmyers commented Feb 10, 2025

FWIW - my guess at the root cause here is some caching - the call understands that the file is now released, and so writes a line, but it still has the old datasetversion state as draft, and so does not add its DOI. That could be fixed, or might go away with the SPA, but it is probably OK if counter-processor just ignores such a line. (Per the specs, we don't count downloads of files in draft versions, and this download, while coming from a newly published dataset, would have to be by the person who could have just downloaded from the draft version a moment earlier, so perhaps not a download that needs to be counted.)

@stevenwinship
Copy link
Contributor

It looks like a known issue as this is called from writeGuestbookResponseRecord(GuestbookResponse guestbookResponse)
and per the MDC call has this in the comments:

//This version of the constructor is for the downloads tracked in FileDownloadServiceBean
//Technically you should be able to get to publishedVersion via the data file, but guestbook's datafile doesn't have that info
//This is passed a DataFile to log the file downloaded
public MakeDataCountEntry(FacesContext fc, DataverseRequestServiceBean dvRequestService, DatasetVersion publishedVersion, DataFile df) {

@stevenwinship
Copy link
Contributor

Both guestbookResponse.getDatasetVersion(); and guestbookResponse.getDataset().getReleasedVersion(); return null

I can make the fix to counter_processor to ignore these lines

@stevenwinship
Copy link
Contributor

stevenwinship commented Feb 20, 2025

counter-processor 1.06 has been released with the identifier bug fix

https://github.com/gdcc/counter-processor/releases/tag/v1.06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

No branches or pull requests

4 participants