Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete Files From Dataset #11230

Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Feb 7, 2025

What this PR does / why we need it:
This PR adds a deleteFiles method to the native API for datasets, using PUT as the HTTP verb to match the existing delete metadata calls.

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:
Note for posterity: To be able to use DELETE with a payload, Payara has to have an option set:

asadmin set
configs.config.default-config.network-config.protocols.protocol.http-listener-1.http.allow-payload-for-undefined-http-methods=true

The consensus in slack was to not use this approach, but if we decide to in the future, we'd need to include instructions to make this change in Payara.

Suggestions on how to test this:
There's an IT test that creates a dataset with files, try to delete some, publishes the dataset and then tries to delete others. It also tests using a bad fileId or datasetId or using a user w/o permissions. Those or other cases could be manually tested as well. The documentation (in the native api section) gives a curl example.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: included.

Additional documentation:

asadmin set
configs.config.default-config.network-config.protocols.protocol.http-listener-1.http.allow-payload-for-undefined-http-methods=true
@coveralls
Copy link

coveralls commented Feb 7, 2025

Coverage Status

coverage: 22.726% (-0.01%) from 22.736%
when pulling af2bcc5 on GlobalDataverseCommunityConsortium:DANS-bulk_file_delete
into 2210d16 on IQSS:develop.

@qqmyers qqmyers added Size: 3 A percentage of a sprint. 2.1 hours. GDCC: DANS related to GDCC work for DANS SPA These changes are required for the Dataverse SPA labels Feb 8, 2025
@qqmyers qqmyers marked this pull request as ready for review February 8, 2025 13:47
if ((pid == null) && (dvObject instanceof DataFile df)) {
pid = df.getOwner().getGlobalId();
}
pidProvider = PidUtil.getPidProvider(pid.getProviderId());
XMLStreamWriter xmlw = XMLOutputFactory.newInstance().createXMLStreamWriter(outputStream);
xmlw.writeStartElement("resource");
boolean deaccessioned=false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is already in a PR that is in QA. Once it's merged please remove this and merge from develop

@qqmyers qqmyers removed their assignment Feb 11, 2025
@stevenwinship stevenwinship removed their assignment Feb 11, 2025
@cmbz cmbz added the FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) label Feb 12, 2025
@qqmyers qqmyers added this to the 6.6 milestone Feb 12, 2025
@cmbz cmbz added the FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) label Feb 12, 2025
fileIdList.add(((JsonNumber) value).longValue());
}
// Find the files to be deleted
List<FileMetadata> filesToDelete = dataset.getOrCreateEditVersion().getFileMetadatas().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers : maybe require all the file ids listed in the request to be in the dataset, instead of just ignoring the ones not found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't just ignored. Line 5400 below will return a 400 if all the files aren't in the dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but still, it could fail faster and return a list of missing IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could it fail faster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I propose is compiling the list of file IDs in the request that are not in the dataset and returning the 400 status at line 5395 with in the body a message like "Files not found in dataset: [..ids..]". If I understand the code correctly, the failure will currently be thrown out of the commandEngine.submit() call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently thrown out in the second of the next two if clauses. The first checks to see if there were no files that match, and the second checks to see if there are fewer files to delete than were in the original list. (Basically unless all files are in the latest dataset version, you get a 400 error, but you'll get different messages depending on whether there are none or some.)

In general, I'm hesitant to do extra work to help an API caller correct their input - any script using the call should be getting the right info to send, and the list of file ids in the dataset is available through other calls. That said, if there's a use case where it is hard for a script to figure out what to send, we could return more details. Perhaps even send the ones that were found instead since that's really the useful list?

@ofahimIQSS ofahimIQSS self-assigned this Feb 18, 2025
@ofahimIQSS
Copy link
Contributor

I see continuous integration keeps failing for this PR.

@ofahimIQSS
Copy link
Contributor

Testing Passed in internal - no issues found. Merging PR

@ofahimIQSS ofahimIQSS merged commit b0d136c into IQSS:develop Feb 25, 2025
12 checks passed
@ofahimIQSS ofahimIQSS removed their assignment Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) GDCC: DANS related to GDCC work for DANS Size: 3 A percentage of a sprint. 2.1 hours. SPA These changes are required for the Dataverse SPA
Projects
Status: Merged 🚀
Development

Successfully merging this pull request may close these issues.

6 participants