Delete Files From Dataset #11230

qqmyers · 2025-02-07T20:06:54Z

What this PR does / why we need it:
This PR adds a deleteFiles method to the native API for datasets, using PUT as the HTTP verb to match the existing delete metadata calls.

Which issue(s) this PR closes:

Closes #

Special notes for your reviewer:
Note for posterity: To be able to use DELETE with a payload, Payara has to have an option set:

asadmin set
configs.config.default-config.network-config.protocols.protocol.http-listener-1.http.allow-payload-for-undefined-http-methods=true

The consensus in slack was to not use this approach, but if we decide to in the future, we'd need to include instructions to make this change in Payara.

Suggestions on how to test this:
There's an IT test that creates a dataset with files, try to delete some, publishes the dataset and then tries to delete others. It also tests using a bad fileId or datasetId or using a user w/o permissions. Those or other cases could be manually tested as well. The documentation (in the native api section) gives a curl example.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: included.

Additional documentation:

asadmin set configs.config.default-config.network-config.protocols.protocol.http-listener-1.http.allow-payload-for-undefined-http-methods=true

coveralls · 2025-02-07T20:14:31Z

coverage: 22.726% (-0.01%) from 22.736%
when pulling af2bcc5 on GlobalDataverseCommunityConsortium:DANS-bulk_file_delete
into 2210d16 on IQSS:develop.

doc/sphinx-guides/source/api/native-api.rst

stevenwinship · 2025-02-11T19:16:58Z

src/main/java/edu/harvard/iq/dataverse/pidproviders/doi/XmlMetadataTemplate.java

-        if ((pid == null) && (dvObject instanceof DataFile df)) {
-                pid = df.getOwner().getGlobalId();
-            }
-        pidProvider = PidUtil.getPidProvider(pid.getProviderId());
        XMLStreamWriter xmlw = XMLOutputFactory.newInstance().createXMLStreamWriter(outputStream);
        xmlw.writeStartElement("resource");
        boolean deaccessioned=false;


This change is already in a PR that is in QA. Once it's merged please remove this and merge from develop

janvanmansum · 2025-02-13T07:45:39Z

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

+                fileIdList.add(((JsonNumber) value).longValue());
+            }
+            // Find the files to be deleted
+            List<FileMetadata> filesToDelete = dataset.getOrCreateEditVersion().getFileMetadatas().stream()


@qqmyers : maybe require all the file ids listed in the request to be in the dataset, instead of just ignoring the ones not found?

They aren't just ignored. Line 5400 below will return a 400 if all the files aren't in the dataset.

OK, but still, it could fail faster and return a list of missing IDs.

How could it fail faster?

What I propose is compiling the list of file IDs in the request that are not in the dataset and returning the 400 status at line 5395 with in the body a message like "Files not found in dataset: [..ids..]". If I understand the code correctly, the failure will currently be thrown out of the commandEngine.submit() call.

It is currently thrown out in the second of the next two if clauses. The first checks to see if there were no files that match, and the second checks to see if there are fewer files to delete than were in the original list. (Basically unless all files are in the latest dataset version, you get a 400 error, but you'll get different messages depending on whether there are none or some.)

In general, I'm hesitant to do extra work to help an API caller correct their input - any script using the call should be getting the right info to send, and the list of file ids in the dataset is available through other calls. That said, if there's a use case where it is hard for a script to figure out what to send, we could return more details. Perhaps even send the ones that were found instead since that's really the useful list?

ofahimIQSS · 2025-02-21T15:29:41Z

I see continuous integration keeps failing for this PR.

ofahimIQSS · 2025-02-25T20:05:41Z

Testing Passed in internal - no issues found. Merging PR

bulk file delete - requires Payara option

e731417

asadmin set configs.config.default-config.network-config.protocols.protocol.http-listener-1.http.allow-payload-for-undefined-http-methods=true

qqmyers added 4 commits February 8, 2025 07:57

change to PUT as with other delete metadata calls

e387a6f

docs, release note

626a1b4

IT test

472d4f4

Change test to use existing files - the license jsons we have.

d6df093

qqmyers added Size: 3 A percentage of a sprint. 2.1 hours. GDCC: DANS related to GDCC work for DANS SPA These changes are required for the Dataverse SPA labels Feb 8, 2025

qqmyers marked this pull request as ready for review February 8, 2025 13:47

obsolete code causing test issues

394996e

janvanmansum mentioned this pull request Feb 10, 2025

Apply #11230 to DANS-6.3 branc DANS-KNAW/dataverse#206

Closed

stevenwinship self-assigned this Feb 11, 2025

stevenwinship requested changes Feb 11, 2025

View reviewed changes

doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved

stevenwinship assigned qqmyers Feb 11, 2025

stevenwinship reviewed Feb 11, 2025

View reviewed changes

qqmyers added 2 commits February 11, 2025 14:19

fix dup header per review

910abea

Merge remote-tracking branch 'IQSS/develop' into DANS-bulk_file_delete

3719a91

qqmyers removed their assignment Feb 11, 2025

stevenwinship approved these changes Feb 11, 2025

View reviewed changes

stevenwinship removed their assignment Feb 11, 2025

qqmyers added 3 commits February 11, 2025 16:29

Wrong verb in test

4897396

don't wrap array in test

9198360

fix useful test string compares

fe034b2

cmbz added the FY25 Sprint 16 FY25 Sprint 16 (2025-01-29 - 2025-02-12) label Feb 12, 2025

qqmyers added this to the 6.6 milestone Feb 12, 2025

cmbz added the FY25 Sprint 17 FY25 Sprint 17 (2025-02-12 - 2025-02-26) label Feb 12, 2025

janvanmansum reviewed Feb 13, 2025

View reviewed changes

ofahimIQSS self-assigned this Feb 18, 2025

Merge remote-tracking branch 'IQSS/develop' into DANS-bulk_file_delete

2f6f0f6

qqmyers added 5 commits February 20, 2025 12:23

test fix

dc000e9

Merge remote-tracking branch 'IQSS/develop' into DANS-bulk_file_delete

671bc21

different syntax

3305545

publish dataverse first

db51397

typo - 1 file left

044f0a7

qqmyers added 3 commits February 21, 2025 11:32

use valid file to test unauth

0135d60

403 not 401

a8d3c94

fix cleanup

af2bcc5

ofahimIQSS merged commit b0d136c into IQSS:develop Feb 25, 2025
12 checks passed

ofahimIQSS removed their assignment Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete Files From Dataset #11230

Delete Files From Dataset #11230

qqmyers commented Feb 7, 2025 •

edited

Loading

coveralls commented Feb 7, 2025 •

edited

Loading

stevenwinship Feb 11, 2025

janvanmansum Feb 13, 2025

qqmyers Feb 13, 2025

janvanmansum Feb 13, 2025

qqmyers Feb 13, 2025

janvanmansum Feb 13, 2025

qqmyers Feb 13, 2025

ofahimIQSS commented Feb 21, 2025

ofahimIQSS commented Feb 25, 2025

Delete Files From Dataset #11230

Delete Files From Dataset #11230

Conversation

qqmyers commented Feb 7, 2025 • edited Loading

coveralls commented Feb 7, 2025 • edited Loading

stevenwinship Feb 11, 2025

Choose a reason for hiding this comment

janvanmansum Feb 13, 2025

Choose a reason for hiding this comment

qqmyers Feb 13, 2025

Choose a reason for hiding this comment

janvanmansum Feb 13, 2025

Choose a reason for hiding this comment

qqmyers Feb 13, 2025

Choose a reason for hiding this comment

janvanmansum Feb 13, 2025

Choose a reason for hiding this comment

qqmyers Feb 13, 2025

Choose a reason for hiding this comment

ofahimIQSS commented Feb 21, 2025

ofahimIQSS commented Feb 25, 2025

qqmyers commented Feb 7, 2025 •

edited

Loading

coveralls commented Feb 7, 2025 •

edited

Loading