Feature Request: (internal request) Add quota-like limit on the number of files in a dataset #11275

landreev · 2025-02-19T17:58:34Z

Opening the issue per recent discussion with/by request from @sbarbosadataverse.
The feature has been long overdue and will be most useful to have.

Short version: thousands+ of files in a Dataset causes all sorts of issues. Most users would be better off with the data repackaged in fewer large bundles. However, we do not make any attempt to stop users from uploading unlimited numbers of files; resulting in problems for them and for us. So, we need to start doing this asap.

More detailed justification/rationale: Dataverse is not particularly good at handling datasets with large numbers (more than some hundreds, really) of files. It's not likely that the SPA will fully fix, or even dramatically address this issue. (Datasets with thousands or tens of thousands - as the case may be with a few monstrous cases at IQSS - are inherently hard for humans to page through and otherwise unmanageable; and there are inherent performance problems with such datasets in the API as well). In reality, it is exceptionally rare that any depositor has an actual need to have that many separate files (a hard requirement for the individual files to have DOIs would be one such scenario). Most people would be 100% better off with their data repackaged as fewer, larger bundles. But, since we don't make any effort to stop them, users just upload however many files they have without understanding the consequences.

In part this could be addressed by better education/documentation/warnings. But ultimately, we need a mechanism for enforcing a hard stop when a pre-set limit is reached. In other words, it should be implemented along the same lines as the support for storage quotas by size, that we now have: when the limit is reached, the dataset becomes read-only, until/unless some files are deleted. Just like with size quotas, it should be possible to set this limit for the instance as a whole, with an option to override it by configuring specific limits for sub-collections or even individual datasets, if needed.

The feature is important enough that I believe investing into adding it to the "old", jsf-based UI is entirely justified. I also believe that the amount of work needed in the UI would be very manageable, since the functionality can be built around/added to the already existing size-based quota limits.

cmbz · 2025-02-24T15:33:57Z

2025/02/24

@sbarbosadataverse will convene a meeting with @scolapasta and @landreev and @jggautier to get a sense of the number of files that tend to affect user experience.

landreev added the Type: Feature a feature request label Feb 19, 2025

cmbz assigned sbarbosadataverse and scolapasta Feb 24, 2025

cmbz added this to IQSS Dataverse Project Feb 24, 2025

cmbz added the Size: 80 A percentage of a sprint. 56 hours. label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: (internal request) Add quota-like limit on the number of files in a dataset #11275

Feature Request: (internal request) Add quota-like limit on the number of files in a dataset #11275

landreev commented Feb 19, 2025

cmbz commented Feb 24, 2025

Feature Request: (internal request) Add quota-like limit on the number of files in a dataset #11275

Feature Request: (internal request) Add quota-like limit on the number of files in a dataset #11275

Comments

landreev commented Feb 19, 2025

cmbz commented Feb 24, 2025