Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: (internal request) Add quota-like limit on the number of files in a dataset #11275

Open
landreev opened this issue Feb 19, 2025 · 1 comment
Assignees
Labels
Size: 80 A percentage of a sprint. 56 hours. Type: Feature a feature request

Comments

@landreev
Copy link
Contributor

Opening the issue per recent discussion with/by request from @sbarbosadataverse.
The feature has been long overdue and will be most useful to have.

Short version: thousands+ of files in a Dataset causes all sorts of issues. Most users would be better off with the data repackaged in fewer large bundles. However, we do not make any attempt to stop users from uploading unlimited numbers of files; resulting in problems for them and for us. So, we need to start doing this asap.

More detailed justification/rationale: Dataverse is not particularly good at handling datasets with large numbers (more than some hundreds, really) of files. It's not likely that the SPA will fully fix, or even dramatically address this issue. (Datasets with thousands or tens of thousands - as the case may be with a few monstrous cases at IQSS - are inherently hard for humans to page through and otherwise unmanageable; and there are inherent performance problems with such datasets in the API as well). In reality, it is exceptionally rare that any depositor has an actual need to have that many separate files (a hard requirement for the individual files to have DOIs would be one such scenario). Most people would be 100% better off with their data repackaged as fewer, larger bundles. But, since we don't make any effort to stop them, users just upload however many files they have without understanding the consequences.

In part this could be addressed by better education/documentation/warnings. But ultimately, we need a mechanism for enforcing a hard stop when a pre-set limit is reached. In other words, it should be implemented along the same lines as the support for storage quotas by size, that we now have: when the limit is reached, the dataset becomes read-only, until/unless some files are deleted. Just like with size quotas, it should be possible to set this limit for the instance as a whole, with an option to override it by configuring specific limits for sub-collections or even individual datasets, if needed.

The feature is important enough that I believe investing into adding it to the "old", jsf-based UI is entirely justified. I also believe that the amount of work needed in the UI would be very manageable, since the functionality can be built around/added to the already existing size-based quota limits.

@landreev landreev added the Type: Feature a feature request label Feb 19, 2025
@cmbz
Copy link

cmbz commented Feb 24, 2025

2025/02/24

@cmbz cmbz added the Size: 80 A percentage of a sprint. 56 hours. label Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 80 A percentage of a sprint. 56 hours. Type: Feature a feature request
Projects
Status: No status
Development

No branches or pull requests

4 participants