You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Opening the issue per recent discussion with/by request from @sbarbosadataverse.
The feature has been long overdue and will be most useful to have.
Short version: thousands+ of files in a Dataset causes all sorts of issues. Most users would be better off with the data repackaged in fewer large bundles. However, we do not make any attempt to stop users from uploading unlimited numbers of files; resulting in problems for them and for us. So, we need to start doing this asap.
More detailed justification/rationale: Dataverse is not particularly good at handling datasets with large numbers (more than some hundreds, really) of files. It's not likely that the SPA will fully fix, or even dramatically address this issue. (Datasets with thousands or tens of thousands - as the case may be with a few monstrous cases at IQSS - are inherently hard for humans to page through and otherwise unmanageable; and there are inherent performance problems with such datasets in the API as well). In reality, it is exceptionally rare that any depositor has an actual need to have that many separate files (a hard requirement for the individual files to have DOIs would be one such scenario). Most people would be 100% better off with their data repackaged as fewer, larger bundles. But, since we don't make any effort to stop them, users just upload however many files they have without understanding the consequences.
In part this could be addressed by better education/documentation/warnings. But ultimately, we need a mechanism for enforcing a hard stop when a pre-set limit is reached. In other words, it should be implemented along the same lines as the support for storage quotas by size, that we now have: when the limit is reached, the dataset becomes read-only, until/unless some files are deleted. Just like with size quotas, it should be possible to set this limit for the instance as a whole, with an option to override it by configuring specific limits for sub-collections or even individual datasets, if needed.
The feature is important enough that I believe investing into adding it to the "old", jsf-based UI is entirely justified. I also believe that the amount of work needed in the UI would be very manageable, since the functionality can be built around/added to the already existing size-based quota limits.
The text was updated successfully, but these errors were encountered:
Opening the issue per recent discussion with/by request from @sbarbosadataverse.
The feature has been long overdue and will be most useful to have.
Short version: thousands+ of files in a Dataset causes all sorts of issues. Most users would be better off with the data repackaged in fewer large bundles. However, we do not make any attempt to stop users from uploading unlimited numbers of files; resulting in problems for them and for us. So, we need to start doing this asap.
More detailed justification/rationale: Dataverse is not particularly good at handling datasets with large numbers (more than some hundreds, really) of files. It's not likely that the SPA will fully fix, or even dramatically address this issue. (Datasets with thousands or tens of thousands - as the case may be with a few monstrous cases at IQSS - are inherently hard for humans to page through and otherwise unmanageable; and there are inherent performance problems with such datasets in the API as well). In reality, it is exceptionally rare that any depositor has an actual need to have that many separate files (a hard requirement for the individual files to have DOIs would be one such scenario). Most people would be 100% better off with their data repackaged as fewer, larger bundles. But, since we don't make any effort to stop them, users just upload however many files they have without understanding the consequences.
In part this could be addressed by better education/documentation/warnings. But ultimately, we need a mechanism for enforcing a hard stop when a pre-set limit is reached. In other words, it should be implemented along the same lines as the support for storage quotas by size, that we now have: when the limit is reached, the dataset becomes read-only, until/unless some files are deleted. Just like with size quotas, it should be possible to set this limit for the instance as a whole, with an option to override it by configuring specific limits for sub-collections or even individual datasets, if needed.
The feature is important enough that I believe investing into adding it to the "old", jsf-based UI is entirely justified. I also believe that the amount of work needed in the UI would be very manageable, since the functionality can be built around/added to the already existing size-based quota limits.
The text was updated successfully, but these errors were encountered: