Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for WebDataset #4

Open
2 of 3 tasks
jbilcke-hf opened this issue Mar 3, 2025 · 0 comments
Open
2 of 3 tasks

Add support for WebDataset #4

jbilcke-hf opened this issue Mar 3, 2025 · 0 comments
Assignees
Labels
feature request New feature or request update available Feature of fix is pushed but needs testing

Comments

@jbilcke-hf
Copy link
Owner

jbilcke-hf commented Mar 3, 2025

Context

When working with hundreds of videos in VMS, we often have to resort to uploading multiple .zip files (eg. 1 GB each, to avoid mega-files)

This practice of having multiple archives containing .mp4 videos + .txt captions is nearly identical to the WebDataset file format, which is designed for large AI/ML training datasets.

Proposal

  • Add basic support for uploading/importing WebDataset
  • Implement end-to-end support for WebDataset (see branch webdataset)
  • Propose the support of WebDataset into Finetrainers

For point 2, here end-to-end support means performing all our processing and transformations (black band removal, captioning..) inside the WebDataset space, instead of the OS file system.

While using WebDataset internally doesn't automatically allow to train datasets greater than what Finetrainers can support, the idea is more about having a long-term vision for VMS to be architecturally independent and adopt future-proof design.

The vision for VMS is to be a standalone app that can be used for annotation only, and to potentially support alternative training backends (Job API, Replicate, Fal, diffusion-pipe etc).

@jbilcke-hf jbilcke-hf added the feature request New feature or request label Mar 3, 2025
@jbilcke-hf jbilcke-hf self-assigned this Mar 3, 2025
@jbilcke-hf jbilcke-hf added the update available Feature of fix is pushed but needs testing label Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request update available Feature of fix is pushed but needs testing
Projects
None yet
Development

No branches or pull requests

1 participant