Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs #158

a-r-r-o-w · 2024-12-27T20:57:28Z

No description provided.

sayakpaul · 2024-12-28T02:18:38Z

finetrainers/dataset.py

        #   - Using a CSV: caption_column and video_column must be some column in the CSV. One could
        #     make use of other columns too, such as a motion score or aesthetic score, by modifying the
        #     logic in CSV processing.
        #   - Using two files containing line-separate captions and relative paths to videos.
+        #   - Using a JSON file containing a list of dictionaries, where each dictionary has a `caption_column` and `video_column` key.


Like this?

[{"prompt": ..., "video": ...}, ...]

Yep. An example dataset would be this: https://huggingface.co/datasets/omni-research/DREAM-1K

sayakpaul · 2024-12-28T02:19:31Z

finetrainers/dataset.py

+        # Clean LLM start phrases
+        for i in range(len(self.prompts)):
+            self.prompts[i] = self.prompts[i].strip()
+            for phrase in COMMON_LLM_START_PHRASES:
+                if self.prompts[i].startswith(phrase):
+                    self.prompts[i] = self.prompts[i].removeprefix(phrase).strip()


Should this be user-configured?

And maybe we should also note this from our data-prep guide?

sayakpaul

Thanks!

finetrainers/trainer.py

finetrainers/dataset.py

a-r-r-o-w · 2025-01-02T11:37:36Z

After the HunyuanVideo fixes, the precomputed vs non-precomputed runs match almost exactly when starting with the same parameters. The weights converge to very similar values and the validation videos demonstrate this as well: https://api.wandb.ai/links/aryanvs/3aixk4xk

sayakpaul · 2025-01-04T03:26:22Z

Is this ready for another review?

a-r-r-o-w · 2025-01-04T03:42:40Z

No.

a-r-r-o-w · 2025-01-05T03:09:56Z

Since this makes some changes to the README, would prefer to merge after #175. Will move the dataset.md file into the docs/ folder too here, because currently it's in assets/

sayakpaul · 2025-01-05T13:54:32Z

SG!

LMK if you would like me to review, too.

a-r-r-o-w · 2025-01-05T16:18:56Z

finetrainers/trainer.py

@@ -955,7 +956,9 @@ def validate(self, step: int, final_validation: bool = False) -> None:
                width=width,
                num_frames=num_frames,
                num_videos_per_prompt=self.args.num_validation_videos_per_prompt,
-                generator=self.state.generator,
+                generator=torch.Generator(device=accelerator.device).manual_seed(


Here, if we use the state, we get different validation images/videos every time. Not very indicative of training working so we want to ensure each generation starts with the same seed

a-r-r-o-w · 2025-01-05T16:20:37Z

finetrainers/hunyuan_video/hunyuan_video_lora.py

    revision: Optional[str] = None,
    cache_dir: Optional[str] = None,
    **kwargs,
 ) -> Dict[str, Union[nn.Module, FlowMatchEulerDiscreteScheduler]]:
    transformer = HunyuanVideoTransformer3DModel.from_pretrained(
        model_id, subfolder="transformer", torch_dtype=transformer_dtype, revision=revision, cache_dir=cache_dir
    )
-    scheduler = FlowMatchEulerDiscreteScheduler()
+    scheduler = FlowMatchEulerDiscreteScheduler(shift=shift)


HunyuanVideo uses 7.0 as the flow shift for inference. By default this value is 1.0, which corresponds to the original flow matching objective, but there has been reports of success and even better results with varying values of shift, so I think makes sense to support

a-r-r-o-w · 2025-01-05T16:21:07Z

finetrainers/dataset.py



 logger = get_logger(__name__)


-class VideoDataset(Dataset):
+# TODO(aryan): This needs a refactor with separation of concerns.


Dataset part needs a complete rewrite. Will take it up in follow-up PRs

a-r-r-o-w · 2025-01-05T16:22:47Z

finetrainers/args.py

            },
            "dataloader_arguments": {
                "dataloader_num_workers": self.dataloader_num_workers,
                "pin_memory": self.pin_memory,
            },
+            "diffusion_arguments": {
+                "flow_resolution_shifting": self.flow_resolution_shifting,


I still haven't had sucess with flow_resolution_shifting (needs to be added for LTX I believe), so we still don't handle the case of adjusting sigmas when this is specified. Will add the actual logic that makes use of it in the image-to-video PR after further iterations

a-r-r-o-w · 2025-01-05T16:24:09Z

@sayakpaul Ready for another review. Doing a small run to check if the validation generator changes work as expected

a-r-r-o-w · 2025-01-05T16:27:59Z

finetrainers/trainer.py

                artifact_type = value["type"]
                artifact_value = value["value"]
                if artifact_type not in ["image", "video"] or artifact_value is None:
                    continue

                extension = "png" if artifact_type == "image" else "mp4"
                filename = "validation-" if not final_validation else "final-"
-                filename += f"{step}-{accelerator.process_index}-{prompt_filename}.{extension}"
+                filename += f"{step}-{accelerator.process_index}-{index}-{prompt_filename}.{extension}"


Just caught my eye. If we use the same prompt multiple times, but want to validate at different resolutions, it might end up using the same filename I think. This should be safer imo

a-r-r-o-w · 2025-01-06T13:19:17Z

I can confirm that the validation generator changes work as expected. Here's demo video using the same starting generator across different training steps:

output.mp4

a-r-r-o-w · 2025-01-06T13:22:39Z

@sayakpaul Proceeding with merge here as two folks who've helped with initial feedback wanted to try this with FP8 training. Currently, creating a common branch with these changes and the one for FP8 creates a merge conflict so I will resolve it for them in the FP8 branch. If you have any suggestions on things to improve here, happy to iterate in future PR

update

f1552e8

a-r-r-o-w requested a review from sayakpaul December 27, 2024 20:57

sayakpaul reviewed Dec 28, 2024

View reviewed changes

sayakpaul approved these changes Dec 28, 2024

View reviewed changes

update

063929c

a-r-r-o-w changed the title ~~Remove LLM generated prefixes and allowing loading from JSON~~ Allow images to be loaded; Remove LLM generated prefixes; Allow loading from JSON Dec 30, 2024

fix

52d172e

a-r-r-o-w changed the title ~~Allow images to be loaded; Remove LLM generated prefixes; Allow loading from JSON~~ Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs Dec 31, 2024

a-r-r-o-w commented Dec 31, 2024

View reviewed changes

finetrainers/trainer.py Show resolved Hide resolved

a-r-r-o-w commented Dec 31, 2024

View reviewed changes

finetrainers/dataset.py Outdated Show resolved Hide resolved

a-r-r-o-w added 3 commits December 31, 2024 21:07

Update finetrainers/dataset.py

b624d69

update

5696111

Merge branch 'main' into dataset-improvements

0e45b1b

sayakpaul mentioned this pull request Jan 4, 2025

scheduler fixes part ii #178

Merged

a-r-r-o-w added 3 commits January 5, 2025 06:10

Merge branch 'main' into dataset-improvements

36b6d56

argument for enabling remove of common llm prefixes

c20d2b5

update

49a3670

a-r-r-o-w added 4 commits January 5, 2025 16:54

make new generator for validation

84fbcb3

Merge branch 'main' into dataset-improvements

29ab882

update

b05d830

update

cee97b8

a-r-r-o-w commented Jan 5, 2025

View reviewed changes

a-r-r-o-w requested a review from sayakpaul January 5, 2025 16:23

update

45cebd7

a-r-r-o-w commented Jan 5, 2025

View reviewed changes

update

33a8f6b

a-r-r-o-w force-pushed the dataset-improvements branch from 87b848d to 33a8f6b Compare January 5, 2025 21:24

update

0db99be

a-r-r-o-w merged commit 38413aa into main Jan 6, 2025
1 check passed

a-r-r-o-w deleted the dataset-improvements branch January 6, 2025 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs #158

Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs #158

a-r-r-o-w commented Dec 27, 2024

sayakpaul Dec 28, 2024

a-r-r-o-w Jan 5, 2025

sayakpaul Dec 28, 2024

sayakpaul left a comment

a-r-r-o-w commented Jan 2, 2025

sayakpaul commented Jan 4, 2025

a-r-r-o-w commented Jan 4, 2025

a-r-r-o-w commented Jan 5, 2025

sayakpaul commented Jan 5, 2025

a-r-r-o-w Jan 5, 2025

a-r-r-o-w Jan 5, 2025

a-r-r-o-w Jan 5, 2025

a-r-r-o-w Jan 5, 2025

a-r-r-o-w commented Jan 5, 2025

a-r-r-o-w Jan 5, 2025

a-r-r-o-w commented Jan 6, 2025

a-r-r-o-w commented Jan 6, 2025

Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs #158

Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs #158

Conversation

a-r-r-o-w commented Dec 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

a-r-r-o-w commented Jan 2, 2025

sayakpaul commented Jan 4, 2025

a-r-r-o-w commented Jan 4, 2025

a-r-r-o-w commented Jan 5, 2025

sayakpaul commented Jan 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-r-r-o-w commented Jan 5, 2025

Choose a reason for hiding this comment

a-r-r-o-w commented Jan 6, 2025

a-r-r-o-w commented Jan 6, 2025