-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs #158
Conversation
# - Using a CSV: caption_column and video_column must be some column in the CSV. One could | ||
# make use of other columns too, such as a motion score or aesthetic score, by modifying the | ||
# logic in CSV processing. | ||
# - Using two files containing line-separate captions and relative paths to videos. | ||
# - Using a JSON file containing a list of dictionaries, where each dictionary has a `caption_column` and `video_column` key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this?
[{"prompt": ..., "video": ...}, ...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. An example dataset would be this: https://huggingface.co/datasets/omni-research/DREAM-1K
finetrainers/dataset.py
Outdated
# Clean LLM start phrases | ||
for i in range(len(self.prompts)): | ||
self.prompts[i] = self.prompts[i].strip() | ||
for phrase in COMMON_LLM_START_PHRASES: | ||
if self.prompts[i].startswith(phrase): | ||
self.prompts[i] = self.prompts[i].removeprefix(phrase).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be user-configured?
And maybe we should also note this from our data-prep guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
After the HunyuanVideo fixes, the precomputed vs non-precomputed runs match almost exactly when starting with the same parameters. The weights converge to very similar values and the validation videos demonstrate this as well: https://api.wandb.ai/links/aryanvs/3aixk4xk |
Is this ready for another review? |
No. |
Since this makes some changes to the README, would prefer to merge after #175. Will move the dataset.md file into the |
SG! LMK if you would like me to review, too. |
@@ -955,7 +956,9 @@ def validate(self, step: int, final_validation: bool = False) -> None: | |||
width=width, | |||
num_frames=num_frames, | |||
num_videos_per_prompt=self.args.num_validation_videos_per_prompt, | |||
generator=self.state.generator, | |||
generator=torch.Generator(device=accelerator.device).manual_seed( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, if we use the state, we get different validation images/videos every time. Not very indicative of training working so we want to ensure each generation starts with the same seed
revision: Optional[str] = None, | ||
cache_dir: Optional[str] = None, | ||
**kwargs, | ||
) -> Dict[str, Union[nn.Module, FlowMatchEulerDiscreteScheduler]]: | ||
transformer = HunyuanVideoTransformer3DModel.from_pretrained( | ||
model_id, subfolder="transformer", torch_dtype=transformer_dtype, revision=revision, cache_dir=cache_dir | ||
) | ||
scheduler = FlowMatchEulerDiscreteScheduler() | ||
scheduler = FlowMatchEulerDiscreteScheduler(shift=shift) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HunyuanVideo uses 7.0 as the flow shift for inference. By default this value is 1.0, which corresponds to the original flow matching objective, but there has been reports of success and even better results with varying values of shift, so I think makes sense to support
|
||
|
||
logger = get_logger(__name__) | ||
|
||
|
||
class VideoDataset(Dataset): | ||
# TODO(aryan): This needs a refactor with separation of concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataset part needs a complete rewrite. Will take it up in follow-up PRs
}, | ||
"dataloader_arguments": { | ||
"dataloader_num_workers": self.dataloader_num_workers, | ||
"pin_memory": self.pin_memory, | ||
}, | ||
"diffusion_arguments": { | ||
"flow_resolution_shifting": self.flow_resolution_shifting, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still haven't had sucess with flow_resolution_shifting (needs to be added for LTX I believe), so we still don't handle the case of adjusting sigmas when this is specified. Will add the actual logic that makes use of it in the image-to-video PR after further iterations
@sayakpaul Ready for another review. Doing a small run to check if the validation generator changes work as expected |
artifact_type = value["type"] | ||
artifact_value = value["value"] | ||
if artifact_type not in ["image", "video"] or artifact_value is None: | ||
continue | ||
|
||
extension = "png" if artifact_type == "image" else "mp4" | ||
filename = "validation-" if not final_validation else "final-" | ||
filename += f"{step}-{accelerator.process_index}-{prompt_filename}.{extension}" | ||
filename += f"{step}-{accelerator.process_index}-{index}-{prompt_filename}.{extension}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just caught my eye. If we use the same prompt multiple times, but want to validate at different resolutions, it might end up using the same filename I think. This should be safer imo
87b848d
to
33a8f6b
Compare
I can confirm that the validation generator changes work as expected. Here's demo video using the same starting generator across different training steps: output.mp4 |
@sayakpaul Proceeding with merge here as two folks who've helped with initial feedback wanted to try this with FP8 training. Currently, creating a common branch with these changes and the one for FP8 creates a merge conflict so I will resolve it for them in the FP8 branch. If you have any suggestions on things to improve here, happy to iterate in future PR |
No description provided.