Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][tests] add precomputation tests #234

Merged
merged 25 commits into from
Feb 24, 2025
Merged

[WIP][tests] add precomputation tests #234

merged 25 commits into from
Feb 24, 2025

Conversation

sayakpaul
Copy link
Collaborator

@sayakpaul sayakpaul commented Jan 21, 2025

Adds precomputation tests.

Currently, I have changed the bare-minimum to show the approach taken for the tests. After I have some reviews, I will propagate the changes to the rest of the supported models and make the PR ready for further reviews.

Some further comments in-line.

To run the tests from DGX or any other internal CUDA machines without using CUDA, run:

CUDA_VISIBLE_DEVICES="" pytest tests/trainers/

Just LMK if you want something change before proceeding to review at this stage of the PR. I will make it happen.

TODOs

  • LTX
  • HunyuanVideo
  • Configure runner and action

Comment on lines 18 to 21
try:
tokenizer = T5Tokenizer.from_pretrained(model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir)
except:
tokenizer = AutoTokenizer.from_pretrained(model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super proud of this but we cannot do T5Tokenizer on the dummy T5 tokenizer checkpoint. Some sentencepiece error.

@sayakpaul sayakpaul requested a review from a-r-r-o-w January 21, 2025 08:30
Comment on lines +18 to +25
try:
tokenizer = T5Tokenizer.from_pretrained(
model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir
)
except: # noqa
tokenizer = AutoTokenizer.from_pretrained(
model_id, subfolder="tokenizer", revision=revision, cache_dir=cache_dir
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not proud of the change but T5Tokenizer cannot be used on a dummy T5 tokenizer ckpt.

@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w LMK what you think of the latest changes.

Copy link
Owner

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sayak. The changes look good

@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w LMK if I can apply the tests to the rest of the models. I have taken care of addressing the rest of your feedback.

@a-r-r-o-w
Copy link
Owner

Yes please, lgtm

@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w I have now completed an initial test suite for precomputation for all supported models for T2V, an essential part of finetrainers!

The three tests are:

  • test_precomputation_txt_format_creates_files() -- ensures the expected number of files are created (and also logging is raised as needed).
  • test_precomputation_txt_format_matches_shapes() -- as the name suggests, this test ensures that the created files meet expected shapes.
  • test_precomputation_txt_format_no_redo() -- ensures precomputation is skipped when present with a check on the logger to raise "Precomputed conditions and latents found. Loading precomputed data".

I believe this should be sufficient for now but LMK.

I haven't yet configured the runner for CI. I want to do that after another round of review and it should be good to go.

@sayakpaul sayakpaul requested a review from a-r-r-o-w January 30, 2025 07:11
@sayakpaul sayakpaul marked this pull request as ready for review January 30, 2025 07:12
@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w a gentle ping.

@a-r-r-o-w
Copy link
Owner

Missed the email update on previous message, so apologies for delay. Thanks for working on improving this a lot. Will test and review very soon. The test suite will be super helpful for #245 as well 🤗

Copy link
Owner

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sayak! Feel free to merge :)

@a-r-r-o-w
Copy link
Owner

Will do a release after this PR. Finally found some time over the weekend to work on some pain points of parallel PR and fixed a majority of the bugs, so I'm getting closer to merging it to main

@sayakpaul
Copy link
Collaborator Author

Thanks!

Will add CI for these tests in a follow-up.

Also, FYI, I excluded examples/_legacy from the formatting checks.

@sayakpaul sayakpaul merged commit 61d14a7 into main Feb 24, 2025
1 check passed
@sayakpaul sayakpaul deleted the add-precompute-tests branch February 24, 2025 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants