-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[do not merge] Rebased refactor #270
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Will Lin <[email protected]>
Co-authored-by: Peiyuan Zhang <[email protected]>
Signed-off-by: <> Co-authored-by: Will Lin <[email protected]> Co-authored-by: Ubuntu <ubuntu@awesome-gpu-name-8-inst-2tbsnfodvpomxv4tukw2dkfgyvz.c.nv-brev-20240723.internal> Co-authored-by: Ubuntu <ubuntu@awesome-gpu-name-9-inst-2tpydiudxfu1jg9xvpflm7oexie.c.nv-brev-20240723.internal>
…/FastVideo into rebased-refactor
Add wan dit
# Copyright 2024 Stability AI, Katherine Crowson and The HuggingFace Team. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
# | ||
# Modified from diffusers==0.29.2 | ||
# | ||
# ============================================================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for licensing:
- All files need Apache license headers. It is recommended to use a simplified one-liner:
# SPDX-License-Identifier: Apache-2.0
- For the files copied from other projects, if that project is in Apache-2.0 as well (e.g. vLLM/SGLang), then just
# SPDX-License-Identifier: Apache-2.0
# Adapted from: <link to the original file>
- For the files copied from other projects, if that project is in other licenses (e.g. Copyright), you need to read their copyright and make sure you can copy it. If so, then the header could be
# Copyright 2024 Stability AI, Katherine Crowson and The HuggingFace Team. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# Adapted from: <link to the original file>
One way to eliminate the full license in every files you copied, you could have license files for each project, such as https://github.com/awslabs/raf/tree/main/licenses
modules["text_encoder"] = text_encoder | ||
modules["text_encoder_2"] = text_encoder_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are those modules used and how these keys (e.g., text_encoder_2
) are determined? A more general question is if a model provider wants to implement a pipeline, does it trivial for them to know how should they implement this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question I think as my above one
self.add_stage("decoding_stage", | ||
DecodingStage()) | ||
|
||
def initialize_pipeline(self, inference_args: InferenceArgs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. It seems not clear about what to initialize in this function.
# for stage in self._stages: | ||
# batch = stage(batch, inference_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely prefer this programming model; otherwise it's meaningless to have stages.
If users want to custom logic between stages, you should ask them to implement a custom stage (so ofc the API of wrapping custom logic to be a pipeline stage has to be clean and simple), and add that stage to the pipeline; otherwise it's hard to apply pipeline-level optimizations because you may not have the fully picture.
Returns: | ||
The updated batch information after this stage's processing. | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should raise NotImplementedError?
- Also the name
_call_implementation
seems not clear and concise enough. Maybe justcall()
(it's better to make this a public API without the underline prefix)?
# Just call the implementation directly if logging is disabled | ||
return self._call_implementation(batch, inference_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better user experience it's better to validate input and output batch (e.g., check expected keys) before and after calling the implementation.
prompt_embeds_2 = batch.prompt_embeds_2 | ||
|
||
# Run denoising loop | ||
with self.progress_bar(total=num_inference_steps) as progress_bar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not handle display stuffs in this function.
# TODO(will): finalize the interface | ||
with set_forward_context(num_step=i, inference_args=inference_args): | ||
# Run transformer | ||
noise_pred = self.transformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is confusing, because it means this stage cannot be used without pipeline. Ideally every stage should be self-contained. For example
def __init__(self, ...):
self.transformers = ...
def _call_implementation(self, ...):
batch = self.transformers(...)
Co-authored-by: kevin314 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inference_hunyuan_v1 instead
of v1_inference_hunyuan
?
predict.py
Outdated
@@ -12,7 +12,7 @@ | |||
import torchvision | |||
from cog import BasePredictor, Input, Path | |||
from einops import rearrange | |||
|
|||
from transformers import LlamaForCausalLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think you indeed need a space to make import
three sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run linting and code style formatter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can delete this file? @SolitaryThinker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need a separate predction.py
?.
- First, this file feels like a bit out of place.
- Second, remember we are pursuing a cli like interface like
fastvideo generate --args
env_setup.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we just put dependency in toml and remove this file
@@ -0,0 +1,213 @@ | |||
# SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if you have done this, but if this is copied/adapted from vLLM, let's mark the source file in the header?
def load(self, model_path: str, architecture: str, inference_args: InferenceArgs): | ||
"""Load the scheduler based on the model path, architecture, and inference args.""" | ||
|
||
scheduler = get_scheduler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just educate me: what do we need to load for scheduler?
fastvideo/v1/models/text_encoder.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about the encoder design here. We have a models/encoder
folder where we put different encoder models, yet we have a separate text_encoder.py
. What are their boundaries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, why we need a composed
package? This package is rather thin. Seems to only provide a abstraction on the ComposePipeline
object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a simple structure is just like:
pipllines/
-- composed_pipeline_base.py
-- hunyuan_pipeline.py
-- pipeline_regstry.py
-- stages/
...?
# prompt_template | ||
prompt_template = PROMPT_TEMPLATE["image"] | ||
|
||
# prompt_template_video |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel the constant.py
is unnecessary; could just put everything into one hunyuan_pipeline.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, i think it's not good to split
|
||
# or | ||
|
||
batch = self.input_validation_stage(batch, inference_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @comaniac. I am confused here -- in the above you already have clearly defined stages? so the block block below could just be discarded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with others comment about avoiding the add_stage API compared to something more fluent
|
||
class HunyuanVideoPipeline(ComposedPipelineBase): | ||
|
||
def required_config_modules(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a func and not a property or a class var?
def initialize_encoders(self, inference_args: InferenceArgs): | ||
self.initialize_encoders_v1(inference_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like boilerplate?
# prompt_template | ||
prompt_template = PROMPT_TEMPLATE["image"] | ||
|
||
# prompt_template_video |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, i think it's not good to split
encoder_1.to(dtype=PRECISION_TO_TYPE[inference_args.text_encoder_precision]) | ||
encoder_1.requires_grad_(False) | ||
|
||
print(f"keys: {self.modules.keys()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
device=inference_args.device if not inference_args.use_cpu_offload else "cpu", | ||
) | ||
|
||
encoder_2 = self.modules.pop("text_encoder_2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have to pop
text_encoder_2 = TextEncoder( | ||
text_encoder=encoder_2, | ||
tokenizer=tokenizer_2, | ||
# text_encoder_type="text_encoder_2", | ||
max_length=inference_args.text_len_2, | ||
# text_encoder_precision=inference_args.text_encoder_precision, | ||
device=inference_args.device if not inference_args.use_cpu_offload else "cpu", | ||
) | ||
self.modules["text_encoder"] = text_encoder | ||
self.modules["text_encoder_2"] = text_encoder_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you do this pop
and set
thing? Feels like you should have something more functional that doesn't make the user do all of this boilerplate -- like you set device
in 2 places, requires_grad_ and finding the tokenizer is basically boilerplate
modules["text_encoder"] = text_encoder | ||
modules["text_encoder_2"] = text_encoder_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question I think as my above one
[ ] Attn backend (PY)
[ ] Wan text encoder (PY)
[ ] wan vae (Wei)
[ ] wan pipeline (PY & Wei)
[ ] Merge wan dit code to refactor (Will)
[ ] Clean up code & loader directory(Will)
[ ] hunyuan text encoder (Will)