Rationale behind converting proj_out of FluxSingleTransformerBlock to ConcatLinear #43

vinovo · 2025-01-29T04:12:02Z

We can see only proj_out of FluxSingleTransformerBlock are converted to ConcatLinear with only a single split [module.proj_out.out_features].

Can anyone help explain the reasoning behind this?

Why we only cares about proj_out of FluxSingleTransformerBlock. Should this operation be performed on all transformer based diffusion models?
Why do we only care to create one single split? What is the benefit of using ConcatLinear?

Thank you in advance.

The text was updated successfully, but these errors were encountered:

synxlin · 2025-02-14T05:13:57Z

Hi @vinovo ,

FluxSingleTransformerBlock parallelizes Attention and Feed Forward Network (FFN), and thus fuses the to_out linear layer in Attention Block and second linear layer in FFN into one proj_out layer. We split this proj_out layer back to two linear layers, one used in Attention Block and the other used in FC layer, following the Nunchaku implementation.

lmxyy added question Further information is requested svdquant labels Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale behind converting proj_out of FluxSingleTransformerBlock to ConcatLinear #43

Rationale behind converting proj_out of FluxSingleTransformerBlock to ConcatLinear #43

vinovo commented Jan 29, 2025 •

edited

Loading

synxlin commented Feb 14, 2025

Rationale behind converting proj_out of FluxSingleTransformerBlock to ConcatLinear #43

Rationale behind converting proj_out of FluxSingleTransformerBlock to ConcatLinear #43

Comments

vinovo commented Jan 29, 2025 • edited Loading

synxlin commented Feb 14, 2025

vinovo commented Jan 29, 2025 •

edited

Loading