[Feature] mask_strategy可以改成可学参数吗？ #219

wangshankun · 2025-02-26T03:26:25Z

Motivation

mask_strategy可以作为一层可学的参数吗？类似MOBA中的gate，在finetune或者distill里面直接学习进去，这样工程适配工作就少了很多

Related resources

No response

BrianChen1129 · 2025-02-27T01:36:48Z

That is a great idea! But we haven't tried it.

foreverpiano · 2025-02-28T05:46:10Z

Yes, a gated sparse pattern would be cool and easy to adapt with.

wangshankun · 2025-02-28T08:34:31Z

That is a great idea! But we haven't tried it.
@BrianChen1129
我想试试，但看起来sta算子实现没有backward，应该可以用FlexAttention先做Finetune吧。
我想请教一下FlexAttention和现在STA算子执行效率有多少差异

foreverpiano · 2025-02-28T08:37:27Z

7.30x vs 10.45x

yinian-lw · 2025-04-01T04:23:45Z

That is a great idea! But we haven't tried it.
@BrianChen1129
我想试试，但看起来sta算子实现没有backward，应该可以用FlexAttention先做Finetune吧。
我想请教一下FlexAttention和现在STA算子执行效率有多少差异

hi～这个你有实现吗，效果怎么样？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] mask_strategy可以改成可学参数吗？ #219

[Feature] mask_strategy可以改成可学参数吗？ #219

wangshankun commented Feb 26, 2025

BrianChen1129 commented Feb 27, 2025

foreverpiano commented Feb 28, 2025

wangshankun commented Feb 28, 2025

foreverpiano commented Feb 28, 2025

yinian-lw commented Apr 1, 2025

[Feature] mask_strategy可以改成可学参数吗？ #219

[Feature] mask_strategy可以改成可学参数吗？ #219

Comments

wangshankun commented Feb 26, 2025

Motivation

Related resources

BrianChen1129 commented Feb 27, 2025

foreverpiano commented Feb 28, 2025

wangshankun commented Feb 28, 2025

foreverpiano commented Feb 28, 2025

yinian-lw commented Apr 1, 2025