-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix liveness of TMem address smem TV #4142
Conversation
!test |
Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
@@ -239,14 +239,14 @@ TEST_F(TMemTest, InexactParallelType) { | |||
|
|||
auto tv0 = makeContigConcreteTensor({2, 33}); | |||
fusion.addInput(tv0); | |||
auto tv1 = set(tv0); // gmem | |||
auto tv1 = set(tv0); // smem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using smem or gmem makes no difference in testing "InexactParallelType", but using smem could trigger the bug of liveness.
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I guess another way to go would be to create a DeAllocTMem
node which has the TV as an input and translate it in the inline PTX pass. Then that expr would be a TV op instead of the current Asm node that is inserted.
for (Val* input : expr->inputs()) { | ||
TensorView* input_tv = ir_utils::getTv(input); | ||
if (!input_tv) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the only difference here is in properly handling kir::TensorIndex
inputs. Is that right? I thought that should not be needed since this pass runs before indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly.
I thought that should not be needed since this pass runs before indexing.
In general yes, but there are always exceptions, such as if you are using smem for non-tensors like mbarrier or TMem address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, because it's a kir::Asm
, it has TensorIndex
inputs upon creation.
To use TMem, we must have a smem tensor for storing the address of TMem, and uses
smem_tv[0] = tcgen05::alloc()
to store the allocated TMem address. The liveness of this smem tv should be from thetcgen05::alloc
to thetcgen05::dealloc
. However, we are currently unable to get that due totcgen05::dealloc(smem_tv[0])
is not a "TV Op". Due to this bug, the smem address tensor is allocated overlapping with other smem tensors. This PR fixes this bug by handling both TV Op or non-TV op correctly.