Fix liveness of TMem address smem TV #4142

zasdfgbnm · 2025-03-26T06:48:31Z

To use TMem, we must have a smem tensor for storing the address of TMem, and uses smem_tv[0] = tcgen05::alloc() to store the allocated TMem address. The liveness of this smem tv should be from the tcgen05::alloc to the tcgen05::dealloc. However, we are currently unable to get that due to tcgen05::dealloc(smem_tv[0]) is not a "TV Op". Due to this bug, the smem address tensor is allocated overlapping with other smem tensors. This PR fixes this bug by handling both TV Op or non-TV op correctly.

zasdfgbnm · 2025-03-26T06:48:39Z

!test

github-actions · 2025-03-26T06:49:17Z

Description

Handle IfThenElse nodes in BufferReuseDebugPrinter
Improve liveness analysis for IfThenElse nodes
Update memory type for tv1 in TMemTest

Changes walkthrough 📝

Relevant files

Enhancement

alias_memory.cpp `Improve liveness analysis and handle IfThenElse nodes` csrc/device_lower/pass/alias_memory.cpp Implement handling of `IfThenElse` nodes in `BufferReuseDebugPrinter` Add scope management for `IfThenElse` nodes in `AllocationInfoMap` Update liveness analysis to handle non-TensorView inputs and outputs	+18/-10

Bug fix

test_tmem.cpp `Update memory type for tv1 in TMemTest` tests/cpp/test_tmem.cpp Change memory type of `tv1` from Global to Shared in `TMemTest`	+2/-2

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review Possible Issue The `handle` method for `kir::IfThenElse` now throws an exception if the debug printer is not set. This could lead to runtime errors if the debug printer is not initialized. indent(); os_ << "IF " << node->predicate()->toString() << ":\n"; } Code Duplication The loop for handling inputs and outputs in `collectLivenessInfoOfExpr` has duplicate code for checking if a `Val` is a `TensorView`. This could be refactored to avoid duplication. for (Val* input : expr->inputs()) { TensorView* input_tv = ir_utils::getTv(input); if (!input_tv) { continue; } auto alloc_info = getAllocInfoFromTV(input_tv); if (alloc_info) { if (!isSerialBroadcastResolution(input_tv, for_loops_)) { alloc_info->inner_live_interval->markRead(expr_pos); Code Duplication The loop for handling inputs and outputs in `collectLivenessInfoOfExpr` has duplicate code for checking if a `Val` is a `TensorView`. This could be refactored to avoid duplication. // Allocate is inlined in the innermost loop, // so outer live interval is the same as inner. alloc_info->outer_live_interval->markRead(expr_pos); } } } for (Val* output : expr->outputs()) { TensorView* output_tv = ir_utils::getTv(output); if (!output_tv) { continue; }

zasdfgbnm · 2025-03-26T06:54:27Z

tests/cpp/test_tmem.cpp

@@ -239,14 +239,14 @@ TEST_F(TMemTest, InexactParallelType) {

  auto tv0 = makeContigConcreteTensor({2, 33});
  fusion.addInput(tv0);
-  auto tv1 = set(tv0); // gmem
+  auto tv1 = set(tv0); // smem


Using smem or gmem makes no difference in testing "InexactParallelType", but using smem could trigger the bug of liveness.

zasdfgbnm · 2025-03-26T06:55:13Z

!test

jacobhinkle

LGTM. I guess another way to go would be to create a DeAllocTMem node which has the TV as an input and translate it in the inline PTX pass. Then that expr would be a TV op instead of the current Asm node that is inserted.

jacobhinkle · 2025-03-26T16:52:46Z

csrc/device_lower/pass/alias_memory.cpp

+    for (Val* input : expr->inputs()) {
+      TensorView* input_tv = ir_utils::getTv(input);
+      if (!input_tv) {
+        continue;
+      }


It seems like the only difference here is in properly handling kir::TensorIndex inputs. Is that right? I thought that should not be needed since this pass runs before indexing.

Yes exactly.

I thought that should not be needed since this pass runs before indexing.

In general yes, but there are always exceptions, such as if you are using smem for non-tensors like mbarrier or TMem address.

Ah right, because it's a kir::Asm, it has TensorIndex inputs upon creation.

Fix liveness of TMem address smem TV

eba8ac8

zasdfgbnm commented Mar 26, 2025

View reviewed changes

zasdfgbnm marked this pull request as ready for review March 26, 2025 06:55

zasdfgbnm requested review from naoyam and jacobhinkle March 26, 2025 16:43

jacobhinkle approved these changes Mar 26, 2025

View reviewed changes

zasdfgbnm merged commit 826f5a1 into main Mar 26, 2025
53 checks passed

zasdfgbnm deleted the liveness branch March 26, 2025 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix liveness of TMem address smem TV #4142

Fix liveness of TMem address smem TV #4142

zasdfgbnm commented Mar 26, 2025 •

edited

Loading

zasdfgbnm commented Mar 26, 2025

github-actions bot commented Mar 26, 2025

zasdfgbnm Mar 26, 2025

zasdfgbnm commented Mar 26, 2025

jacobhinkle left a comment

jacobhinkle Mar 26, 2025

zasdfgbnm Mar 26, 2025

jacobhinkle Mar 26, 2025

Fix liveness of TMem address smem TV #4142

Fix liveness of TMem address smem TV #4142

Conversation

zasdfgbnm commented Mar 26, 2025 • edited Loading

zasdfgbnm commented Mar 26, 2025

github-actions bot commented Mar 26, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

zasdfgbnm Mar 26, 2025

Choose a reason for hiding this comment

zasdfgbnm commented Mar 26, 2025

jacobhinkle left a comment

Choose a reason for hiding this comment

jacobhinkle Mar 26, 2025

Choose a reason for hiding this comment

zasdfgbnm Mar 26, 2025

Choose a reason for hiding this comment

jacobhinkle Mar 26, 2025

Choose a reason for hiding this comment

zasdfgbnm commented Mar 26, 2025 •

edited

Loading