Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix liveness of TMem address smem TV #4142

Merged
merged 1 commit into from
Mar 26, 2025
Merged

Fix liveness of TMem address smem TV #4142

merged 1 commit into from
Mar 26, 2025

Conversation

zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Mar 26, 2025

To use TMem, we must have a smem tensor for storing the address of TMem, and uses smem_tv[0] = tcgen05::alloc() to store the allocated TMem address. The liveness of this smem tv should be from the tcgen05::alloc to the tcgen05::dealloc. However, we are currently unable to get that due to tcgen05::dealloc(smem_tv[0]) is not a "TV Op". Due to this bug, the smem address tensor is allocated overlapping with other smem tensors. This PR fixes this bug by handling both TV Op or non-TV op correctly.

@zasdfgbnm
Copy link
Collaborator Author

!test

Copy link

Description

  • Handle IfThenElse nodes in BufferReuseDebugPrinter

  • Improve liveness analysis for IfThenElse nodes

  • Update memory type for tv1 in TMemTest


Changes walkthrough 📝

Relevant files
Enhancement
alias_memory.cpp
Improve liveness analysis and handle IfThenElse nodes       

csrc/device_lower/pass/alias_memory.cpp

  • Implement handling of IfThenElse nodes in BufferReuseDebugPrinter
  • Add scope management for IfThenElse nodes in AllocationInfoMap
  • Update liveness analysis to handle non-TensorView inputs and outputs
  • +18/-10 
    Bug fix
    test_tmem.cpp
    Update memory type for tv1 in TMemTest                                     

    tests/cpp/test_tmem.cpp

    • Change memory type of tv1 from Global to Shared in TMemTest
    +2/-2     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The handle method for kir::IfThenElse now throws an exception if the debug printer is not set. This could lead to runtime errors if the debug printer is not initialized.

      indent();
      os_ << "IF " << node->predicate()->toString() << ":\n";
    }
    Code Duplication

    The loop for handling inputs and outputs in collectLivenessInfoOfExpr has duplicate code for checking if a Val is a TensorView. This could be refactored to avoid duplication.

    for (Val* input : expr->inputs()) {
      TensorView* input_tv = ir_utils::getTv(input);
      if (!input_tv) {
        continue;
      }
      auto alloc_info = getAllocInfoFromTV(input_tv);
      if (alloc_info) {
        if (!isSerialBroadcastResolution(input_tv, for_loops_)) {
          alloc_info->inner_live_interval->markRead(expr_pos);
    Code Duplication

    The loop for handling inputs and outputs in collectLivenessInfoOfExpr has duplicate code for checking if a Val is a TensorView. This could be refactored to avoid duplication.

          // Allocate is inlined in the innermost loop,
          //  so outer live interval is the same as inner.
          alloc_info->outer_live_interval->markRead(expr_pos);
        }
      }
    }
    for (Val* output : expr->outputs()) {
      TensorView* output_tv = ir_utils::getTv(output);
      if (!output_tv) {
        continue;
      }

    @@ -239,14 +239,14 @@ TEST_F(TMemTest, InexactParallelType) {

    auto tv0 = makeContigConcreteTensor({2, 33});
    fusion.addInput(tv0);
    auto tv1 = set(tv0); // gmem
    auto tv1 = set(tv0); // smem
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Using smem or gmem makes no difference in testing "InexactParallelType", but using smem could trigger the bug of liveness.

    @zasdfgbnm zasdfgbnm marked this pull request as ready for review March 26, 2025 06:55
    @zasdfgbnm
    Copy link
    Collaborator Author

    !test

    @zasdfgbnm zasdfgbnm requested review from naoyam and jacobhinkle March 26, 2025 16:43
    Copy link
    Collaborator

    @jacobhinkle jacobhinkle left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM. I guess another way to go would be to create a DeAllocTMem node which has the TV as an input and translate it in the inline PTX pass. Then that expr would be a TV op instead of the current Asm node that is inserted.

    Comment on lines +947 to +951
    for (Val* input : expr->inputs()) {
    TensorView* input_tv = ir_utils::getTv(input);
    if (!input_tv) {
    continue;
    }
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    It seems like the only difference here is in properly handling kir::TensorIndex inputs. Is that right? I thought that should not be needed since this pass runs before indexing.

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Yes exactly.

    I thought that should not be needed since this pass runs before indexing.

    In general yes, but there are always exceptions, such as if you are using smem for non-tensors like mbarrier or TMem address.

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Ah right, because it's a kir::Asm, it has TensorIndex inputs upon creation.

    @zasdfgbnm zasdfgbnm merged commit 826f5a1 into main Mar 26, 2025
    53 checks passed
    @zasdfgbnm zasdfgbnm deleted the liveness branch March 26, 2025 17:35
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants