-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CIR] Lower nested local constant alloca #1261
Open
Lancern
wants to merge
2,199
commits into
llvm:main
Choose a base branch
from
Lancern:nested-local-const-lowering
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add more NFC skeleton while here.
Forgot to git add in cb0cb34
…lvm#971) The llvm's intrinsic `llvm.is.fpclass` is used to support multiple float point builtins: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-isfpclass > The `__builtin_isfpclass()` builtin is a generalization of functions > isnan, isinf, isfinite and some others defined by the C standard. It tests > if the floating-point value, specified by the first argument, falls into > any of data classes, specified by the second argument. I meant to support this by creating IntrinsicCallOp directly. But I can't make it due to llvm#480 since the return type of the intrinsic will mismatch. So I have to create a new Op for it. But I feel it might not be too bad. At least it is more explicit and more expressive.
Re llvm#958 > Consider the following code snippet `tmp.c`: > ``` > typedef struct { > int a, b; > } S; > > void foo(S s) {} > ``` > Running `bin/clang tmp.c -fclangir -Xclang -emit-llvm -Xclang -fclangir-call-conv-lowering -S -o -`, we get: > ``` > loc(fused["tmp.c":5:1, "tmp.c":5:16]): error: 'llvm.bitcast' op result #0 must be LLVM-compatible non-aggregate type, but got '!llvm.struct<"struct.S", (i32, i32)>' > ``` > We can also produce a similar error from this: > ``` > typedef struct { > int a, b; > } S; > > S init() { S s; return s; } > ``` > gives: > ``` > loc(fused["tmp.c":5:17, "tmp.c":5:24]): error: 'llvm.bitcast' op operand #0 must be LLVM-compatible non-aggregate type, but got '!llvm.struct<"struct.S", (i32, i32)>' > ``` > I've traced the errors back to `lib/CIR/Dialect/Transforms/TargetLowering/LowerFunction.cpp` in `LowerFunction::buildAggregateStore`, `castReturnValue`, and `buildAggregateBitcast`. > > `withElementType(SrcTy)` is currently commented out/ignored in `LowerFunction.cpp`, but it is important. > > This PR adds/fixes this and updates one test. I thought [about it](llvm#958 (comment)) and I understand adding `cir.bitcast` to circumvent the CIR checks, but I am not sure how we can ignore/drop the bitcast while lowering. I think we can just make the CIR casts correct. I have added a number of lowering tests to verify that the CIR is lowered properly. cc: @sitio-couto @bcardosolopes.
…s for source code global and local vars (llvm#1001) Now CIR supports annotations for both globals and locals. They all should just use the same set of annotation related globals including file name string, annotation name string, and arguments. This PR makes sure this is the case. FYI: for the test case we have, OG generates [ code ](https://godbolt.org/z/Ev5Ycoqj1), pretty much the same code except annotation variable names. This would fix the crash like > error: redefinition of symbol named '.str.annotation' > fatal error: error in backend: The pass manager failed to lower CIR to LLVMIR dialect!
Previously we didn't generate the index for array correct. The previous test is incorrect already: globals-neg-index-array.c ``` // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-cir %s -o %t.cir // RUN: FileCheck --input-file=%t.cir %s // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t.ll // RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s // RUN: %clang_cc1 -x c++ -triple x86_64-unknown-linux-gnu -emit-cir %s -o %t.cir // RUN: FileCheck --input-file=%t.cir %s // RUN: %clang_cc1 -x c++ -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t.ll // RUN: FileCheck --check-prefix=LLVM --input-file=%t.ll %s struct __attribute__((packed)) PackedStruct { char a1; char a2; char a3; }; struct PackedStruct packed[10]; char *packed_element = &(packed[-2].a3); // CHECK: cir.global external @PACKED = #cir.zero : !cir.array<!ty_PackedStruct x 10> {alignment = 16 : i64} loc(#loc5) // CHECK: cir.global external @packed_element = #cir.global_view<@PACKED, [-2 : i32, 2 : i32]> // LLVM: @PACKED = global [10 x %struct.PackedStruct] zeroinitializer + // LLVM: @packed_element = global ptr getelementptr inbounds ([10 x %struct.PackedStruct], ptr @PACKED, i32 0, i32 -2, i32 2) - // LLVM: @packed_element = global ptr getelementptr inbounds ([10 x %struct.PackedStruct], ptr @PACKED, i32 -2, i32 2) ``` Compile it with `-fclangir -S`, we got: ``` packed: .zero 30 packed_element: .quad packed-54 ``` but the traditional pipeline shows (https://godbolt.org/z/eTj96EP1E): ``` packed: .zero 30 packed_element: .quad packed-4 ``` this may be a simple mismatch.
…lvm#1009) The MLIR docs at https://mlir.llvm.org/docs/DefiningDialects/Operations/#literals specify that "An empty literal `` may be used to remove a space that is inserted implicitly after certain literal elements", so I inserted one before the `right` literal to remove the extra space that was being printed. Oddly, the bug is also fixed by inserting an empty literal _after_ the `left` literal, which leads me to believe that tablegen is inserting an implicit space after the `left` literal.
Close llvm#522 This solves the issue we can't handle `case` in nested scopes and we can't handle if the switch body is not a compound statement. The core idea of the patch is to introduce the `cir.case` operation to the language. Then we can get the cases by traversing the body of the `cir.switch` operation easily instead of counting the regions and the attributes. Every `cir.case` operation has a region and now the `cir.switch` has only one region too. But to make the analysis and optimizations easier, I add a new concept `simple form` here. That a simple `cir.switch` operation is: all the `cir.case` operation owned by the `cir.switch` lives in the top level blocks of the `cir.switch` region and there is no other operations except the ending `cir.yield`. This solves the previous `simplified for common-case` vs `general solution` discussion in llvm#522. After implemented this, I feel the correct answer to it is, we want a general solution for constructing and lowering the operations but we like simple and common case for analysis and optimizations. We just mixed the different phases. For other semantics, see `CIROps.td`. For lowering, we can make it generally by lower the cases one by one and finally lower the switch itself. Although this patch has 1000+ lines of changes, I feel it is relatively neat especially it erases some odd behaviors before. Tested with Spec2017's C benchmarks except 500.perlbench_r.
…pes (llvm#1004) This PR adds a support for return values of a struct type. There are two cases that are not covered by this PR and will be added later.
…1005) This PR adds several copy-pasted lines and a small test and now var args seems to work in the calling convention pass
Reviewers: smeenai Reviewed By: smeenai Pull Request: llvm#1022
We diverge from CodeGen here by delaying the function emission that happens for a global variable. However, due to situations where a global can be emitted while building out a function the old CGF might not be invalid. So we need to store it here just in case. Reviewers: bcardosolopes, smeenai Reviewed By: smeenai Pull Request: llvm#1023
This was declared but never implemented. Upon first usage in a later commit this fails to link. Reviewers: bcardosolopes, smeenai Reviewed By: smeenai Pull Request: llvm#1024
…tion lowering pass (llvm#1003) This PR adds initial function pointers support for the calling convention lowering pass. This is a suggestion, so any other ideas are welcome. Several ideas was described in the llvm#995 and basically what I'm trying to do is to generate a clean CIR code without additional `bitcast` operations for function pointers and without mix of lowered and initial function types. #### Problem Looks like we can not just lower the function type and cast the value since too many operations are involved. For instance, for the next simple code: ``` typedef struct { int a; } S; typedef int (*myfptr)(S); int foo(S s) { return 42 + s.a; } void bar() { myfptr a = foo; } ``` we get the next CIR for the function `bar` , before the calling convention lowering pass: ``` cir.func no_proto @bar() extra(#fn_attr) { %0 = cir.alloca !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>>, ["a", init] %1 = cir.get_global @foo : !cir.ptr<!cir.func<!s32i (!ty_S)>> cir.store %1, %0 : !cir.ptr<!cir.func<!s32i (!ty_S)>>, !cir.ptr<!cir.ptr<!cir.func<!s32i (!ty_S)>>> cir.return } ``` As one can see, first three operations depend on the function type. Once `foo` is lowered, we need to fix `GetGlobalOp`: otherwise the code will fail with the verification error since actual `foo` type (lowered) differs from the one currently expected by the `GetGlobalOp`. First idea would just rewrite only the `GetGlobalOp` and insert a bitcast after, so both `AllocaOp` and `StoreOp` would work witth proper types. Once the code will be more complex, we will need to take care about possible use cases, e.g. if we use arrays, we will need to track array accesses to it as well in order to insert this bitcast every time the array element is needed. One workaround I can think of: we fix the `GetGlobalOp` type and cast from the lowered type to the initial, and cast back before the actual call happens - but it doesn't sound as a good and clean approach (from my point of view, of course). So I suggest to use type converter and rewrite any operation that may deal with function pointers and make sure it has a proper type, and we don't have any unlowered function type in the program after the calling convention lowering pass. #### Implementation I added lowering for `AllocaOp`, `GetGlobalOp`, and split the lowering for `FuncOp` (former `CallConvLoweringPattern`) and lower `CallOp` separately. Frankly speaking, I tried to implement a pattern for each operation, but for some reasons the tests are not passed for windows and macOs in this case - something weird happens inside `applyPatternsAndFold` function. I suspect it's due to two different rewriters used - one in the `LoweringModule` and one in the mentioned function. So I decided to follow the same approach as it's done for the `LoweringPrepare` pass and don't involve this complex rewriting framework. Next I will add a type converter for the struct type, patterns for `ConstantOp` (for const arrays and `GlobalViewAttr`) In the end of the day we'll have (at least I hope so) a clean CIR code without any bitcasts for function pointers. cc @sitio-couto @bcardosolopes
We had some incorrect logic when creating functions and getting their address which resulted in spurious "definition with the same mangled name" errors. Fix that logic to match original CodeGen, which also fixes these errors. It's expected that decls can appear in the deferred decl list multiple times, and CodeGen has to guard against that. In the case that triggered the error, both `CIRGenerator::HandleInlineFunctionDefinition` and CIRGenModule were deferring the declaration. Something else I discovered here is that we emit these functions in the opposite order as regular CodeGen: https://godbolt.org/z/4PrKG7h9b. That might be a meaningful difference worth investigating further. Fixes llvm#991
…intrinsics (llvm#1020) In this PR, also changed `buildNeonShiftVector` to allow it generates negative shift values. When the shift value is negative, the shift amount vector is not used in any ShiftOp of IR (as they don't need sign to know shift direction), instead, it is just input argument to shift intrinsic function call.
…vm#1028) This PR fixes the notorious double whitespaces introduced by visibility attribute, for `cir.func` only. It uses "leading whitespace" for every print. And the printing of visibility attr is properly guarded by a check of `!isDefault()`. Double whitespaces in test files are removed.
Lowering Neon `vsetq_lane_f64` References: [Clang CGBuiltin Implementation](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L12348) [Builtin definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vsetq_lane_f64)
…64` (llvm#1255) Lowering Neon `vaddd_s64` and `vaddd_u64` - [Clang CGBuiltin Implementation](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L12460-L12462) - [vaddd_s64 Builtin definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vaddd_s64) - [vaddd_u64 Builtin definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vaddd_u64)
…64` (llvm#1256) Lowering `neon_vsubd_s64` and `neon_vsubd_u64` - [Clang CGBuiltin Implementation](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L12463-L12465) - [vaddd_s64 Builtin definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vsubd_s64) - [vaddd_u64 Builtin definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vsubd_u64)
…16` and `vdivh_f16` (llvm#1258) Lowering: - `vaddh_f16` - `vsubh_f16` - `vmulh_f16` - `vdivh_f16`
…1259) Lowering `vshld_n_s64` and `vshld_n_u64`
Lowering `vshrd_n_s64`
Lowering Neon `vset_lane_f64` References: [Clang CGBuiltin Implementation](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L12342) [Builtin definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vset_lane_f64)
…llvm#1265) [Neon definiton](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vmaxv_s8) [OG implementation](https://github.com/llvm/clangir/blob/04d7dcfb2582753f3eccbf01ec900d60297cbf4b/clang/lib/CodeGen/CGBuiltin.cpp#L13202) Implementation in this PR is different from OG as 1. avoided code duplication by extracting out the common pattern 2. avoided using i32 as return type of the intrinsic call, so eliminated the need for casting result of the intrinsic call. This way of OG's implementation is quite unnecessary IMHO, this is MAX, not ADD or MUL. After all, using the expected type as return type of intrinsic call produces [the same ASM code](https://godbolt.org/z/3nKG7fxPb).
I continue to use `csmith` and catch run time bags. Now it's time to fix the layout for the const structs. There is a divergence between const structs generated by CIR and the original codegen. And this PR makes one more step to eliminate it. There are cases where the extra padding is required - and here is a fix for some of them. I did not write extra tests, since the fixes in the existing already covers the code I added. The point is that now the layout for all of these structs in the LLVM IR with and without CIR is the same.
Class `CIRGenFunction` contained three identical functions that converted from a Clang AST type (`clang::QualType`) to a ClangIR type (`mlir::Type`): `convertType`, `ConvertType`, and `getCIRType`. This embarrassment of duplication needed to be fixed, along with cleaning up other functions that convert from Clang types to ClangIR types. The three functions `CIRGenFunction::ConvertType`, `CIRGenFunction::convertType`, and `CIRGenFunction::getCIRType` were combined into a single function `CIRGenFunction::convertType`. Other functions were renamed as follows: - `CIRGenTypes::ConvertType` to `CIRGenTypes::convertType` - `CIRGenTypes::ConvertFunctionTypeInternal` to `CIRGenTypes::convertFunctionTypeInternal` - `CIRGenModule::getCIRType` to `CIRGenModule::convertType` - `ConstExprEmitter::ConvertType` to `ConstExprEmitter::convertType` - `ScalarExprEmitter::ConvertType` to `ScalarExprEmitter::convertType` Many cases of `getTypes().convertType(t)` and `getTypes().convertTypeForMem(t)` were changed to just `convertType(t)` and `convertTypeForMem(t)`, respectively, because the forwarding functions in `CIRGenModule` and `CIRGenFunction` make the explicit call to `getTypes()` unnecessary.
Reland previously reverted attempt now that this passes ASANified `ninja check-clang-cir`. Original message: We are missing cleanups all around, more incremental progress towards fixing that. This is supposed to be NFC intended, but we have to start changing some bits in order to properly match cleanup bits in OG. Start tagging places with more MissingFeatures to allow us to incrementally improve the situation.
…#1262) This patch adds support for the following GCC function attributes: - `__attribute__((const))` - `__attribute__((pure))` The side effect information is attached to the call operations during CIRGen. During LLVM lowering, these information is consumed to further emit appropriate LLVM metadata on LLVM call instructions.
This patch fixes the LLVM project tests workflow on Linux. Two changes were needed. Firstly, some commands need to be performed with sudo now that the container executes as a non-root user. Second, we needed to change from `ubuntu-latest` to `ubuntu-22.04` as `ubuntu-latest` not defaults to `ubuntu-24.04` which causes `setup-python` to install a python executable linked against a newer version of glibc that is not found on ubuntu 22.04, which causes failures when CMake cannot execute the python interpreter that it finds. (cherry picked from commit a759176)
…m#1249) C/C++ functions returning void had an explicit !cir.void return type while not having any returned value, which was breaking a lot of MLIR invariants when the CIR dialect is used in a greater context, for example with the inliner. Now, a C/C++ function returning void has no return type and no return values, which does not break the MLIR invariant about the same number of return types and returned values. This change does not keeps the same parsing/pretty-printed syntax as before for compatibility like in llvm#1203 because it requires some new features from the MLIR parser infrastructure itself, which is not great. This uses an optional type for function return type. The default MLIR parser for optional parameters requires an optional anchor we do not have in the syntax, so use a custom FuncType parser to handle the optional return type.
Some passes not declared with TableGen did not have descriptions.
This PR adds a support for for default arguments
…#1283) Corresponding [OG code](https://github.com/llvm/clangir/blob/ef20d053b3d78c9d4c135e2811b303b7e5016d30/clang/lib/CodeGen/CGExprConstant.cpp#L846). [OG generated code here](https://godbolt.org/z/x6q333dMn), one notable diff is we're missing `inrange` which is reported in [issue 886 ](llvm#886). For now, I'm still using GlobalViewAttr to implement it so we can move things fast. But it might be worth considering approach [Comments in issue 258](llvm#258), especially we could incoporate [inrange info](llvm#886) to the attribute suggested there.
This PR adds CIRGen and LLVM lowering support for the following language features related to pointers to data members: - Comparisons between pointers to data members. - Casting from pointers to data members to boolean. - Reinterpret casts between pointers to data members.
This PR updates the `#cir.global_view` attribute and make it accept integer types as its result type.
…vm#1280) Resolves llvm#1266 After change: ```llvm %1 = alloca ptr, i64 1, align 8 store i32 1, ptr @g_arr, align 4 store i32 2, ptr getelementptr (i32, ptr @g_arr, i64 1), align 4 store i32 3, ptr getelementptr (i32, ptr @g_arr, i64 2), align 4 %2 = load i32, ptr @g, align 4 store i32 %2, ptr getelementptr (i32, ptr @g_arr, i64 3), align 4 store ptr getelementptr (i32, ptr getelementptr (i32, ptr @g_arr, i64 3), i64 1), ptr %1, align 8 ```
This does not change anything in practice, work in that direction should come next. We also want this to not affect existing tests to isolate upcoming changes.
This patch adds support for lowering local constants in nested scopes, including those in nested loops. For those constant allocas in non-loop inner scopes, this patch keeps their constant flags during alloca hoisting. LLVM lowering would correctly emit necessary invariant metadata for those allocas. For those constant allocas in a loop, this patch introduces a new operation `cir.invariant_group` that marks the beginning of the lifetime of the constant objects. This operation is put at the location of the alloca operation before hoisting them. This patch updates LLVM lowering to emit the necessary invariant metadata when loading and storing through such pointers. This patch takes care of the special case where the constant alloca represents a variable declared in the condition part of a while loop. In such a case, this patch removes the constant flag on the alloca operation when hositing them.
Lancern
force-pushed
the
nested-local-const-lowering
branch
from
January 24, 2025 08:02
0a13412
to
7a7d9cf
Compare
Rebased onto the latest |
Sorry I missed this, taking a closer look next week! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for lowering local constants in nested scopes, including those in nested loops.
For those constant allocas in non-loop inner scopes, this PR keeps their constant flags during alloca hoisting. LLVM lowering would correctly emit necessary invariant metadata for those allocas.
For those constant allocas in a loop, this PR introduces a new operation
cir.invariant_group
that marks the beginning of the lifetime of the constant objects. This operation is put at the location of the alloca operation before hoisting them. This PR updates LLVM lowering to emit the necessary invariant metadata when loading and storing through such pointers.This PR takes care of the special case where the constant alloca represents a variable declared in the condition part of a while loop. In such a case, this PR removes the constant flag on the alloca operation when hositing them.