Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False negatives clang-analyzer-core.StackAddressEscape when storing pointers/references in container #123459

Closed
chrchr-github opened this issue Jan 18, 2025 · 14 comments · Fixed by #125638 or #126986
Labels
clang:static analyzer false-negative good first issue https://github.com/llvm/llvm-project/contribute

Comments

@chrchr-github
Copy link

#include <string>
#include <vector>

struct S { std::string* s; };
std::vector<S> f() {
    std::vector<S> v;
    {
        std::string a{ "abc" };
        v.push_back({ &a });
    }
    return v;
}

struct T { std::string& s; };
std::vector<T> g() {
    std::vector<T> v;
    {
        std::string b{ "def" };
        v.push_back({ b });
    }
    return v;
}

int main() {
    return f()[0].s->size() + g()[0].s.size();
}

https://godbolt.org/z/sMb8Ebv9j

@abhishek-kaushik22
Copy link
Contributor

Sorry, what's the issue here? You are taking a reference/pointer to a local variable and the AddressStanitizer correctly points it out.

ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 1
=================================================================
==1==ERROR: AddressSanitizer: stack-use-after-return on address 0x72ca03809128 at pc 0x573ea7891af2 bp 0x7ffed6c87ab0 sp 0x7ffed6c87aa8
READ of size 8 at 0x72ca03809128 thread T0
    #0 0x573ea7891af1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::size() const /opt/compiler-explorer/gcc-snapshot/lib/gcc/x86_64-linux-gnu/15.0.1/../../../../include/c++/15.0.1/bits/basic_string.h:1086:19
    #1 0x573ea7891557 in main /app/example.cpp:21:22
    #2 0x76ca05829d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #3 0x76ca05829e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #4 0x573ea77a93c4 in _start (/app/output.s+0x2e3c4)

Address 0x72ca03809128 is located in stack of thread T0 at offset 40 in frame
    #0 0x573ea7890d7f in f() /app/example.cpp:5

  This frame has 3 object(s):
    [32, 64) 'a' (line 7) <== Memory access at offset 40 is inside this variable
    [96, 97) 'ref.tmp' (line 7)
    [112, 120) 'ref.tmp1' (line 8)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-return /app/example.cpp:21:22 in main
Shadow bytes around the buggy address:
  0x72ca03808e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x72ca03808f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x72ca03808f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x72ca03809000: f1 f1 f1 f1 00 00 00 f2 f2 f2 f2 f2 f8 f8 f8 f3
  0x72ca03809080: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
=>0x72ca03809100: f5 f5 f5 f5 f5[f5]f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x72ca03809180: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x72ca03809200: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x72ca03809280: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x72ca03809300: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x72ca03809380: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1==ABORTING

@chrchr-github
Copy link
Author

Yes, but it should be possible to find the issues in static analysis,

@llvmbot
Copy link
Member

llvmbot commented Jan 18, 2025

@llvm/issue-subscribers-clang-static-analyzer

Author: None (chrchr-github)

~~~c++ #include <string> #include <vector>

struct S { std::string* s; };
std::vector<S> f() {
std::vector<S> v;
{
std::string a{ "abc" };
v.push_back({ &a });
}
return v;
}

struct T { std::string& s; };
std::vector<T> g() {
std::vector<T> v;
{
std::string b{ "def" };
v.push_back({ b });
}
return v;
}

int main() {
return f()[0].s->size() + g()[0].s.size();
}

https://godbolt.org/z/sMb8Ebv9j
</details>

@steakhal
Copy link
Contributor

Does it reproduce the FN if you avoid using the { and } initializers?
I know we have inaccurate modeling of those.
I wish someone could reduce the example futher by mocking out the std containers with homegrown stubs.
That would make it much easier to see where it goes wrong.

Thank you.

@steakhal steakhal added the needs-reduction Large reproducer that should be reduced into a simpler form label Jan 20, 2025
@chrchr-github
Copy link
Author

chrchr-github commented Jan 20, 2025

Here's a reduction of the pointer case:

struct S { int* p; };
S f() {
    S s;
    {
        int a = 1;
        s = S(&a);
    }
    return s;
}

int main() {
    return *f().p;
}

https://godbolt.org/z/MGzohEheE

This is caught: s.p = &a; or s = S{ &a };

@chrchr-github
Copy link
Author

chrchr-github commented Jan 20, 2025

Oddly, this is caught as C++ but not as C code:

struct S { int* p; };
struct S f() {
    struct S s;
    {
        int a = 1;
        s = (struct S){ &a };
    }
    return s;
}

int main() {
    return *f().p;
}

https://godbolt.org/z/Edjbsc8n9
Also not caught as C: s.p = &a;

@chrchr-github
Copy link
Author

More container-like:

struct S { int* p; };

template <typename V>
struct T {
    V* q{};
    T() = default;
    T(T&& rhs) { q = rhs.q; rhs.q = nullptr;}
    T& operator=(T&& rhs) { q = rhs.q; rhs.q = nullptr;}
    void push_back(const V& v) { if (q == nullptr) q = new V(v); }
    ~T() { delete q; }
};

T<S> f() {
    T<S> t;
    {
        int a = 1;
        t.push_back({ &a });
    }
    return t;
}

int main() {
    return *f().q->p;
}

https://godbolt.org/z/Ps5c9b4Tn

@steakhal steakhal removed the needs-reduction Large reproducer that should be reduced into a simpler form label Jan 21, 2025
@steakhal
Copy link
Contributor

Thank you for the several proposed reproducers. They helped a lot really.
I settled with the following:

struct S { int* p; };
S top() {
    int a;
    S t(&a);
    return t;
}

This should work AFAIK the checker. So when leaving the top-level function top the checkEndFunction callback should fire, that should trigger the necessary checks in the checker. Of course, because we don't have the diagnostic, something goes sideways in the mean time.
At first glance, it shouldn't be too hard to debug. One should follow step by step from the checkEndFunction, end eventually, we should reach the iterBindings where we should see that we have a binding referring to a stack variable of the stack we are about to pop.
https://clang.llvm.org/doxygen//StackAddrEscapeChecker_8cpp_source.html

I think this is a good first issue.

Remember to use the State->dump() to get an overview of how the state looks like (including the Store, that is iterated in the iterBindings).

@steakhal steakhal added the good first issue https://github.com/llvm/llvm-project/contribute label Jan 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 22, 2025

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

  1. Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
  2. In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
  3. Fix the issue locally.
  4. Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
  5. Create a Git commit.
  6. Run git clang-format HEAD~1 to format your changes.
  7. Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

@llvmbot
Copy link
Member

llvmbot commented Jan 22, 2025

@llvm/issue-subscribers-good-first-issue

Author: None (chrchr-github)

~~~c++ #include <string> #include <vector>

struct S { std::string* s; };
std::vector<S> f() {
std::vector<S> v;
{
std::string a{ "abc" };
v.push_back({ &a });
}
return v;
}

struct T { std::string& s; };
std::vector<T> g() {
std::vector<T> v;
{
std::string b{ "def" };
v.push_back({ b });
}
return v;
}

int main() {
return f()[0].s->size() + g()[0].s.size();
}

https://godbolt.org/z/sMb8Ebv9j
</details>

@Flandini
Copy link
Contributor

Flandini commented Jan 24, 2025

Looks like an issue with the callback in checkEndFunction not checking deep enough into the SVal of the binding.

For this:

struct S { int* p; };
S top() {
    int a;
    S t(&a);
    return t;
}

I get only one binding checked by checkEndFunction:

Referrer: SymRegion{reg_$0<struct S * this>}
Referred: SymRegion{reg_$4<int * Element{SymRegion{reg_$3<S && NonParamVarRegion{D930}>},0 S64b,struct S}.p>}

(edit:) and both of these have UnknownSpaceRegion as their memory space, so these are not warned on as a leak with the current bindings callback checks.

Similar, there are false negatives with other SVals where the stack capture is deeper in the structure, so these examples are also false negatives:

struct S { int* x; };
S test()
{
  int x = 14;
  return (S) { &x };
}
#include <functional>
std::function<int*()> test()
{
  int x = 14;
  return [&]() { return &x; };
}

So the fix will pretty much be inspecting the SVal of the binding more deeply for escaping stack regions. I think this is a pretty easy win to cover some more false negatives.

Working on a fix while the other benchmarks run

@steakhal
Copy link
Contributor

Looks like an issue with the callback in checkEndFunction not checking deep enough into the SVal of the binding.

Sounds good on the high level. I hope you know about SValVisitor, iterBindings and ScanReachableSymbols.
These are the tools that I can think of traversing different things.

github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 10, 2025
…hecker (#125638)

Fixes llvm/llvm-project#123459.

Previously, when the StackAddrEscapeChecker checked return values, it
did not scan into the structure of the return SVal. Now it does, and we
can catch some more false negatives that were already mocked out in the
tests in addition to those mentioned in
llvm/llvm-project#123459.

The warning message at the moment for these newly caught leaks is not
great. I think they would be better if they had a better trace of why
and how the region leaks. If y'all are happy with these changes, I would
try to improve these warnings and work on normalizing this SVal checking
on the `checkEndFunction` side of the checker also.

Two of the stack address leak test cases now have two warnings, one
warning from return expression checking and another from`
checkEndFunction` `iterBindings` checking. For these two cases, I prefer
the warnings from the return expression checking, but I couldn't figure
out a way to drop the `checkEndFunction` without breaking other
`checkEndFunction` warnings that we do want. Thoughts here?
Icohedron pushed a commit to Icohedron/llvm-project that referenced this issue Feb 11, 2025
…m#125638)

Fixes llvm#123459.

Previously, when the StackAddrEscapeChecker checked return values, it
did not scan into the structure of the return SVal. Now it does, and we
can catch some more false negatives that were already mocked out in the
tests in addition to those mentioned in
llvm#123459.

The warning message at the moment for these newly caught leaks is not
great. I think they would be better if they had a better trace of why
and how the region leaks. If y'all are happy with these changes, I would
try to improve these warnings and work on normalizing this SVal checking
on the `checkEndFunction` side of the checker also.

Two of the stack address leak test cases now have two warnings, one
warning from return expression checking and another from`
checkEndFunction` `iterBindings` checking. For these two cases, I prefer
the warnings from the return expression checking, but I couldn't figure
out a way to drop the `checkEndFunction` without breaking other
`checkEndFunction` warnings that we do want. Thoughts here?
@chrchr-github
Copy link
Author

The original example #123459 (comment) as well as
#123459 (comment) are still not caught. Should they be transferred to a new issue?

@Flandini
Copy link
Contributor

@chrchr-github, you are right. I only picked the latest reduced case here and didn't try out the previous ones which differ. Could we reopen this issue @steakhal @Xazax-hun? I see what the issue is for the others and will take a stab at it later today.

@steakhal steakhal reopened this Feb 12, 2025
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this issue Feb 14, 2025
…m#125638)

Fixes llvm#123459.

Previously, when the StackAddrEscapeChecker checked return values, it
did not scan into the structure of the return SVal. Now it does, and we
can catch some more false negatives that were already mocked out in the
tests in addition to those mentioned in
llvm#123459.

The warning message at the moment for these newly caught leaks is not
great. I think they would be better if they had a better trace of why
and how the region leaks. If y'all are happy with these changes, I would
try to improve these warnings and work on normalizing this SVal checking
on the `checkEndFunction` side of the checker also.

Two of the stack address leak test cases now have two warnings, one
warning from return expression checking and another from`
checkEndFunction` `iterBindings` checking. For these two cases, I prefer
the warnings from the return expression checking, but I couldn't figure
out a way to drop the `checkEndFunction` without breaking other
`checkEndFunction` warnings that we do want. Thoughts here?
steakhal pushed a commit that referenced this issue Feb 16, 2025
…frames (#126986)

Fixes #123459.

This changes checking of the returned expr to also look for memory
regions whose stack frame context was a child of the current stack frame
context, e.g., for cases like this given in #123459:

```
struct S { int *p; };
S f() {
  S s;
  {
    int a = 1;
    s.p = &a;
  }
  return s;
}
```
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this issue Feb 24, 2025
…m#125638)

Fixes llvm#123459.

Previously, when the StackAddrEscapeChecker checked return values, it
did not scan into the structure of the return SVal. Now it does, and we
can catch some more false negatives that were already mocked out in the
tests in addition to those mentioned in
llvm#123459.

The warning message at the moment for these newly caught leaks is not
great. I think they would be better if they had a better trace of why
and how the region leaks. If y'all are happy with these changes, I would
try to improve these warnings and work on normalizing this SVal checking
on the `checkEndFunction` side of the checker also.

Two of the stack address leak test cases now have two warnings, one
warning from return expression checking and another from`
checkEndFunction` `iterBindings` checking. For these two cases, I prefer
the warnings from the return expression checking, but I couldn't figure
out a way to drop the `checkEndFunction` without breaking other
`checkEndFunction` warnings that we do want. Thoughts here?
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this issue Feb 24, 2025
…frames (llvm#126986)

Fixes llvm#123459.

This changes checking of the returned expr to also look for memory
regions whose stack frame context was a child of the current stack frame
context, e.g., for cases like this given in llvm#123459:

```
struct S { int *p; };
S f() {
  S s;
  {
    int a = 1;
    s.p = &a;
  }
  return s;
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer false-negative good first issue https://github.com/llvm/llvm-project/contribute
Projects
None yet
6 participants