-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
profile (libbpf): tool enhacements #5181
base: master
Are you sure you want to change the base?
Conversation
There are systems in which more than one source of DSOs exist, so overlap can exist. Think of systems with container or systemd-portabled loads. Those binaries can dynamically link to DSOs having conflicting paths to ones living in the root namespace, but the same-path DSOs might be completely different binary wise. This is a preparation commit for functionality that we want to add later: dumping DSO list state from profile, for offline symbol resolution. When resolving symbols like that, let matches by PID take place too, so the code does not go for bad addresses, on the wrong binaries. Signed-off-by: Gustavo Lima Chaves <[email protected]>
One could end up with no DSO state for those and thus missed flamechart name resolutions. Signed-off-by: Gustavo Lima Chaves <[email protected]>
…tempted Instead, under that mode, more information is output: the DSO list/state is also dumped, besides the stack traces (and counts). The latter also only gets addresses, never symbol names resolved. This is intended for systems where having the debug symbols in their images is prohibitive, but where one still wants to leverage BPF-based profiling. Offline symbol resolution is still possible with this textual output, paired with the system's /proc/kallsyms contents and a serialized version of the known DSO tree. With the new tool mode, all that info is the output, not just the stack traces and counts. With a version of the rootfs where profile ran (with the added missing debug symbols), paired with our added symbol resolution tool (profile_symbol_resolve.py) and the output in the new mode, one will be able to produce flamecharts outside of the running context, in an offline fashion. For flamechart generation, we start with an opinionated take and choose https://github.com/jonhoo/inferno as the leveraged tool. One can easily make that an argument for other choices, after this. NB: no support for folded stack output for the (offline) symbol resolution schema, as that mode does not outputs PIDs. It would be prohibitive to make right DSO matches without that information. NB: Offline VDSO (Virtual Dynamically Shared Object) symbol resolution also not possible. They are generally hardly expected to be bottlenecks anyway, so we don't lose much here. Signed-off-by: Gustavo Lima Chaves <[email protected]>
This is a mode where it runs forever (until SIGINT), but not profiling the system right away. It will only kick in if a given PSI threshold is hit for either CPU/MEM/IO, in the system. That is passed in form of the size (percentage) of a 1s (rolling) window while some process might be stalled, system wide. When that happens, the duration argument is going to be honored (can't be left blank in this mode) and a profiling burst of that length will take place. After that, it will continue to honor new PSI watermark hits, with the same behavior. TLDR: this is a mode for continuous system profiling, but only when the system is under stress. Signed-off-by: Gustavo Lima Chaves <[email protected]>
…t mode Specially for the "PSI loop" mode, outputting to stdout would make it clumsy to organize the info for the different runs again. This makes the tool capable of writing output to files named in the pattern passed to that option (if in a loop mode, timestamp suffixes are added, with local time). Signed-off-by: Gustavo Lima Chaves <[email protected]>
@@ -106,6 +107,7 @@ const char argp_program_doc[] = | |||
" profile -p 185 # only profile process with PID 185\n" | |||
" profile -L 185 # only profile thread with TID 185\n" | |||
" profile -U # only show user space stacks (no kernel)\n" | |||
" profile -A # only output addresses\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not be achieved using the BPF_F_USER_BUILD_ID flag of the bpf_get_stackid helper function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks @ekyooo, let me take a look at that before responding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, finally time for this. Not sure I follow, though, I'm afraid. bpf_get_stackid will only honor the following flags, it seems:
- BPF_F_SKIP_FIELD_MASK
- BPF_F_USER_STACK
- BPF_F_FAST_STACK_CMP
- BPF_F_REUSE_STACKID.
What did I miss? Are you talking about employing a different set of bpf helpers at the BPF program level, should that mode be in place, that would get to the same effect? But the parent profile.c wants to walk symbol resolution, through syms__map_addr(), regardless, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's not bpf_get_stackid but bpf_get_stack. Sorry for the confusion. Please refer to:
https://youtu.be/20SO5thkvhI?list=PLbzoR-pLrL6oj1rVTXLnV7cOuetvjKn9q&t=145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Yeah, even for my uses, I intend to downstream-fork the symbol resolution phase to adapt to company-only flows of symbol resolution.
BUILD ID could indeed be one of the things serialized there. However, can that be an addition to this? This mode is already useful for some as-is, right? Not everybody will have BUILD-ID annotaded binaries for their systems, to begin with.
I will revisit and think about BUILD-ID addition when I refine it internally (making something more generic to add here). Is that sound? Or do you want me to focus on something I missed?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ekyooo happy to accomodate, though, if you work with me deeper in your idea :) If my take is beneficial, gentle ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no BUILD-ID, you can perform offline symbol resolving using the module name and module base offset that are outputted by the -v option.
However, since this information can vary with each build version, I think that using the build-id method could be more practical and has lower maintenance costs.
This is my opinion as a contributor, not a reviewer.
This enhances the profile tool with two main tracks: