Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile (libbpf): tool enhacements #5181

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

glima
Copy link

@glima glima commented Jan 2, 2025

This enhances the profile tool with two main tracks:

  • offline symbol resolution capabilities and
  • PSI threshold trigger mode, for always-running profiling capabilities

glima added 5 commits January 2, 2025 09:46
There are systems in which more than one source of DSOs exist, so
overlap can exist. Think of systems with container or
systemd-portabled loads. Those binaries can dynamically link to DSOs
having conflicting paths to ones living in the root namespace, but
the same-path DSOs might be completely different binary wise.

This is a preparation commit for functionality that we want to add
later: dumping DSO list state from profile, for offline symbol
resolution.

When resolving symbols like that, let matches by PID take place too,
so the code does not go for bad addresses, on the wrong binaries.

Signed-off-by: Gustavo Lima Chaves <[email protected]>
One could end up with no DSO state for those and thus missed
flamechart name resolutions.

Signed-off-by: Gustavo Lima Chaves <[email protected]>
…tempted

Instead, under that mode, more information is output: the DSO
list/state is also dumped, besides the stack traces (and counts). The
latter also only gets addresses, never symbol names resolved.

This is intended for systems where having the debug symbols in their
images is prohibitive, but where one still wants to leverage BPF-based
profiling. Offline symbol resolution is still possible with this
textual output, paired with the system's /proc/kallsyms contents and a
serialized version of the known DSO tree. With the new tool mode, all
that info is the output, not just the stack traces and counts.

With a version of the rootfs where profile ran (with the added missing
debug symbols), paired with our added symbol resolution
tool (profile_symbol_resolve.py) and the output in the new mode, one
will be able to produce flamecharts outside of the running context, in
an offline fashion.

For flamechart generation, we start with an opinionated take and
choose https://github.com/jonhoo/inferno as the leveraged tool. One
can easily make that an argument for other choices, after this.

NB: no support for folded stack output for the (offline) symbol
resolution schema, as that mode does not outputs PIDs. It would be
prohibitive to make right DSO matches without that information.

NB: Offline VDSO (Virtual Dynamically Shared Object) symbol resolution
also not possible. They are generally hardly expected to be
bottlenecks anyway, so we don't lose much here.

Signed-off-by: Gustavo Lima Chaves <[email protected]>
This is a mode where it runs forever (until SIGINT), but not profiling
the system right away. It will only kick in if a given PSI threshold
is hit for either CPU/MEM/IO, in the system. That is passed in form of
the size (percentage) of a 1s (rolling) window while some process
might be stalled, system wide.

When that happens, the duration argument is going to be honored (can't
be left blank in this mode) and a profiling burst of that length will
take place.

After that, it will continue to honor new PSI watermark hits, with the
same behavior.

TLDR: this is a mode for continuous system profiling, but only when
the system is under stress.

Signed-off-by: Gustavo Lima Chaves <[email protected]>
…t mode

Specially for the "PSI loop" mode, outputting to stdout would make it
clumsy to organize the info for the different runs again.

This makes the tool capable of writing output to files named in the
pattern passed to that option (if in a loop mode, timestamp suffixes
are added, with local time).

Signed-off-by: Gustavo Lima Chaves <[email protected]>
@@ -106,6 +107,7 @@ const char argp_program_doc[] =
" profile -p 185 # only profile process with PID 185\n"
" profile -L 185 # only profile thread with TID 185\n"
" profile -U # only show user space stacks (no kernel)\n"
" profile -A # only output addresses\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not be achieved using the BPF_F_USER_BUILD_ID flag of the bpf_get_stackid helper function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks @ekyooo, let me take a look at that before responding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, finally time for this. Not sure I follow, though, I'm afraid. bpf_get_stackid will only honor the following flags, it seems:

  • BPF_F_SKIP_FIELD_MASK
  • BPF_F_USER_STACK
  • BPF_F_FAST_STACK_CMP
  • BPF_F_REUSE_STACKID.

What did I miss? Are you talking about employing a different set of bpf helpers at the BPF program level, should that mode be in place, that would get to the same effect? But the parent profile.c wants to walk symbol resolution, through syms__map_addr(), regardless, doesn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's not bpf_get_stackid but bpf_get_stack. Sorry for the confusion. Please refer to:
https://youtu.be/20SO5thkvhI?list=PLbzoR-pLrL6oj1rVTXLnV7cOuetvjKn9q&t=145

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. Yeah, even for my uses, I intend to downstream-fork the symbol resolution phase to adapt to company-only flows of symbol resolution.

BUILD ID could indeed be one of the things serialized there. However, can that be an addition to this? This mode is already useful for some as-is, right? Not everybody will have BUILD-ID annotaded binaries for their systems, to begin with.

I will revisit and think about BUILD-ID addition when I refine it internally (making something more generic to add here). Is that sound? Or do you want me to focus on something I missed?

Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekyooo happy to accomodate, though, if you work with me deeper in your idea :) If my take is beneficial, gentle ping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no BUILD-ID, you can perform offline symbol resolving using the module name and module base offset that are outputted by the -v option.

However, since this information can vary with each build version, I think that using the build-id method could be more practical and has lower maintenance costs.

This is my opinion as a contributor, not a reviewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants