Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rocestat PMDA for collecting and analyzing RoCE device metrics #2132

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mohith-kumar-thummaluru
Copy link
Contributor

This commit introduces the Rocestat PMDA, which enables the collection of hardware and software statistics for RDMA over Converged Ethernet (RoCE) devices. The PMDA provides essential metrics such as packet reception statistics, transport-level performance data, and hardware counters to facilitate system monitoring and analysis.

This integration enhances diagnostic capabilities, particularly for analyzing network performance in RoCE-enabled environments.

@mohith-kumar-thummaluru mohith-kumar-thummaluru marked this pull request as draft February 11, 2025 06:07
This commit introduces the Rocestat PMDA, which enables the collection of
hardware and software statistics for RDMA over Converged Ethernet (RoCE)
devices. The PMDA provides essential metrics such as packet reception
statistics, transport-level performance data, and hardware counters to
facilitate system monitoring and analysis.

This integration enhances diagnostic capabilities, particularly for
analyzing network performance in RoCE-enabled environments.

Signed-off-by: Mohith Kumar Thummaluru <[email protected]>
@mohith-kumar-thummaluru mohith-kumar-thummaluru marked this pull request as ready for review February 11, 2025 08:34
Copy link
Member

@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohith-kumar-thummaluru hi, nice work - there's a few questions and observations inline. Also, we'll need to do some automated regression tests for this new agent - you might find the "lio" PMDA as a good reference python case there (see qa/lio/ and qa/1061)

@mohith-kumar-thummaluru
Copy link
Contributor Author

Hi Nathan,

Thanks for reviewing the code and providing your comments. I'll make the necessary updates and send a v2.

@mohith-kumar-thummaluru
Copy link
Contributor Author

@natoscott Need a suggestion.
I have developed PMCD client for this agent. Would you prefer submitting them as a separate PR, or should I include them in the same PR?


def get_up_interfaces(self):
"""Get the names of all interfaces that are in the Up state."""
result = subprocess.Popen(['/usr/bin/ibdev2netdev'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ... there is no /usr/bin/ibdev2netdev installed by default on my Ubuntu system, so I think we need some configure magic and/or packaging magic to make this an optionally built and packaged PMDA. Think outside the RH/Oracle box ... SuSE? all the Debian derivatives? *BSD? Windows? MacOS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a conditional check to verify whether the IB devices are configured. Based on this, the PMDA will be installed. I hope this logic works as expected. Let me know your thoughts on this.

@natoscott
Copy link
Member

@natoscott Need a suggestion. I have developed PMCD client for this agent. Would you prefer submitting them as a separate PR, or should I include them in the same PR?

Feel free to include it here. It will also need tests, man pages, etc. Also, consider using a pmrep(1) configuration file as the way to report these metrics, perhaps no new code/tests/docs would be needed that way?

- Removed binary notready protocol and unnecessary sleep.
- Eliminated redundant debugger logging wrappers.
- Cleaned up unused variables and imports.
- Fixed PM units for metrics.
- Moved delta/rate conversion calculation logic from the PMDA.
- Added conditional check in installation script to run only if IB devices are configured; exit otherwise.
- Integrated pylint static code checking in the Makefile.

Signed-off-by: Mohith Kumar Thummaluru <[email protected]>
Copy link
Member

@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohith-kumar-thummaluru thanks for the updates - if further advice is needed (esp. with QA and man pages) please let us know. Also, the pylint makefile change has shown there's quite a few warnings to resolve - check the fedora CI results for that (that's where we enable pylint by default).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants