Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-48716: Investigate memory leak in the Watcher. #95

Merged
merged 8 commits into from
Feb 18, 2025

Conversation

tribeiro
Copy link
Member

In addition to fixing the memory leak, improve unit test reliability and remove references to idl.

…that all severities are NONE, after it has started to produce heartbeats .

There is a race condition here that, when we enable the rule, before we are publishing heartbeat it will trigger the severity. So the most realiable order of operations is, start publishing heartbeats and then enable the alarm.
…ation is done.

There is a race condition between a background task is cancelled and the actual cancelation is done. The test now will check it the escalation_timer_task is done, if it is it will assert that it was cancelled, if not it will await for the task and assert that it raises a CancelledError.
…lish an initial summaryState event for the ATDome CSC, used during the test.

In DDS every unit test was fully isolated but this is no longer the case with Kafka. So, if we don't publish an initial summaryState the Watcher CSC ends up getting whatever last state was published for this component, which interferes with the unit test. To ensure this does not happen the best thing to do is publish a value of the summary state the CSC would be in.
Copy link
Contributor

@teweitsai teweitsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@@ -28,7 +28,7 @@
STD_TIMEOUT = 5 # Max time to send/receive a topic (seconds)
LONG_TIMEOUT = 60 # Max Remote startup time (seconds)

index_gen = utils.index_generator()
index_gen = utils.index_generator(imin=1234)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help to comment the fix here. I do not understand why the imin=1234 fixes the issue. Thanks!

@teweitsai
Copy link
Contributor

teweitsai commented Feb 18, 2025

For the remove of idl, I think you might want to modify the conda/meta.yaml, Jenkinsfile, and doc/conf.py as well.

…tion at a higher range such that it does not overlap with other unit tests that ran previously.
…leak.

Basically the original code passes a dictionary to the method and edits the dictionary in place. One have to be really carefull when doing this, especially as one would usually not expect that a passed argument be directly edited by a method. Furthermore, when an assignement is made, a local variable is created instead, which was the source of the memory leak.
@tribeiro tribeiro merged commit 50dc4e8 into develop Feb 18, 2025
4 checks passed
@tribeiro tribeiro deleted the tickets/DM-48716 branch February 18, 2025 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants