Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WorkloadTools.ValidateSiteLisits does not set empty lists by default. #12260

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Feb 11, 2025

Fixes #12250

Status

ready

Description

During the validation of the user provided request_args we have a step to mandatory check against the spec arguments from here:

validateArgumentsUpdate(schema, argumentDefinition, optionKey=None)

and subsequently the siteLists from here:

validateSiteLists(arguments)

Unfortunately the so defined function at WMWorloadTools.validateSiteLists sets the sitelists to empty lists by default (even in the case they are completely missing from the so provided by the user request_args):

def validateSiteLists(arguments):
whiteList = arguments.get("SiteWhitelist", [])
blackList = arguments.get("SiteBlacklist", [])
whiteList = makeList(whiteList)
blackList = makeList(blackList)
res = (set(whiteList) & set(blackList))
if len(res):
msg = "Validation failed: The same site cannot be white and blacklisted: %s" % list(res)
raise WMSpecFactoryException(msg)
# store the properly formatted values (list instead of string)
arguments["SiteWhitelist"] = whiteList
arguments["SiteBlacklist"] = blackList
return

This causes the following behavior:

During the call to validate a dictionary of arguments provided by the user which is missing the siteWhitelist or SiteBlacklist at ReqMgr2 - here:

workload.validateArgumentsPartialUpdate(request_args)

The siteLists are set to empty lists before we enter the sequence of steps to update both the request at ReqMgr and all the WorkQueuElements.

Here is a printout of this effect produced by an interactive debugging session again together with @mapellidario:

[11/Feb/2025:15:34:51]  Updating request "tivanov_TaskChain_LumiMask_multiRun_SiteListsTest_v7_241210_093449_7256" with these user-provided args: {'RequestPriority': 700}
Before Validation:{'RequestPriority': 700}
After  Validation:{'RequestPriority': 700, 'SiteWhitelist': [], 'SiteBlacklist': []}
Calling _handleNoStatusUpdate
[11/Feb/2025:15:34:51]  Handling request: tivanov_TaskChain_LumiMask_multiRun_SiteListsTest_v7_241210_093449_7256 with CurrentRequest status: running-open
[11/Feb/2025:15:34:51]  reqArgs: {'RequestPriority': 700, 'SiteWhitelist': []}

This way the so produced empty lists enter as argument difference in reqArgsDiff and are considered as a change to empty SiteWhilist and SiteBlacklist and upon that updated in the workflow spec. Causing the so reported bug.

Here is the result after applying the current patch:

[11/Feb/2025:15:52:11]  Updating request "tivanov_TaskChain_LumiMask_multiRun_SiteListsTest_v7_241210_093449_7256" with these user-provided args: {'RequestPriority': 900}
Before Validation:{'RequestPriority': 900}
After  Validation:{'RequestPriority': 900}
Calling _handleNoStatusUpdate
[11/Feb/2025:15:52:11]  Handling request: tivanov_TaskChain_LumiMask_multiRun_SiteListsTest_v7_241210_093449_7256 with CurrentRequest status: running-open
[11/Feb/2025:15:52:11]  reqArgs: {'RequestPriority': 900}

It is obvious now that the Request._handleNoStatusUpdate method is called without the empty siteLists

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

The PR introducing the bug was: #12099

External dependencies / deployment changes

N/A

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 68 comments to review
  • Pycodestyle check: succeeded
    • 18 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/351/artifact/artifacts/PullRequestReport.html

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 1 warnings
    • 68 comments to review
  • Pycodestyle check: succeeded
    • 18 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/352/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for providing this fix. In addition to the comment along the code, I also have a comment on the WMWorkloadTools, but I'd rather have it here than along the code. So here it goes.

Even though that looks the way to go, I am not sure whether that is the correct system-wide fix.

When a workflow is created or assigned, the workflow factory is loaded and arguments are:

so I am afraid that with this change, we will be either causing an error somewhere else or setting these to None as default.

I'd suggest to test/investigate those cases as well.

@@ -33,6 +33,8 @@ def makeList(stringList):
throws a ValueError if the input is not well formed.
If the stringList is already of type list, then return it untouched.
"""
# NOTE: In case of a change to this function beware of side effects which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the Utils package is supposed to be agnostic of any WMCore-specific logic, I would suggest to remove this comment, please.

@todor-ivanov
Copy link
Contributor Author

ok this:

When a workflow is created or assigned, the workflow factory is loaded and arguments are:

Tells the whole story. This function has been already developed under the presumption there is no previously set parameters at the workflow. In addition to that, it will not only NOT work for updates of already existing parameters, but it would rather miss eventual SiteBlacklist vs. SiteWhitelists conflicts between the sets of already existing + new arguments. Because in the current version of this function it checks for such conflicts only in among the site lists provided with the call but not with the already existing ones.

I am going to change the code. And will not touch the old one in any way. It would be better in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updating priority of a workflow from a script sets the whitelist to an empty list
3 participants