Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update isolationLoadbalancer to use isolation group assignment #6725

Merged
merged 1 commit into from
Mar 13, 2025

Conversation

natemort
Copy link
Member

@natemort natemort commented Mar 10, 2025

Rather than arbitrarily assigning isolation groups to partitions, use the assignment stored in the database and cached in the client.

What changed?

  • Update isolation load balancer to use the assigned isolation groups
  • Refactor PartitionConfigProvider to expose partition configuration

Why?

  • Enables isolation group assignment to influence task/poller routing

How did you test it?

  • Unit tests

Potential risks

  • Bugs in PartitionConfigProvider could impact TaskList partitioning, but isolation group assignment is behind a feature flag.

Release notes

Documentation Changes

Comment on lines 87 to 95
partitions, ok := i.getPartitionsForGroup(isolationGroup, config.ReadPartitions)
if !ok {
return i.fallback.PickReadPartition(taskListType, req, isolationGroup)
}

// Scaling down, we need to consider both sets of partitions
if numWrite := i.provider.GetNumberOfWritePartitions(req.GetDomainUUID(), taskList, taskListType); numWrite != nRead {
writePartitions, ok := i.getPartitionsForGroup(isolationGroup, numWrite)
if ok {
for p := range writePartitions {
partitions[p] = struct{}{}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this guardrail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no longer needed. It was a consequence of the previous approach that arbitrarily assigned isolation groups based on the number of partitions. When the number of partitions was changing we had to calculate what the assignments would have been with and without the change to make sure we drained partitions.

for j := index; j < partitionCount; j += len(isolationGroups) {
partitions[j] = struct{}{}

res := make(map[int]any)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add some comments. It took a lot of time to figure out what this is doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this to be more readable, good callout.

if nRead <= 1 {
return taskListName
domainName, err := i.domainIDToName(req.GetDomainUUID())
if err != nil || !i.isolationEnabled(domainName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use fallback for zero isolation groups case instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was covered implicitly by getPartitionsForGroup, but I've made it more explicit.

…ignment

Rather than arbitrarily assigning isolation groups to partitions, use the assignment stored in the database and cached in the client.

Refactor PartitionConfigProvider to expose the full partition configuration.
Copy link
Member

@shijiesheng shijiesheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +45 to +46
domainIDToName: domainIDToName,
isolationEnabled: isolationEnabled,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reduce the footage of domainIDToName

Suggested change
domainIDToName: domainIDToName,
isolationEnabled: isolationEnabled,
isolationEnabled: func(domainID string) bool {
domainName, err := domainIDToName(domainID)
if err != nil {
return false
}
return isolationEnabled(domainName)
},

@natemort natemort merged commit 427f0dd into cadence-workflow:master Mar 13, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants