Quota Maintenance

A team in MLBatch is a group of users that share a resource quota.

In Kueue, the ClusterQueue is the abstraction used to define a pool of resources (cpu, memory, nvidia.com/gpu, etc.) that is available to a team. A LocalQueue is the abstraction used by members of the team to submit workloads to a ClusterQueue for execution using those resources.

Kubernetes built-in ResourceQuotas should not be used for resources that are being managed by ClusterQueues. The two quota systems are incompatible.

We strongly recommend maintaining a simple relationship between between teams, namespaces, ClusterQueues and LocalQueues. Each team should assigned to their own namespace that contains a single LocalQueue which is configured to be the only LocalQueue that targets the team's ClusterQueue.

The quotas assigned to a ClusterQueue can be dynamically adjusted by a cluster admin at any time. Adjustments to quotas only impact queued workloads; workloads already admitted for execution are not impacted by quota adjustments.

For Kueue quotas to be effective, the sum of all quotas for each managed resource (cpu, memory, nvidia.com/gpu, pods) must be maintained to remain less than or equal to the available cluster capacity for this resource. Concretely, for cluster with 256 NVIDIA GPUs dedicated to MLBatch users, the cumulative nomimalQuota for the nvidia.com/gpu resource should be 256 or less. Quotas should be reduced when the available capacity is reduced whether because of failures or due to the allocation of resources to non-batch workloads.

To facilitate the necessary quota adjustments, we recommend setting up a dedicated ClusterQueue for slack capacity that other ClusterQueues can borrow from. This queue should not be associated with any team, project, namespace, or local queue. Its lendingLimit should be adjusted dynamically to reflect changes in cluster capacity. If sized appropriately, this queue will make adjustments to other cluster queues unnecessary for small cluster capacity changes. The figure below shows this recommended setup for an MLBatch cluster with three teams. Beginning with RHOAI 2.12 (AppWrapper v0.23), the dynamic adjustment of the Slack ClusterQueue lendingLimit can be configured to be fully automated.

Every resource name occurring in the resource requests or limits of a workload must be covered by a ClusterQueue intended to admit the workload, even if the requested resource count is zero. For example. a ClusterQueue must cover nvidia.com/roce_gdr, possibly with an empty quota, to admit a PyTorchJob requesting:

  resources:
    requests:
      cpu: 1
      memory: 256Mi
      nvidia.com/roce_gdr: 0
    limits:
      cpu: 1
      memory: 256Mi
      nvidia.com/roce_gdr: 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUOTA_MAINTENANCE.md

QUOTA_MAINTENANCE.md

Quota Maintenance

Files

QUOTA_MAINTENANCE.md

Latest commit

History

QUOTA_MAINTENANCE.md

File metadata and controls

Quota Maintenance