Skip to content

optionally disable clusterIP for Service Type=LoadBalancer #3623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

xiaonancc77
Copy link

@xiaonancc77 xiaonancc77 commented Oct 18, 2022

  • One-line PR description:
    This is useful for VIP-based implementations of Service Type=LoadBalancer where the ClusterIP is not needed. The largest motivations for this feature is:
    the number of loadbalancers is limited by the number of available clusterIP.
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 18, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @xiaonancc77!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @xiaonancc77. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Oct 18, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xiaonancc77
Once this PR has been reviewed and has the lgtm label, please assign dcbw for approval by writing /assign @dcbw in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from dcbw October 18, 2022 07:11
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 18, 2022
* the number of load balancers is now limited to the number of available clusterIP.
* clusterIP are allocated for an LB even though they are not used.

For clusters that have integrations for Service Type=LoadBalancer but don't require clusterIP should have the option to disable clusterIP allocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how the other Pods in the cluster will be able to reach this Service?
I think this will break the service discovery per example, that uses DNS for the svcname and returns the ClusterIP as A record

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In some business scenarios, some services do not need to be accessed by the pods in the cluster, but are accessed by requests from outside the cluster.In some cases, a large number of ClusterIPs that are not needed will be allocated.

  2. In some actual production environments, the access within the cluster is also expected to be accessed through the cloud provider loadBalancer(service external ip) instead of ipvs/iptables, because the cloud provider loadBalancer has session retention, health detection, traffic monitoring and more functions .

  3. dns: If need access within the cluster, dns can configure the k8s_external plugin(dns can return the ExternalIP for service name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are a lot of exceptions for justifying changing a default behavior 😅

Copy link
Author

@xiaonancc77 xiaonancc77 Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there is indeed a need for LB service not to be accessed by pods in the cluster. In this case, it is really unnecessary to allocate clusterIP.

@aojea
Copy link
Member

aojea commented Oct 19, 2022

the number of loadbalancers is limited by the number of available clusterIP.

This is the part I find harder to understand, ClusterIPs come from the ServiceCIDR that use to be private network and you can use up to a /12 , however, load balancer IPs has to be public and those are much expensive and scarce.

@xiaonancc77 xiaonancc77 reopened this Oct 20, 2022
@xiaonancc77
Copy link
Author

the number of loadbalancers is limited by the number of available clusterIP.

This is the part I find harder to understand, ClusterIPs come from the ServiceCIDR that use to be private network and you can use up to a /12 , however, load balancer IPs has to be public and those are much expensive and scarce.

First of all this is true in the private cloud scenario.

However, in public cloud scenarios, such as Tencent Cloud and Alibaba Cloud, in a subnet, the network segments of the container network of all k8s clusters cannot overlap, that is to say, the network segment resources determine the number or the size of clusters that can be purchased.
It is precisely because of this constraint that it brings difficulties to our production environment. If we allocate a large server ip range to multiple clusters, but we do not need to use clusterIP in the clusters, it will cause waste here. Setting a smaller service ip range is not enough for us to create services.

In addition, among public cloud vendors, the vip provided by load balancing is also divided into public network ip and internal ip. The internal IP of the load balancer is associated with the external IP of the k8s cluster server, which can also meet the needs of access outside the cluster. At the same time, the internal ip, like the container ip, belongs to the private network and it is not so scarce.

Whether it is a private cloud or a public cloud, there is indeed a service bound to a load balancer, and there is no need to be accessed by pods in the cluster.
In our production environment, this problem has reached a certain magnitude, which will bring us a limit on the size of the cluster and the number of clusters.

@aojea
Copy link
Member

aojea commented Oct 20, 2022

If we allocate a large server ip range to multiple clusters, but we do not need to use clusterIP in the clusters, it will cause waste here

but the service ip range can overlap, and there are current limitations on the number of Service supported at scale, with a /20 you'll have the possibility to use 4093 ClusterIPs ... just curious, why don't you use the same Service CIDR for all the clusters?

@xiaonancc77
Copy link
Author

xiaonancc77 commented Oct 20, 2022

In many public cloud providers, multiple clusters on the same subnet are not allowed to use overlapping network segments, including service ip ranges.

Even though all the clusters use the same service ip range, if the setting is too large, it will crowd out the container network ip resources. If the setting is too small, it will not be able to expand the IP segment when it is not enough(many public cloud providers currently do not allow the service ip range to be modified after the cluster is created. ).

And in the case where clusterIP is not required, there is really no need for mandatory allocation, and this requirement can be opened as optional.

In many public cloud providers, multiple clusters on the same subnet are not allowed to use overlapping network segments, including service ip ranges.

Even though all the clusters use the same service ip range, if the setting is too large, it will crowd out the container network ip resources. If the setting is too small, it will not be able to expand the IP segment when it is not enough(many public cloud providers currently do not allow the service ip range to be modified after the cluster is created.
).

@xiaonancc77
Copy link
Author

In the case where clusterIP is not required, there is really no need for mandatory allocation, and this requirement can be opened as optional.

@xiaonancc77
Copy link
Author

@aojea @caseydavenport @dcbw

@aojea
Copy link
Member

aojea commented Oct 25, 2022

/assign @danwinship @thockin @khenidak

@thockin
Copy link
Member

thockin commented Oct 31, 2022

I appreciate the KEP, but I don't think we want to do this. We did the equivalent for nodePort under duress, but nodePorts are ACTUALLY a limited resource. As @aojea says, you CAN reuse the same service CIDR on many clusters (in most implementations) and those CAN be very large. This new exception would have to propagate pretty far and wide - every discovery mechanism, including but not limited to DNS, would need to be aware that this previously required field could now be empty.

The risk of this change is quite large. I'll leave it open for a bit to collect feedback, but I'm -1 on this proposal.

@robscott
Copy link
Member

robscott commented Nov 3, 2022

@xiaonancc77 With the upcoming work on Gateway API for L4 load balancing I don't think we'll have any expectations of also provisioning a ClusterIP, maybe it would be more straightforward to invest in that going forward instead of a new KEP on Service?

@khenidak
Copy link
Contributor

khenidak commented Nov 3, 2022

The motivation

* the number of load balancers is now limited to the number of available clusterIP. 
* clusterIP are allocated for an LB even though they are not used.

is centered around preserving ClusterIPs. This is specially painful to deal with if the cluster was setup with a small ClusterIPs CIRD. I think we have a KEP to allow multiples of these CIDRs which should also solve for the problem this KEP is trying to solve for, right?

@xiaonancc77
Copy link
Author

xiaonancc77 commented Nov 4, 2022

@robscott Do you have a link to the relevant documentation, please? Thanks~~

@xiaonancc77
Copy link
Author

@khenidak Thank you for your reply, you are right, but being able to configure multiple service CIDRs can solve some of our pain points, but not all.

In a private network, container CIDR, service CIDR, and LoadBalancer CIDR cannot overlap.
In the current mode, an LoadBalancer service will consume a service ip and a LoadBalancer ip at the same time.
In a private network, our business will have a large number of LoadBalancer service requirements, which will quickly consume the IP resources(half the number of IPs is unnecessary) of our private network.
Therefore, this limits the number or scale of clusters we can create in the same private network

@shaneutt
Copy link
Member

/cc @shaneutt

@MikeZappa87
Copy link

"the number of loadbalancers is limited by the number of available clusterIP." <---- What is your service cluster cidr? It would be good to understand why you are running out of clusterIP's

@shaneutt
Copy link
Member

shaneutt commented Dec 22, 2022

@robscott Do you have a link to the relevant documentation, please?

The project is https://gateway-api.sigs.k8s.io/ and the repository is https://github.com/kubernetes-sigs/gateway-api.

One of the long term goals of this project is to enable the Gateway resource as an alternative to Service type=LoadBalancer, kubernetes-sigs/gateway-api#223 is a potentially side-relevant issue, but I don't think we currently have a completely formal statement of this intent in our issues. This goal would appear to align with your goals, but we would need to start formalizing that intent further and pulling together the interested parties.

We would love to have you join one of the upcoming Gateway API community meetings and talk to us more about your use case and needs (note that they wont be starting back up until January 9th due to the holidays). Please feel free to put something on our agenda for an upcoming meeting, or if you prefer async we have a discussions board and we're in #sig-network-gateway-api on Kubernetes Slack.

"the number of loadbalancers is limited by the number of available clusterIP." <---- What is your service cluster cidr? It would be good to understand why you are running out of clusterIP's

I'm also very curious about the impetus for this change. Can you please help us to better understand:

  1. how you get into this resource depletion problem, what kind of numbers are we talking in terms of LoadBalancers?
  2. are new clusters being deployed with this problem, or is this specific to older long-running clusters? If so why?
  3. what is the friction with re-deploying clusters onto larger IP pools?

@thockin
Copy link
Member

thockin commented Dec 22, 2022

In almost all implementations, the service clusterIP range is virtual - it never hits the wire. That said, you don't want it to overlap with real IPs that you use elsewhere in your network. It can be any range that you can afford to consume - link-local or RFC-1918 or CGNAT or class E or even puplic IPs you know you will never use. IPv6 should be even easier. You can also use the same range in every cluster.

Adding more API to economize on these has a real cost for maintenacne and testing, and the cost is forever. I don't think we want to do this proposal this way - Gateway is our way out of the "stacked" model of Services.

/close

@k8s-ci-robot
Copy link
Contributor

@thockin: Closed this PR.

In response to this:

In almost all implementations, the service clusterIP range is virtual - it never hits the wire. That said, you don't want it to overlap with real IPs that you use elsewhere in your network. It can be any range that you can afford to consume - link-local or RFC-1918 or CGNAT or class E or even puplic IPs you know you will never use. IPv6 should be even easier. You can also use the same range in every cluster.

Adding more API to economize on these has a real cost for maintenacne and testing, and the cost is forever. I don't think we want to do this proposal this way - Gateway is our way out of the "stacked" model of Services.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants