Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: allow the use of private networks #1567

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

xavierleune
Copy link
Contributor

Hi there,

First, thank you for this awesome project! 🙌
This feature allows you to deploy k3s using only private IPs on your servers. It requires a VPN and NAT configuration, which are out of scope for this deployment.

This is my very first time working with Terraform, so I'm open to any feedback on this pull request as it can probably be improved.

Have a nice day!

CF #282 #1255

@xavierleune xavierleune force-pushed the feature/private-network branch from 15e7571 to e187c93 Compare November 22, 2024 10:18
Copy link
Contributor

@cedric-lovit cedric-lovit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm super excited for this feature. So thank you very much for putting effort in this!
You mentioned this was the first time you used terraform... You did super well!

Welcome to Terraform world! :D

@nicolaracco
Copy link

I attempted to switch to this implementation in an existing cookbook without making any other change to the kube.tf file. I’ve posted the current configuration here. However, I encountered several instances of the following error:

Error: Attempt to get attribute from null value
│ 
│   on .terraform/modules/kube-hetzner/modules/host/out.tf line 10, in output "private_ipv4_address":
│   10:   value = one(hcloud_server.server.network).ip
│     ├────────────────
│     │ hcloud_server.server.network is empty set of object
│ 
│ This value is null, so it does not have any attributes.
╵

Maybe is it expecting to always find the network referenced as an input? If I have some time later today, I’ll try to debug the issue and add more info

@xavierleune
Copy link
Contributor Author

@nicolaracco I confirm that the network should always be defined. It must be created before your cluster, because you have to be connected to this network through a non-k8s server.

@xavierleune
Copy link
Contributor Author

@mysticaltech do you have an opinion on this PR ? 🙏

@mysticaltech
Copy link
Collaborator

@xavierleune Thanks for this, will review it tonight

@4erdenko
Copy link

@xavierleune Thanks for this, will review it tonight

Hey, any updates?
Disabling public addresses on nodes its a important security thing.

@mkajander
Copy link

Tested this today (along with NAT + Wireguard) and was able to successfully deploy a cluster with public ips disabled for the nodes.

mysticaltech
mysticaltech previously approved these changes Jan 20, 2025
Copy link
Collaborator

@mysticaltech mysticaltech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Will merge and try it before releasing.

@mysticaltech
Copy link
Collaborator

@xavierleune Here's my test deploy, the one I usually use for everything. Please make sure it deploys, this is important.

locals {
  # You have the choice of setting your Hetzner API token here or define the TF_VAR_hcloud_token env
  # within your shell, such as such: export TF_VAR_hcloud_token=xxxxxxxxxxx 
  # If you choose to define it in the shell, this can be left as is.

  # Your Hetzner token can be found in your Project > Security > API Token (Read & Write is required).
  hcloud_token = "xxxxxx"
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token

  source = "../kube-hetzner"

  cluster_name = "test12"

  initial_k3s_channel = "v1.30"

  ssh_public_key  = file("/home/karim/.ssh/id_ed25519.pub")
  ssh_private_key = file("/home/karim/.ssh/id_ed25519")

  network_region = "eu-central"

  control_plane_nodepools = [
    {
      name        = "control-plane",
      server_type = "cx22",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 1
    }
  ]

  agent_nodepools = [
    {
      name        = "agent-small",
      server_type = "cx22",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 1
    }
  ]

  autoscaler_nodepools = [
    {
      name        = "autoscaled-small"
      server_type = "cx22",
      location    = "fsn1",
      min_nodes   = 1
      max_nodes   = 2
    }
  ]

  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"

}

provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}

terraform {
  required_version = ">= 1.5.0"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.43.0"
    }
  }
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

variable "hcloud_token" {
  sensitive = true
  default   = ""
}

Currently, having the following error.
Screenshot From 2025-01-20 20-53-29

Copy link
Collaborator

@mysticaltech mysticaltech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xavierleune The ball is in your camp. Please, let's make sure this is fully backward compatible. See above. Thanks!

@xavierleune
Copy link
Contributor Author

@mysticaltech thanks for your feedback, I'll have a look !

@Mikopet
Copy link

Mikopet commented Jan 22, 2025

This is something I am also really looking forward to.

I have never done k8s cluster with publicly available nodes, and made me concerned. In fact, I was thinking about creating an own module because of that. But having this here would be way better :-)

Big thanks for the implementation!

@xavierleune
Copy link
Contributor Author

@mysticaltech I think that should do the trick, your test file now works as expected

@mysticaltech
Copy link
Collaborator

Thanks @xavierleune, will try ASAP

@devops-fd
Copy link

Super excited to get this live, when it's expected to merge this?

@devops-fd
Copy link

Hello @xavierleune I have tried to run your branch and got some issues with post creation script as shown in the console

module.kube-hetzner.null_resource.first_control_plane: Provisioning with 'remote-exec'...
module.kube-hetzner.null_resource.first_control_plane (remote-exec): Connecting to remote host via SSH...
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   Host: 10.127.128.101
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   User: root
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   Password: false
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   Private key: true
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   Certificate: false
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   SSH Agent: false
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   Checking Host Key: false
module.kube-hetzner.null_resource.first_control_plane (remote-exec):   Target Platform: unix
module.kube-hetzner.null_resource.first_control_plane (remote-exec): Connected!
module.kube-hetzner.null_resource.first_control_plane (remote-exec): + /etc/cloud/rename_interface.sh
module.kube-hetzner.null_resource.first_control_plane: Still creating... [10s elapsed]
module.kube-hetzner.null_resource.first_control_plane (remote-exec): loop
module.kube-hetzner.null_resource.first_control_plane (remote-exec): Error: Device 'eth0' not found.
module.kube-hetzner.null_resource.first_control_plane (remote-exec): Error: unknown connection ''.
module.kube-hetzner.null_resource.first_control_plane (remote-exec): got failure exit code, repeating
module.kube-hetzner.null_resource.first_control_plane (remote-exec): loop

Since disabling the public interface will result in a missing eth0
image

So changes must be made to locals.tf line 844 script to take this case into consideration

@lefterisALEX
Copy link

i think this block should be executed if eth0 do not exists.

    # In case of a private-only network, eth0 may not exist
    if ip link show eth0 &>/dev/null; then
        eth0_connection=$(nmcli -g GENERAL.CONNECTION device show eth0)
        nmcli connection modify "$eth0_connection" \
          con-name eth0 \
          connection.interface-name eth0
    fi

Copy link

@gePower gePower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this code

@mysticaltech
Copy link
Collaborator

Thanks @xavierleune, on it, will do one final review and we can merge this.

Copy link
Collaborator

@mysticaltech mysticaltech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General Review

Overall, this PR adds valuable functionality for running Kube-Hetzner clusters with private IPs or even purely private connectivity. The approach is sensible: it updates modules to handle disable_ipv4 and disable_ipv6 flags, modifies how the code retrieves the node address, and clarifies the user guide for NAT setups.

Below is a deeper review highlighting potential concerns:

  1. Connectivity Checks & Fallbacks

    • Various sections (e.g., modules/host/main.tf) now use coalesce(ipv4, ipv6, network ip). If both IPv4 and IPv6 are disabled, the module depends on a NAT or VPN for connectivity. You might want more explicit error handling for the scenario where even the private IP is unreachable.
  2. Resiliency of SSH Wait-Loop

    • The code that loops until the server is ready is set with a 600-second (or indefinite) wait. That might be okay, but in a worst-case scenario, it can stall a pipeline. Consider adding a strict timeout or additional checks to fail more gracefully.
  3. Backwards Compatibility

    • Changes appear backward compatible overall—existing setups that keep IPv4 enabled will continue to operate as before. Just ensure that users with older configurations do not break if they never define the new boolean flags.
  4. Validation & Documentation

    • For purely private setups, you now rely on user-managed NAT or VPN. That’s correct, but do consider adding more disclaimers about the need for a stable route, especially if both IPv4 and IPv6 are disabled.
  5. Maintainability

    • The repeated coalesce logic is absolutely correct, but if the code in future evolves to support multiple interfaces or custom hostnames, a small helper or a local variable to unify the address selection might help.

Given these changes, I have some specific line-by-line suggestions below.

Summary

You’re on the right track. I recommend a few improvements to handle the purely private case more robustly and to keep the code a bit more self-explanatory. Great job overall!

@mysticaltech
Copy link
Collaborator

@xavierleune Almost there brother, thanks in advance 🚀

@xavierleune
Copy link
Contributor Author

Hello there, sorry I've been away from this pull request for too long, I'll have a look at your feedbacks in the coming days !

@xavierleune xavierleune force-pushed the feature/private-network branch from 51dbf30 to c2bb343 Compare February 26, 2025 15:36
@xavierleune
Copy link
Contributor Author

This one might be the one 🤞

@amedeopalopoli
Copy link

Thanks @xavierleune, on it, will do one final review and we can merge this.

Hi all,

in Hetzner Private Networks I think we should add the below command in the init
ip route add default via ${network-gateway}

As well as we did here I think we should add a condition when we get Network Manager up.

cloudinit_runcmd_common = <<EOT
....
# Make sure the network is up
- [systemctl, restart, NetworkManager]
- [systemctl, status, NetworkManager]
- [ip, route, add, default, via, '172.31.1.1', dev, 'eth0']
....
EOT

In this case I think we need to add default network gateway if public ip address is not enabled

@mysticaltech
Copy link
Collaborator

Thanks @xavierleune, on it, will do one final review and we can merge this.

Hi all,

in Hetzner Private Networks I think we should add the below command in the init ip route add default via ${network-gateway}

As well as we did here I think we should add a condition when we get Network Manager up.

cloudinit_runcmd_common = <<EOT
....
# Make sure the network is up
- [systemctl, restart, NetworkManager]
- [systemctl, status, NetworkManager]
- [ip, route, add, default, via, '172.31.1.1', dev, 'eth0']
....
EOT

In this case I think we need to add default network gateway if public ip address is not enabled

Thanks @xavierleune! What do you think about @amedeopalopoli's suggestion above?

@xavierleune
Copy link
Contributor Author

@amedeopalopoli thanks for the suggestion, however I'm not sure we should do that, actually without public ip we should add a route for 0.0.0.0/0 in hetzner to the private ip of your gateway ; I followed this tutorial as I pointed out in the readme. I think that trying to use hetzner gateway without public ip would fail.
The setup I propose in the readme works for me since 3 months and I recreated a control plane yesterday that has access to net. So I'm not sure how to deal with your suggestion ^^

@lefterisALEX
Copy link

lefterisALEX commented Feb 27, 2025

An example i have it working in my case.
Subnet : 192.168.100.0/24
NAT_GW IP: 192.168.100.2 (The IP of the instance i use as NAT Gateway)

then on the instance (running ubuntu) that has only private interface i do this:

    cat <<'EOF' >> /etc/systemd/network/10-enp7s0.network
    [Match]
    Name=enp7s0
    
    [Network]
    DHCP=yes
    Gateway=192.168.100.1
    EOF
    echo "DNS=9.9.9.9" | tee -a /etc/systemd/resolved.conf
    systemctl reload systemd-networkd
    systemctl restart systemd-resolved

Not 100% sure if i missed something , but i am happy to help if needed

@xavierleune
Copy link
Contributor Author

@lefterisALEX thanks for the feedback, seems a lot like the approach I was talking about, BTW I think you might get rid of those steps by adding the route in hetzner routes for your subnet ; it should be automatic then, like this but with my gateway ^^
Capture d’écran 2025-02-27 à 10 47 08

@aru-amedeo
Copy link

use hetzner gateway without public ip w

I'm sorry @xavierleune but if we have nodes with only private IPs they cannot reach out internet (0.0.0.0) unless you add
- [ip, route, add, default, via, '$gateway-network', dev, 'private-interface']

am I wrong?

@xavierleune
Copy link
Contributor Author

use hetzner gateway without public ip w

I'm sorry @xavierleune but if we have nodes with only private IPs they cannot reach out internet (0.0.0.0) unless you add - [ip, route, add, default, via, '$gateway-network', dev, 'private-interface']

am I wrong?

You can, if you add your gateway in the routes for 0.0.0.0/0, like in my previous screenshot ; that will leverage nat masquerade

@lefterisALEX
Copy link

i think you need both @xavierleune . If you see in the documentation you shared in Step 3 - Configuring NAT and Step 4 - Achieving a persistent configuration it says that configuration is needed on client side as well.
For example for ubuntu:

screenshot -2025-02-27 -- 12 34 14@2x

screenshot -2025-02-27 -- 12 35 56@2x

@xavierleune
Copy link
Contributor Author

@lefterisALEX ok I can see that but from my understanding 10.0.0.1 is already declared because it's the subnet gateway for hetzner. I think cloud-init already handles that part.

Here is an example for ip route show for a control plane created yesterday:

default via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.101 metric 100
10.0.0.0/8 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.101 metric 100
10.0.0.1 dev eth1 proto dhcp scope link src 10.127.128.101 metric 100
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
10.42.3.0/24 via 10.42.3.0 dev flannel.1 onlink
10.42.4.0/24 via 10.42.4.0 dev flannel.1 onlink
10.42.5.0/24 via 10.42.5.0 dev flannel.1 onlink
10.42.6.0/24 via 10.42.6.0 dev flannel.1 onlink
10.42.7.0/24 via 10.42.7.0 dev flannel.1 onlink
10.42.8.0/24 via 10.42.8.0 dev flannel.1 onlink
10.42.9.0/24 via 10.42.9.0 dev flannel.1 onlink
169.254.169.254 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.101 metric 100

@xavierleune
Copy link
Contributor Author

@aru-amedeo I'm curious, did you tried this branch ? Because as far as my tests goes, it's working just fine right now ; cloud init pulls the network config from hetzner api so the default route use the gateway subnet ; just like for servers with a public interface, without a route on 0.0.0.0/0 the main gateway from hetzner (172.31.1.1) is the default route, without any need to declare it. The current terraform does not configure the public gateway for servers with public interfaces ; neither it needs to do it for servers with only private network. Am I missing something here ?

@amedeopalopoli
Copy link

@lefterisALEX ok I can see that but from my understanding 10.0.0.1 is already declared because it's the subnet gateway for hetzner. I think cloud-init already handles that part.

Here is an example for ip route show for a control plane created yesterday:

default via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.101 metric 100
10.0.0.0/8 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.101 metric 100
10.0.0.1 dev eth1 proto dhcp scope link src 10.127.128.101 metric 100
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
10.42.3.0/24 via 10.42.3.0 dev flannel.1 onlink
10.42.4.0/24 via 10.42.4.0 dev flannel.1 onlink
10.42.5.0/24 via 10.42.5.0 dev flannel.1 onlink
10.42.6.0/24 via 10.42.6.0 dev flannel.1 onlink
10.42.7.0/24 via 10.42.7.0 dev flannel.1 onlink
10.42.8.0/24 via 10.42.8.0 dev flannel.1 onlink
10.42.9.0/24 via 10.42.9.0 dev flannel.1 onlink
169.254.169.254 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.101 metric 100

Hi @xavierleune as far as I understood, when you create a private network, you should add #1567 (comment). This allows you to use the network route when the destination address is part of the internal network and when it does not (e.g. public ip like 8.8.8.8). This is why I said you this PR is not enough to achieve what you aim to do. It's missing the cloudinit stuff to configure the default gw when a public ip should be routable from nodes with no public interface attached to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.