Joining a non-Talos node to a Talos cluster (Day 39)

How to add non-Talos nodes to a Talos cluster with HAProxy for KubePrism compatibility.

Joining a non-Talos node to a Talos cluster (Day 39)
Photo by Catherine Hinrichsen / Unsplash

I have been meaning to figure out a way to get a non-Talos node to join my Talos cluster for a while, because I have this idea of running GPU machines from the cloud, and I would like them to show up as regular nodes.

Thanks to this GitHub issue, it ended up being surprisingly easy to do

The taskfile

Created a Taskfile to automate the entire process. Here's the key parts:

version: '3'

vars:
  VIP: '{{.VIP | default "10.30.30.155"}}'
  TARGET: '{{.TARGET | default "10.30.10.101"}}'
  SSH_KEY: '{{.SSH_KEY | default "~/.ssh/devkey"}}'
  KUBE_VERSION: '{{.KUBE_VERSION | default ""}}' # Empty means auto-detect

tasks:
  join-ubuntu:
    desc: Join Ubuntu node to Talos cluster
    deps:
      - validate
    cmds:
      - task: prepare-node
      - task: copy-configs
      - task: setup-haproxy
      - task: start-kubelet
      - task: verify

Node prep

The prepare-node task sets up the prerequisites and installs needed packages:

prepare-node:
  desc: Prepare Ubuntu node
  cmds:
    - |
      {{.SSH_CMD}} 'sudo bash -s' << 'EOF'
      # Get node IP (adjust interface name as needed)
      NODE_IP=$(ip -4 addr show enp6s18 | grep -oP '(?<=inet\s)\d+(\.\d+){3}')
      swapoff -a
      sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
      modprobe overlay
      modprobe br_netfilter
      cat > /etc/sysctl.d/k8s.conf <<EOT
      net.bridge.bridge-nf-call-iptables  = 1
      net.bridge.bridge-nf-call-ip6tables = 1
      net.ipv4.ip_forward                 = 1
      EOT
      sysctl --system
      apt-get install -y containerd
      mkdir -p /etc/containerd
      containerd config default | sed 's/SystemdCgroup = false/SystemdCgroup = true/' > /etc/containerd/config.toml
      systemctl restart containerd

      # Install Kubernetes components
      if [ -n "{{.KUBE_VERSION}}" ]; then
        KUBE_VER="{{.KUBE_VERSION}}"
      else
        KUBE_VER=$(curl -L -s https://dl.k8s.io/release/stable.txt | awk 'BEGIN { FS="." } { printf "%s.%s", $1, $2 }')
      fi
      
      mkdir -p /etc/apt/keyrings
      curl -fsSL https://pkgs.k8s.io/core:/stable:/${KUBE_VER}/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg --yes
      echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/${KUBE_VER}/deb/ /" > /etc/apt/sources.list.d/kubernetes.list

      apt-get update
      apt-get install -y kubelet kubeadm kubectl
      apt-mark hold kubelet kubeadm kubectl

      crictl config \
          --set runtime-endpoint=unix:///run/containerd/containerd.sock \
          --set image-endpoint=unix:///run/containerd/containerd.sock
      cat > /etc/default/kubelet <<EOT
      KUBELET_EXTRA_ARGS='--node-ip ${NODE_IP}'
      EOT
      EOF

Talos configs

Here we get the needed files using talosctl and copy them to the ubuntu machine:

copy-configs:
  desc: Copy Kubernetes configuration from Talos
  vars:
    OUT_DIR: _out
  cmds:
    - mkdir -p {{.OUT_DIR}}
    
    # Copy files from Talos
    - talosctl -n {{.VIP}} cat /etc/kubernetes/kubeconfig-kubelet > {{.OUT_DIR}}/kubelet.conf
    - talosctl -n {{.VIP}} cat /etc/kubernetes/bootstrap-kubeconfig > {{.OUT_DIR}}/bootstrap-kubelet.conf
    - talosctl -n {{.VIP}} cat /etc/kubernetes/pki/ca.crt > {{.OUT_DIR}}/ca.crt
    
    - 'perl -pi -e "s|server:.*|server: https://{{.VIP}}:6443|g" {{.OUT_DIR}}/kubelet.conf'
    - 'perl -pi -e "s|server:.*|server: https://{{.VIP}}:6443|g" {{.OUT_DIR}}/bootstrap-kubelet.conf'
    
    - |
      clusterDomain=$(talosctl -n {{.VIP}} get kubeletconfig -o jsonpath="{.spec.clusterDomain}")
      clusterDNS=$(talosctl -n {{.VIP}} get kubeletconfig -o jsonpath="{.spec.clusterDNS}")
      
      cat > {{.OUT_DIR}}/config.yaml <<EOF
      kind: KubeletConfiguration
      apiVersion: kubelet.config.k8s.io/v1beta1
      authentication:
        anonymous:
          enabled: false
        webhook:
          enabled: true
        x509:
          clientCAFile: /etc/kubernetes/pki/ca.crt
      authorization:
        mode: Webhook
      clusterDomain: "$clusterDomain"
      clusterDNS: $clusterDNS
      runtimeRequestTimeout: "0s"
      cgroupDriver: systemd
      containerRuntimeEndpoint: unix:///var/run/containerd/containerd.sock
      EOF
    
    # Copy to target and move to the correct locations
    - '{{.SCP_CMD}} {{.OUT_DIR}}/* ubuntu@{{.TARGET}}:~/'
    - |
      {{.SSH_CMD}} 'sudo bash -s' << 'EOF'
      mkdir -p /etc/kubernetes/pki /var/lib/kubelet
      mv /home/ubuntu/kubelet.conf /etc/kubernetes/kubelet.conf
      mv /home/ubuntu/bootstrap-kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf
      mv /home/ubuntu/ca.crt /etc/kubernetes/pki/ca.crt
      mv /home/ubuntu/config.yaml /var/lib/kubelet/config.yaml
      
      chmod 600 /etc/kubernetes/kubelet.conf
      chmod 600 /etc/kubernetes/bootstrap-kubelet.conf
      EOF

Haproxy

Talos uses KubePrism, a built-in load balancer that listens on 127.0.0.1:7445 and forwards to the API server.

When you join a non-Talos node, it doesn't have KubePrism, so components like Cilium fail to connect, and the node stays in a NotReady state.

The solution for this, without disabling KubePrism, is to use haproxy on the Ubuntu node to mimic KubePrism:

The HAProxy setup:

setup-haproxy:
  desc: Install and configure HAProxy to mimic KubePrism
  cmds:
    - |
      {{.SSH_CMD}} 'sudo bash -s' << 'EOF'
      apt-get update -qq
      apt-get install -y haproxy
      
      # Check if KubePrism configuration already exists
      if ! grep -q "frontend kubeprism" /etc/haproxy/haproxy.cfg; then
        echo "Adding KubePrism configuration to HAProxy..."
        cat >> /etc/haproxy/haproxy.cfg <<EOT
      
      frontend kubeprism
        mode tcp
        bind localhost:7445
        default_backend api
      
      backend api
        mode tcp
        server lb {{.VIP}}:6443 check
      EOT
      else
        echo "KubePrism configuration already exists in HAProxy"
      fi
      
      # Enable and restart HAProxy
      systemctl enable haproxy
      systemctl restart haproxy
      EOF

Usage

Join an Ubuntu node with auto-detected Kubernetes version:

VIP=10.30.30.155 TARGET=10.30.10.101 task join-ubuntu

Or specify a version to match the Talos cluster:

VIP=10.30.30.155 TARGET=10.30.10.101 KUBE_VERSION=v1.33 task join-ubuntu

Check status:

# Check node status
kubectl get nodes

# Check logs
TARGET=10.30.10.101 task logs
TARGET=10.30.10.101 SERVICE=haproxy task logs

Getting the nodes now shows that the Ubuntu node has joined the cluster.

NAME          STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
daedalus-01   Ready    control-plane   71d   v1.32.3   10.30.30.141   <none>        Talos (v1.10.4)      6.12.31-talos      containerd://2.0.5
daedalus-02   Ready    control-plane   71d   v1.32.3   10.30.30.142   <none>        Talos (v1.10.4)      6.12.31-talos      containerd://2.0.5
daedalus-03   Ready    control-plane   71d   v1.32.3   10.30.30.143   <none>        Talos (v1.10.4)      6.12.31-talos      containerd://2.0.5
daedalus-21   Ready    <none>          71d   v1.32.3   10.30.30.134   <none>        Talos (v1.10.4)      6.12.31-talos      containerd://2.0.5
daedalus-22   Ready    <none>          71d   v1.32.3   10.30.30.135   <none>        Talos (v1.10.4)      6.12.31-talos      containerd://2.0.5
ubuntu        Ready    <none>          77s   v1.33.2   10.30.10.101   <none>        Ubuntu 24.04.1 LTS   6.8.0-63-generic   containerd://1.7.27

HAProxy on the node intercepts connections to localhost:7445 and forwards them to the Talos k8s API server, making it work with the existing cluster's KubePrism configuration.