𝔩𝔢𝔩𝕠𝔭𝔢𝔷
Theme
Connect With Me on LinkedIn Buy Me a Coffee

Homelab

Talos Cluster Expansion and HA

Expanding Your Talos Kubernetes Cluster and Enabling High Availability

Overview

Adding a third node1 and enabling high availability by promoting all nodes to control plane.

With 3 nodes, running all as control plane gives you etcd quorum - the cluster survives any single node failure. This is the sweet spot for homelabs. With 4+ nodes, you'd keep 3 control plane and add dedicated workers to reduce overhead.

Tip:Having trouble? See v0.9.1 for what your setup should look like after completing this article.

Before You Begin

Prerequisites

Configure BIOS Settings

Before booting from USB, configure BIOS settings (press Delete on boot for GEEKOM XT15 MEGA):

SettingLocationAction
Boot Device PriorityBootSet USB KEY: UEFI:... first
Wake Up by LANAdvanced > Power Management ConfigurationEnable
Power-On after Power-FailAdvanced > Power Management ConfigurationSet to Power On

See Talos Linux USB Installation for full BIOS details.

Gather Hardware Info

With the new node booted from the Talos installer ISO:

talosctl get disks --insecure --nodes 192.168.1.32
talosctl get links --insecure --nodes 192.168.1.32 | grep -E "^NODE|up.*true"

Note the disk model and network interface name.

Add New Node

Clone the config from an existing node and adjust.

Node Config

talos/talconfig.yaml:

nodes:
  # ... existing nodes ...

  - hostname: talos-node-3
    ipAddress: 192.168.1.32
    controlPlane: false  # Start as worker, promote later
    installDiskSelector:
      size: ">= 2TB"
      model: WPBSN4M8-2TGP  # Same hardware = same model
    networkInterfaces:
      - interface: enp172s0
        addresses:
          - 192.168.1.32/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.1.1
    schematic:
      customization:
        systemExtensions:
          officialExtensions:
            - siderolabs/i915
            - siderolabs/intel-ucode
            - siderolabs/iscsi-tools
    patches:
      - |-
        machine:
          kubelet:
            extraArgs:
              feature-gates: DynamicResourceAllocation=true
          files:
            - path: /etc/cri/conf.d/20-customization.part
              op: create
              content: |
                [plugins."io.containerd.cri.v1.runtime"]
                  cdi_spec_dirs = ["/var/run/cdi"]
Note:Copy the schematic and patches from an existing node to ensure identical configuration. Only hostname, ipAddress, and addresses differ.

Commit Node Config

git add talos/talconfig.yaml
git commit -m "feat(talos): add third node"

Promote All Nodes to Control Plane

With 3 nodes, enable HA by making all control plane.

Update Control Plane

talos/talconfig.yaml:

nodes:
  - hostname: talos-node-1
    controlPlane: true   # Already control plane

  - hostname: talos-node-2
    controlPlane: true   # Changed from false

  - hostname: talos-node-3
    controlPlane: true   # Changed from false

Commit Control Plane

git add talos/talconfig.yaml
git commit -m "feat(talos): make all control plane"

Regenerate Configs

Generate Machine Configs

cd ~/homelab/talos

SOPS_AGE_KEY_FILE=<(op document get "sops-key | homelab") \
  talhelper genconfig

Apply to New Node

Add node 3 to the cluster first. This brings etcd to 2 members before touching node 2.

Apply Config

# Node 3 is in maintenance mode, needs --insecure
talosctl apply-config --insecure \
  --nodes 192.168.1.32 \
  --file clusterconfig/homelab-cluster-talos-node-3.yaml

Watch Node Join

Node 3 reboots and joins the cluster. Wait for it to appear:

# Watch for node 3 to join (Ctrl+C when ready)
kubectl get nodes -w

Verify etcd Members

Verify etcd now has 2 members:

talosctl --nodes 192.168.1.30 etcd members

Promote Existing Worker

Now apply the updated config to node 2 to promote it to control plane.

Apply Config

# Node 2 is already configured (not maintenance mode), no --insecure
talosctl apply-config \
  --nodes 192.168.1.31 \
  --file clusterconfig/homelab-cluster-talos-node-2.yaml

Watch Node Rejoin

This triggers a reboot. Node 2 restarts with control plane services and joins etcd.

kubectl get nodes -w

Verify etcd Members

Verify etcd now has 3 members:

talosctl --nodes 192.168.1.30 etcd members

Verify Final Health

Cluster Health

talosctl --nodes 192.168.1.30 health

All checks should show OK.

GPU Discovery

Verify GPU discovered on new node:

kubectl get resourceslices

Should show ResourceSlices for all 3 nodes.

Maintain Nodes

With 3 control plane nodes, you can safely take one offline for maintenance. etcd maintains quorum with 2 of 3 nodes.

Get MAC Address

Before shutting down, note the MAC address for Wake on LAN (HW ADDR column):

talosctl --nodes 192.168.1.32 get links | grep -E "^NODE|enp"

Graceful Shutdown

Drains workloads and stops services cleanly before powering off. Use when physically moving a node or for extended maintenance.

talosctl --nodes 192.168.1.32 shutdown

Wake on LAN

Power on a shutdown node remotely (requires Wake on LAN enabled in BIOS per Talos Linux USB Installation):

wakeonlan <MAC_ADDRESS>

Reboot

Restarts a running node without full power cycle. Use after config changes that require a restart.

talosctl --nodes 192.168.1.32 reboot

Verify Cluster After Maintenance

When the node comes back online:

kubectl get nodes
talosctl --nodes 192.168.1.30 etcd members
Note:With 2 of 3 nodes running, the cluster remains fully operational. Avoid taking down 2 nodes simultaneously - etcd requires majority.

Next Steps

With the new node added, update Tailscale to route traffic to it.

See: Tailscale ACL and Subnet Routes

Resources

Footnotes

  1. Sidero Labs, "Adding Nodes to a Cluster," docs.siderolabs.com. Accessed: Dec. 20, 2025. [Online]. Available: https://www.talos.dev/latest/talos-guides/howto/scaling-up/

Previous
Talos Upgrade and Extensions