Homelab
Talos Cluster Expansion and HA
Expanding Your Talos Kubernetes Cluster and Enabling High Availability
Overview
Adding a third node1 and enabling high availability by promoting all nodes to control plane.
With 3 nodes, running all as control plane gives you etcd quorum - the cluster survives any single node failure. This is the sweet spot for homelabs. With 4+ nodes, you'd keep 3 control plane and add dedicated workers to reduce overhead.
| Tip: | Having trouble? See v0.9.1 for what your setup should look like after completing this article. |
Before You Begin
Prerequisites
- Talhelper Cluster Bootstrap completed (cluster running)
- New node: BIOS configured per Talos Linux USB Installation
- New node: Booted from Talos installer ISO, in maintenance mode
Configure BIOS Settings
Before booting from USB, configure BIOS settings (press Delete on boot for GEEKOM XT15 MEGA):
| Setting | Location | Action |
|---|---|---|
| Boot Device Priority | Boot | Set USB KEY: UEFI:... first |
| Wake Up by LAN | Advanced > Power Management Configuration | Enable |
| Power-On after Power-Fail | Advanced > Power Management Configuration | Set to Power On |
See Talos Linux USB Installation for full BIOS details.
Gather Hardware Info
With the new node booted from the Talos installer ISO:
talosctl get disks --insecure --nodes 192.168.1.32
talosctl get links --insecure --nodes 192.168.1.32 | grep -E "^NODE|up.*true" Note the disk model and network interface name.
Add New Node
Clone the config from an existing node and adjust.
Node Config
talos/talconfig.yaml:
nodes:
# ... existing nodes ...
- hostname: talos-node-3
ipAddress: 192.168.1.32
controlPlane: false # Start as worker, promote later
installDiskSelector:
size: ">= 2TB"
model: WPBSN4M8-2TGP # Same hardware = same model
networkInterfaces:
- interface: enp172s0
addresses:
- 192.168.1.32/24
routes:
- network: 0.0.0.0/0
gateway: 192.168.1.1
schematic:
customization:
systemExtensions:
officialExtensions:
- siderolabs/i915
- siderolabs/intel-ucode
- siderolabs/iscsi-tools
patches:
- |-
machine:
kubelet:
extraArgs:
feature-gates: DynamicResourceAllocation=true
files:
- path: /etc/cri/conf.d/20-customization.part
op: create
content: |
[plugins."io.containerd.cri.v1.runtime"]
cdi_spec_dirs = ["/var/run/cdi"] | Note: | Copy the schematic and patches from an existing node to ensure identical configuration. Only hostname, ipAddress, and addresses differ. |
Commit Node Config
git add talos/talconfig.yaml
git commit -m "feat(talos): add third node" Promote All Nodes to Control Plane
With 3 nodes, enable HA by making all control plane.
Update Control Plane
talos/talconfig.yaml:
nodes:
- hostname: talos-node-1
controlPlane: true # Already control plane
- hostname: talos-node-2
controlPlane: true # Changed from false
- hostname: talos-node-3
controlPlane: true # Changed from false Commit Control Plane
git add talos/talconfig.yaml
git commit -m "feat(talos): make all control plane" Regenerate Configs
Generate Machine Configs
cd ~/homelab/talos
SOPS_AGE_KEY_FILE=<(op document get "sops-key | homelab") \
talhelper genconfig Apply to New Node
Add node 3 to the cluster first. This brings etcd to 2 members before touching node 2.
Apply Config
# Node 3 is in maintenance mode, needs --insecure
talosctl apply-config --insecure \
--nodes 192.168.1.32 \
--file clusterconfig/homelab-cluster-talos-node-3.yaml Watch Node Join
Node 3 reboots and joins the cluster. Wait for it to appear:
# Watch for node 3 to join (Ctrl+C when ready)
kubectl get nodes -w Verify etcd Members
Verify etcd now has 2 members:
talosctl --nodes 192.168.1.30 etcd members Promote Existing Worker
Now apply the updated config to node 2 to promote it to control plane.
Apply Config
# Node 2 is already configured (not maintenance mode), no --insecure
talosctl apply-config \
--nodes 192.168.1.31 \
--file clusterconfig/homelab-cluster-talos-node-2.yaml Watch Node Rejoin
This triggers a reboot. Node 2 restarts with control plane services and joins etcd.
kubectl get nodes -w Verify etcd Members
Verify etcd now has 3 members:
talosctl --nodes 192.168.1.30 etcd members Verify Final Health
Cluster Health
talosctl --nodes 192.168.1.30 health All checks should show OK.
GPU Discovery
Verify GPU discovered on new node:
kubectl get resourceslices Should show ResourceSlices for all 3 nodes.
Maintain Nodes
With 3 control plane nodes, you can safely take one offline for maintenance. etcd maintains quorum with 2 of 3 nodes.
Get MAC Address
Before shutting down, note the MAC address for Wake on LAN (HW ADDR column):
talosctl --nodes 192.168.1.32 get links | grep -E "^NODE|enp" Graceful Shutdown
Drains workloads and stops services cleanly before powering off. Use when physically moving a node or for extended maintenance.
talosctl --nodes 192.168.1.32 shutdown Wake on LAN
Power on a shutdown node remotely (requires Wake on LAN enabled in BIOS per Talos Linux USB Installation):
wakeonlan <MAC_ADDRESS> Reboot
Restarts a running node without full power cycle. Use after config changes that require a restart.
talosctl --nodes 192.168.1.32 reboot Verify Cluster After Maintenance
When the node comes back online:
kubectl get nodes
talosctl --nodes 192.168.1.30 etcd members | Note: | With 2 of 3 nodes running, the cluster remains fully operational. Avoid taking down 2 nodes simultaneously - etcd requires majority. |
Next Steps
With the new node added, update Tailscale to route traffic to it.
See: Tailscale ACL and Subnet Routes
Resources
Footnotes
Sidero Labs, "Adding Nodes to a Cluster," docs.siderolabs.com. Accessed: Dec. 20, 2025. [Online]. Available: https://www.talos.dev/latest/talos-guides/howto/scaling-up/ ↩