𝔩𝔢𝔩𝕠𝔭𝔢𝔷
Theme
Connect With Me on LinkedIn Buy Me a Coffee

Homelab

Talhelper Cluster Bootstrap

Bootstrapping a Talos Kubernetes Cluster with Talhelper and Intel GPU Extensions

Overview

Generating Talos machine configurations, applying them to our nodes, and bootstrapping the Kubernetes cluster. This is where we go from nodes in maintenance mode to a running cluster.

Tip:Having trouble? See v0.5.0 for what your setup should look like after completing this article.

Before You Begin

Prerequisites

What We're Setting Up

  • Talos machine configurations via talhelper with SOPS encryption
  • Two-node Kubernetes cluster with Intel GPU extensions
  • Control plane on Node 1, worker on Node 2

Configure Talhelper

Using talhelper1 for GitOps-friendly Talos config management with SOPS integration2.

Install Talhelper

brew install talhelper

Ignore Generated Configs

Add Talos-specific patterns to .gitignore:

# Generated Talos configs (regenerate with talhelper genconfig)
talos/clusterconfig/

Generated configs contain embedded secrets (certs, keys, tokens) - never commit them.

Commit Ignore Patterns

git add .gitignore
git commit -m "chore(git): ignore generated files"
Note:Talhelper creates a .gitignore inside clusterconfig/ with explicit entries for each generated file (e.g., homelab-cluster-talos-node-1.yaml, talosconfig). When you add nodes and regenerate, it updates automatically. The repo-root gitignore above is redundant but provides a belt-and-suspenders approach.

Generate Secrets

Generate cluster secrets once - these don't depend on hardware info:

cd ~/homelab/talos

# Generate secrets
talhelper gensecret > talsecret.sops.yaml

# Encrypt immediately (uses .sops.yaml from repo root)
sops -e -i talsecret.sops.yaml

Commit Encrypted Secrets

git add talos/talsecret.sops.yaml
git commit -m "chore(talos): add encrypted secrets"
Warning:Do not modify talsecret.sops.yaml after your cluster is running unless you want to recreate the cluster. Changing these secrets will break cluster access.

Gather Hardware Info

# Get disk info (size, model)
talosctl get disks --insecure --nodes 192.168.1.30
talosctl get disks --insecure --nodes 192.168.1.31

# Get network interface names (filter to active interfaces)
talosctl get links --insecure --nodes 192.168.1.30 | grep -E "^NODE|up.*true"
talosctl get links --insecure --nodes 192.168.1.31 | grep -E "^NODE|up.*true"

Disk: Note the SIZE and MODEL columns for your NVMe. Use a range for size (e.g., ">= 2TB") because the pretty size (2.0 TB) doesn't match the exact byte count. busPath is another option but broken in talhelper since Talos 1.93. See talhelper4 or Talos5 diskSelector docs for all options.

Network: The grep filters to active interfaces (OPER STATE: up, LINK STATE: true). Use that interface name (e.g., enp172s0, eth0).

Kubernetes version: Check the Talos release notes6 for your Talos version - the Images section lists registry.k8s.io/kube-apiserver:vX.Y.Z which shows the supported Kubernetes version.

Cluster Config

Create talos/talconfig.yaml with the values gathered above. See the Talos configuration reference7 for all available options:

---
clusterName: homelab-cluster
talosVersion: v1.11.6
kubernetesVersion: v1.34.1  # from Talos release notes
endpoint: https://192.168.1.30:6443
allowSchedulingOnControlPlanes: true

clusterPodNets:
  - 10.244.0.0/16
clusterSvcNets:
  - 10.96.0.0/12

cniConfig:
  name: flannel

nodes:
  - hostname: talos-node-1
    ipAddress: 192.168.1.30
    controlPlane: true
    installDiskSelector:
      size: <SIZE>    # e.g., ">= 2TB"
      model: <MODEL>  # e.g., WPBSN4M8-2TGP
    networkInterfaces:
      - interface: <INTERFACE>  # e.g., enp172s0
        addresses:
          - 192.168.1.30/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.1.1
    schematic:
      customization:
        systemExtensions:
          officialExtensions:
            - siderolabs/i915         # Intel GPU drivers
            - siderolabs/intel-ucode  # Intel CPU microcode

  - hostname: talos-node-2
    ipAddress: 192.168.1.31
    controlPlane: false
    installDiskSelector:
      size: <SIZE>    # e.g., ">= 2TB"
      model: <MODEL>  # e.g., WPBSN4M8-2TGP
    networkInterfaces:
      - interface: <INTERFACE>  # e.g., enp172s0
        addresses:
          - 192.168.1.31/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.1.1
    schematic:
      customization:
        systemExtensions:
          officialExtensions:
            - siderolabs/i915         # Intel GPU drivers
            - siderolabs/intel-ucode  # Intel CPU microcode

Notes:

  • allowSchedulingOnControlPlanes: true - Both nodes can run workloads (prepares for adding a third node later)
  • Intel GPU extensions (i9158, intel-ucode9) install the drivers into Talos10. A later article covers the Kubernetes device plugin for pod GPU access.
  • Full extension list11. Custom images with extensions can be generated via Talos Image Factory12.
Tip:Need to add extensions later? See Talos Upgrade and Extensions for how to update a running cluster.

What's in git:

  • talconfig.yaml - cluster definition (no secrets)
  • talsecret.sops.yaml - encrypted secrets

Commit Cluster Config

git add talos/talconfig.yaml
git commit -m "feat(talos): init cluster/node config"

iSCSI Tools Extension

Longhorn (distributed storage) requires iSCSI. Add the extension to both nodes:

schematic:
  customization:
    systemExtensions:
      officialExtensions:
        - siderolabs/i915
        - siderolabs/intel-ucode
        - siderolabs/iscsi-tools  # Required for Longhorn

Commit iSCSI Extension

git add talos/talconfig.yaml
git commit -m "feat(talos): add iscsi-tools extension for storage"

Generate Configs

# Decrypt secrets and generate configs
SOPS_AGE_KEY_FILE=<(op document get "sops-key | homelab") \
    talhelper genconfig

# Output goes to clusterconfig/ directory
ls clusterconfig/

Generated (gitignored):

  • clusterconfig/*.yaml - per-node machine configs
  • clusterconfig/talosconfig - CLI client config (365-day certs, regenerated each run)
Note:To manage the cluster from another machine, just clone the repo and run talhelper genconfig with your SOPS key - no need to transfer talosconfig separately.

Apply Configurations

Apply to Both Nodes

cd ~/homelab/talos

# Apply to Node 1 (control plane)
talosctl apply-config --insecure \
  --nodes 192.168.1.30 \
  --file clusterconfig/homelab-cluster-talos-node-1.yaml

# Apply to Node 2 (worker)
talosctl apply-config --insecure \
  --nodes 192.168.1.31 \
  --file clusterconfig/homelab-cluster-talos-node-2.yaml

Wait for both nodes to reboot and come online (2-3 minutes)

Nodes are no longer in insecure maintenance mode - talosctl now requires authentication via --talosconfig.

Verify Nodes are Reachable

# Check both nodes are up (doesn't require etcd)
talosctl --talosconfig clusterconfig/talosconfig --nodes 192.168.1.30 version
talosctl --talosconfig clusterconfig/talosconfig --nodes 192.168.1.31 version

Bootstrap the Cluster

Follow the Talos bootstrap guide13 to initialize the control plane.

Bootstrap etcd on Node 1

talosctl --talosconfig clusterconfig/talosconfig --nodes 192.168.1.30 bootstrap

This initializes the control plane and etcd (2-3 minutes to fully ready)

Watch Bootstrap Progress (optional)

talosctl --talosconfig clusterconfig/talosconfig --nodes 192.168.1.30 dmesg -f

Temporary errors during bootstrap are normal - the API server takes time to come up. Watch for these signs of success:

  • service[etcd](Running): Health check successful
  • service[kubelet](Running): Health check successful
  • machine is running and ready
  • CNI port logs (cni0: port ... entered forwarding state) indicate flannel is up

Verify Cluster Health

Once bootstrap completes, verify everything is healthy:

talosctl --talosconfig clusterconfig/talosconfig --nodes 192.168.1.30 health

All checks should show OK.

Verify Talos Health

Check etcd Members

talosctl --talosconfig clusterconfig/talosconfig --nodes 192.168.1.30 \
  etcd members

Expected: Single member (talos-node-1) since we have one control plane node.

Access Kubernetes

Get kubeconfig

talosctl --talosconfig clusterconfig/talosconfig --nodes 192.168.1.30 \
  kubeconfig clusterconfig/kubeconfig

Like talosconfig, kubeconfig is regenerable - no need to store separately. For Tailscale later, regenerate with Tailscale endpoints.

Check All Nodes

kubectl --kubeconfig clusterconfig/kubeconfig get nodes -o wide

Expected output:

NAME           STATUS   ROLES           AGE   VERSION
talos-node-1   Ready    control-plane   5m    v1.34.x
talos-node-2   Ready    <none>          4m    v1.34.x

Check System Pods

kubectl --kubeconfig clusterconfig/kubeconfig get pods -n kube-system

Should see14:

  • coredns pods (DNS)
  • kube-apiserver, kube-controller-manager, kube-scheduler (control plane)
  • kube-proxy (networking)
  • CNI pods (flannel or cilium)

Verify Control Plane Taint

kubectl --kubeconfig clusterconfig/kubeconfig describe node talos-node-1 | grep Taint

Expected: Since we set allowSchedulingOnControlPlanes: true, there should be no taint.

Worker node should also have no taints:

kubectl --kubeconfig clusterconfig/kubeconfig describe node talos-node-2 | grep Taint

Test the Cluster

Deploy Test Application

# Create a pod running nginx web server
# (PodSecurity warning about "restricted" policy is expected - it's warn mode, not enforced)
kubectl --kubeconfig clusterconfig/kubeconfig run nginx --image=nginx --port=80

# Expose it as a NodePort service (accessible from outside cluster)
# NodePort opens a port on every node's IP - we specify 30080 explicitly
kubectl --kubeconfig clusterconfig/kubeconfig expose pod nginx \
  --type=NodePort --port=80 \
  --overrides='{"spec":{"ports":[{"port":80,"nodePort":30080}]}}'

# Check pod is running
kubectl --kubeconfig clusterconfig/kubeconfig get pod nginx

# Verify service shows 80:30080/TCP
kubectl --kubeconfig clusterconfig/kubeconfig get svc nginx

Test from Within Cluster

# Start a temporary debug pod with interactive shell (--rm deletes on exit)
kubectl --kubeconfig clusterconfig/kubeconfig run -it --rm debug \
  --image=busybox --restart=Never -- sh

# Inside container - test cluster DNS resolves service name to ClusterIP:
wget -O- http://nginx
exit

Test from Outside Cluster

Access nginx from your machine using any node's IP and the NodePort:

# Using Node 1
curl http://192.168.1.30:30080

# Using Node 2 (same port works on any node)
curl http://192.168.1.31:30080

You should see the nginx welcome page HTML.

Clean Up

# Delete the test pod and service
kubectl --kubeconfig clusterconfig/kubeconfig delete pod nginx
kubectl --kubeconfig clusterconfig/kubeconfig delete svc nginx

# Verify nothing left in default namespace
kubectl --kubeconfig clusterconfig/kubeconfig get all

Expected: Only service/kubernetes should remain (the built-in API service).

Next Steps

With Kubernetes cluster running, we can bootstrap Flux for GitOps.

See: Flux CD Kubernetes GitOps

For Talos maintenance and expansion:

Resources

Footnotes

  1. Budimanjojo, "Talhelper Documentation," budimanjojo.github.io. Accessed: Dec. 16, 2025. [Online]. Available: https://budimanjojo.github.io/talhelper/latest/

  2. Budimanjojo, "Talhelper Guides - Configuring SOPS," budimanjojo.github.io. Accessed: Dec. 16, 2025. [Online]. Available: https://budimanjojo.github.io/talhelper/latest/guides/#configuring-sops-for-talhelper

  3. Budimanjojo, "Talhelper Example Config," github.com. Accessed: Dec. 16, 2025. [Online]. Available: https://github.com/budimanjojo/talhelper/blob/master/example/talconfig.yaml

  4. Budimanjojo, "Talhelper diskSelector Reference," budimanjojo.github.io. Accessed: Dec. 16, 2025. [Online]. Available: https://budimanjojo.github.io/talhelper/latest/reference/configuration/#installdiskselector

  5. Sidero Labs, "diskSelector Configuration," talos.dev. Accessed: Dec. 16, 2025. [Online]. Available: https://docs.siderolabs.com/talos/v1.11/reference/configuration/v1alpha1/config#diskselector

  6. Sidero Labs, "Talos Releases," github.com. Accessed: Dec. 16, 2025. [Online]. Available: https://github.com/siderolabs/talos/releases

  7. Sidero Labs, "Talos Configuration Reference," talos.dev. Accessed: Dec. 16, 2025. [Online]. Available: https://www.talos.dev/latest/reference/configuration/

  8. Sidero Labs, "i915 Extension," github.com. Accessed: Dec. 16, 2025. [Online]. Available: https://github.com/siderolabs/extensions/tree/main/drm/i915

  9. Sidero Labs, "intel-ucode Extension," github.com. Accessed: Dec. 16, 2025. [Online]. Available: https://github.com/siderolabs/extensions/tree/main/firmware/intel-ucode

  10. J. Kueber, "Intel QuickSync on Talos," johanneskueber.com. Accessed: Dec. 16, 2025. [Online]. Available: https://johanneskueber.com/posts/proxmox_passthrough_talos/

  11. Sidero Labs, "Talos Extensions," github.com. Accessed: Dec. 16, 2025. [Online]. Available: https://github.com/siderolabs/extensions

  12. Sidero Labs, "Talos Image Factory," factory.talos.dev. Accessed: Dec. 16, 2025. [Online]. Available: https://factory.talos.dev/

  13. Sidero Labs, "Talos Bootstrap Guide," talos.dev. Accessed: Dec. 16, 2025. [Online]. Available: https://www.talos.dev/latest/introduction/getting-started/

  14. Kubernetes, "Kubernetes Components," kubernetes.io. Accessed: Dec. 16, 2025. [Online]. Available: https://kubernetes.io/docs/concepts/overview/components/

Previous
SOPS and Age GitOps Secrets