Easy Kubernetes

Beyond `k3s`: Why I Deployed My Kubernetes Cluster with Talos Linux

The core tenet of DevOps is treating infrastructure as code. While building a Kubernetes cluster with standard tooling is achievable, the process invariably leads to the greatest pain point for any SysAdmin: configuration drift.

💡 Key Takeaways

State vs. Artifacts: Stop managing mutable OS state; start managing versioned infrastructure artifacts.

Immunity: Learn why “SSH-less” OS architecture is the gold standard for security and stability.

Automation: See how to combine OpenTofu (Terraform) and Talos to provision full clusters atomically.

Traditional cluster bootstrapping is a multi-step, imperative process executed across disparate tools and at different times. Consider the typical flow: Ansible sets up k3s on a base Debian VM, followed by manual adjustments for networking policies. Each action is a write operation on a living, breathing machine, exponentially increasing the surface area for human error—which, in a DevOps context, is failure itself.

I was tired of managing state. I needed to manage artifacts.

This pivotal realization led me to Talos Linux. For anyone skeptical about replacing the underlying OS layer for Kubernetes, I challenge you to deeply follow the principles of immutable infrastructure, and you will find that Talos is the logical endpoint for achieving a truly resilient cluster.

🧱 The Immutable Advantage: Why Talos Changes the Game

Think of a traditional OS as a pet: you SSH into it, nurse it back to health, and patch its specific ailments.

Talos Linux is cattle. It isn’t designed to be “cared for”—it is designed to be replaced. If an OS node has a configuration issue, you don’t “fix” it; you destroy it and spin up a new, pristine version from your verified image. This eliminates the “configuration drift” graveyard where many production clusters go to die.

This immediately solves operational nightmares related to dependency conflicts, forgotten cleanup scripts, or the accidental execution of a single, detrimental apt upgrade command on a critical node. The node is never left in a half-updated, half-configured purgatory.

🚀 Building the Foundation with OpenTofu

The goal was to provision a multi-node cluster where the control plane nodes (API, Scheduler, Controller) and the worker nodes were all provisioned atomically from a single source of truth. OpenTofu was the natural fit for this orchestration layer.

Since Talos manages the entire node lifecycle, our OpenTofu workflow is streamlined: we define the desired state of the operating system, and Talos handles the complex operational choreography of joining the node to the cluster.

Here is a conceptual view of how we defined the cluster provisioning using OpenTofu.

Prerequisites

Before we begin, ensure you have:

A functioning Proxmox cluster.
A Talos Linux cloud-init template uploaded to Proxmox.
OpenTofu installed locally.

Provider Setup

I am running 3 Proxmox servers to host my VMs, complemented by a dedicated Talos Template VM. Following best development practices, everything that defines the environment—IPs, counts, names, etc.—should be versioned as variables.

You can set the following environment variables in your local shell (or Gitlab CI environment) to authenticate the providers: OpenTofu Docs

export TF_VAR_gitlab_api_url=<Gitlab's API Url IE: https://gitlab.com/api/v4/>
export TF_VAR_gitlab_token=<Gitlab Access Token that has API access>
export TF_VAR_pm_api_url=<URL to your Proxmox Server>
export TF_VAR_pm_password=<Password for your proxmox user> 
export TF_VAR_pm_user=<User for proxmox IE root@pam>
export TF_VAR_unifi_api_url=<Unifi API URL>
export TF_VAR_unifi_api_key=<Unifi API key>

OpenTofu {
  required_providers {
    proxmox = {
      source  = "Telmate/proxmox"
      version = "3.0.2-rc07"
    }
    talos = {
      source  = "siderolabs/talos"
      version = "0.11.0"
    }
    gitlab = {
      source  = "gitlabhq/gitlab"
      version = "19.0"
    }
    unifi = {
      source  = "ubiquiti-community/unifi"
      version = "0.41.25"
    }
  }
}

locals {
  gitlab = {
    iacRepoId = "7"
  }
  template  = "tails-Template"
  format    = "raw"
  dnsserver = "192.168.${local.vlan}.1"
  gateway   = "192.168.${local.vlan}.1"
  vlan      = 10
  control = {
    tags  = "control_dev"
    count = 3
    name = [
      "control01-dev",
      "control02-dev",
      "control03-dev"
    ]
    cores   = 2
    memory  = "4096"
    # size in GB
    drive   = 20
    storage = "cache-domains"
    node = [
      "mothership",
      "overlord",
      "vanguard"
    ]
    vmid = [
      "${local.vlan}11",
      "${local.vlan}12",
      "${local.vlan}13"
    ]
    ip = [
      "192.168.${local.vlan}.11",
      "192.168.${local.vlan}.12",
      "192.168.${local.vlan}.13"
    ]
  }
  worker = {
    tags  = "worker_dev"
    count = 3
    name = [
      "worker01-dev",
      "worker02-dev",
      "worker03-dev"
    ]
    cores   = 4
    memory  = "8192"
    # size in GB
    drive   = 120
    storage = "cache-domains"
    node = [
      "mothership",
      "overlord",
      "vanguard"
    ]
    vmid = [
      "${local.vlan}21",
      "${local.vlan}22",
      "${local.vlan}23"
    ]
    ip = [
      "192.168.${local.vlan}.21",
      "192.168.${local.vlan}.22",
      "192.168.${local.vlan}.23"
    ]
  }
  talos = {
    cluster_name = "dev"
    cluster_dns  = "kube.dev.durp.loc"
  }
}

1. Getting the DNS Setup

Because I am using Unifi as my DNS Server I can use OpenTofu to create all the required A records. This allows me to address my control plane nodes via a static endpoint regardless of which physical host they reside on. This will also create a DNS entry for every server so you don’t need to use the IP to talk to the server. Talos Docs

provider "unifi" {
  api_url        = var.unifi_api_url
  api_key        = var.unifi_api_key
  allow_insecure = true
}

variable "unifi_api_url" {
  description = "api key for unifi"
  type        = string
}

variable "unifi_api_key" {
  description = "api key for unifi"
  type        = string
}

resource "unifi_dns_record" "control_records" {
  count       = local.control.count
  name        = "${local.control.name[count.index]}.durp.loc"
  enabled     = true
  record_type = "A"
  ttl         = 300
  value       = local.control.ip[count.index]
}

resource "unifi_dns_record" "worker_records" {
  count       = local.worker.count
  name        = "${local.worker.name[count.index]}.durp.loc"
  record_type = "A"
  value       = local.worker.ip[count.index]
  enabled     = true
  ttl         = 300
}

resource "unifi_dns_record" "cluster_endpoint" {
  count       = local.control.count
  name        = local.talos.cluster_dns
  record_type = "A"
  value       = local.control.ip[count.index]
  enabled     = true
  ttl         = 300
}

2. Getting the VMs Setup

This step uses the Proxmox provider to clone your template and inject networking via cloud-init.

ℹ️ Why no SSH? Unlike traditional Linux, Talos has no SSH. All management is performed via the Talos API (gRPC). OpenTofu interacts with this API to deliver machine configurations.

provider "proxmox" {
  pm_parallel                 = 1
  pm_tls_insecure             = true
  pm_api_url                  = var.pm_api_url
  pm_user                     = var.pm_user
  pm_password                 = var.pm_password
  pm_debug                    = false
  pm_minimum_permission_check = false
}

variable "pm_api_url" {
  description = "API URL to Proxmox provider"
  type        = string
}

variable "pm_password" {
  description = "Passowrd to Proxmox provider"
  type        = string
}

variable "pm_user" {
  description = "UIsername to Proxmox provider"
  type        = string
}

resource "proxmox_vm_qemu" "control" {
  count       = local.control.count
  ciuser      = "administrator"
  description = "Managed by OpenTofu"
  vmid        = local.control.vmid[count.index]
  name        = local.control.name[count.index]
  target_node = local.control.node[count.index]
  clone       = local.template
  tags        = local.control.tags
  qemu_os     = "l26"
  full_clone  = true
  os_type     = "cloud-init"
  agent       = 1
  cpu {
    cores = local.control.cores
    type  = "x86-64-v2-AES"
  }
  memory             = local.control.memory
  scsihw             = "virtio-scsi-pci"
  boot               = "order=scsi0"
  start_at_node_boot = true
  startup_shutdown {
    order            = -1
    shutdown_timeout = -1
    startup_delay    = -1
  }
  vga {
    type = "serial0"
  }
  serial {
    id   = 0
    type = "socket"
  }
  disks {
    ide {
      ide2 {
        cloudinit {
          storage = local.control.storage
        }
      }
    }
    scsi {
      scsi0 {
        disk {
          size    = local.control.drive
          format  = local.format
          storage = local.control.storage
        }
      }
    }
  }
  network {
    id     = 0
    model  = "virtio"
    bridge = "vmbr0"
    tag    = local.vlan
  }

  #Cloud Init Settings
  ipconfig0    = "ip=${local.control.ip[count.index]}/24,gw=${local.gateway}"
  searchdomain = "durp.loc"
  nameserver   = local.dnsserver
}

resource "proxmox_vm_qemu" "worker" {
  count       = local.worker.count
  ciuser      = "administrator"
  description = "Managed by OpenTofu"
  vmid        = local.worker.vmid[count.index]
  name        = local.worker.name[count.index]
  target_node = local.worker.node[count.index]
  clone       = local.template
  tags        = local.worker.tags
  qemu_os     = "l26"
  full_clone  = true
  os_type     = "cloud-init"
  agent       = 1
  cpu {
    cores = local.worker.cores
    type  = "x86-64-v2-AES"
  }
  memory             = local.worker.memory
  scsihw             = "virtio-scsi-pci"
  boot               = "order=scsi0"
  start_at_node_boot = true
  startup_shutdown {
    order            = -1
    shutdown_timeout = -1
    startup_delay    = -1
  }
  vga {
    type = "serial0"
  }
  serial {
    id   = 0
    type = "socket"
  }
  disks {
    ide {
      ide2 {
        cloudinit {
          storage = local.worker.storage
        }
      }
    }
    scsi {
      scsi0 {
        disk {
          size    = local.worker.drive
          format  = local.format
          storage = local.worker.storage
        }
      }
    }
  }
  network {
    id     = 0
    model  = "virtio"
    bridge = "vmbr0"
    tag    = local.vlan
  }

  #Cloud Init Settings
  ipconfig0    = "ip=${local.worker.ip[count.index]}/24,gw=${local.gateway}"
  searchdomain = "durp.loc"
  nameserver   = local.dnsserver
}

3. Control Plane Nodes

Now that the VMs exist, we define the Talos-specific configuration. Here, we apply a taint to the control plane to ensure application pods remain scheduled on worker nodes only.

resource "talos_machine_secrets" "machine_secrets" {}

data "talos_client_configuration" "talosconfig" {
  cluster_name         = local.talos.cluster_name
  client_configuration = talos_machine_secrets.machine_secrets.client_configuration
  endpoints            = local.control.ip
}

data "talos_machine_configuration" "machineconfig_cp" {
  cluster_name     = local.talos.cluster_name
  cluster_endpoint = "https://${local.talos.cluster_dns}:6443"
  machine_type     = "controlplane"
  machine_secrets  = talos_machine_secrets.machine_secrets.machine_secrets

  config_patches = [
    yamlencode({
      cluster = {
        allowSchedulingOnControlPlanes = false
      }
    })
  ]
}

resource "talos_machine_configuration_apply" "cp_config_apply" {
  depends_on                  = [proxmox_vm_qemu.control]
  client_configuration        = talos_machine_secrets.machine_secrets.client_configuration
  machine_configuration_input = data.talos_machine_configuration.machineconfig_cp.machine_configuration
  count                       = local.control.count
  node                        = local.control.ip[count.index]
}

4. Worker Nodes

Worker nodes follow a similar pattern but use the worker machine type, which does not require the cluster bootstrapping logic reserved for control planes.

data "talos_machine_configuration" "machineconfig_worker" {
  cluster_name     = local.talos.cluster_name
  cluster_endpoint = "https://${local.talos.cluster_dns}:6443"
  machine_type     = "worker"
  machine_secrets  = talos_machine_secrets.machine_secrets.machine_secrets
}

resource "talos_machine_configuration_apply" "worker_config_apply" {
  depends_on                  = [proxmox_vm_qemu.worker]
  client_configuration        = talos_machine_secrets.machine_secrets.client_configuration
  machine_configuration_input = data.talos_machine_configuration.machineconfig_worker.machine_configuration
  count                       = local.worker.count
  node                        = local.worker.ip[count.index]
}

5. bootstrapping the cluster

With the infrastructure and configurations applied, the final step is to signal the Talos nodes to initialize the etcd/Kubernetes control plane.

resource "talos_machine_bootstrap" "bootstrap" {
  depends_on           = [talos_machine_configuration_apply.cp_config_apply]
  client_configuration = talos_machine_secrets.machine_secrets.client_configuration
  node                 = local.control.ip[0]
}

data "talos_cluster_health" "health" {
  depends_on           = [talos_machine_configuration_apply.cp_config_apply, talos_machine_configuration_apply.worker_config_apply]
  client_configuration = data.talos_client_configuration.talosconfig.client_configuration
  control_plane_nodes  = local.control.ip
  worker_nodes         = local.worker.ip
  endpoints            = data.talos_client_configuration.talosconfig.endpoints
}

resource "talos_cluster_kubeconfig" "kubeconfig" {
  depends_on           = [talos_machine_bootstrap.bootstrap, data.talos_cluster_health.health]
  client_configuration = talos_machine_secrets.machine_secrets.client_configuration
  node                 = local.control.ip[0]
}

output "talosconfig" {
  value     = data.talos_client_configuration.talosconfig.talos_config
  sensitive = true
}

output "kubeconfig" {
  value     = resource.talos_cluster_kubeconfig.kubeconfig.kubeconfig_raw
  sensitive = true
}

6. Gitlab Variables (Optional)

Because I want to develop additional automation tooling I will store the configs inside of gitlab as a variable for use in other pipelines. That can be done with the following

provider "gitlab" {
  token    = var.gitlab_token
  base_url = var.gitlab_api_url
}

variable "gitlab_api_url" {
  description = "Gitlab API Url"
  type        = string
}

variable "gitlab_token" {
  description = "Gitlab Token"
  type        = string
}

resource "gitlab_project_variable" "talosconfig" {
  project       = local.gitlab.iacRepoId
  key           = "${local.talos.cluster_name}_TALOSCONFIG"
  value         = data.talos_client_configuration.talosconfig.talos_config
  variable_type = "file"
}

resource "gitlab_project_variable" "kubeconfig" {
  project       = local.gitlab.iacRepoId
  key           = "${local.talos.cluster_name}_KUBECONFIG"
  value         = resource.talos_cluster_kubeconfig.kubeconfig.kubeconfig_raw
  variable_type = "file"
}

7. Putting it together

You can place all the code above into a single main.tf file or organize it into modules. To provision your cluster manually, run the following commands:

# Initialize providers and modules
tofu init 

# Preview the infrastructure changes
tofu plan

# Deploy the infrastructure and Talos configuration
tofu apply

Accessing Your Cluster

Once tofu apply finishes, your talosconfig and kubeconfig are stored as sensitive outputs. You can retrieve and use them immediately:

# Extract the kubeconfig to your local machine
tofu output -raw kubeconfig > kubeconfig.yaml

# Test connectivity to the cluster
kubectl --kubeconfig=kubeconfig.yaml get nodes

💡 Configuration Management Since Talos is managed via API, keep your talosconfig secure. I recommend using the Talos CLI to perform deep-level health checks using talosctl dashboard or talosctl health.

✨ Final Thoughts: Thinking Architecturally, Not Operationally

If you are treating your cluster as a collection of mutable servers you must manually patch, you are architecting for complexity.

By moving the infrastructure definition into OpenTofu and standardizing on the immutable, API-driven nature of Talos Linux, we shift the burden from “managing OS state” to “orchestrating artifacts.” This workflow ensures that your Proxmox-hosted nodes are not just containers for Kubernetes, but predictable, version-controlled components of your infrastructure stack.

For those looking to achieve true cluster resilience, adopting this declarative approach is the logical next step—turning your entire cluster into a single, verifiable, and entirely disposable deployment artifact.

Easy Kubernetes

Beyond `k3s`: Why I Deployed My Kubernetes Cluster with Talos Linux

🧱 The Immutable Advantage: Why Talos Changes the Game

🚀 Building the Foundation with OpenTofu

Prerequisites

Provider Setup

1. Getting the DNS Setup

2. Getting the VMs Setup

3. Control Plane Nodes

4. Worker Nodes

5. bootstrapping the cluster

6. Gitlab Variables (Optional)

7. Putting it together

Accessing Your Cluster

✨ Final Thoughts: Thinking Architecturally, Not Operationally

Homelab Weekly: dmz-talos …

dmz-talos Environment Ships

Homelab Weekly Roundup: …

GitOps: DMZ InternalProxy Updated

Homelab Weekly: …

🚀 Starting Fresh with durpdeploy

Easy Kubernetes

Beyond k3s: Why I Deployed My Kubernetes Cluster with Talos Linux

🧱 The Immutable Advantage: Why Talos Changes the Game

🚀 Building the Foundation with OpenTofu

Prerequisites

Provider Setup

1. Getting the DNS Setup

2. Getting the VMs Setup

3. Control Plane Nodes

4. Worker Nodes

5. bootstrapping the cluster

6. Gitlab Variables (Optional)

7. Putting it together

Accessing Your Cluster

✨ Final Thoughts: Thinking Architecturally, Not Operationally

Beyond `k3s`: Why I Deployed My Kubernetes Cluster with Talos Linux