A Year of Homelabbing
This is a year of building, breaking, and rebuilding my homelab.
Before the homelab
I never liked minikube. A potentially bold statement to make, but something about it felt too abstracted.
I remember installing it, running minikube addons enable ingress, and honestly feeling like something was off. What's actually happening here? What can I mess around with? (For local Kubernetes now, Rancher Desktop is a much better starting point imho.)
So I went straight to kubeadm. First on my Mac, then when I got dedicated hardware. Bash scripts that SSH'd in and ran kubeadm commands. Not elegant, but it taught me what actually happens when you bootstrap a cluster: certificates, etcd, kubelet config etc.
Eventually went HA with HAProxy and Keepalived for a floating VIP.
The Ansible detour
At some point I tried Ansible. Wrote playbooks for HAProxy and kubeadm setup.
It lasted maybe two weeks.
Ansible is good at what it does, but in my case it felt cranky / abstracted for something that I figured a makefile could do and was less convoluted. So I went back to Makefiles.
Multiple clusters
The old repo had three cluster approaches running simultaneously:
- Atlas: The kubeadm cluster, my original setup
- Prism: A K3s cluster, the idea being it hosts always available components incase I shut down some machines in the other cluster.
- Talos: Which I later migrated to and is the currently active
Each had its own directory, its own tooling, its own domain (*.atlas.home.mrdvince.me, *.prism.home.mrdvince.me, *.talos.home.mrdvince.me).
This however did create maintenance overhead, and well to be honest, one doesnt really need 3 clusters in a homelab.
Networking
Networking ended up being the most stable part once it was set up right.
Got a managed switch and OPNsense as the router. Setup VLANs for segmentation: proxmox plus other non-k8s vms on one, storage on another, cluster traffic on a third and home devices on a fourth.
With firewall rules configured, CrowdSec added for intrusion detection the main config was mostly done. Now I just add a new VLAN when I need one among other operational configs.
Tailscale ties it together for access from anywhere.
On the Kubernetes side, started with Cilium and MetalLB. Eventually dropped MetalLB and let Cilium handle LoadBalancer IPs directly. Dropped kube-proxy too, letting Cilium do everything with eBPF.
Storage: the backbone
TrueNAS Scale became the foundation. Started with a USB controller passthrough setup (Day 13-15 in the blog) which was more involved than expected. ZFS with RAIDZ1, eventually extended with VDEV extension (Day 25).
This worked very well for a very long time before I switched to a ugreen NAS and also installed truenas scale on it and have since switched to RAIDZ2.
The storage architecture went through iterations:
- Local storage only (early days)
- NFS mounts from TrueNAS
- Longhorn for distributed block storage
- MinIO for S3-compatible object storage
- RustFS replaced MinIO (current)
Longhorn on Talos deserves its own mention. Talos is immutable, which means you can't just install packages. Getting Longhorn to work required Talos extensions (iscsi-tools, util-linux-tools) and kubelet mount patches.
The storage layer now handles: Terraform state, database backups via CloudNativePG, GitLab artifacts, container registry storage, and anything else that needs persistence.
GitOps
ArgoCD with the app-of-apps pattern was and still is the deployment model.
The pattern: push to git, ArgoCD syncs, applications deploy.
For secrets, played with sealed-secrets briefly but settled on SOPS with age encryption. It works with the Helmfile plugin to decrypt when ArgoCD applies.
Day 26 and Day 30 in the blog cover the secrets journey in detail.
Talos won
By Day 32, this was after I came back from Kubecon London, Talos had been on my list of things to try out for a month before I went to the conference and decided to give it a go. I set up Talos in HA mode and experimented with it for a while.
I started figuring out extensions, and got Tailscale running on the Talos nodes too. One thing i however needed and was the reason to install Tailscale on the node was join an instance running on the cloud to my cluster, and so I set out to figure out how to join a non-Talos nodes to the cluster. The goal being to try out Kueue but never really got to it.
Talos is opinionated in ways that initially frustrated me. No SSH, and the config through an API for a machine was new. But those constraints made sense once I wrapped my head around it. The cluster is reproducible.
I wrote a Talos module that handles cluster setup and upgrades. The entire setup is now managed by Terragrunt. I can destroy and recreate the cluster and know exactly what I'll get.
The current setup runs Talos v1.12. Control plane on one Proxmox node (avalon), workers on another (elysium). One cluster. Maybe a second for testing.
What actually changed
The main changes over the whole span was going from having multiple active clusters and using makefiles to a single active cluster. Currently in progress of migrating all the apps from the old Talos cluster to the new config.
Tooling shifted from a combination of Makefiles, shell scripts and Terrgrunt to just Terragrunt. Charts moved from ChartMuseum to GitLab's package registry. Container images now sync to a private registry via GitLab CI.
App structure is mostly the same, just switched to ApplicationSets in ArgoCD for discovery.
The rebuild
I'm rebuilding the homelab again now. Not because something broke, but because a second pass lets me incorporate everything a year ago me didn't know. Plus it's homelab'ing after all.
Now I know which apps I actually use, which monitoring metrics matter, which complexity was necessary. Goal this time: two clusters max. One main, one playground.
Current state
The stack as it stands:
- Infrastructure: Proxmox VE, Terragrunt/OpenTofu
- Kubernetes: Talos, Cilium CNI, Traefik ingress
- GitOps: ArgoCD with Helmfile plugin
- Observability: Prometheus, Grafana, Loki, Tempo, Alloy, Pyroscope
- Storage: Longhorn (block), CloudNativePG (Postgres), RustFS (S3), NFS (csi-driver-nfs)
- Auth: Authentik with OIDC for everything
- Secrets: SOPS with age
Apps are still being migrated over from the old cluster. Access is behind Tailscale.
I also decided to make the repo public: github.com/mrdvince/homelab