2026-04-10
Why I’m Getting the CKA
I run a K3s cluster that hosts 14 production services across 18 namespaces—Gitea, mail, monitoring, DNS, media, analytics, and more. It serves real traffic, handles real failures, and runs on infrastructure I built and maintain myself. I am not studying Kubernetes from scratch. I am formalizing knowledge I already use every day. This is why the CKA matters to me, and why I think Kubernetes itself is one of the most important technologies to understand right now.
What Kubernetes Actually Is
Kubernetes is a container orchestration system. That description is accurate and almost completely unhelpful for understanding why it matters. Here is a better one: Kubernetes is a declarative system for describing what your infrastructure should look like, and a control loop that continuously reconciles reality with that description.
You write a YAML manifest that says “I want three replicas of this application, exposed on port 443, with 512MB of memory, pulling this container image, with this environment variable pointing to that database.” You apply it. Kubernetes makes it happen. If a node dies, Kubernetes reschedules the pods elsewhere. If a deployment fails health checks, Kubernetes rolls it back. If you change the manifest, Kubernetes computes the diff and applies it with zero downtime.
This is not magic. It is engineering—a decade of distributed systems research from Google, packaged into an open-source project that is now the standard way to run production workloads at every scale from a single-node homelab to the largest cloud deployments on earth.
Why Kubernetes Is Interesting
Most tools solve one problem. Kubernetes solves a category of problems, and it does so by providing primitives that compose rather than features that prescribe.
Pods are the unit of deployment—one or more containers that share a network namespace. Services provide stable networking in front of ephemeral pods. Deployments manage rollouts and rollbacks. ConfigMaps and Secrets separate configuration from code. Ingress routes external traffic. PersistentVolumeClaims abstract storage. Namespaces provide isolation. RBAC controls access. CronJobs schedule work.
Each of these is simple. The power comes from composition. A production deployment is a Deployment with resource limits, a Service, an Ingress rule, a PVC for persistent data, a ConfigMap for configuration, a Secret for credentials, and maybe a NetworkPolicy to restrict traffic. All of it described in version-controlled YAML, applied with a single command, and continuously enforced by the control plane.
Compare this to the alternative: SSH into a server, install packages, edit configuration files, start services with systemd, configure a reverse proxy, set up cron jobs, manage disk mounts, and hope that the next person who touches the server (or the next reboot) does not break something. I have done this. I still do it on some servers. Kubernetes is better.
What Running a Real Cluster Teaches You
Tutorials teach you to deploy nginx. Production teaches you everything else.
I have learned things from running my cluster that no course covers. When I consolidated six separate PostgreSQL instances into one shared deployment, I had to understand PVC migration, database connection strings across namespaces, and how to keep the old deployments at replicas=0 for rollback. When I replaced Prometheus with VictoriaMetrics to save 230MB of RAM, I had to verify PromQL compatibility, migrate Grafana datasources, and keep the old stack on standby. When I removed Longhorn (a distributed storage system that was using 2.5GB of RAM and 28 pods for no measurable benefit on same-datacenter nodes), I had to migrate every PVC to local-path storage without downtime.
These are the decisions that matter in production: resource optimization, migration strategy, rollback planning, understanding the tradeoffs between complexity and reliability. The CKA exam tests this kind of reasoning—not just “can you create a pod,” but “can you troubleshoot a broken service, upgrade a cluster, back up etcd, and configure RBAC under time pressure?”
Why the CKA Specifically
The CKA is a hands-on, performance-based exam. You get a terminal, a set of Kubernetes clusters, and a list of tasks. No multiple choice. You either fix the broken deployment or you do not. You either configure the network policy correctly or traffic leaks. The exam is two hours of real work on real clusters.
This format respects practitioners. It cannot be passed by memorizing documentation—you need muscle memory with kubectl, an intuition for where things break, and the ability to read error messages and reason about distributed system state under pressure. These are exactly the skills that matter in production.
The exam covers five domains:
| Domain | Weight |
|---|---|
| Cluster Architecture, Installation & Configuration | 25% |
| Workloads & Scheduling | 15% |
| Services & Networking | 20% |
| Storage | 10% |
| Troubleshooting | 30% |
Troubleshooting is 30%—nearly a third of the exam. This is the right emphasis. Anyone can create resources from documentation. Diagnosing why a service is unreachable, why a pod is stuck in CrashLoopBackOff, why a node is NotReady—that requires understanding how the pieces fit together. I spend more time troubleshooting my own cluster than deploying new things, and that ratio matches reality.
Why Now
Kubernetes is not going away. The CNCF ecosystem is the foundation of modern infrastructure, and Kubernetes is its centre. Every major cloud provider offers managed Kubernetes. Most companies running containerized workloads are running them on Kubernetes or something built on top of it. The job market reflects this: DevOps, SRE, and platform engineering roles increasingly list Kubernetes and CKA as requirements or strong preferences.
I already run production Kubernetes. The CKA formalizes that into a credential that employers recognize. More importantly, the training fills gaps—I have not done a kubeadm cluster upgrade from scratch, I have not practiced etcd backup and restore on multi-node clusters, I have not worked through the full RBAC permission model systematically. The certification process forces me to close these gaps rather than working around them.
For anyone considering the CKA: if you are already running containers in production, the certification codifies what you know and fills in what you have been avoiding. If you are new to infrastructure, Kubernetes is the single most valuable technology to learn. Either way, the hands-on exam format means the certification actually means something—you cannot fake your way through a live terminal.