Running Highly Available Databases in Kubernetes — Part 1
Reality, Trade-offs, and First Principles
Introduction: The Question Teams Ask Too Late
A team migrates applications to Kubernetes. At first, everything seems simpler:
- Deployments become repeatable.
- Scaling becomes easier.
- Infrastructure feels standardized.
Then comes the inevitable question:
“Why don’t we run our databases in Kubernetes too?”
On paper, this is logical. Kubernetes handles workloads, offers declarative control, and promises self-healing. Many databases even advertise Kubernetes operators or “cloud-native” support. But when the databases in question are highly available, resilient, and production-critical, the problem is no longer deployment; it’s responsibility, correctness, and operational discipline.
This article is not about whether databases can run in Kubernetes. They can. It is about whether you should, under what conditions, and what responsibilities you inherit when you do.
What Kubernetes Is Optimized For (and What It Isn’t)
Before diving into databases, we need to be clear about Kubernetes’ design assumptions.
Kubernetes’ Core Design Goal
Kubernetes is optimized for workloads that are:
- Ephemeral compute
Pods are expected to be temporary. If a pod dies or a node fails, the system reschedules workloads automatically. For stateless applications, this is fine: traffic routes elsewhere and nothing breaks. Databases rely on continuity. A pod hosting a primary replica going down mid-transaction can interrupt in-progress writes or replication, risking inconsistent state or split-brain. - Stateless workloads
Stateless applications do not rely on in-memory state or local storage. Kubernetes schedules these freely and can replace them at will. A database’s state isn’t just files on disk, it includes in-memory caches, transaction logs, and leadership metadata. Losing these unexpectedly can cause downtime or divergence across replicas. - Replaceable instances
Pods are treated as “cattle,” not “pets.” Their identity is ephemeral, and they can be replaced anywhere. Many distributed databases require stable identities for leader election, replication assignments, and quorum. Pod replacement can confuse these mechanisms and trigger unnecessary failovers. - Fast rescheduling
Kubernetes favors speed: restart pods quickly, recover workloads rapidly. Databases need controlled recovery. Restarting a primary too quickly without replaying logs or synchronizing replicas can corrupt data or break quorum. Recovery is not about speed, it’s about correctness.
Databases Are Built on the Opposite Assumptions
Highly available databases assume:
- Durable state
Databases guarantee that committed writes survive crashes, power loss, or node failures. This relies on fsync (forcing writes to storage) and write barriers (ensuring ordering).
Challenge: Not all PersistentVolumes or CSI drivers respect these guarantees. Some cloud volumes acknowledge writes before they are physically persisted. Network storage may reorder writes or delay commits, which can silently corrupt a database. - Strict write ordering
Databases require that writes occur in sequence. For example, a bank must record a deposit before a withdrawal.
Challenge: Some network-backed volumes or misconfigured storage may reorder writes. Replicas can diverge, breaking consistency even if no pod has failed. - Predictable I/O latency
Databases rely on consistent, low-latency I/O. Latency spikes can cause timeouts, replication lag, and failovers.
Challenge: Networked or shared storage often introduces unpredictable latency. Kubernetes sees the pod as “healthy” even if the database is struggling to write data on time. - Stable identity and membership
Nodes must retain identities for leader election and consensus.
Challenge: Pods are disposable. Even StatefulSets only partially guarantee identity. A rescheduled pod may be treated as a new participant, confusing the cluster and risking data divergence. - Controlled failover
Failover requires careful coordination to avoid split-brain or lost writes.
Challenge: Kubernetes’ automatic restarts or evictions may trigger failovers at the wrong time if the database isn’t aware or properly orchestrated. - Consistency under partial failure
Databases must handle incomplete writes, replica lag, and network partitions safely.
Challenge: Kubernetes cannot enforce application-level consistency. A pod may appear healthy while the database silently diverges.
Why Teams Want Highly Available Databases in Kubernetes
Despite these challenges, teams are motivated to run databases in Kubernetes for rational reasons:
- Platform Standardization
Teams want a single deployment model, CI/CD pipeline, security surface, and operational workflow. Running databases elsewhere breaks this model. - GitOps and Declarative Control
Kubernetes enables version-controlled infrastructure, reproducible environments, and auditable changes. Manual database operations feel “out of band.” - Perceived Resilience
Kubernetes self-healing features (automatic pod restarts, node replacement, service rerouting) give the impression of database resilience.
Danger: Pod restart is not equivalent to database recovery or consistency.
Highly Available Databases: What “Resilient” Actually Means
High availability isn’t just uptime. It implies:
- Replication (synchronous or asynchronous)
- Automatic failover with quorum awareness
- Durable storage across failures
- Defined recovery point objectives (RPO) and recovery time objectives (RTO)
- Consistency under failure
A database being unavailable temporarily may not be catastrophic but inconsistent data is. Kubernetes does not differentiate between downtime and silent corruption.
The Core Challenges of Running Databases in Kubernetes
1. State Is Not Just “Data on Disk”
Databases rely on:
- Disk ordering guarantees - Writes must occur in the exact order to preserve consistency.
- fsync behavior and write barriers - Ensure committed writes are physically persisted before acknowledging them.
- Low and predictable latency - I/O must be reliable and timely for replication and consensus.
- Correct handling of partial failures - Replicas must detect incomplete writes and avoid diverging.
Problem: Kubernetes’ PersistentVolumes and CSI drivers do not always honor these guarantees. Network-backed storage may reorder writes, introduce latency spikes, or acknowledge incomplete writes.
2. Identity and Membership Matter
Distributed databases require stable identities for:
- Node IDs
- Replica IDs
- Peer membership
- Election history
Problem: Kubernetes treats pods as disposable. StatefulSets help, but:
- Pod recreation resets runtime state
- Pods may be scheduled across zones unpredictably
- Leader election may be disrupted
The database may survive but only if its internal logic is stronger than Kubernetes’ assumptions.
3. Failure Domains Do Not Align
- Kubernetes failure domains: pod failure, node failure, zone failure, control plane disruptions
- Database failure domains: disk failure, replication lag, split-brain, partial network partitions
Example: Node drain evicts a pod → database thinks primary crashed → failover begins → Kubernetes reschedules the original pod → two primaries exist briefly.
4. Restart ≠ Recovery
Kubernetes automatically restarts failed pods.
Databases require controlled recovery:
- WAL replay
- Replica catch-up
- Consistency checks
- Coordinated rejoin
Premature restarts can make a small failure catastrophic.
StatefulSets: Necessary but Not Sufficient
StatefulSets provide:
- Stable pod names
- Stable volume attachment
- Ordered startup/shutdown (partially)
What they do not provide:
- Safe failover logic
- Data correctness guarantees
- Protection from operator mistakes
- Awareness of database health beyond liveness probes
A StatefulSet is a transport mechanism, not a safety net.
Storage Is the Hidden Single Point of Failure
Critical considerations:
-
Replication below vs above the database - Double-replication may create hidden failure dependencies.
-
Snapshot correctness - Crash-consistent vs application-consistent snapshots matter.
-
Backup restore speed - HA relies on predictable restoration times.
-
Cross-zone durability - Storage may survive a node but not a zone failure.
-
Performance isolation - Shared volumes may introduce unpredictable latency.
Operational Coupling and Blast Radius
When running databases in Kubernetes:
- Cluster outages impact the database
- Control plane misconfigurations affect data services
- Cluster-wide changes (CNI, CSI, upgrades) touch the data plan
A failed
kubectl applycan become a data incident.
This is not inherently wrong, but it is a responsibility that must be consciously accepted.
A First Principle to Keep in Mind
Kubernetes does not make databases simpler.
It moves complexity from infrastructure vendors to your team.
Choosing this path means choosing ownership.
Where This Leaves Us
At this point, we understand:
- Why teams want databases in Kubernetes
- Why highly available databases are fundamentally different
- Why Kubernetes primitives alone are insufficient
- Why resilience is harder than it appears
What we haven’t yet answered:
- How to design these systems responsibly
- When running databases in Kubernetes is justified
- What patterns actually work in production
These questions are covered in Part 2.



