This article explores the realities, trade-offs, and first principles of running highly available databases in Kubernetes. It explains why Kubernetes is optimized for stateless, ephemeral workloads, the challenges databases face under these assumptions, and what operational responsibilities teams inherit when they attempt to run production-critical stateful systems on Kubernetes.

Running Highly Available Databases in Kubernetes — Part 1

Reality, Trade-offs, and First Principles

Introduction: The Question Teams Ask Too Late

A team migrates applications to Kubernetes. At first, everything seems simpler:

Deployments become repeatable.
Scaling becomes easier.
Infrastructure feels standardized.

Then comes the inevitable question:

“Why don’t we run our databases in Kubernetes too?”

On paper, this is logical. Kubernetes handles workloads, offers declarative control, and promises self-healing. Many databases even advertise Kubernetes operators or “cloud-native” support. But when the databases in question are highly available, resilient, and production-critical, the problem is no longer deployment; it’s responsibility, correctness, and operational discipline.

This article is not about whether databases can run in Kubernetes. They can. It is about whether you should, under what conditions, and what responsibilities you inherit when you do.

What Kubernetes Is Optimized For (and What It Isn’t)

Before diving into databases, we need to be clear about Kubernetes’ design assumptions.

Kubernetes’ Core Design Goal

Kubernetes is optimized for workloads that are:

Ephemeral compute
Pods are expected to be temporary. If a pod dies or a node fails, the system reschedules workloads automatically. For stateless applications, this is fine: traffic routes elsewhere and nothing breaks. Databases rely on continuity. A pod hosting a primary replica going down mid-transaction can interrupt in-progress writes or replication, risking inconsistent state or split-brain.
Stateless workloads
Stateless applications do not rely on in-memory state or local storage. Kubernetes schedules these freely and can replace them at will. A database’s state isn’t just files on disk, it includes in-memory caches, transaction logs, and leadership metadata. Losing these unexpectedly can cause downtime or divergence across replicas.
Replaceable instances
Pods are treated as “cattle,” not “pets.” Their identity is ephemeral, and they can be replaced anywhere. Many distributed databases require stable identities for leader election, replication assignments, and quorum. Pod replacement can confuse these mechanisms and trigger unnecessary failovers.
Fast rescheduling
Kubernetes favors speed: restart pods quickly, recover workloads rapidly. Databases need controlled recovery. Restarting a primary too quickly without replaying logs or synchronizing replicas can corrupt data or break quorum. Recovery is not about speed, it’s about correctness.

Databases Are Built on the Opposite Assumptions

Highly available databases assume:

Durable state
Databases guarantee that committed writes survive crashes, power loss, or node failures. This relies on fsync (forcing writes to storage) and write barriers (ensuring ordering).
Challenge: Not all PersistentVolumes or CSI drivers respect these guarantees. Some cloud volumes acknowledge writes before they are physically persisted. Network storage may reorder writes or delay commits, which can silently corrupt a database.
Strict write ordering
Databases require that writes occur in sequence. For example, a bank must record a deposit before a withdrawal.
Challenge: Some network-backed volumes or misconfigured storage may reorder writes. Replicas can diverge, breaking consistency even if no pod has failed.
Predictable I/O latency
Databases rely on consistent, low-latency I/O. Latency spikes can cause timeouts, replication lag, and failovers.
Challenge: Networked or shared storage often introduces unpredictable latency. Kubernetes sees the pod as “healthy” even if the database is struggling to write data on time.
Stable identity and membership
Nodes must retain identities for leader election and consensus.
Challenge: Pods are disposable. Even StatefulSets only partially guarantee identity. A rescheduled pod may be treated as a new participant, confusing the cluster and risking data divergence.
Controlled failover
Failover requires careful coordination to avoid split-brain or lost writes.
Challenge: Kubernetes’ automatic restarts or evictions may trigger failovers at the wrong time if the database isn’t aware or properly orchestrated.
Consistency under partial failure
Databases must handle incomplete writes, replica lag, and network partitions safely.
Challenge: Kubernetes cannot enforce application-level consistency. A pod may appear healthy while the database silently diverges.

Why Teams Want Highly Available Databases in Kubernetes

Despite these challenges, teams are motivated to run databases in Kubernetes for rational reasons:

Platform Standardization
Teams want a single deployment model, CI/CD pipeline, security surface, and operational workflow. Running databases elsewhere breaks this model.
GitOps and Declarative Control
Kubernetes enables version-controlled infrastructure, reproducible environments, and auditable changes. Manual database operations feel “out of band.”
Perceived Resilience
Kubernetes self-healing features (automatic pod restarts, node replacement, service rerouting) give the impression of database resilience.
Danger: Pod restart is not equivalent to database recovery or consistency.

Highly Available Databases: What “Resilient” Actually Means

High availability isn’t just uptime. It implies:

Replication (synchronous or asynchronous)
Automatic failover with quorum awareness
Durable storage across failures
Defined recovery point objectives (RPO) and recovery time objectives (RTO)
Consistency under failure

A database being unavailable temporarily may not be catastrophic but inconsistent data is. Kubernetes does not differentiate between downtime and silent corruption.

The Core Challenges of Running Databases in Kubernetes

1. State Is Not Just “Data on Disk”

Databases rely on:

Disk ordering guarantees - Writes must occur in the exact order to preserve consistency.
fsync behavior and write barriers - Ensure committed writes are physically persisted before acknowledging them.
Low and predictable latency - I/O must be reliable and timely for replication and consensus.
Correct handling of partial failures - Replicas must detect incomplete writes and avoid diverging.

Problem: Kubernetes’ PersistentVolumes and CSI drivers do not always honor these guarantees. Network-backed storage may reorder writes, introduce latency spikes, or acknowledge incomplete writes.

2. Identity and Membership Matter

Distributed databases require stable identities for:

Node IDs
Replica IDs
Peer membership
Election history

Problem: Kubernetes treats pods as disposable. StatefulSets help, but:

Pod recreation resets runtime state
Pods may be scheduled across zones unpredictably
Leader election may be disrupted

The database may survive but only if its internal logic is stronger than Kubernetes’ assumptions.

3. Failure Domains Do Not Align

Kubernetes failure domains: pod failure, node failure, zone failure, control plane disruptions
Database failure domains: disk failure, replication lag, split-brain, partial network partitions

Example: Node drain evicts a pod → database thinks primary crashed → failover begins → Kubernetes reschedules the original pod → two primaries exist briefly.

4. Restart ≠ Recovery

Kubernetes automatically restarts failed pods.
Databases require controlled recovery:

WAL replay
Replica catch-up
Consistency checks
Coordinated rejoin

Premature restarts can make a small failure catastrophic.

StatefulSets: Necessary but Not Sufficient

StatefulSets provide:

Stable pod names
Stable volume attachment
Ordered startup/shutdown (partially)

What they do not provide:

Safe failover logic
Data correctness guarantees
Protection from operator mistakes
Awareness of database health beyond liveness probes

A StatefulSet is a transport mechanism, not a safety net.

Storage Is the Hidden Single Point of Failure

Critical considerations:

Replication below vs above the database - Double-replication may create hidden failure dependencies.
Snapshot correctness - Crash-consistent vs application-consistent snapshots matter.
Backup restore speed - HA relies on predictable restoration times.
Cross-zone durability - Storage may survive a node but not a zone failure.
Performance isolation - Shared volumes may introduce unpredictable latency.

Operational Coupling and Blast Radius

When running databases in Kubernetes:

Cluster outages impact the database
Control plane misconfigurations affect data services
Cluster-wide changes (CNI, CSI, upgrades) touch the data plan

A failed kubectl apply can become a data incident.

This is not inherently wrong, but it is a responsibility that must be consciously accepted.

A First Principle to Keep in Mind

Kubernetes does not make databases simpler.
It moves complexity from infrastructure vendors to your team.

Choosing this path means choosing ownership.

Where This Leaves Us

At this point, we understand:

Why teams want databases in Kubernetes
Why highly available databases are fundamentally different
Why Kubernetes primitives alone are insufficient
Why resilience is harder than it appears

What we haven’t yet answered:

How to design these systems responsibly
When running databases in Kubernetes is justified
What patterns actually work in production

These questions are covered in Part 2.

Running Highly Available Databases in Kubernetes — Part 1

Running Highly Available Databases in Kubernetes — Part 1

Reality, Trade-offs, and First Principles

Introduction: The Question Teams Ask Too Late

What Kubernetes Is Optimized For (and What It Isn’t)

Kubernetes’ Core Design Goal

Databases Are Built on the Opposite Assumptions

Why Teams Want Highly Available Databases in Kubernetes

Highly Available Databases: What “Resilient” Actually Means

The Core Challenges of Running Databases in Kubernetes

1. State Is Not Just “Data on Disk”

2. Identity and Membership Matter

3. Failure Domains Do Not Align

4. Restart ≠ Recovery

StatefulSets: Necessary but Not Sufficient

Storage Is the Hidden Single Point of Failure

Operational Coupling and Blast Radius

A First Principle to Keep in Mind

Where This Leaves Us

About David Essien

Related Articles

Running Highly Available Databases in Kubernetes — Part 2

Understanding Amazon EKS Capabilities: Managed Building Blocks for Kubernetes Platforms

Understanding the Argo CD Architecture