A deep technical exploration of how Kubernetes relies on core Linux primitives—namespaces, cgroups, virtual networking, and filesystem mounts—to orchestrate containers at scale. This article explains how Pods become Linux process groups, how kubelet and container runtimes build isolation boundaries, and how networking and storage map directly to kernel mechanisms.

Mapping Kubernetes Components to Linux Primitives (Updated)

How Kubernetes composes Linux namespaces, cgroups, netfilter/eBPF, and mounts to orchestrate containers

Introduction

Kubernetes orchestrates containers by composing established Linux primitives: namespaces for isolation, cgroups for resource control, virtual networking for Pod connectivity, and mounts for storage. Every Pod, Service, and Volume ultimately becomes Linux processes with specific isolation and kernel-enforced limits. Understanding these mappings clarifies scheduling, performance, and debugging behaviors in real clusters.

1) Pods → Namespaces + cgroups + virtual networking

A Pod is a group of Linux processes created with isolated namespaces and governed by cgroup limits. The kubelet instructs the container runtime (containerd/CRI-O), which uses clone()/unshare() to configure namespaces and mounts.

1.1 Linux namespaces used by Pods

Namespace	Purpose	Kubernetes Usage
`mnt`	Filesystem isolation	Each container sees its own rootfs and mount table
`pid`	Process IDs	Containers cannot see/kill processes outside their PID namespace
`net`	Network stack	Pod gets its own netns with an `eth0`, routes, ARP table
`ipc`	SysV IPC, POSIX msg queues	Isolates IPC unless explicitly shared
`uts`	Hostname/domain	Pod has its own hostname; can differ from the node
`user`	UID/GID mapping	Used for rootless/user-namespace scenarios when enabled

Runtime behavior: Low-level runtimes (e.g., runc, crun) perform namespace creation and apply seccomp/AppArmor/SELinux if configured.

1.2 cgroups (v1 and v2) and QoS

Kubernetes QoS classes map to cgroup controls:

Guaranteed → hard CPU/memory limits assigned per container
Burstable → shares/limits applied; can burst within node capacity
BestEffort → no explicit requests/limits; lowest priority under pressure

Paths vary by cgroup mode and init system:

cgroup v1 uses hierarchical paths like /sys/fs/cgroup/<controller>/kubepods/...
cgroup v2 (unified) with systemd uses slice-based layout (e.g., kubepods.slice)

Important: As of 2026, cgroup v2 is default on modern distributions (Flatcar, Fedora, recent Ubuntu), but many production clusters still run v1 or mixed configurations. The systemd cgroup driver is recommended for v2 and becomes the default in Kubernetes 1.35+ with automatic detection via KubeletCgroupDriverFromCRI feature gate .

1.3 Pod networking: veth pairs + bridge + netfilter/eBPF

CNI plugins attach the Pod netns to the host network:

Create a veth pair: one end in the Pod (eth0), one in the host
Connect host end to a Linux bridge (commonly cni0) or switch fabric
Assign Pod IP, routes, and apply iptables/nftables/eBPF policies

Service traffic:

kube-proxy iptables mode installs chains like KUBE-SERVICES and DNATs cluster IPs to endpoints
IPVS mode uses the kernel’s IPVS load-balancer for scale
eBPF dataplanes (Cilium, Calico eBPF) implement kube-proxy replacement and load-balancing in-kernel without iptables

2) Deployments & ReplicaSets → Controllers for Pods

Deployments and ReplicaSets do not directly create Linux primitives. They reconcile desired state (count, rollout strategy) into Pods, which are realized as Linux processes with namespaces and cgroups by kubelet + runtime.

3) Kubelet → systemd service + process supervisor

The kubelet typically runs as a systemd unit and is responsible for:

Creating Pod sandboxes and containers via the CRI
Applying cgroup limits and mounts
Calling CNI to set up netns/veth/routes
Managing probes, restarts, and garbage collection

On Linux hosts you will see unit files like kubelet.service and kubelet using /var/lib/kubelet/ to stage pod resources and mounts.

4) Container runtimes → OCI + kernel syscalls

Runtimes (containerd, CRI-O) delegate to low-level OCI runtimes (runc, crun, kata-runtime) to:

clone()/unshare() → create namespaces
write to cgroup fs → apply CPU/memory/IO limits
mount syscalls → build container rootfs
apply seccomp/AppArmor/SELinux profiles for syscall and access control

A container is, fundamentally, a Linux process tree with specific namespaces and cgroup constraints.

Runtime	Type	Notes
`containerd`	CRI implementation	Built-in CRI plugin, default in most distros
`CRI-O`	CRI implementation	Lightweight, OpenShift default
`runc`	OCI runtime	Reference implementation, written in Go
`crun`	OCI runtime	Written in C, faster startup, better cgroup v2 support
`kata-runtime`	OCI runtime	Lightweight VM-based isolation

5) Services & kube-proxy → iptables/IPVS/eBPF

Mode	Implementation	Use Case
iptables	`KUBE-SERVICES`, `KUBE-NODEPORTS` chains, DNAT to endpoints	Small clusters, legacy compatibility
IPVS	Kernel IPVS load-balancer	Large clusters, better performance than iptables
eBPF	TC/XDP hooks, eBPF maps (Cilium, Calico)	Production scale, ~80% adoption in Cilium deployments

Note: eBPF-based kube-proxy replacement eliminates conntrack overhead and provides O(1) service lookups. Cilium 1.17/1.18 (2025) introduced 40% reduction in policy latency and 43% reduction in agent CPU usage during high churn .

6) Volumes → Linux mounts (bind, tmpfs, network, block)

Kubernetes volumes rely on standard Linux mount mechanics:

emptyDir: by default backed by the node’s filesystem; set medium: Memory to make it tmpfs (RAM-backed)
hostPath: a bind mount of a host directory into the container’s mount namespace
Network volumes: NFS/CIFS/CSI drivers leverage kernel clients/device mappers
Block volumes: raw block devices or filesystems mounted by kubelet/CSI

You can observe paths like:

/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~empty-dir/<name>/

7) ConfigMaps & Secrets → tmpfs + bind mounts (projected volumes)

ConfigMaps and Secrets are materialized as files on the node (often tmpfs for secrets), then bind-mounted into the container. Secrets typically have 0400 permissions and may be encrypted at rest by the API server’s KMS provider. Projected service account tokens auto-rotate.

8) Service Accounts → token files in the container filesystem

Service accounts are presented via projected volumes at paths like:

/var/run/secrets/kubernetes.io/serviceaccount/token

These are ordinary files from the container’s perspective, supplied by tmpfs + bind-mount mechanisms.

9) Scheduler → resource-aware placement (Linux topology)

The scheduler does not create namespaces/cgroups. It makes placement choices using node capacity, topology (NUMA, CPU), taints/tolerations, and resource requests/limits. The kubelet then realizes Pods via Linux primitives.

10) Node components → kernel-level integration

Component	Linux primitive(s) used
kubelet	systemd service; mount syscalls; cgroups; interacts with CRI
container runtime	namespaces, cgroups, seccomp/AppArmor/SELinux
CNI plugins	netns, veth, bridges/routes, iptables/nftables, or eBPF
kube-proxy	iptables chains, IPVS, or replaced by eBPF

Hands-On: Inspecting these mappings (quick cheats)

These commands are for debugging/education on a Linux node.

List namespaces for a container process

sudo lsns -p <pid>

Enter a Pod’s netns (host)

# Identify container netns
sudo nsenter -t <pid> -n ip a

Inspect kube-proxy chains

sudo iptables -t nat -S | grep KUBE-

See cgroup placement (v1/v2 vary)

cat /proc/<pid>/cgroup

CNI bridge & veths

ip link show | grep cni0 -n
ip route

eBPF dataplane presence (if using Cilium, etc.)

sudo bpftool net
sudo bpftool map dump  # View service endpoints in eBPF maps

Verify cgroup version

stat -fc %T /sys/fs/cgroup/
# cgroup2fs = v2, tmpfs = v1

Conclusion

Kubernetes layers orchestration logic on top of Linux’s namespaces, cgroups, netfilter/eBPF networking, and mounts. Pods become Linux process trees with isolated views and kernel-enforced limits; Services route via iptables/IPVS/eBPF; volumes are just Linux mounts; and the kubelet and runtimes are the glue. Mastering these mappings provides leverage for debugging, performance tuning, and secure design.

Mapping Kubernetes Components to Linux Primitives