Mapping Kubernetes Components to Linux Primitives (Updated)
How Kubernetes composes Linux namespaces, cgroups, netfilter/eBPF, and mounts to orchestrate containers
Introduction
Kubernetes orchestrates containers by composing established Linux primitives: namespaces for isolation, cgroups for resource control, virtual networking for Pod connectivity, and mounts for storage. Every Pod, Service, and Volume ultimately becomes Linux processes with specific isolation and kernel-enforced limits. Understanding these mappings clarifies scheduling, performance, and debugging behaviors in real clusters.
1) Pods → Namespaces + cgroups + virtual networking
A Pod is a group of Linux processes created with isolated namespaces and governed by cgroup limits. The kubelet instructs the container runtime (containerd/CRI-O), which uses clone()/unshare() to configure namespaces and mounts.
1.1 Linux namespaces used by Pods
| Namespace | Purpose | Kubernetes Usage |
|---|---|---|
mnt |
Filesystem isolation | Each container sees its own rootfs and mount table |
pid |
Process IDs | Containers cannot see/kill processes outside their PID namespace |
net |
Network stack | Pod gets its own netns with an eth0, routes, ARP table |
ipc |
SysV IPC, POSIX msg queues | Isolates IPC unless explicitly shared |
uts |
Hostname/domain | Pod has its own hostname; can differ from the node |
user |
UID/GID mapping | Used for rootless/user-namespace scenarios when enabled |
Runtime behavior: Low-level runtimes (e.g., runc, crun) perform namespace creation and apply seccomp/AppArmor/SELinux if configured.
1.2 cgroups (v1 and v2) and QoS
Kubernetes QoS classes map to cgroup controls:
- Guaranteed → hard CPU/memory limits assigned per container
- Burstable → shares/limits applied; can burst within node capacity
- BestEffort → no explicit requests/limits; lowest priority under pressure
Paths vary by cgroup mode and init system:
- cgroup v1 uses hierarchical paths like
/sys/fs/cgroup/<controller>/kubepods/... - cgroup v2 (unified) with systemd uses slice-based layout (e.g.,
kubepods.slice)
Important: As of 2026, cgroup v2 is default on modern distributions (Flatcar, Fedora, recent Ubuntu), but many production clusters still run v1 or mixed configurations. The systemd cgroup driver is recommended for v2 and becomes the default in Kubernetes 1.35+ with automatic detection via KubeletCgroupDriverFromCRI feature gate .
1.3 Pod networking: veth pairs + bridge + netfilter/eBPF
CNI plugins attach the Pod netns to the host network:
- Create a veth pair: one end in the Pod (
eth0), one in the host - Connect host end to a Linux bridge (commonly
cni0) or switch fabric - Assign Pod IP, routes, and apply iptables/nftables/eBPF policies
Service traffic:
- kube-proxy iptables mode installs chains like
KUBE-SERVICESand DNATs cluster IPs to endpoints - IPVS mode uses the kernel’s IPVS load-balancer for scale
- eBPF dataplanes (Cilium, Calico eBPF) implement kube-proxy replacement and load-balancing in-kernel without iptables
2) Deployments & ReplicaSets → Controllers for Pods
Deployments and ReplicaSets do not directly create Linux primitives. They reconcile desired state (count, rollout strategy) into Pods, which are realized as Linux processes with namespaces and cgroups by kubelet + runtime.
3) Kubelet → systemd service + process supervisor
The kubelet typically runs as a systemd unit and is responsible for:
- Creating Pod sandboxes and containers via the CRI
- Applying cgroup limits and mounts
- Calling CNI to set up netns/veth/routes
- Managing probes, restarts, and garbage collection
On Linux hosts you will see unit files like kubelet.service and kubelet using /var/lib/kubelet/ to stage pod resources and mounts.
4) Container runtimes → OCI + kernel syscalls
Runtimes (containerd, CRI-O) delegate to low-level OCI runtimes (runc, crun, kata-runtime) to:
clone()/unshare()→ create namespaces- write to cgroup fs → apply CPU/memory/IO limits
- mount syscalls → build container rootfs
- apply seccomp/AppArmor/SELinux profiles for syscall and access control
A container is, fundamentally, a Linux process tree with specific namespaces and cgroup constraints.
| Runtime | Type | Notes |
|---|---|---|
containerd |
CRI implementation | Built-in CRI plugin, default in most distros |
CRI-O |
CRI implementation | Lightweight, OpenShift default |
runc |
OCI runtime | Reference implementation, written in Go |
crun |
OCI runtime | Written in C, faster startup, better cgroup v2 support |
kata-runtime |
OCI runtime | Lightweight VM-based isolation |
5) Services & kube-proxy → iptables/IPVS/eBPF
| Mode | Implementation | Use Case |
|---|---|---|
| iptables | KUBE-SERVICES, KUBE-NODEPORTS chains, DNAT to endpoints |
Small clusters, legacy compatibility |
| IPVS | Kernel IPVS load-balancer | Large clusters, better performance than iptables |
| eBPF | TC/XDP hooks, eBPF maps (Cilium, Calico) | Production scale, ~80% adoption in Cilium deployments |
Note: eBPF-based kube-proxy replacement eliminates conntrack overhead and provides O(1) service lookups. Cilium 1.17/1.18 (2025) introduced 40% reduction in policy latency and 43% reduction in agent CPU usage during high churn .
6) Volumes → Linux mounts (bind, tmpfs, network, block)
Kubernetes volumes rely on standard Linux mount mechanics:
emptyDir: by default backed by the node’s filesystem; setmedium: Memoryto make it tmpfs (RAM-backed)hostPath: a bind mount of a host directory into the container’s mount namespace- Network volumes: NFS/CIFS/CSI drivers leverage kernel clients/device mappers
- Block volumes: raw block devices or filesystems mounted by kubelet/CSI
You can observe paths like:
/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~empty-dir/<name>/
7) ConfigMaps & Secrets → tmpfs + bind mounts (projected volumes)
ConfigMaps and Secrets are materialized as files on the node (often tmpfs for secrets), then bind-mounted into the container. Secrets typically have 0400 permissions and may be encrypted at rest by the API server’s KMS provider. Projected service account tokens auto-rotate.
8) Service Accounts → token files in the container filesystem
Service accounts are presented via projected volumes at paths like:
/var/run/secrets/kubernetes.io/serviceaccount/token
These are ordinary files from the container’s perspective, supplied by tmpfs + bind-mount mechanisms.
9) Scheduler → resource-aware placement (Linux topology)
The scheduler does not create namespaces/cgroups. It makes placement choices using node capacity, topology (NUMA, CPU), taints/tolerations, and resource requests/limits. The kubelet then realizes Pods via Linux primitives.
10) Node components → kernel-level integration
| Component | Linux primitive(s) used |
|---|---|
| kubelet | systemd service; mount syscalls; cgroups; interacts with CRI |
| container runtime | namespaces, cgroups, seccomp/AppArmor/SELinux |
| CNI plugins | netns, veth, bridges/routes, iptables/nftables, or eBPF |
| kube-proxy | iptables chains, IPVS, or replaced by eBPF |
Hands-On: Inspecting these mappings (quick cheats)
These commands are for debugging/education on a Linux node.
List namespaces for a container process
sudo lsns -p <pid>
Enter a Pod’s netns (host)
# Identify container netns
sudo nsenter -t <pid> -n ip a
Inspect kube-proxy chains
sudo iptables -t nat -S | grep KUBE-
See cgroup placement (v1/v2 vary)
cat /proc/<pid>/cgroup
CNI bridge & veths
ip link show | grep cni0 -n
ip route
eBPF dataplane presence (if using Cilium, etc.)
sudo bpftool net
sudo bpftool map dump # View service endpoints in eBPF maps
Verify cgroup version
stat -fc %T /sys/fs/cgroup/
# cgroup2fs = v2, tmpfs = v1
Conclusion
Kubernetes layers orchestration logic on top of Linux’s namespaces, cgroups, netfilter/eBPF networking, and mounts. Pods become Linux process trees with isolated views and kernel-enforced limits; Services route via iptables/IPVS/eBPF; volumes are just Linux mounts; and the kubelet and runtimes are the glue. Mastering these mappings provides leverage for debugging, performance tuning, and secure design.
References
- https://www.cncf.io/reports/cilium-annual-report-2025/
- https://www.plural.sh/blog/the-ultimate-guide-to-cilium-kubernetes-networking/
- https://www.oneuptime.com/blog/how-to-implement-kube-proxy-replacement-with-ebpf/
- https://chkk.io/blog/cgroup-v1-to-v2-migration-in-kubernetes/
- https://www.cleanstart.com/container-orchestration-guide/
- https://www.linkedin.com/pulse/comparative-study-container-runtime-performance-runc-crun-kata-lb9ke/
- https://kubernetes.io/docs/setup/production-environment/container-runtimes/



