Kubernetes Production Deployment Patterns
Moving from development to production with Kubernetes requires more than just kubectl apply. Here are the patterns that separate hobby projects from enterprise-grade deployments.
Blue-Green Deployments
The safest pattern for critical applications:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: myapp:v1.2.0
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
How it works:
- Deploy new version (green) alongside current (blue)
- Test green deployment thoroughly
- Switch traffic at load balancer level
- If issues arise, instantly roll back to blue
Canary Releases for Risk Reduction
When you need to test with real users:
# Deploy initial canary (5% of traffic)
kubectl apply -f canary-deployment.yaml
# Gradually increase traffic
kubectl scale deployment/app-canary --replicas=2 # 10% traffic
kubectl scale deployment/app-canary --replicas=5 # 25% traffic
# Monitor metrics before full rollout
kubectl get hpa
kubectl top pods -l app=myapp
Key metrics to monitor:
- Error rates (keep below 0.1%)
- Latency P99 (should not increase >20%)
- Resource utilization (watch for memory leaks)
- Business metrics (conversion rates, etc.)
Horizontal Pod Autoscaling Done Right
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
Common pitfalls to avoid:
- Setting CPU thresholds too low (causes thrashing)
- Not accounting for warm-up time
- Ignoring memory pressure
- Forgetting to set pod disruption budgets
Multi-Region Deployment Strategy
For global applications, consider this architecture:
Region: us-east-1 (Primary)
├── 3 Availability Zones
├── Global Load Balancer
└── Database (async replication)
Region: eu-west-1 (Secondary)
├── 2 Availability Zones
├── Read-only workloads
└── Database replica
Traffic routing rules:
- Route users to nearest healthy region
- Fail over only when primary region has >5% error rate
- Use latency-based routing for API calls
- Implement circuit breakers between regions
Observability: The Production Safety Net
Your deployment isn’t complete without:
# Prometheus monitoring
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-monitor
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: web
interval: 15s
path: /metrics
Four golden signals to monitor:
- Latency - Time to serve requests
- Traffic - Requests per second
- Errors - Failed requests percentage
- Saturation - How “full” your service is
Conclusion
Production Kubernetes requires thinking beyond pods and services. Implement these patterns gradually:
- Start with proper readiness/liveness probes
- Add HPA with conservative limits
- Implement blue-green for critical apps
- Add multi-region for global scale
Remember: The goal isn’t just deployment—it’s reliable, observable, and maintainable operations.