Kubernetes Backup Strategies
Best practices for backing up Kubernetes clusters and applications
Kubernetes Backup Strategies
Implementing robust backup strategies is crucial for data protection and disaster recovery. This guide covers essential backup practices for Kubernetes.
Video Tutorial
Prerequisites
- Basic understanding of Kubernetes
- Access to a Kubernetes cluster
- kubectl CLI tool installed
- Familiarity with storage concepts
Project Structure
.
├── backup/
│ ├── velero/ # Velero configurations
│ ├── volumes/ # Volume backup configs
│ ├── etcd/ # etcd backup configs
│ └── scripts/ # Backup scripts
└── monitoring/
├── backup/ # Backup monitoring
└── alerts/ # Backup alerts
Velero Setup
1. Backup Configuration
apiVersion: velero.io/v1
kind: Backup
metadata:
name: daily-backup
namespace: velero
spec:
includedNamespaces:
- "*"
excludedNamespaces:
- kube-system
includedResources:
- "*"
excludedResources:
- nodes
- events
schedule: "0 1 * * *"
ttl: 720h
2. Storage Location
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: default
namespace: velero
spec:
provider: aws
objectStorage:
bucket: my-backup-bucket
config:
region: us-west-2
Volume Backup
1. Volume Snapshot
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: data-snapshot
spec:
volumeSnapshotClassName: csi-hostpath-snapclass
source:
persistentVolumeClaimName: data-pvc
2. Snapshot Schedule
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-hostpath-snapclass
driver: hostpath.csi.k8s.io
deletionPolicy: Delete
parameters:
csi.storage.k8s.io/snapshotter-secret-name: snapshotter-secret
csi.storage.k8s.io/snapshotter-secret-namespace: default
etcd Backup
1. Backup Job
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-backup
spec:
schedule: "0 */6 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: etcd-backup
image: k8s.gcr.io/etcd:3.5.0
command:
- /bin/sh
- -c
- |
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
2. Restore Configuration
apiVersion: v1
kind: Pod
metadata:
name: etcd-restore
spec:
containers:
- name: etcd
image: k8s.gcr.io/etcd:3.5.0
command:
- /bin/sh
- -c
- |
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd-restore \
--name=restored-etcd \
--initial-cluster=restored-etcd=https://127.0.0.1:2380 \
--initial-cluster-token=restored-etcd-token \
--initial-advertise-peer-urls=https://127.0.0.1:2380
Monitoring Setup
1. Backup Monitoring
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: backup-monitor
spec:
selector:
matchLabels:
app: velero
endpoints:
- port: metrics
2. Alert Configuration
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: backup-alerts
spec:
groups:
- name: backup
rules:
- alert: BackupFailed
expr: velero_backup_failure_total > 0
for: 1h
labels:
severity: critical
Best Practices Checklist
- ✅ Regular backups
- ✅ Multiple backup locations
- ✅ Encryption
- ✅ Retention policy
- ✅ Backup testing
- ✅ Monitoring
- ✅ Documentation
- ✅ Access control
- ✅ Disaster recovery
- ✅ Compliance
Backup Patterns
Full Backup
- Complete cluster state
- All resources
- Volume data
- Configuration
Incremental Backup
- Changed resources
- Delta snapshots
- Efficient storage
- Quick backup
Selective Backup
- Critical namespaces
- Important data
- Configuration only
- Application state
Common Pitfalls
- ❌ Infrequent backups
- ❌ Untested restores
- ❌ Missing monitoring
- ❌ Poor encryption
- ❌ Insufficient retention
Restore Procedures
1. Velero Restore
apiVersion: velero.io/v1
kind: Restore
metadata:
name: restore-production
namespace: velero
spec:
backupName: daily-backup
includedNamespaces:
- "*"
excludedNamespaces:
- kube-system
restorePVs: true
2. Volume Restore
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: restore-pvc
spec:
dataSource:
name: data-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Backup Testing
1. Test Restore
apiVersion: batch/v1
kind: Job
metadata:
name: backup-test
spec:
template:
spec:
containers:
- name: restore-test
image: test:latest
command: ["./test-restore.sh"]
restartPolicy: Never
2. Validation
apiVersion: v1
kind: ConfigMap
metadata:
name: backup-validation
data:
validate.sh: |
#!/bin/bash
# Validate restored resources
kubectl get all --all-namespaces
# Validate restored data
kubectl exec database-0 -- mysql -e "SELECT COUNT(*) FROM users"
Compliance Requirements
1. Retention Policy
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: compliance-backup
spec:
schedule: "0 0 * * *"
template:
ttl: 2160h # 90 days
includedNamespaces:
- "*"
2. Audit Configuration
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
resources:
- group: velero.io
resources: ["backups", "restores"]
Conclusion
Implementing these backup strategies ensures data protection and business continuity in your Kubernetes clusters. Regular testing and updates of backup procedures are essential.