Kubernetes Backup Strategies

Best practices for backing up Kubernetes clusters and applications

Kubernetes Backup Strategies

Implementing robust backup strategies is crucial for data protection and disaster recovery. This guide covers essential backup practices for Kubernetes.

Video Tutorial

Prerequisites

  • Basic understanding of Kubernetes
  • Access to a Kubernetes cluster
  • kubectl CLI tool installed
  • Familiarity with storage concepts

Project Structure

.
├── backup/
│   ├── velero/          # Velero configurations
│   ├── volumes/         # Volume backup configs
│   ├── etcd/            # etcd backup configs
│   └── scripts/         # Backup scripts
└── monitoring/
    ├── backup/          # Backup monitoring
    └── alerts/          # Backup alerts

Velero Setup

1. Backup Configuration

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: daily-backup
  namespace: velero
spec:
  includedNamespaces:
  - "*"
  excludedNamespaces:
  - kube-system
  includedResources:
  - "*"
  excludedResources:
  - nodes
  - events
  schedule: "0 1 * * *"
  ttl: 720h

2. Storage Location

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: my-backup-bucket
  config:
    region: us-west-2

Volume Backup

1. Volume Snapshot

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: data-snapshot
spec:
  volumeSnapshotClassName: csi-hostpath-snapclass
  source:
    persistentVolumeClaimName: data-pvc

2. Snapshot Schedule

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-hostpath-snapclass
driver: hostpath.csi.k8s.io
deletionPolicy: Delete
parameters:
  csi.storage.k8s.io/snapshotter-secret-name: snapshotter-secret
  csi.storage.k8s.io/snapshotter-secret-namespace: default

etcd Backup

1. Backup Job

apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
spec:
  schedule: "0 */6 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: etcd-backup
            image: k8s.gcr.io/etcd:3.5.0
            command:
            - /bin/sh
            - -c
            - |
              ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db \
                --endpoints=https://127.0.0.1:2379 \
                --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                --cert=/etc/kubernetes/pki/etcd/server.crt \
                --key=/etc/kubernetes/pki/etcd/server.key

2. Restore Configuration

apiVersion: v1
kind: Pod
metadata:
  name: etcd-restore
spec:
  containers:
  - name: etcd
    image: k8s.gcr.io/etcd:3.5.0
    command:
    - /bin/sh
    - -c
    - |
      ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
        --data-dir=/var/lib/etcd-restore \
        --name=restored-etcd \
        --initial-cluster=restored-etcd=https://127.0.0.1:2380 \
        --initial-cluster-token=restored-etcd-token \
        --initial-advertise-peer-urls=https://127.0.0.1:2380

Monitoring Setup

1. Backup Monitoring

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: backup-monitor
spec:
  selector:
    matchLabels:
      app: velero
  endpoints:
  - port: metrics

2. Alert Configuration

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: backup-alerts
spec:
  groups:
  - name: backup
    rules:
    - alert: BackupFailed
      expr: velero_backup_failure_total > 0
      for: 1h
      labels:
        severity: critical

Best Practices Checklist

  1. ✅ Regular backups
  2. ✅ Multiple backup locations
  3. ✅ Encryption
  4. ✅ Retention policy
  5. ✅ Backup testing
  6. ✅ Monitoring
  7. ✅ Documentation
  8. ✅ Access control
  9. ✅ Disaster recovery
  10. ✅ Compliance

Backup Patterns

Full Backup

  • Complete cluster state
  • All resources
  • Volume data
  • Configuration

Incremental Backup

  • Changed resources
  • Delta snapshots
  • Efficient storage
  • Quick backup

Selective Backup

  • Critical namespaces
  • Important data
  • Configuration only
  • Application state

Common Pitfalls

  1. ❌ Infrequent backups
  2. ❌ Untested restores
  3. ❌ Missing monitoring
  4. ❌ Poor encryption
  5. ❌ Insufficient retention

Restore Procedures

1. Velero Restore

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restore-production
  namespace: velero
spec:
  backupName: daily-backup
  includedNamespaces:
  - "*"
  excludedNamespaces:
  - kube-system
  restorePVs: true

2. Volume Restore

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: restore-pvc
spec:
  dataSource:
    name: data-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

Backup Testing

1. Test Restore

apiVersion: batch/v1
kind: Job
metadata:
  name: backup-test
spec:
  template:
    spec:
      containers:
      - name: restore-test
        image: test:latest
        command: ["./test-restore.sh"]
      restartPolicy: Never

2. Validation

apiVersion: v1
kind: ConfigMap
metadata:
  name: backup-validation
data:
  validate.sh: |
    #!/bin/bash
    # Validate restored resources
    kubectl get all --all-namespaces
    # Validate restored data
    kubectl exec database-0 -- mysql -e "SELECT COUNT(*) FROM users"

Compliance Requirements

1. Retention Policy

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: compliance-backup
spec:
  schedule: "0 0 * * *"
  template:
    ttl: 2160h # 90 days
    includedNamespaces:
    - "*"

2. Audit Configuration

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: velero.io
    resources: ["backups", "restores"]

Conclusion

Implementing these backup strategies ensures data protection and business continuity in your Kubernetes clusters. Regular testing and updates of backup procedures are essential.

Additional Resources