ArgoCD Monitoring and Alerting
Comprehensive guide for monitoring ArgoCD and setting up alerts
ArgoCD Monitoring and Alerting
This guide covers comprehensive monitoring and alerting setup for ArgoCD, including Prometheus, Grafana, and notification systems.
Video Tutorial
Learn more about ArgoCD monitoring and alerting in this comprehensive video tutorial:
Prometheus Integration
1. ServiceMonitor Configuration
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-metrics
namespace: argocd
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-metrics
endpoints:
- port: metrics
interval: 30s
namespaceSelector:
matchNames:
- argocd
2. Application Metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-server-metrics
namespace: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-server
endpoints:
- port: metrics
path: /metrics
interval: 30s
Grafana Dashboards
1. ArgoCD Overview Dashboard
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-dashboard
namespace: monitoring
labels:
grafana_dashboard: "true"
data:
argocd-overview.json: |
{
"title": "ArgoCD Overview",
"panels": [
{
"title": "Sync Status",
"type": "gauge",
"datasource": "Prometheus",
"targets": [
{
"expr": "sum(argocd_app_sync_status{status=\"Synced\"})"
}
]
},
{
"title": "Health Status",
"type": "gauge",
"targets": [
{
"expr": "sum(argocd_app_health_status{status=\"Healthy\"})"
}
]
}
]
}
2. Application Performance Dashboard
apiVersion: v1
kind: ConfigMap
metadata:
name: app-performance-dashboard
namespace: monitoring
data:
app-performance.json: |
{
"title": "Application Performance",
"panels": [
{
"title": "Sync Duration",
"type": "graph",
"targets": [
{
"expr": "rate(argocd_app_sync_duration_seconds_sum[5m])"
}
]
},
{
"title": "Resource Operations",
"type": "graph",
"targets": [
{
"expr": "sum(rate(argocd_app_k8s_request_total[5m])) by (verb)"
}
]
}
]
}
Alert Rules
1. Sync Status Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-alerts
namespace: monitoring
spec:
groups:
- name: argocd
rules:
- alert: ApplicationOutOfSync
expr: |
sum(argocd_app_sync_status{status!="Synced"}) > 0
for: 15m
labels:
severity: warning
annotations:
summary: Application out of sync for more than 15 minutes
description: "{{ $value }} applications are out of sync"
- alert: ApplicationUnhealthy
expr: |
sum(argocd_app_health_status{status!="Healthy"}) > 0
for: 5m
labels:
severity: critical
annotations:
summary: Application health check failed
description: "{{ $value }} applications are unhealthy"
2. Performance Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-performance-alerts
spec:
groups:
- name: argocd-performance
rules:
- alert: HighSyncFailureRate
expr: |
rate(argocd_app_sync_total{status="Failed"}[5m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: High sync failure rate detected
- alert: SlowSync
expr: |
argocd_app_sync_duration_seconds > 300
for: 5m
labels:
severity: warning
annotations:
summary: Sync operation taking too long
Notification Templates
1. Slack Notifications
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
data:
service.slack: |
token: $slack-token
template.app-sync-status: |
message: |
Application {{.app.metadata.name}} sync status is {{.app.status.sync.status}}
Application details: {{.context.argocdUrl}}/applications/{{.app.metadata.name}}
template.app-health-status: |
message: |
Application {{.app.metadata.name}} health status is {{.app.status.health.status}}
Application details: {{.context.argocdUrl}}/applications/{{.app.metadata.name}}
trigger.on-sync-status-changed: |
- when: app.status.sync.status == 'OutOfSync'
send: [app-sync-status]
trigger.on-health-status-changed: |
- when: app.status.health.status == 'Degraded'
send: [app-health-status]
2. Email Notifications
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
data:
service.email: |
host: smtp.gmail.com
port: 587
from: argocd@yourcompany.com
template.app-sync-failed: |
email:
subject: Application {{.app.metadata.name}} sync failed
body: |
Application {{.app.metadata.name}} sync operation failed.
Time: {{.app.status.operationState.finishedAt}}
Error: {{.app.status.operationState.message}}
trigger.on-sync-failed: |
- when: app.status.operationState.phase == 'Error'
send: [app-sync-failed]
Custom Metrics
1. Application-specific Metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-metrics
spec:
selector:
matchLabels:
app.kubernetes.io/instance: myapp
endpoints:
- port: metrics
path: /metrics
interval: 30s
metricRelabelings:
- sourceLabels: [__name__]
regex: 'app_.*'
action: keep
2. Resource Usage Metrics
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: resource-metrics
spec:
groups:
- name: resources
rules:
- record: argocd:app_resource_usage:memory
expr: |
sum(
container_memory_usage_bytes{container!=""}
) by (namespace, pod)
- record: argocd:app_resource_usage:cpu
expr: |
sum(
rate(container_cpu_usage_seconds_total{container!=""}[5m])
) by (namespace, pod)
Logging Configuration
1. Logging Setup
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
data:
logging.level: debug
logging.format: json
logging.components.controller: debug
logging.components.repo-server: debug
logging.components.server: debug
2. Log Aggregation
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: argocd-logs
spec:
filters:
- tag_normaliser: {}
- parser:
remove_key_name_field: true
reserve_data: true
parse:
type: json
match:
- select:
labels:
app.kubernetes.io/name: argocd-server
localOutputRefs:
- elasticsearch
Best Practices Checklist
- Set up basic metrics
- Configure detailed dashboards
- Implement alerting
- Enable notifications
- Monitor resources
- Track performance
- Aggregate logs
- Custom metrics
- Regular review
- Documentation
Performance Optimization
1. Resource Limits
apiVersion: apps/v1
kind: Deployment
metadata:
name: argocd-server
spec:
template:
spec:
containers:
- name: argocd-server
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1024Mi
2. Scaling Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: argocd-repo-server
spec:
replicas: 3
template:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
Conclusion
Proper monitoring and alerting are crucial for maintaining a healthy ArgoCD installation. Regular review and updates of monitoring configurations ensure optimal operation.