Dashboard templates
SRE on-call
- 4 widgets at the top: error rate / p95 latency / RPS / saturation (the USE method)
- Variables: env=prod (static), service=label_values
- Auto-refresh: 30s
Capacity planning
- 7-day CPU/RAM graphs across the fleet
- Predict_linear for memory:
predict_linear(node_memory_used_percent[7d], 86400 * 30)— where we will be in 30 days - Authentication metrics: new users / active / churn
Business KPI
- Revenue (a custom metric)
- Signups / day
- Funnel: visits → signup → trial → paid
Embed for customers
- 1-2 widgets: success rate + latency
- Public share with a 30-day TTL