Knowledge base/Performance

Cardinality management

What cardinality is

The number of unique combinations of label values. Each combination = a separate time series in the TSDB. 10k series is nothing, 1M hurts, 10M brings the TSDB down.

The main sources of an explosion

  1. user_id / session_id in labels — thousands of users × N other labels = millions of series
  2. HTTP path without templating/users/123/orders instead of /users/:id/orders
  3. Timestamps in labels — never

How to find it

GET /api/v1/orgs/:slug/metrics

Then for a suspicious metric:

GET /api/v1/orgs/:slug/metrics/:name/labels

More than 1000 values per label is a red flag.

The fix

  • Template the URL path (/users/:id/...)
  • Drop labels on the agent via include/exclude configuration
  • Move user_id into logs, not metrics