feat(ingester): Add cortex_ingester_active_metric_names gauge per user#7514
feat(ingester): Add cortex_ingester_active_metric_names gauge per user#7514yeya24 wants to merge 1 commit into
Conversation
| // Not registered automatically, but only if activeSeriesEnabled is true. | ||
| activeMetricNamesPerUser: prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
| Name: "cortex_ingester_active_metric_names", | ||
| Help: "Number of unique metric names in the ingester head per user.", |
There was a problem hiding this comment.
Nit: What about a TSDB head?
| Help: "Number of unique metric names in the ingester head per user.", | |
| Help: "Number of unique metric names in the TSDB head per user.", |
| userDB.activeSeries.Purge(purgeTime) | ||
| i.metrics.activeSeriesPerUser.WithLabelValues(userID).Set(float64(userDB.activeSeries.Active())) | ||
| i.metrics.activeNHSeriesPerUser.WithLabelValues(userID).Set(float64(userDB.activeSeries.ActiveNativeHistogram())) | ||
| i.metrics.activeMetricNamesPerUser.WithLabelValues(userID).Set(float64(userDB.seriesInMetric.ActiveMetricNames())) |
There was a problem hiding this comment.
nit: the two lines above derive from userDB.activeSeries.*, which applies the -ingester.active-series-metrics-idle-timeout sliding window (10m default). This line derives from userDB.seriesInMetric, which counts anything alive in the TSDB head — effectively up to the block range (~2h).
Since this gauge is grouped under the Active Series Tracker feature, operators will reasonably expect active_metric_names to use the same window as active_series. Two options:
- Move the counter to
ActiveSeries(track metric-name refcounts alongsideactive/activeNativeHistogram). Incremental cost on series creation/purge, O(1) read. Matches the window. - Keep using
seriesInMetric, but tighten the Help text to "unique metric names with at least one series in the head (not windowed like active_series)" so the semantic difference is explicit.
Option 1 is cleaner; Option 2 is acceptable if the current data source is what you want.
There was a problem hiding this comment.
Since this gauge is grouped under the Active Series Tracker feature
Umm no, Active Series Tracker is a new feature introduced in #7476. It is different from the active series metrics we added long time ago. They are not related.
I will remove this metric from the doc to avoid confusion.
f91ebe6 to
e1f5c03
Compare
Expose the number of unique metric names (distinct __name__ values) per tenant in the ingester head as a new Prometheus gauge metric. The data is sourced from the existing seriesInMetric counter which already tracks series counts per metric name via TSDB lifecycle callbacks. The metric is registered when -ingester.active-series-metrics-enabled is true (same gate as cortex_ingester_active_series) and updated in the same periodic loop alongside active series counts. This enables operators to monitor metric name cardinality per tenant without additional overhead, as the underlying data structure already exists. Signed-off-by: Ben Ye <benye@amazon.com>
e1f5c03 to
49190e6
Compare
Summary
Expose the number of unique metric names (distinct
__name__values) per tenant in the ingester head as a new Prometheus gauge metric:cortex_ingester_active_metric_names.Motivation
Operators need visibility into metric name cardinality per tenant to detect cardinality explosions at the metric name level (as opposed to series level which
cortex_ingester_active_seriesalready covers).Changes
pkg/ingester/user_state.go: AddedActiveMetricNames()method tometricCounterthat returns the total number of unique metric names across all shards.pkg/ingester/metrics.go: Addedcortex_ingester_active_metric_namesGaugeVec (labels:user), registered underactiveSeriesEnabledgate, with cleanup indeletePerUserMetrics.pkg/ingester/ingester.go: Set the gauge inupdateActiveSeriesloop and clean up on TSDB close.pkg/ingester/user_state_test.go: Unit test forActiveMetricNames().docs/configuration/v1-guarantees.md: Listed as experimental feature.How it works
The
seriesInMetric(metricCounter) already tracks a sharded map ofmetricName → seriesCountmaintained via TSDBPostCreation/PostDeletioncallbacks. The number of keys in this map equals the number of unique metric names in the head. No new data structures or tracking overhead is introduced.Testing
TestMetricCounter_ActiveMetricNames— verifies count increases/decreases correctly as series are added/removed.TestExpandedCachePostings_RaceandTestIngester_Pushare unrelated race conditions on master).