[otap-df-telemetry] add duration metric helper for processors by jmacd · Pull Request #2211 · open-telemetry/otel-arrow

jmacd · 2026-03-05T19:07:52Z

Add a node.processor metric set with a process.success.duration and process.failed.duration Mmsc instrument for measuring the wall-clock duration of the work done in a process() call. A closure is used to prevent inclusion of async-await points in the measurement.

The metric is registered via the node telemetry context.

This is intended to be gated by MetricLevel >= Normal

Fixes #2210.

Add a node.processor metric set with a process.duration Mmsc instrument that automatically measures the wall-clock duration of every processor's process() call. The timing is performed at the engine wrapper level in ProcessorWrapper::start(), so all processors are instrumented without requiring changes to the Processor trait or individual implementations. The metric is registered via the node telemetry context during prepare_runtime() and reported on each CollectTelemetry interval (1-second periodic) as well as at final shutdown. This is intended to be gated by MetricLevel >= Normal once open-telemetry#2169 merges. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace engine-level automatic process() timing with an opt-in approach where processors time their own compute sections using the new Mmsc::timed() scoped API. Processors opt in by: - Adding a process_duration: Mmsc field to their metric set - Wrapping their compute section in timed() - Reporting via the existing CollectTelemetry flow Instrumented processors: filter, attributes, KQL recordset, and condense attributes. Processors like retry and fanout that don't do meaningful compute are not instrumented. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nto jmacdonald/processor-duration-histogram

Check effect_handler.node_interests() for CONSUMER_METRICS before calling timed(). At Basic or None level, the Instant::now() overhead is skipped entirely. Aligns with the MetricLevel policy from PR open-telemetry#2169. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move process_duration: Mmsc from each processor's individual metric set into a shared ProcessDuration type in the engine crate. This encapsulates the metric set, the MetricLevel gating (via Interests), and reporting into a single reusable wrapper. Each call site is now a single line: self.process_duration.timed(interests, || compute()) Deleted per-processor metrics files that only contained process_duration (KQL, condense). Removed process_duration from filter and attributes metric sets that have other counters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

codecov · 2026-03-05T19:10:46Z

Codecov Report

❌ Patch coverage is 87.16216% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.71%. Comparing base (87aaf03) to head (6deafe3).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #2211    +/-   ##
========================================
  Coverage   87.71%   87.71%            
========================================
  Files         575      577     +2     
  Lines      197290   198107   +817     
========================================
+ Hits       173044   173775   +731     
- Misses      23720    23806    +86     
  Partials      526      526

Components	Coverage Δ
otap-dataflow	`89.74% <87.16%> (-0.01%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.61% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`52.44% <ø> (ø)`
quiver	`91.91% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Replace the closure-based timed() approach and mutable-reference guard with a lightweight TimingGuard that captures the start instant without borrowing the ProcessDuration. This avoids borrow checker conflicts when the processor needs to call methods on &self/&mut self between start and stop. API: ProcessDuration::start(interests) -> TimingGuard timing.stop(&mut self.process_duration) All four processors (filter, attributes, KQL recordset, condense) now use this pattern with minimal diff against upstream - no code re-indentation or method extraction required. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Introduce a Timer type that encapsulates Instant and the elapsed-to-nanos conversion. Mmsc::timed() and TimingGuard now both delegate to Timer, eliminating raw Instant usage from callers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove elapsed_nanos() from Timer. Instead, Mmsc::record_timer() consumes the Timer directly and handles the elapsed-to-nanos conversion internally. TimingGuard::stop() now delegates straight to record_timer(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Enables Normal+ level metrics (including process_duration) so the example config demonstrates the full metrics picture out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…donald/processor-duration-histogram

jmacd · 2026-03-12T03:59:11Z

e.g.,

# HELP process_duration_min Wall-clock duration of the processor compute section, in nanoseconds.
# TYPE process_duration_min gauge
process_duration_min{set="processor.process.duration",node_id="debug",node_urn="urn:otel:processor:debug",node_type="processor",pipeline_id="main",pipeline_group_id="default",core_id="19",numa_node_id="0",process_instance_id="AGOOAMAILN73HA4PQSJ3WPDKSA",host_id="JoshCorpSurfaceLaptop",container_id="",custom="{node_sequence=2, pipeline_name=MyCustomPipelineName}"} 4139070 1773287780962
# HELP process_duration_max Wall-clock duration of the processor compute section, in nanoseconds.
# TYPE process_duration_max gauge
process_duration_max{set="processor.process.duration",node_id="debug",node_urn="urn:otel:processor:debug",node_type="processor",pipeline_id="main",pipeline_group_id="default",core_id="19",numa_node_id="0",process_instance_id="AGOOAMAILN73HA4PQSJ3WPDKSA",host_id="JoshCorpSurfaceLaptop",container_id="",custom="{node_sequence=2, pipeline_name=MyCustomPipelineName}"} 4409694 1773287780962
# HELP process_duration_sum Wall-clock duration of the processor compute section, in nanoseconds.
# TYPE process_duration_sum counter
process_duration_sum{set="processor.process.duration",node_id="debug",node_urn="urn:otel:processor:debug",node_type="processor",pipeline_id="main",pipeline_group_id="default",core_id="19",numa_node_id="0",process_instance_id="AGOOAMAILN73HA4PQSJ3WPDKSA",host_id="JoshCorpSurfaceLaptop",container_id="",custom="{node_sequence=2, pipeline_name=MyCustomPipelineName}"} 8548764 1773287780962
# HELP process_duration_count Wall-clock duration of the processor compute section, in nanoseconds.
# TYPE process_duration_count counter
process_duration_count{set="processor.process.duration",node_id="debug",node_urn="urn:otel:processor:debug",node_type="processor",pipeline_id="main",pipeline_group_id="default",core_id="19",numa_node_id="0",process_instance_id="AGOOAMAILN73HA4PQSJ3WPDKSA",host_id="JoshCorpSurfaceLaptop",container_id="",custom="{node_sequence=2, pipeline_name=MyCustomPipelineName}"} 2 1773287780962

…nto jmacdonald/processor-duration-histogram # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

Revert to Cell<Mmsc> with &self on timed() so the closure can borrow sibling fields on the processor. This structurally prevents timing from spanning .await points (reviewer feedback). - Remove start_timer() and record_elapsed() — all 5 processors now use timed() exclusively - Debug processor: timed() wraps sync proto decode only - Filter/KQL/attributes/condense: timed() wraps sync compute work Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

All processor call sites now use effect_handler.timed(&self.compute_duration, || { ... }) instead of self.compute_duration.timed(effect_handler.node_interests(), || { ... }). The EffectHandler already knows its interests, so processors no longer need to thread them through manually. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Constrain timed() to FnOnce() -> Result<T, E> and route elapsed time into separate compute.duration.success and compute.duration.failed instruments based on the closure outcome. This lets operators distinguish compute time from error-path time in duration metrics. All callers already return Result, so no call-site changes needed. Add end-to-end test verifying both metric names appear in the telemetry registry after report(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

RefCell supports non-Copy types (e.g. Box<Histogram>, [u64; N] bucket arrays) that Cell cannot handle. The borrow is non-reentrant by construction since timed() runs a sync closure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

utpilla · 2026-03-16T22:56:35Z

+                                self.metrics.log_signals_consumed.add(log_signals_consumed);
+                                self.metrics.log_signals_filtered.add(log_signals_filtered);


nit: The counter updates (self.metrics.log_signals_consumed.add(...), self.metrics.span_signals_consumed.add(...), etc.) are inside the timed() closure, so they're included in the compute duration measurement. These are cheap add() calls so the impact is minimal, but for purity the counters could be updated after the closure returns (similar to how the attributes processor updates its counters outside timed()).

I agree, fixed. And also-- I sort of don't care whether metric updates are included in compute time (a) because the distortions will be small, (b) because compute is compute. 😀

Extract the repeated TelemetryRegistryHandle/ControllerContext/ PipelineContext boilerplate into a shared helper in crates/engine/src/testing/mod.rs. Use it in process_duration tests and condense_attributes_processor test helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

utpilla · 2026-03-16T23:02:44Z

+    /// closure.  `RefCell` is used instead of `Cell` to support future
+    /// instrument types that are not `Copy` (e.g. histogram buckets).


nit: Compared to Cell which has zero runtime borrow-check cost, RefCell does come with a runtime borrow-check cost for every borrow_mut call. I understand that Cell might not help in future when we move to Histograms, but I guess that switch could have been made when we actually needed it.

Replaced with Cell for now!

Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>

lalitb · 2026-03-17T00:04:03Z

-                                }
-                            })?;
+                            let req =
+                                effect_handler.timed(&self.compute_duration, || {


This timing only covers protobuf decode. Most of the debug processor’s actual work happens afterward in process_log / process_metric / process_trace, so this metric will underreport the real processing cost.

I've gone back and forth on this! The blocking calls to the console will appear as compute if we're not careful, but I guess the work in formatting can be measured. Not sure if it's worth it. I mainly added this in the debug procsesor to test the code path because it's used in many test configs.

Yeah it's just an observation, not blocking specifically for debug processor.

…ters outside timed() Address PR review feedback: - Replace RefCell<Mmsc> with Cell<Mmsc> for zero runtime borrow-check overhead, since Mmsc is Copy. - Move filter processor counter .add() calls outside the timed() closure so only actual compute work is measured. - Fix pre-existing test bug with swapped metric field name order. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nto jmacdonald/processor-duration-histogram

…jmacd/otel-arrow into jmacdonald/processor-duration-histogram

…donald/processor-duration-histogram

The method only uses &self.pipeline immutably, so &mut self was unnecessary and caused a borrow conflict with &self.compute_duration inside the timed() closure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…elemetry#2211) Add a `node.processor` metric set with a `process.success.duration` and `process.failed.duration` Mmsc instrument for measuring the wall-clock duration of the work done in a process() call. A closure is used to prevent inclusion of async-await points in the measurement. The metric is registered via the node telemetry context. This is intended to be gated by MetricLevel >= Normal Fixes open-telemetry#2210. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>

jmacd and others added 5 commits March 4, 2026 22:38

Merge branch 'main' of https://github.com/open-telemetry/otel-arrow i…

b8a82e3

…nto jmacdonald/processor-duration-histogram

github-project-automation bot added this to OTel-Arrow Mar 5, 2026

github-actions bot added the rust Pull requests that update Rust code label Mar 5, 2026

jmacd and others added 4 commits March 5, 2026 19:22

Set channel_metrics: detailed in KQL example config

7160a25

Enables Normal+ level metrics (including process_duration) so the example config demonstrates the full metrics picture out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jmacd marked this pull request as ready for review March 5, 2026 20:28

jmacd requested a review from a team as a code owner March 5, 2026 20:28

utpilla reviewed Mar 5, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/contrib-nodes/src/processors/condense_attributes_processor/mod.rs Outdated

utpilla reviewed Mar 5, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/engine/src/process_duration.rs Outdated

utpilla reviewed Mar 6, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/contrib-nodes/src/processors/condense_attributes_processor/mod.rs Outdated

utpilla reviewed Mar 6, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/engine/src/process_duration.rs Outdated

utpilla reviewed Mar 6, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/engine/src/process_duration.rs Outdated

lalitb reviewed Mar 6, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/otap/src/filter_processor.rs Outdated

jmacd added 6 commits March 11, 2026 17:47

review feedback

c75a9b9

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

3fd649c

…donald/processor-duration-histogram

guard the timer in a more appropriate way

1002e2f

humaniez

a03a2eb

comments

0fe6b3f

processor duration

bc34849

jmacd added 3 commits March 11, 2026 21:07

fmt

ab9181b

must_use

1a3bca9

lint

b76bd40

jmacd and others added 2 commits March 16, 2026 19:15

utpilla reviewed Mar 16, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/engine/src/process_duration.rs Outdated

jmacd and others added 3 commits March 16, 2026 21:10

utpilla reviewed Mar 16, 2026

View reviewed changes

Comment thread rust/otap-dataflow/crates/engine/src/process_duration.rs Outdated

utpilla reviewed Mar 16, 2026

View reviewed changes

utpilla approved these changes Mar 16, 2026

View reviewed changes

utpilla reviewed Mar 16, 2026

View reviewed changes

Comment thread rust/otap-dataflow/configs/fake-kql-debug-noop.yaml Outdated

jmacd and others added 2 commits March 16, 2026 16:20

rename to compute.{success,failed}.duration

4bee80b

Update rust/otap-dataflow/configs/fake-kql-debug-noop.yaml

9ec0805

Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>

lalitb reviewed Mar 17, 2026

View reviewed changes

jmacd and others added 2 commits March 17, 2026 00:06

Merge branch 'main' of https://github.com/open-telemetry/otel-arrow i…

7261d6d

…nto jmacdonald/processor-duration-histogram

lalitb approved these changes Mar 17, 2026

View reviewed changes

jmacd and others added 4 commits March 17, 2026 08:10

fmt

97e891e

Merge branch 'jmacdonald/processor-duration-histogram' of github.com:…

9ebcf07

…jmacd/otel-arrow into jmacdonald/processor-duration-histogram

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

d43b206

…donald/processor-duration-histogram

jmacd enabled auto-merge March 17, 2026 18:25

jmacd added this pull request to the merge queue Mar 17, 2026

Merged via the queue into open-telemetry:main with commit f71e43b Mar 17, 2026
67 of 68 checks passed

jmacd deleted the jmacdonald/processor-duration-histogram branch March 17, 2026 19:14

github-project-automation bot moved this to Done in OTel-Arrow Mar 17, 2026

utpilla mentioned this pull request Apr 2, 2026

fix(geneva_exporter): per-signal metrics, accurate batch tracking, and per-upload latency #2504

Merged

		self.metrics.log_signals_consumed.add(log_signals_consumed);
		self.metrics.log_signals_filtered.add(log_signals_filtered);

		/// closure. `RefCell` is used instead of `Cell` to support future
		/// instrument types that are not `Copy` (e.g. histogram buckets).

Conversation

jmacd commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmacd commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

utpilla Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

jmacd Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

utpilla Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

jmacd Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lalitb Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

jmacd Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

lalitb Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jmacd commented Mar 5, 2026 •

edited

Loading

codecov bot commented Mar 5, 2026 •

edited

Loading