Conversation
Implements periodic worker heartbeat RPCs that report worker status, slot usage, poller info, and task counters to the server. Key components: - HeartbeatManager: per-namespace scheduler that aggregates heartbeats from all workers sharing that namespace - PollerTracker: tracks in-flight poll count and last successful poll time - WorkflowClientOptions.workerHeartbeatInterval: configurable interval (default 60s, range 1-60s, negative to disable) - TrackingSlotSupplier: extended with slot type reporting - Worker: builds SharedNamespaceWorker heartbeat data from activity, workflow, and nexus worker stats - TestWorkflowService: implements recordWorkerHeartbeat, describeWorker, and shutdownWorker RPCs for testing
9eeea9b to
bfd3fc6
Compare
temporal-sdk/src/main/java/io/temporal/internal/worker/ActivityWorker.java
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/ActivityWorker.java
Show resolved
Hide resolved
| private final AtomicInteger totalProcessedTasks = new AtomicInteger(); | ||
| private final AtomicInteger totalFailedTasks = new AtomicInteger(); |
There was a problem hiding this comment.
All the workers now have something like this - can we abstract these out into something else? Seems like we could use a shared interface or base class.
There was a problem hiding this comment.
moved into a new TaskCounter class
| DescribeNamespaceRequest.newBuilder() | ||
| .setNamespace(workflowClient.getOptions().getNamespace()) | ||
| .build()); | ||
| boolean heartbeatsSupported = |
There was a problem hiding this comment.
In my PR I've bundled a bunch of capability stuff up into a class, FYI, this will want to go in there
There was a problem hiding this comment.
Thanks for the heads up, had Claude pull in your new class, so merge conflict should be minimal now. Feel free to merge first
temporal-sdk/src/test/java/io/temporal/internal/worker/HeartbeatManagerTest.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/test/java/io/temporal/worker/WorkerHeartbeatIntegrationTest.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/test/java/io/temporal/worker/WorkerHeartbeatIntegrationTest.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/test/java/io/temporal/worker/WorkerHeartbeatIntegrationTest.java
Outdated
Show resolved
Hide resolved
… use separate TaskCounter, get TaskQueueType live not just on start
temporal-sdk/src/main/java/io/temporal/internal/worker/LocalActivityWorker.java
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/WorkflowWorker.java
Outdated
Show resolved
Hide resolved
temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java
Outdated
Show resolved
Hide resolved
temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java
Show resolved
Hide resolved
… anticipate merge conflict
temporal-sdk/src/main/java/io/temporal/internal/worker/WorkflowWorker.java
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/WorkflowWorker.java
Outdated
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/HeartbeatManager.java
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/WorkflowWorker.java
Show resolved
Hide resolved
temporal-sdk/src/main/java/io/temporal/internal/worker/HeartbeatManager.java
Show resolved
Hide resolved
…r testNoHeartbeatsSentWhenDisabled
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit ff47c66. Configure here.
| // Only signal shutdown — don't awaitTermination from within the scheduler's own thread | ||
| shuttingDown.set(true); | ||
| scheduler.shutdown(); | ||
| return; |
There was a problem hiding this comment.
HeartbeatManager UNIMPLEMENTED handling prevents proper shutdown cleanup
Low Severity
When the server returns UNIMPLEMENTED, heartbeatTick calls shuttingDown.set(true) and scheduler.shutdown() (graceful). Later, when HeartbeatManager.shutdown() is called, SharedNamespaceWorker.shutdown() uses compareAndSet(false, true) which fails since shuttingDown is already true, causing it to return immediately without calling shutdownNow() or awaitTermination(). This means HeartbeatManager.shutdown() won't wait for the scheduler thread to fully terminate.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ff47c66. Configure here.


What was changed
Implements periodic worker heartbeat RPC that reports worker status, slot usage, poller info, host metrics, and sticky cache counters to the Temporal server. Includes HeartbeatManager, PollerTracker, and integration tests covering all heartbeat fields.
HeartbeatManager— Per-namespace heartbeat scheduler. Workers register/unregister; scheduler fires at the configured interval. Gracefully shuts down if server returns UNIMPLEMENTED.PollerTracker— Tracks in-flight poll count and last successful poll time per worker type. Only records success when a poll returns actual work.WorkflowClientOptions.workerHeartbeatInterval— New option to configure heartbeat interval. Defaults to 60s. Can be set between 1-60s, or a negative duration to disable.Worker.getActiveTaskQueueTypes()— Reports WORKFLOW, ACTIVITY, and NEXUS (only when Nexus services are registered, matching Go SDK).Worker.buildHeartbeat()— Assembles the full WorkerHeartbeat proto with slot info, poller info, host metrics, sticky cache counters, and timestamps.TrackingSlotSupplier.getSlotSupplierKind()— Reports FixedSize vs ResourceBased in heartbeats.Why?
New feature!
Checklist
Closes Worker Heartbeating #2716
How was this tested:
Note
Medium Risk
Adds a new periodic heartbeat RPC path and threads that run in production workers, and changes worker shutdown requests/poller instrumentation, which could affect load and shutdown behavior if misconfigured or server capability detection is wrong.
Overview
Implements periodic worker heartbeating to Temporal Server: workers now register a per-namespace scheduled heartbeat callback that reports status plus runtime stats (slot usage, poller counts/last success, sticky cache hit/miss counts, host metrics, plugin info, and optional deployment version) and disables itself on
UNIMPLEMENTED.Introduces new tracking primitives (
PollerTracker,TaskCounter,NamespaceCapabilities) and wires them through workflow/activity/nexus pollers and task handlers so heartbeats can report in-flight pollers and processed/failed task counts;TrackingSlotSuppliernow exposes supplier kind and used-slot count for reporting.Extends
WorkflowClientOptionswith experimentalworkerHeartbeatInterval(default 60s, 1–60s allowed, negative disables) and updates worker shutdown to send richerShutdownWorkerRequestincludingtaskQueue,workerInstanceKey, active task queue types, and a final shutting down heartbeat. CI dev-server startup enablesfrontend.ListWorkersEnabledto support new worker listing tests.Reviewed by Cursor Bugbot for commit ff47c66. Bugbot is set up for automated code reviews on this repo. Configure here.