Skip to content

BurstObservatory fails to timely health-check dynamically registered outbound handlers (simplified reverse proxy) #5750

@vemneyy

Description

@vemneyy

Integrity requirements

  • I have read all the comments in the issue template and ensured that this issue meet the requirements.
  • I confirm that I have read the documentation, understand the meaning of all the configuration items I wrote, and did not pile up seemingly useful options or default values.
  • I provided the complete config and logs, rather than just providing the truncated parts based on my own judgment.
  • I searched issues and did not find any similar issues.
  • The problem can be successfully reproduced in the latest Release

Description

When using the simplified reverse proxy configuration (VLESS user-level reverse field), BurstObservatory health checks experience a significant and unpredictable delay before the reverse proxy outbound is first checked. With the standard app/reverse portal/bridge configuration health checks work immediately on startup.

The BurstObservatory scheduler in app/observatory/burst discovers outbound tags via outbound.Manager.Select() (app/proxyman/outbound). This discovery happens at two points:

  1. Initial check immediately on StartScheduler(), one-shot
  2. Periodic checks every interval × samplingCount (default: 1min × 10 = 10 minutes)

The simplified reverse proxy (VLESS inbound GetReverse() in proxy/vless/inbound) registers its outbound handler dynamically via outbound.Manager.AddHandler() only when the first reverse client connects. This happens after the Observatory has already started.

Comparison with standard reverse config

With the standard app/reverse configuration ("reverse": {"portals": [...]}), the portal outbound is registered during config initialization before Observatory starts. So Select() finds it immediately on the first call. No delay.

Impact

  • Balancer strategies (leastPing, leastLoad) that rely on Observatory results will treat rproxy-* outbounds as having no observation data for up to 10 minutes
  • Depending on the strategy, unobserved outbounds may be simply ignored leading to unpredictable routing behavior

Possible solutions

A. Early re-checks in StartScheduler (smallest change)

Add a few early re-check passes (e.g. at T+5s, T+15s, T+30s) in app/observatory/burst/healthping.go to catch dynamically registered outbounds during the startup window. Zero architectural changes, zero new interfaces, zero memory overhead.

Pros: minimal diff, no cross-module coupling
Cons: outbounds appearing after 30s still wait for the next ticker

B. Handler-added callback in outbound.Manager (moderate change)

Add a listener/callback mechanism to app/proxyman/outbound.Manager that BurstObserver subscribes to. When AddHandler() is called, the callback triggers an immediate re-check.

Possible interface addition:

// in features/outbound/outbound.go or app/proxyman/outbound/outbound.go
type HandlerChangeListener interface {
    OnHandlerAdded(tag string)
}

// or simpler — just a func callback:
func (m *Manager) SetOnHandlerAdded(f func())

Pros: precise, instant detection, no polling
Cons: requires interface change or type assertion; cross-module coupling
I think its the best solution

C. Remove or TTL-limit Select() caching (small change, 1 file)

Remove the tagsCache in outbound.Manager.Select() or add a short TTL. This makes every Select() call re-scan taggedHandler.

Pros: simple, solves the root cause
Cons: Select() is called by balancers on every request — removing cache impacts hot path performance

Reproduction Method

T+0s      BurstObservatory.Start()
          → selector() → Select(["proxy-", "rproxy-"])
          → finds: ["proxy-us", "proxy-de", "proxy-jp"]
          → rproxy-home does NOT exist yet
          → Check() runs for proxy-* only 
          → result cached in tagsCache

T+3s      Reverse client connects
          → GetReverse() → AddHandler("rproxy-home")
          → tagsCache reset (new sync.Map)
          → rproxy-home is now in taggedHandler

T+3s–10m  Observatory is idle, waiting for ticker
          → rproxy-home gets NO health checks

T+10m     Ticker fires → selector() → Select()
          → cache miss → re-scans taggedHandler
          → finds: ["proxy-de", "proxy-jp", "proxy-us", "rproxy-home"]
          → rproxy-home finally gets health-checked

The delay between outbound registration and first health check is up to interval × samplingCount (10 minutes by default). This is non-deterministic it depends on when the reverse client connects relative to the ticker cycle.

Environment

  • Xray-core (recent versions)
  • burstObservatory with subjectSelector matching both regular and reverse proxy outbound tags

Configuration example

{
  "burstObservatory": {
    "subjectSelector": ["proxy-", "rproxy-"],
    "pingConfig": {
      "destination": "https://connectivitycheck.gstatic.com/generate_204",
      "interval": "1m",
      "sampling": 10
    }
  }
}

Outbound tags:

  • proxy-us, proxy-de, proxy-jp is regular outbounds, exist at startup
  • rproxy-home is simplified reverse proxy, created dynamically on first client connection

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions