apache · masaori335 · Apr 20, 2026 · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026
diff --git a/connect_attempts_report.md b/connect_attempts_report.md
@@ -0,0 +1,305 @@
+# Report of connect attempts bugs
+
+- affected versions: 10.1.x and smaller
+
+Relevant configs:
+
+- `proxy.config.http.connect_attempts_max_retries`
+- `proxy.config.http.connect_attempts_max_retries_down_server`
+- `proxy.config.http.connect_attempts_rr_retries`
+- `proxy.config.http.down_server.cache_time` (the `fail_window`)
+
+PRs:
+
+- https://github.com/apache/trafficserver/pull/12846
+- https://github.com/apache/trafficserver/pull/12880
+- https://github.com/apache/trafficserver/pull/13092
+- https://github.com/apache/trafficserver/pull/13102
+
+## Net effect between 10.1.x and later
+
+On 10.1.x and earlier, the three `connect_attempts_*` configs behaved
+erratically:
+
+- `connect_attempts_max_retries_down_server` was effectively unreachable
+  because the `is_down()` check was inverted (#12846).
+- Single-address origins ignored down-state entirely — `select_best_http`
+  unconditionally returned the first (and only) entry (#12880).
+- The "mark host DOWN" threshold was gated by `connect_attempts_rr_retries`
+  instead of `connect_attempts_max_retries` (#13102).
+- The fail-window clock used `client_request_time` instead of the actual
+  failure time, so the window could start in the past or be pre-expired
+  (#13102).
+- After `select_next_rr()`, `dst_addr` was not updated, so RR "switches"
+  silently kept hitting the same target (#13102).
+- `mark_host_failure()` ran at end-of-txn against the post-switch
+  `dns_info.active`, so failures on A could be charged to B (#13102).
+- RR exhaustion terminated retries early even when the current target
+  still had per-host budget (#13102).
+
+The combined effect of PRs #12846, #12880, #13092, and #13102 is a
+coherent model in which each config has a well-defined role tied to an
+explicit `UP / DOWN / SUSPECT` state for the active target.
+
+### Worked examples
+
+All examples use the ATS defaults:
+
+| Config                                       | Default |
+|----------------------------------------------|---------|
+| `connect_attempts_max_retries`               | 3       |
+| `connect_attempts_max_retries_down_server`   | 1       |
+| `connect_attempts_rr_retries`                | 3       |
+| `down_server.cache_time` (`fail_window`)     | 60s     |
+
+#### Example 1 — Single-address origin goes offline
+
+Client sends repeated requests to an origin whose only A record points to
+a down IP. How many connect attempts does ATS make before the IP is
+marked DOWN in HostDB?
+
+**10.1.x behavior — IP marked DOWN after 12 connect attempts (3 failed
+transactions).**
+
+Why: `mark_host_failure()` is only called from
+`do_hostdb_update_if_necessary()`, which is only reached at the end of a
+transaction (via `handle_server_connection_not_open`). So `fail_count`
+increments **once per failed transaction**, not once per attempt. The
+threshold is `connect_attempts_rr_retries` (default `3`) — the wrong
+config, but that's what 10.1.x uses. Per-transaction there are `1 +
+connect_attempts_max_retries = 4` attempts.
+
+| Attempt | Txn | `fail_count` after | marked DOWN? |
+|---------|-----|--------------------|--------------|
+| 1       | 1   | 0                  | no           |
+| 2       | 1   | 0                  | no           |
+| 3       | 1   | 0                  | no           |
+| 4       | 1   | 1 (end of txn)     | no           |
+| 5       | 2   | 1                  | no           |
+| 6       | 2   | 1                  | no           |
+| 7       | 2   | 1                  | no           |
+| 8       | 2   | 2 (end of txn)     | no           |
+| 9       | 3   | 2                  | no           |
+| 10      | 3   | 2                  | no           |
+| 11      | 3   | 2                  | no           |
+| **12**  | 3   | 3 ≥ 3              | **yes**      |
+
+And because `select_best_http()` returns `info[0]` unconditionally for
+single-address records (#12880), every one of those 12 attempts actually
+hits the network. Even after the IP is marked DOWN, subsequent requests
+keep trying it because the single-address code path ignores down-state.
+
+**Later behavior — IP marked DOWN after 4 connect attempts (1 failed
+transaction).**
+
+Why: `do_hostdb_update_if_necessary()` is now called after **each**
+failure in `handle_response_from_server()`, so `fail_count` increments
+**once per attempt**. The threshold is `max_retries + 1 = 4` (for an UP
+target). Per-transaction there are still `1 + max_retries = 4` attempts,
+so the threshold is reached on the last attempt of the first failed
+transaction.
+
+| Attempt | Txn | `fail_count` after | marked DOWN? |
+|---------|-----|--------------------|--------------|
+| 1       | 1   | 1                  | no           |
+| 2       | 1   | 2                  | no           |
+| 3       | 1   | 3                  | no           |
+| **4**   | 1   | 4 ≥ 4              | **yes**      |
+
+Subsequent requests within `fail_window` (60s):
+`select_best_http()` now honors down-state for single-address records,
+so HostDB lookup returns `nullptr` and the SM reports `OriginDown`
+immediately — **zero connect attempts**.
+
+After `fail_window` elapses, the entry is SUSPECT. The next request gets
+`connect_attempts_max_retries_down_server + 1 = 2` attempts (1 probe +
+1 retry). Success → UP; failure → back to DOWN for another 60s.
+
+#### Example 2 — Round-robin with one bad backend (A fails, B/C healthy)
+
+Defaults apply: `rr_retries = 3`, `max_retries = 3`. A is unresponsive,
+B and C are fine. Initial active = A.
+
+**10.1.x behavior — client gets 502, and the wrong host is blamed.**
+
+Two compounding bugs: (a) after `select_next_rr()` is called, neither
+`dns_info.addr` nor `server_info.dst_addr` is updated, so the next
+connect still goes to A's address. (b) `mark_host_failure()` is called
+once at end-of-txn against `dns_info.active`, which by that point is
+the *post-switch* host (B), so B's `fail_count` is incremented for A's
+failures.
+
+| Attempt | dst_addr (target) | dns_info.active | retry_attempts after | `(retry+1) % 3 == 0`? | Result |
+|---------|-------------------|-----------------|----------------------|----------------------|--------|
+| 1       | A                 | A               | 1                    | no                   | fail   |
+| 2       | A                 | A               | 2                    | no                   | fail   |
+| 3       | A                 | A → switched to B (`select_next_rr`); dst_addr untouched | 3 | **yes** | fail |
+| 4       | A (still!)        | B               | 3 ≥ 3, give up       | n/a                  | fail → 502 to client |
+
+End of txn: `mark_host_failure()` increments `B.fail_count` even
+though B was never tried. Repeated requests do this 3 times → **B
+gets marked DOWN** while A (the actually-broken host) keeps getting
+hit on every transaction.
+
+**Later behavior — client gets 200; A's failures are correctly
+recorded against A.**
+
+`do_hostdb_update_if_necessary()` runs after each failure (so the
+correct `dns_info.active` is in scope), and after `select_next_rr()`
+the SM explicitly assigns `dns_info.addr` and `server_info.dst_addr`
+to the new target.
+
+| Attempt | dst_addr (target) | dns_info.active | A.fail_count after | retry_attempts after | RR switch? | Result |
+|---------|-------------------|-----------------|--------------------|----------------------|------------|--------|
+| 1       | A                 | A               | 1                  | 1                    | no         | fail   |
+| 2       | A                 | A               | 2                  | 2                    | no         | fail   |
+| 3       | A                 | A               | 3                  | 3                    | **yes** → active=B, dst_addr=B | fail |
+| **4**   | **B**             | B               | 3 (unchanged)      | n/a                  | n/a        | **success → 200** |
+
+A finishes this transaction with `fail_count = 3` (not yet at the
+threshold of `max_retries + 1 = 4`). The next request that hashes to
+A bumps it to 4 and marks A DOWN; subsequent requests within
+`fail_window` skip A entirely until the window expires.
+
+If B and C had also been DOWN (no selectable alternate),
+`select_next_rr()` returns false and the SM stays on A using its
+remaining per-host retry budget instead of giving up — new in
+#13102.
+
+**Later behavior**
+
+- Failure timestamp is `ts_clock::now()` at the moment `mark_down`
+  is called, so the full `fail_window` is honored regardless of how
+  long the transaction took to decide the host was bad.
+
+
+## Summary of bug-fix PRs
+
+### PR #12846 — Fix `HostDBInfo::is_down` condition
+
+**Bug.** `HostDBInfo::is_down()` had an inverted comparison:
+
+```cpp
+// Before
+return (last_fail != TS_TIME_ZERO) && (last_fail + fail_window < now);
+// After
+return (last_fail != TS_TIME_ZERO) && (now <= last_fail + fail_window);
+```
+
+The pre-fix code reported a host as DOWN only *after* `fail_window` had elapsed —
+the opposite of intent. During the fail window (when the host should be
+considered blocked) it returned `false`.
+
+**Behavior change.** A failed host is now correctly treated as DOWN for the
+entire `fail_window` following `last_failure`, and returns to UP only after
+the window has elapsed. Every downstream path that calls `is_down()` (retry
+budget, RR selection, negative-cached check) is impacted.
+
+### PR #12880 — Check state of HostDBInfo in `select_best_http`
+
+**Bug.** For single-address (non round-robin) records,
+`HostDBRecord::select_best_http()` bypassed the state check:
+
+```cpp
+// Before
+best_alive = &info[0];                       // unconditional
+// After
+if (info[0].select(now, fail_window)) {
+  best_alive = &info[0];
+}
+```
+
+A single-address host that was marked down would still be returned from
+HostDB and reused for the next request.
+
+**Behavior change.** Single-address origins now honor `down_server` state the
+same way RR origins do: if the only target is DOWN and still within the fail
+window, HostDB lookup fails (`nullptr`), the SM reports `OriginDown`, and no
+connection attempt is made until the window expires. The gold autest
+`dns_host_down` was updated accordingly (second request returns 500 instead
+of 502 once the first failure marked the IP down).
+
+### PR #13092 — Clarify `HostDBInfo` state
+
+**Not a bug fix — a model/refactor.** Introduces an explicit tri-state for
+upstream health and reshapes the API around it:
+
+```cpp
+enum class HostDBInfo::State { UP, DOWN, SUSPECT };
+```
+
+- **UP** — no known failure; normal selection.
+- **DOWN** — `_last_failure` set, `now` is within `fail_window`; not eligible.
+- **SUSPECT** — `fail_window` has elapsed; a probe is permitted. A successful
+  response transitions it to UP via `mark_up()`; another failure returns it
+  to DOWN.
+
+API renames: `is_alive` → `is_up`, `mark_active_server_alive` →
+`mark_active_server_up`. `mark_down` now takes `fail_window`. `select()` is
+replaced by callers using `is_down()` directly. `last_failure` / `fail_count`
+become private atomics (`_last_failure`, `_fail_count`).
+
+**Behavior change.** The previous two-state model collapsed SUSPECT into UP
+once `fail_window` expired, which meant a recovering host was treated
+identically to a never-failed host for retry sizing. With SUSPECT explicit,
+callers (notably the retry machinery in #13102) can apply
+`connect_attempts_max_retries_down_server` to probing traffic and
+`connect_attempts_max_retries` to UP traffic.
+
+### PR #13102 — Fix connect attempt retries
+
+**Bugs.** The retry machinery in
+`HttpTransact::handle_response_from_server()` and
+`HttpSM::mark_host_failure()` had several interlocking issues:
+
+1. **Wrong threshold used to mark host down.** `mark_host_failure()` called
+   `increment_fail_count(..., connect_attempts_rr_retries, ...)`. The RR
+   switch-over config was reused as the "mark down" threshold, so a host was
+   marked DOWN after `connect_attempts_rr_retries` failures instead of
+   `connect_attempts_max_retries + 1` total attempts.
+2. **Wrong clock for failure timestamp.**
+   `do_hostdb_update_if_necessary()` recorded the failure at
+   `t_state.client_request_time` (request-receipt time), not
+   `ts_clock::now()`. For long requests the fail window started in the past,
+   sometimes expiring before the failure was even recorded.
+3. **RR exhaustion short-circuited all retries.** When the
+   `connect_attempts_rr_retries` boundary was hit and no other RR member was
+   selectable, the old code gave up, even though the current target still
+   had per-host retry budget.
+4. **`connect_attempts_max_retries_down_server` had no effect in practice.**
+   It was only consulted through `is_server_negative_cached()`, which
+   depended on the inverted `is_down()` from #12846. In many cases it was
+   either never applied or applied when it shouldn't have been.
+5. **`connect_attempts_max_retries_down_server == 0` blocked the probe.**
+   A separate code path in `do_http_server_open()` refused to connect at all
+   when the target was negative-cached and the config was zero — preventing
+   the SUSPECT-state probe that the fail window was designed to allow.
+6. **Retry config range allowed overflow.** `[0-255]` with `uint8_t`
+   threshold = `max_retries + 1` wraps to 0.
+7. Rename `proxy.config.http.connect_attempts_max_retries_down_server` with `proxy.config.http.connect_attempts_max_retries_suspicious_server` to clarify behavior.
+   The `proxy.config.http.connect_attempts_max_retries_down_server` config is deprecated.
+
+
+
+**Behavior change.**
+
+- Retry budget is now driven by HostDB state (via the new helper
+  `HttpTransact::origin_server_connect_attempts_max_retries`):
+  - `UP`      → `connect_attempts_max_retries`
+  - `DOWN`    → 0 (bail out immediately)
+  - `SUSPECT` → `connect_attempts_max_retries_down_server`
+- `mark_host_failure` uses `max_retries + 1` as the attempt budget and
+  `ts_clock::now()` as the failure time.
+- RR switch and per-host retry are separated: if RR has no selectable
+  alternate, the SM stays on the current target and keeps retrying up to
+  the per-host budget rather than giving up.
+- The `do_http_server_open()` early-bail on "negative cached +
+  down_server=0" is removed — DOWN vs SUSPECT is now the single source of
+  truth.
+- `proxy.config.http.connect_attempts_max_retries`,
+  `…_max_retries_down_server`, and `…_rr_retries` clamped to `[0-254]` to
+  avoid the `uint8_t` overflow when adding 1 for the total-attempt count.
+- A new warning is logged at config reconfigure when
+  `connect_attempts_max_retries_down_server == 0` and
+  `connect_attempts_rr_retries > 0`, because that combination prevents any
+  probe of a recovering (SUSPECT) origin.
diff --git a/doc/admin-guide/files/records.yaml.en.rst b/doc/admin-guide/files/records.yaml.en.rst
@@ -1812,15 +1812,28 @@ Origin Server Connect Attempts
    The maximum number of connection retries |TS| can make when the origin server is not responding.
    Each retry attempt lasts for `proxy.config.http.connect_attempts_timeout`_ seconds.  Once the maximum number of retries is
    reached, the origin is marked down (as controlled by `proxy.config.http.connect.down.policy`_.  After this, the setting
-   `proxy.config.http.connect_attempts_max_retries_down_server`_ is used to limit the number of retry attempts to the known down origin.
+   `proxy.config.http.connect_attempts_max_retries_suspect_server`_ is used to limit the number of retry attempts when the origin is
+   in the SUSPECT state (recovering after `proxy.config.http.down_server.cache_time`_ has elapsed).
+
+.. ts:cv:: CONFIG proxy.config.http.connect_attempts_max_retries_suspect_server INT 1
+   :reloadable:
+   :overridable:
+
+   Maximum number of connection retries |TS| can make while an origin is in the SUSPECT state (the first request after
+   `proxy.config.http.down_server.cache_time`_ has elapsed on a previously-down origin). The total attempt budget for a SUSPECT
+   origin is therefore ``connect_attempts_max_retries_suspect_server + 1`` (the initial probe plus each retry). If any attempt
+   succeeds, the origin transitions back to UP; if all attempts fail, the origin returns to DOWN for another
+   `proxy.config.http.down_server.cache_time`_ seconds. Typically smaller than `proxy.config.http.connect_attempts_max_retries`_
+   so the recovering origin is not flooded.
 
 .. ts:cv:: CONFIG proxy.config.http.connect_attempts_max_retries_down_server INT 1
    :reloadable:
    :overridable:
+   :deprecated:
 
-   Maximum number of connection attempts |TS| can make while an origin is marked down per request.  Typically this value is smaller than
-   `proxy.config.http.connect_attempts_max_retries`_ so an error is returned to the client faster and also to reduce the load on the down origin.
-   The timeout interval `proxy.config.http.connect_attempts_timeout`_ in seconds is used with this setting.
+   This setting is deprecated in favor of :ts:cv:`proxy.config.http.connect_attempts_max_retries_suspect_server`. If the
+   deprecated setting is set explicitly and the replacement is not, the deprecated value is mirrored forward and a warning is
+   logged. If both are set explicitly, the new setting wins and the deprecated value is ignored.
 
 .. ts:cv:: CONFIG proxy.config.http.connect_attempts_retry_backoff_base INT 0
    :reloadable: