Skip to content

[WorkItem #19558] Resolved ptests failure in trident package#17427

Open
Ratiranjan5 wants to merge 1 commit into
microsoft:3.0-devfrom
Kanishk-Bansal:topic_trident-3.0
Open

[WorkItem #19558] Resolved ptests failure in trident package#17427
Ratiranjan5 wants to merge 1 commit into
microsoft:3.0-devfrom
Kanishk-Bansal:topic_trident-3.0

Conversation

@Ratiranjan5
Copy link
Copy Markdown

@Ratiranjan5 Ratiranjan5 commented May 25, 2026

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

  • The toolchain has been rebuilt successfully (or no changes were made to it)
  • The toolchain/worker package manifests are up-to-date
  • Any updated packages successfully build (or no packages were changed)
  • Packages depending on static components modified in this PR (Golang, *-static subpackages, etc.) have had their Release tag incremented.
  • Package tests (%check section) have been verified with RUN_CHECK=y for existing SPEC files, or added to new SPEC files
  • All package sources are available
  • cgmanifest files are up-to-date and sorted (./cgmanifest.json, ./toolkit/scripts/toolchain/cgmanifest.json, .github/workflows/cgmanifest.json)
  • LICENSE-MAP files are up-to-date (./LICENSES-AND-NOTICES/SPECS/data/licenses.json, ./LICENSES-AND-NOTICES/SPECS/LICENSES-MAP.md, ./LICENSES-AND-NOTICES/SPECS/LICENSE-EXCEPTIONS.PHOTON)
  • All source files have up-to-date hashes in the *.signatures.json files
  • sudo make go-tidy-all and sudo make go-test-coverage pass
  • Documentation has been updated to match any changes to the build system
  • Ready to merge

Summary

What does the PR accomplish, why was it needed?

  • Issue: flaky ptest failures due to nondeterministic error kind on HTTP retry timeout exhaustion.
  • fix: return TimedOut consistently when the HTTP retry deadline is exhausted.

AI Analysis relevance (In %) / Reasoning
AI Analysis relevance (In %): N/A (no work Item was created for it)
Upstream reference: microsoft/trident#458

Change Log
  • new file: SPECS/trident/0001-http-return-timedout-when-retry-deadline-is-exhauste.patch
  • modified: SPECS/trident/trident.spec
Does this affect the toolchain?

NO

Associated issues
Links to CVEs
Test Methodology
  • image

@microsoft-github-policy-service microsoft-github-policy-service Bot added Packaging 3.0-dev PRs Destined for AzureLinux 3.0 labels May 25, 2026
@Kanishk-Bansal Kanishk-Bansal added the ptest package testing (%check section in spec) label May 25, 2026
@Kanishk-Bansal
Copy link
Copy Markdown

Build

@Kanishk-Bansal Kanishk-Bansal marked this pull request as ready for review May 25, 2026 06:21
@Kanishk-Bansal Kanishk-Bansal requested a review from a team as a code owner May 25, 2026 06:21
@Ratiranjan5
Copy link
Copy Markdown
Author

Build

Buddy build has passed.

Copy link
Copy Markdown

@suresh-thelkar suresh-thelkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the following questions, can you please go through them and answer?

  1. Upstream PR for the actual fix. "Upstream reference: microsoft/trident#458" points to the PR that introduced the code, not one that contains this fix. Please open a corresponding PR against microsoft/trident (main) that lands the same TimedOut change in crates/trident/src/io_utils/http/mod.rs::retriable_request_sender, and link that PR here. Right now this patch is a downstream-only divergence from upstream main.

  2. Justify collapsing the error classification. The current patch replaces both branches (response.error_for_status().map_err(http_to_io_err) and Err(http_to_io_err(e))) with an unconditional IoErrorKind::TimedOut. That means a final-attempt 404, 401/403, 5xx, or a transport error (is_connect, is_request) — which http_to_io_err maps to NotFound / PermissionDenied / ConnectionAborted / ConnectionRefused / InvalidData — will now all be reported as TimedOut. Either:

narrow the fix so the status/transport-derived IoErrorKind is preserved on the final attempt and only synthesize TimedOut when the underlying error has no better classification, or
explain why losing that classification is acceptable for all callers.

  1. Why patch the source instead of fixing the ptest? The PR summary says the failure is "nondeterministic error kind on HTTP retry timeout exhaustion." Was fixing the test (accept either TimedOut or the mapped HTTP kind) considered? If yes, why was changing production behavior preferred?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.0-dev PRs Destined for AzureLinux 3.0 Packaging ptest package testing (%check section in spec)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants