Skip to content

Sandbox forward reports success without a reachable local listener #1878

@shiju-nv

Description

@shiju-nv

Agent Diagnostic

The agent reproduced this against freshly fetched upstream origin/main.

The existing integration harness already provides a fake mTLS OpenShell server, create_ssh_session, get_sandbox, isolated config directories, and a fake ssh binary. The fake ssh binary exits successfully and creates no listener:

#!/bin/sh
exit 0

sandbox_forward returns Ok(()) even though no background SSH process is trackable and no local port listener exists.

Description

Actual behavior: On current origin/main, run::sandbox_forward(..., background = true, ...) can return success after the ssh -f command exits zero even when the CLI cannot discover the forked SSH process and no local TCP listener is reachable. The user can be left with a reported forward that cannot be stopped by OpenShell and may not be serving traffic at all.

The foreground forward path has the same missing health proof at the source level: after a grace period it prints the local access URL without probing the listener.

Expected behavior: Forwarding should report success only after the local listener is connectable. Background mode should fail closed if PID discovery fails, should probe the listener before writing the PID file, and should clean up any discovered background SSH process if the listener never becomes reachable.

Reproduction Steps

Add the temporary test from the Agent Diagnostic section after test_tls in crates/openshell-cli/tests/sandbox_create_lifecycle_integration.rs.

#[tokio::test]
async fn sandbox_forward_background_fails_when_ssh_process_untracked_and_listener_missing() {
    let server = run_server().await;
    let fake_ssh_dir = tempfile::tempdir().unwrap();
    let xdg_dir = tempfile::tempdir().unwrap();
    let _env = test_env(&fake_ssh_dir, &xdg_dir);
    let tls = test_tls(&server);
    install_fake_ssh(&fake_ssh_dir);

    let listener = std::net::TcpListener::bind(("127.0.0.1", 0)).unwrap();
    let port = listener.local_addr().unwrap().port();
    drop(listener);
    let spec = openshell_core::forward::ForwardSpec::new(port);

    let err = run::sandbox_forward(&server.endpoint, "forward-repro", &spec, true, &tls)
        .await
        .expect_err(
            "background forward should fail when ssh exits successfully but no background process or local listener exists",
        );

    let text = format!("{err:?}");
    assert!(
        text.contains("Could not discover") || text.contains("listener"),
        "unexpected error: {text}",
    );
}

Observe that the test fails because sandbox_forward returned Ok(()).

Environment

  • OS: macOS machine
  • Docker/Podman/Kubernetes: not required for this reproduction
  • OpenShell: origin/main at e73745f10e7decc55c889e5e101fdb00baabfa8c
  • Gateway mode: local fake mTLS gRPC server from the OpenShell CLI integration tests
  • Sandbox driver: not driver-specific; test calls sandbox_forward directly
  • SSH implementation: test fake ssh binary that exits zero and creates no listener

Logs

cargo test -p openshell-cli --test sandbox_create_lifecycle_integration sandbox_forward_background_fails_when_ssh_process_untracked_and_listener_missing -- --nocapture

Finished `test` profile [unoptimized + debuginfo] target(s) in 2m 32s
Running tests/sandbox_create_lifecycle_integration.rs (debug/deps/sandbox_create_lifecycle_integration-b7f9d98331efe6d5)

running 1 test
! Could not discover backgrounded SSH process; forward may be running but is not tracked

thread 'sandbox_forward_background_fails_when_ssh_process_untracked_and_listener_missing' (354098477) panicked at crates/openshell-cli/tests/sandbox_create_lifecycle_integration.rs:785:10:
background forward should fail when ssh exits successfully but no background process or local listener exists: ()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test sandbox_forward_background_fails_when_ssh_process_untracked_and_listener_missing ... FAILED

failures:

failures:
    sandbox_forward_background_fails_when_ssh_process_untracked_and_listener_missing

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 15 filtered out; finished in 0.35s

error: test failed, to rerun pass `-p openshell-cli --test sandbox_create_lifecycle_integration`

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    gator:validatedGator validated this issue as ready for workstate:triage-neededOpened without agent diagnostics and needs triage

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions