Skip to content

[Feature]: structured output from shell steps (opt-in output_format: json) #2962

@doquanghuy

Description

@doquanghuy

Problem Statement

No workflow step that runs external code can hand a typed value to a later step. Shell steps expose only {exit_code, stdout, stderr} (src/specify_cli/workflows/steps/shell/__init__.py:42-44), command steps the same (and their stdout streams, so it's empty in run state — see SK-style discussions around streaming), and workflow inputs are scalar-only (string/number/boolean). The practical consequence: a fan-out can never consume a dynamically-computed collection — there is no end-to-end path from "a script computed a list" to "items: receives a list".

#2960 (from_json filter) covers the consumption side for shell stdout, but a structural opt-in on the step itself would make the contract explicit and also benefit steps whose stdout isn't reliably capturable.

Proposed Solution

An opt-in output_format: json on shell steps:

- id: emit
  type: shell
  run: "python extract.py"
  output_format: json

When set, stdout is parsed and exposed under output.data (raw stdout/stderr/exit_code keys unchanged — no merge/clobber ambiguity), so later steps can use {{ steps.emit.output.data.items }}. A parse failure fails the step with a clear error — declaring the format is a contract, and silence would hide wiring bugs. Fully backward-compatible: without the key, behavior is byte-identical.

This is a proposal with a reference implementation — happy to rework toward whichever direction you prefer.

Alternatives Considered

  • output_file: <path> — the step writes JSON to a file the engine reads into output.*; better for large payloads and streaming-stdout steps, slightly more machinery.
  • A declared named-outputs: schema (map output names to JSON paths) — the most expressive, biggest API surface.
  • Status quo + #2960's from_json — works for shell stdout only, and leaves the contract implicit.

Component

Workflow engine (shell step)

AI Agent (if applicable)

n/a

Use Cases

  • Fan-out over runtime-computed items (extract → items: {{ steps.emit.output.data.items }}).
  • Conditions/args consuming structured tool results without string hacks.

Coordination note: I'm aware open PR #2443 also touches steps/shell/__init__.py (security opt-in) — this change is isolated to the structured-output addition and I'm happy to rebase/coordinate in whichever order suits.

AI disclosure (per CONTRIBUTING.md): this issue and the accompanying reference PR were authored with AI assistance (Claude); verified by running the repo's test suite locally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions