Skip to content

Latest commit

 

History

History
227 lines (163 loc) · 4.76 KB

File metadata and controls

227 lines (163 loc) · 4.76 KB

Using CodexOpt with Codex

Use this guide when your repo already has Codex instruction files and you want CodexOpt to improve them safely.

CodexOpt works with the same files Codex loads:

  • AGENTS.md
  • .codex/skills/**/SKILL.md
  • .agents/skills/**/SKILL.md

Start With A Preview

Run this from the repo where you use Codex:

uv run codexopt improve

This command:

  1. finds AGENTS.md and SKILL.md files
  2. mines starter tasks from git history and skill descriptions
  3. runs the reflective optimizer in preview mode
  4. shows what would change
  5. writes review artifacts under .codexopt/

The default preview stays offline. It does not spend Codex or API budget unless you ask it to.

Run The Live Codex Loop

Use live mode when you want CodexOpt to evaluate actual Codex behavior:

uv run codexopt improve --live

Live mode uses codex exec as the optimizer and judge. CodexOpt evaluates the candidate instruction file, captures feedback from the run, proposes a focused rewrite, and keeps the rewrite only when it improves held-out tasks.

Apply The Result

After reviewing the preview, apply validated changes:

uv run codexopt improve --live --apply

CodexOpt writes backups before changing files.

Review The Report

Write a markdown report after any run:

uv run codexopt report --output codexopt-report.md

The report shows:

  • files found
  • files improved
  • validation score movement
  • accepted reflective edits
  • sampled feedback that led to the edit
  • fallback notes when CodexOpt had to use a weaker signal

Step By Step Workflow

Use this flow when you want more control than improve:

uv run codexopt init
uv run codexopt scan
uv run codexopt benchmark
uv run codexopt optimize skills --engine reflective
uv run codexopt apply --kind skills --dry-run
uv run codexopt report --output codexopt-report.md

Review the dry-run diff, then apply:

uv run codexopt apply --kind skills

For AGENTS.md:

uv run codexopt optimize agents --engine reflective --file AGENTS.md
uv run codexopt apply --kind agents --dry-run

Add Simple Task Evidence

Task evidence tells CodexOpt what “better” means for your repo.

Create tasks.md:

- Update changelog entries for patch releases.
- Add regression tests before changing parser behavior.
- Summarize risky changes in the final response.

Reference it in codexopt.yaml:

evidence:
  task_files:
    - tasks.md

Then run:

uv run codexopt improve

CodexOpt uses these tasks for train and validation splits. A candidate must improve held-out validation score before it can win.

Mine Starter Tasks

If you do not have task evidence yet, generate a starter file:

uv run codexopt tasks init

Review the generated codexopt-tasks.json, trim anything noisy, then add it to evidence.task_files.

Add Command Rollouts

Use command rollouts when a deterministic verifier can decide whether a skill supports a workflow.

Create skill-rollouts.json:

[
  {
    "name": "release-skill-smoke",
    "description": "Verify the release skill mentions changelog and tests.",
    "command": "python scripts/verify_release_skill.py",
    "timeout_seconds": 30,
    "expected_stdout_contains": "ok"
  }
]

Reference it:

evidence:
  task_files:
    - skill-rollouts.json

Run:

uv run codexopt improve

CodexOpt copies the repo to a temporary directory, writes the candidate SKILL.md, runs the verifier, and uses pass rate as a strong reward signal.

Add Codex Rollouts

Use Codex rollouts when you want to test how Codex behaves with a candidate skill.

Create codex-rollouts.json:

[
  {
    "name": "codex-release-notes",
    "backend": "codex",
    "description": "Ask Codex to use the candidate release skill on a release-note task.",
    "codex_prompt": "Use the local release skill to update CHANGELOG.md for a patch release.",
    "timeout_seconds": 120,
    "expected_final_response_contains": "CHANGELOG.md",
    "expected_command_contains": "git status",
    "expected_file_change": "CHANGELOG.md",
    "expected_file_contains": {
      "path": "CHANGELOG.md",
      "contains": "Patch"
    }
  }
]

Run live mode:

uv run codexopt improve --live

CodexOpt runs codex exec --json in a temporary repo copy and records the trajectory:

  • final response
  • command executions
  • file changes
  • token usage
  • errors

What SkillOpt Means In CodexOpt

CodexOpt now includes SkillOpt-style discipline in the Codex workflow:

  • train and validation task splits
  • bounded edits
  • validation-gated acceptance
  • rollout-based reward when available
  • textual feedback that drives reflective mutation

For most users, the entry point is still simple:

uv run codexopt improve --live