Skip to content

Commit 58daba3

Browse files
sjarmakclaude
andcommitted
fix: infrastructure bugs in 5 feature task Dockerfiles and F1 scorers
- tensorrt-mxfp4: add clone-as-claude pattern (agent couldn't write to /workspace) - vscode-stale-diagnostics: add clone-as-claude pattern + npm install in test.sh for sg_only clone-at-verify (npx tsc fails without node_modules) - servo-scrollend: add libclang-dev to both Dockerfile and Dockerfile.sg_only (cargo check fails because bindgen requires libclang.so) - k8s-runtime-object + envoy-grpc: add normalize_repo() to F1 scorer to handle sg-evals mirror names (sg-evals/api--f32ed1d6 → api matches kubernetes/api) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c861f42 commit 58daba3

7 files changed

Lines changed: 97 additions & 10 deletions

File tree

benchmarks/ccb_feature/envoy-grpc-server-impl-001/tests/test.sh

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,10 +137,45 @@ try:
137137
write_reward(0.0)
138138
sys.exit(0)
139139
140+
# ── Repo name normalization ────────────────────────────────────────────
141+
def normalize_repo(name):
142+
"""Normalize repo names to a canonical short form for matching.
143+
144+
Handles sg-evals mirror names and upstream org/repo names by
145+
extracting just the repo basename (last path segment).
146+
147+
sg-evals/istio--2300e245 -> istio
148+
github.com/sg-evals/istio--h -> istio
149+
istio/istio -> istio
150+
envoyproxy/go-control-plane -> go-control-plane
151+
emissary-ingress/emissary -> emissary
152+
"""
153+
n = name.strip()
154+
# Strip URL prefixes
155+
for prefix in ("github.com/", "https://github.com/"):
156+
if n.startswith(prefix):
157+
n = n[len(prefix):]
158+
# Strip sg-evals/ prefix
159+
if n.startswith("sg-evals/"):
160+
n = n[len("sg-evals/"):]
161+
# Strip --hexhash suffix (8+ hex chars after --)
162+
n = re.sub(r'--[0-9a-f]{7,}$', '', n)
163+
# Take just the last path segment (repo basename)
164+
if "/" in n:
165+
n = n.rsplit("/", 1)[-1]
166+
return n
167+
140168
# ── Build composite keys ─────────────────────────────────────────────
141169
def make_key(entry, fields):
142-
"""Build a composite key tuple from an entry's field values."""
143-
return tuple(str(entry.get(f, "")).strip() for f in fields)
170+
"""Build a composite key tuple from an entry's field values.
171+
Normalizes 'repo' field to handle sg-evals mirror names."""
172+
parts = []
173+
for f in fields:
174+
val = str(entry.get(f, "")).strip()
175+
if f == "repo":
176+
val = normalize_repo(val)
177+
parts.append(val)
178+
return tuple(parts)
144179
145180
expected_keys = [make_key(e, key_fields) for e in expected]
146181
reported_keys = [make_key(r, key_fields) for r in reported]

benchmarks/ccb_feature/k8s-runtime-object-impl-001/tests/test.sh

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,10 +137,44 @@ try:
137137
write_reward(0.0)
138138
sys.exit(0)
139139
140+
# ── Repo name normalization ────────────────────────────────────────────
141+
def normalize_repo(name):
142+
"""Normalize repo names to a canonical short form for matching.
143+
144+
Handles sg-evals mirror names and upstream org/repo names by
145+
extracting just the repo basename (last path segment).
146+
147+
sg-evals/api--f32ed1d6 -> api
148+
github.com/sg-evals/api--hash -> api
149+
kubernetes/api -> api
150+
envoyproxy/go-control-plane -> go-control-plane
151+
"""
152+
n = name.strip()
153+
# Strip URL prefixes
154+
for prefix in ("github.com/", "https://github.com/"):
155+
if n.startswith(prefix):
156+
n = n[len(prefix):]
157+
# Strip sg-evals/ prefix
158+
if n.startswith("sg-evals/"):
159+
n = n[len("sg-evals/"):]
160+
# Strip --hexhash suffix (8+ hex chars after --)
161+
n = re.sub(r'--[0-9a-f]{7,}$', '', n)
162+
# Take just the last path segment (repo basename)
163+
if "/" in n:
164+
n = n.rsplit("/", 1)[-1]
165+
return n
166+
140167
# ── Build composite keys ─────────────────────────────────────────────
141168
def make_key(entry, fields):
142-
"""Build a composite key tuple from an entry's field values."""
143-
return tuple(str(entry.get(f, "")).strip() for f in fields)
169+
"""Build a composite key tuple from an entry's field values.
170+
Normalizes 'repo' field to handle sg-evals mirror names."""
171+
parts = []
172+
for f in fields:
173+
val = str(entry.get(f, "")).strip()
174+
if f == "repo":
175+
val = normalize_repo(val)
176+
parts.append(val)
177+
return tuple(parts)
144178
145179
expected_keys = [make_key(e, key_fields) for e in expected]
146180
reported_keys = [make_key(r, key_fields) for r in reported]

benchmarks/ccb_feature/servo-scrollend-event-feat-001/environment/Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
1111
build-essential \
1212
autoconf \
1313
pkg-config \
14+
libclang-dev \
1415
&& rm -rf /var/lib/apt/lists/*
1516

1617
# Create claude user and writable work dirs before clone.

benchmarks/ccb_feature/servo-scrollend-event-feat-001/environment/Dockerfile.sg_only

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
1212
ca-certificates \
1313
python3 \
1414
curl \
15+
libclang-dev \
1516
&& rm -rf /var/lib/apt/lists/*
1617

1718
WORKDIR /workspace

benchmarks/ccb_feature/tensorrt-mxfp4-quant-feat-001/environment/Dockerfile

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,22 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
1111
ninja-build \
1212
&& rm -rf /var/lib/apt/lists/*
1313

14+
# Create claude user and writable work dirs before clone.
15+
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true) && \
16+
mkdir -p /workspace /logs && \
17+
chown -R claude:claude /workspace /logs
18+
1419
# Clone the actual TensorRT-LLM repository at pinned commit
1520
# This ensures BOTH baseline and MCP agents have identical file access
21+
USER claude
1622
RUN git clone --depth 1 https://github.com/sg-evals/TensorRT-LLM--b98f3fca.git . && \
1723
git config user.email "agent@example.com" && \
1824
git config user.name "Agent"
25+
USER root
1926

2027
# Install Python dependencies (basic setup only)
2128
# Full build may require CUDA and other deps, but we need the source files
2229
RUN pip install --upgrade pip && \
2330
pip install pybind11==2.13.6 2>&1 || true
2431

25-
# Task setup complete
26-
# Note: Both baseline and MCP agents now have access to the real TensorRT-LLM source.
27-
# Baseline will use local grep/find/rg. MCP will use Sourcegraph semantic search.
32+
ENTRYPOINT []

benchmarks/ccb_feature/vscode-stale-diagnostics-feat-001/environment/Dockerfile

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,20 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
1111
python3-pip \
1212
&& rm -rf /var/lib/apt/lists/*
1313

14+
# Create claude user and writable work dirs before clone.
15+
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true) && \
16+
mkdir -p /workspace /logs && \
17+
chown -R claude:claude /workspace /logs
18+
1419
# Clone the actual VS Code repository at pinned commit (1.96.0)
1520
# This ensures BOTH baseline and MCP agents have identical file access
21+
USER claude
1622
RUN git clone --depth 1 https://github.com/sg-evals/vscode--138f619c.git . && \
1723
git config user.email "agent@example.com" && \
1824
git config user.name "Agent"
25+
USER root
1926

2027
# Install VS Code dependencies (required for testing)
2128
RUN npm install --legacy-peer-deps 2>&1 | tail -10 || true
2229

23-
# Task setup complete
24-
# Note: Both baseline and MCP agents now have access to the real VS Code source.
25-
# Baseline will use local grep/find/rg. MCP will use Sourcegraph semantic search.
30+
ENTRYPOINT []

benchmarks/ccb_feature/vscode-stale-diagnostics-feat-001/tests/test.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,12 @@ fi
6868

6969
echo "Running VS Code test suite..."
7070

71+
# ── Install dependencies if missing (needed after sg_only clone-at-verify) ─
72+
if [ ! -d "node_modules" ] && [ -f "package.json" ]; then
73+
echo "Installing npm dependencies (post-clone)..."
74+
npm install --legacy-peer-deps 2>&1 | tail -5 || true
75+
fi
76+
7177
# ── TypeScript type-check ────────────────────────────────────────────────
7278
# VS Code uses its own tsconfig files. Run tsc --noEmit to verify the
7379
# modified TypeScript still type-checks. If it fails, score is 0.

0 commit comments

Comments
 (0)