Skip to content

Commit f7322b2

Browse files
committed
Address Copilot review feedback
- Include untracked (newly created) files in diff-scope scan set via git ls-files --others --exclude-standard - Fix staged scope to scan the index version of each file using git show :<path> into a temp file, ensuring scan matches what will actually be committed - Add json_escape() helper to escape backslashes and double quotes in file paths and redacted values before JSON string concatenation - Replace echo with printf for grep pipes to handle lines starting with -n/-e safely - Replace em dashes with colons and commas throughout
1 parent 87cf333 commit f7322b2

File tree

2 files changed

+46
-26
lines changed

2 files changed

+46
-26
lines changed

hooks/secrets-scanner/README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,12 @@ Scans files modified during a GitHub Copilot coding agent session for accidental
1212

1313
AI coding agents generate and modify code rapidly, which increases the risk of hardcoded secrets slipping into the codebase. This hook acts as a safety net by scanning all modified files at session end for 20+ categories of secret patterns, including:
1414

15-
- **Cloud credentials** AWS access keys, GCP service account keys, Azure client secrets
16-
- **Platform tokens** GitHub PATs, npm tokens, Slack tokens, Stripe keys
17-
- **Private keys** RSA, EC, OpenSSH, PGP, DSA private key blocks
18-
- **Connection strings** Database URIs (PostgreSQL, MongoDB, MySQL, Redis, MSSQL)
19-
- **Generic secrets** API keys, passwords, bearer tokens, JWTs
20-
- **Internal infrastructure** Private IP addresses with ports
15+
- **Cloud credentials**: AWS access keys, GCP service account keys, Azure client secrets
16+
- **Platform tokens**: GitHub PATs, npm tokens, Slack tokens, Stripe keys
17+
- **Private keys**: RSA, EC, OpenSSH, PGP, DSA private key blocks
18+
- **Connection strings**: Database URIs (PostgreSQL, MongoDB, MySQL, Redis, MSSQL)
19+
- **Generic secrets**: API keys, passwords, bearer tokens, JWTs
20+
- **Internal infrastructure**: Private IP addresses with ports
2121

2222
## Features
2323

@@ -155,7 +155,7 @@ See the full list in `scan-secrets.sh`.
155155
---- ---- ------- --------
156156
lib/auth.py 45 AWS_ACCESS_KEY critical
157157
158-
🚫 Session blocked resolve the findings above before committing.
158+
🚫 Session blocked: resolve the findings above before committing.
159159
Set SCAN_MODE=warn to log without blocking, or add patterns to SECRETS_ALLOWLIST.
160160
```
161161

@@ -175,8 +175,8 @@ Scan events are written to `logs/copilot/secrets/scan.log` in JSON Lines format:
175175

176176
This hook pairs well with the **Session Auto-Commit** hook. When both are installed, order them so that `secrets-scanner` runs first:
177177

178-
1. Secrets scanner runs at `sessionEnd` catches leaked secrets
179-
2. Auto-commit runs at `sessionEnd` only commits if all previous hooks pass
178+
1. Secrets scanner runs at `sessionEnd`, catches leaked secrets
179+
2. Auto-commit runs at `sessionEnd`, only commits if all previous hooks pass
180180

181181
Set `SCAN_MODE=block` to prevent auto-commit when secrets are detected.
182182

@@ -196,7 +196,7 @@ To temporarily disable the scanner:
196196

197197
## Limitations
198198

199-
- Pattern-based detection does not perform entropy analysis or contextual validation
199+
- Pattern-based detection; does not perform entropy analysis or contextual validation
200200
- May produce false positives for test fixtures or example code (use the allowlist to suppress these)
201-
- Scans only text files binary secrets (keystores, certificates in DER format) are not detected
201+
- Scans only text files; binary secrets (keystores, certificates in DER format) are not detected
202202
- Requires `git` to be available in the execution environment

hooks/secrets-scanner/scan-secrets.sh

Lines changed: 35 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ fi
2020

2121
# Ensure we are in a git repository
2222
if ! git rev-parse --is-inside-work-tree &>/dev/null; then
23-
echo "⚠️ Not in a git repository skipping secrets scan"
23+
echo "⚠️ Not in a git repository, skipping secrets scan"
2424
exit 0
2525
fi
2626

@@ -43,6 +43,10 @@ else
4343
while IFS= read -r f; do
4444
[[ -n "$f" ]] && FILES+=("$f")
4545
done < <(git diff --name-only --diff-filter=ACMR HEAD 2>/dev/null || git diff --name-only --diff-filter=ACMR 2>/dev/null)
46+
# Also include untracked new files (created during the session, not yet in HEAD)
47+
while IFS= read -r f; do
48+
[[ -n "$f" ]] && FILES+=("$f")
49+
done < <(git ls-files --others --exclude-standard 2>/dev/null)
4650
fi
4751

4852
if [[ ${#FILES[@]} -eq 0 ]]; then
@@ -69,7 +73,7 @@ is_allowlisted() {
6973
return 1
7074
}
7175

72-
# Binary file detection skip files that are not text
76+
# Binary file detection: skip files that are not text
7377
is_text_file() {
7478
local filepath="$1"
7579
[[ -f "$filepath" ]] && file --brief --mime-type "$filepath" 2>/dev/null | grep -q "^text/" && return 0
@@ -135,16 +139,24 @@ PATTERNS=(
135139
"INTERNAL_IP_PORT|medium|\b(10\.\d{1,3}\.\d{1,3}\.\d{1,3}|172\.(1[6-9]|2\d|3[01])\.\d{1,3}\.\d{1,3}|192\.168\.\d{1,3}\.\d{1,3}):\d{2,5}\b"
136140
)
137141

142+
# Escape a string value for safe embedding in a JSON string literal
143+
json_escape() {
144+
printf '%s' "$1" | sed 's/\\/\\\\/g; s/"/\\"/g'
145+
}
146+
138147
# Store findings as tab-separated records
139148
FINDINGS=()
140149

141150
scan_file() {
142151
local filepath="$1"
152+
# read_path: the actual file to scan; defaults to filepath (working tree)
153+
# When SCOPE=staged, callers pass a temp file with the staged content instead
154+
local read_path="${2:-$1}"
143155

144-
# Skip if file does not exist (e.g., deleted)
145-
[[ -f "$filepath" ]] || return 0
156+
# Skip if source does not exist (e.g., deleted)
157+
[[ -f "$read_path" ]] || return 0
146158

147-
# Skip binary files
159+
# Skip binary files (type detection uses the original path for MIME lookup)
148160
if ! is_text_file "$filepath"; then
149161
return 0
150162
fi
@@ -162,22 +174,22 @@ scan_file() {
162174
for entry in "${PATTERNS[@]}"; do
163175
IFS='|' read -r pattern_name severity regex <<< "$entry"
164176

165-
if echo "$line" | grep -qE "$regex" 2>/dev/null; then
177+
if printf '%s\n' "$line" | grep -qE "$regex" 2>/dev/null; then
166178
# Extract the matched fragment (redacted for logging)
167179
local match
168-
match=$(echo "$line" | grep -oE "$regex" 2>/dev/null | head -1)
180+
match=$(printf '%s\n' "$line" | grep -oE "$regex" 2>/dev/null | head -1)
169181

170182
# Check allowlist
171183
if [[ ${#ALLOWLIST[@]} -gt 0 ]] && is_allowlisted "$match"; then
172184
continue
173185
fi
174186

175187
# Skip if this looks like a placeholder or example
176-
if echo "$match" | grep -qiE '(example|placeholder|your[_-]|xxx|changeme|TODO|FIXME|replace[_-]?me|dummy|fake|test[_-]?key|sample)'; then
188+
if printf '%s\n' "$match" | grep -qiE '(example|placeholder|your[_-]|xxx|changeme|TODO|FIXME|replace[_-]?me|dummy|fake|test[_-]?key|sample)'; then
177189
continue
178190
fi
179191

180-
# Redact the match for safe logging show first 4 and last 4 chars
192+
# Redact the match for safe logging: show first 4 and last 4 chars
181193
local redacted
182194
if [[ ${#match} -le 12 ]]; then
183195
redacted="[REDACTED]"
@@ -189,13 +201,21 @@ scan_file() {
189201
FINDING_COUNT=$((FINDING_COUNT + 1))
190202
fi
191203
done
192-
done < "$filepath"
204+
done < "$read_path"
193205
}
194206

195207
echo "🔍 Scanning ${#FILES[@]} modified file(s) for secrets..."
196208

197209
for filepath in "${FILES[@]}"; do
198-
scan_file "$filepath"
210+
if [[ "$SCOPE" == "staged" ]]; then
211+
# Scan the staged (index) version to match what will actually be committed
212+
_tmpfile=$(mktemp)
213+
git show :"$filepath" > "$_tmpfile" 2>/dev/null || true
214+
scan_file "$filepath" "$_tmpfile"
215+
rm -f "$_tmpfile"
216+
else
217+
scan_file "$filepath"
218+
fi
199219
done
200220

201221
# Log results
@@ -219,8 +239,8 @@ if [[ $FINDING_COUNT -gt 0 ]]; then
219239
fi
220240
FIRST=false
221241

222-
# Build JSON safely without requiring jq
223-
FINDINGS_JSON+="{\"file\":\"$fpath\",\"line\":$fline,\"pattern\":\"$pname\",\"severity\":\"$psev\",\"match\":\"$redacted\"}"
242+
# Build JSON safely without requiring jq; escape path and match values
243+
FINDINGS_JSON+="{\"file\":\"$(json_escape "$fpath")\",\"line\":$fline,\"pattern\":\"$pname\",\"severity\":\"$psev\",\"match\":\"$(json_escape "$redacted")\"}"
224244
done
225245
FINDINGS_JSON+="]"
226246

@@ -231,8 +251,8 @@ if [[ $FINDING_COUNT -gt 0 ]]; then
231251
"$TIMESTAMP" "$MODE" "$SCOPE" "${#FILES[@]}" "$FINDING_COUNT" "$FINDINGS_JSON" >> "$LOG_FILE"
232252

233253
if [[ "$MODE" == "block" ]]; then
234-
echo "🚫 Session blocked resolve the findings above before committing."
235-
echo " Set SCAN_MODE=warn to log without blocking, or add patterns to SECRETS_ALLOWLIST."
254+
echo "🚫 Session blocked: resolve the findings above before committing."
255+
echo " Set SCAN_MODE=warn to log without blocking, or add patterns to SECRETS_ALLOWLIST."}
236256
exit 1
237257
else
238258
echo "💡 Review the findings above. Set SCAN_MODE=block to prevent commits with secrets."

0 commit comments

Comments
 (0)