fix(scripts): encode residual JSON control chars as \uXXXX instead of stripping#1872
Open
pierluigilenoci wants to merge 1 commit intogithub:mainfrom
Open
fix(scripts): encode residual JSON control chars as \uXXXX instead of stripping#1872pierluigilenoci wants to merge 1 commit intogithub:mainfrom
pierluigilenoci wants to merge 1 commit intogithub:mainfrom
Conversation
…pping json_escape() was silently deleting control characters (U+0000-U+001F) that were not individually handled (\n, \t, \r, \b, \f). Per RFC 8259, these must be encoded as \uXXXX sequences to preserve data integrity. Replace the tr -d strip with a char-by-char loop that emits proper \uXXXX escapes for any remaining control characters.
Contributor
Author
|
@mnriem @dhilipkumars, please take a look. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Bash json_escape() fallback (used when jq is unavailable) to avoid silent data loss by encoding previously-unhandled JSON control characters (U+0000–U+001F) as \uXXXX sequences instead of stripping them.
Changes:
- Replaces
tr -dcontrol-character stripping with a per-character scan that emits\u%04xescapes for remaining control bytes. - Keeps existing short escapes (
\n,\t,\r,\b,\f) and quote/backslash escaping behavior unchanged.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+174
to
+181
| # Escape any remaining U+0000-U+001F control characters as \uXXXX. | ||
| # Only single-byte characters can be JSON control chars; multi-byte UTF-8 | ||
| # sequences have first-byte values >= 0xC0 and are never control characters. | ||
| local i char code | ||
| local out="" | ||
| for (( i=0; i<${#s}; i++ )); do | ||
| char="${s:$i:1}" | ||
| code=$(LC_ALL=C printf '%d' "'$char" 2>/dev/null || echo 256) |
Comment on lines
+174
to
+176
| # Escape any remaining U+0000-U+001F control characters as \uXXXX. | ||
| # Only single-byte characters can be JSON control chars; multi-byte UTF-8 | ||
| # sequences have first-byte values >= 0xC0 and are never control characters. |
Comment on lines
+178
to
+188
| local out="" | ||
| for (( i=0; i<${#s}; i++ )); do | ||
| char="${s:$i:1}" | ||
| code=$(LC_ALL=C printf '%d' "'$char" 2>/dev/null || echo 256) | ||
| if (( code >= 0 && code <= 31 )); then | ||
| out+=$(printf '\\u%04x' "$code") | ||
| else | ||
| out+="$char" | ||
| fi | ||
| done | ||
| printf '%s' "$out" |
mnriem
requested changes
Mar 17, 2026
Collaborator
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable please explain why
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
json_escape()incommon.shwas silently deleting control characters (U+0000–U+001F) not individually handled (\n,\t,\r,\b,\f) viatr -d, causing data loss\uXXXXsequencestr -dstrip with a char-by-char loop that emits proper\uXXXXescapes, preserving data integrityExample
Before:
"hello\x01world"→"helloworld"(SOH silently deleted)After:
"hello\x01world"→"hello\u0001world"(SOH properly escaped)Test plan
json_escapecorrectly encodes control characters like SOH (\x01), STX (\x02), etc. as\uXXXX\n,\t,\r,\b,\f) still use their short escape formscheck-prerequisites.sh --jsonandcreate-new-feature.sh --jsonto verify JSON output is valid