Skip to content

fix(config): preserve literal $ in config values#2363

Open
jluocsa wants to merge 1 commit into
microsoft:mainfrom
jluocsa:fix/config-template-substitute-dollar-sign
Open

fix(config): preserve literal $ in config values#2363
jluocsa wants to merge 1 commit into
microsoft:mainfrom
jluocsa:fix/config-template-substitute-dollar-sign

Conversation

@jluocsa
Copy link
Copy Markdown
Member

@jluocsa jluocsa commented May 23, 2026

Description

Fixes #2349.

_parse_env_variables in graphrag-common calls
string.Template(text).substitute(os.environ). substitute() treats
every $ character as a placeholder prefix and raises
ValueError: Invalid placeholder in string for any $ that is not
followed by a valid Python identifier or {ident} / $$ escape. Common
config values fall into that trap, most notably regex anchors like
file_pattern: ".*\\.md$".

Reproduction

# settings.yaml
input:
  type: markitdown
  file_pattern: ".*\\.md$"
$ graphrag index --root <dir>
ValueError: Invalid placeholder in string: line 50, col 25

Fix

Swap substitute() for safe_substitute(). safe_substitute() leaves
unrecognized $ sequences as literal text, which is the expected
behaviour for arbitrary string config values.

To keep the existing "missing env var raises ConfigParsingError"
contract (and the test that exercises it), the function now also
inspects Template.get_identifiers() (Python 3.11+, which is graphrag's
minimum) and raises ConfigParsingError up-front if any referenced
identifier has no corresponding entry in os.environ.

Net result:

Input Before After
file_pattern: ".*\\.md$" ValueError crash preserved literally
val: $$literal $literal $literal (unchanged)
val: $MY_VAR (set) substituted substituted
val: ${MISSING} ConfigParsingError ConfigParsingError (unchanged)

Tests

  • New unit test test_load_config_preserves_literal_dollar_sign
    covering bare $ in a regex-like value, $$ escape, and a mixed
    payload that combines a regex $ with a real env-var reference.
  • Existing test_load_config_validation (which expects
    ConfigParsingError for a missing env var) still passes — the
    get_identifiers() pre-check keeps that path intact.

Local run:

.venv\Scripts\python.exe -m pytest tests/unit/load_config -q
... 3 passed in 1.98s

ruff check and ruff format --check are clean on the changed files.

Notes

  • Changelog: added .semversioner/next-release/patch-…json per
    CONTRIBUTING.md.
  • Diff: +1 source file, +1 test, +1 semversioner entry. No behavioural
    change for any code path that was previously succeeding.

Switch _parse_env_variables from Template.substitute to Template.safe_substitute so that a bare `$` in a config value (for example the regex anchor in `file_pattern: '.*\\.md$'`) no longer crashes the loader with `ValueError: Invalid placeholder in string`.

Missing env-var references are still detected explicitly via `Template.get_identifiers` so callers continue to see `ConfigParsingError` when a referenced placeholder has no value -- existing behaviour and tests are preserved.

Adds a regression test `test_load_config_preserves_literal_dollar_sign` covering bare `\$` in `str` values, `\$\$` escape, and a mixed regex + env var payload.
@jluocsa jluocsa requested a review from a team as a code owner May 23, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: string.Template.substitute() crashes on $ in config values like regex file_pattern

1 participant