Skip to content

fix(streaming): handle duplicate-index list entries in accumulate_delta#3227

Open
rmotgi1227 wants to merge 2 commits into
openai:mainfrom
rmotgi1227:fix/accumulate-delta-duplicate-index
Open

fix(streaming): handle duplicate-index list entries in accumulate_delta#3227
rmotgi1227 wants to merge 2 commits into
openai:mainfrom
rmotgi1227:fix/accumulate-delta-duplicate-index

Conversation

@rmotgi1227
Copy link
Copy Markdown

There's a bug in accumulate_delta where if the very first streaming chunk contains a list value (like tool_calls), the function copies it wholesale into acc and returns early — completely skipping the index-based merge logic. That means if the first chunk happens to have two entries at the same index (which the OpenAI spec allows), they end up as two separate list items instead of being merged. Subsequent chunks then only ever touch index 0, leaving the duplicate entry orphaned and causing tool_calls[0].function.arguments to be missing a chunk and come out as invalid JSON.

The fix is straightforward: in the two early-return branches ("key not in acc" and "acc_value is None"), check whether delta_value is a list before doing the wholesale copy. If it is, initialize acc[key] to an empty list and let execution fall through to the existing per-index merge loop, the same path that all subsequent chunks already go through. Non-list values still take the original fast path.

  • I understand that this repository is auto-generated and my pull request may not be merged

The change is in src/openai/lib/streaming/_deltas.py only. The same early-return pattern exists in _assistants.py but that file has different context so I've left it for a separate follow-up if needed.

Fixes #3203. See also #3201 which covers a narrower version of the same bug.

When the very first streaming chunk for a key contains a list value (e.g. tool_calls), the early-return branches in accumulate_delta copied the raw list wholesale instead of routing it through the index-based merge path. This caused duplicate-index entries in that first chunk to be stored as separate list elements rather than merged, leaving orphaned entries that subsequent chunks could not reach.

Fix: when key is absent or acc_value is None and delta_value is a list, initialize acc[key] to [] and fall through to the existing list-merge logic so that every chunk — including the first — goes through proper index deduplication.

Fixes openai#3203, see also openai#3201.
@rmotgi1227 rmotgi1227 requested a review from a team as a code owner May 11, 2026 21:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be37dee88a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if is_list(delta_value):
# Initialize an empty list so that list-merge logic below
# handles duplicate-index entries even on the very first chunk.
acc[key] = []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix first list chunk duplicate-index merging

When delta_value is a list for a key that is not yet in acc, this initializes acc_value to an empty list, but the existing list branch treats all(isinstance(... ) for x in acc_value) as true for an empty list and just extends the raw delta. As a result, the exact duplicate-index case this commit is meant to fix still leaves two entries at index 0 for inputs like {'tool_calls': [{'index': 0, ...}, {'index': 0, ...}]} instead of merging them, so later chunks continue from a malformed snapshot.

Useful? React with 👍 / 👎.

Python's all() returns True on an empty sequence, so initializing acc[key] = [] and then hitting the `all(isinstance(...) for x in acc_value)` check would evaluate to True and call extend() — dumping the entire raw list in wholesale, same bug as before. Adding `acc_value and` before the all() check means an empty list skips the extend shortcut and falls through to the per-index merge loop instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

accumulate_delta drops tool_call fragments when one chunk has multiple entries at the same index

2 participants