Skip to content

Add batch size parameter to prevent context leaking of translator notes/hints #1733

@thomasaull

Description

@thomasaull

Problem

This is a follow up to #1728, since hints currently are not included in the request to a model at all (not when using the lingo.dev platform I think)

Given the following example:

{
  // Short form for "Year", used to display a date format, example in `en`: "DD-MM-YYYY". Don’t translate for CJK (Chinese, Japan, Korean) languages
  "DATE_FORMAT_SPECIFIER_YEAR": "Y",
  // Short form for "Month", used to display a date format, example in `en`: "DD-MM-YYYY"
  "DATE_FORMAT_SPECIFIER_MONTH": "M"
}

in my tests, using openai/gpt-oss-120b (openAI compatible provider, see #1729) there is some context leaking going on, even when trying to use a prompt like:

Translate the provided text from source: {source} to target: {target}. For each translation key, use only the hint with identical key. If no hint exists for a key, do not reuse any other hints

and translate to korean, the result it get is:

{
  "DATE_FORMAT_SPECIFIER_YEAR": "Y",
  "DATE_FORMAT_SPECIFIER_MONTH": "M"
}

even though only DATE_FORMAT_SPECIFIER_YEAR has the hint to not translate to CJK languages this get’s applied to DATE_FORMAT_SPECIFIER_MONTH aswell.

Looking at the reasoning it becomes clear, that the model is not able to follow the instruction to treat each hint for each translation completely individually. I tried many things, but could not find a reliable solution.

The way out is probably to translate each string one-by-one in individual requests to the LLM, preventing context leaking between translations completely. This of course has other trade-offs (increased API cost, slower, …) but it reliably translates to:

{
  "DATE_FORMAT_SPECIFIER_YEAR": "Y",
  "DATE_FORMAT_SPECIFIER_MONTH": "월"
}

Solution

Add a batch size parameter to the CLI to control how many translations strings are send to a LLM at once. A batch size of 1 would send each translation individually.

Visuals

Image

Workarounds

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions