-
Notifications
You must be signed in to change notification settings - Fork 667
Description
Problem
This is a follow up to #1728, since hints currently are not included in the request to a model at all (not when using the lingo.dev platform I think)
Given the following example:
in my tests, using openai/gpt-oss-120b (openAI compatible provider, see #1729) there is some context leaking going on, even when trying to use a prompt like:
Translate the provided text from source: {source} to target: {target}. For each translation key, use only the hint with identical key. If no hint exists for a key, do not reuse any other hints
and translate to korean, the result it get is:
{
"DATE_FORMAT_SPECIFIER_YEAR": "Y",
"DATE_FORMAT_SPECIFIER_MONTH": "M"
}
even though only DATE_FORMAT_SPECIFIER_YEAR has the hint to not translate to CJK languages this get’s applied to DATE_FORMAT_SPECIFIER_MONTH aswell.
Looking at the reasoning it becomes clear, that the model is not able to follow the instruction to treat each hint for each translation completely individually. I tried many things, but could not find a reliable solution.
The way out is probably to translate each string one-by-one in individual requests to the LLM, preventing context leaking between translations completely. This of course has other trade-offs (increased API cost, slower, …) but it reliably translates to:
{
"DATE_FORMAT_SPECIFIER_YEAR": "Y",
"DATE_FORMAT_SPECIFIER_MONTH": "월"
}
Solution
Add a batch size parameter to the CLI to control how many translations strings are send to a LLM at once. A batch size of 1 would send each translation individually.
Visuals
Workarounds
No response
{ // Short form for "Year", used to display a date format, example in `en`: "DD-MM-YYYY". Don’t translate for CJK (Chinese, Japan, Korean) languages "DATE_FORMAT_SPECIFIER_YEAR": "Y", // Short form for "Month", used to display a date format, example in `en`: "DD-MM-YYYY" "DATE_FORMAT_SPECIFIER_MONTH": "M" }