Skip to content

Commit 1d32128

Browse files
committed
Replace step-0 overwrite with _state cache invalidation
The real issue is that UnslothService._state (a cached_property) may be initialized before the fork copies the checkpoint, caching the base model instead of the forked weights. Invalidating the cache after fork ensures the trainer picks up the forked checkpoint on next access. The step-0 overwrite was unnecessary — vLLM's start_openai_server already calls get_last_checkpoint_dir() which finds the forked checkpoint at its original step number.
1 parent 7ee591e commit 1d32128

1 file changed

Lines changed: 9 additions & 7 deletions

File tree

src/art/local/backend.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1434,16 +1434,18 @@ async def _experimental_fork_checkpoint(
14341434

14351435
shutil.copytree(source_checkpoint_dir, dest_checkpoint_dir)
14361436

1437-
# Also overwrite the initial empty checkpoint at step 0 so vLLM
1438-
# loads the forked weights on startup (it uses @0 by default)
1439-
step0_dir = get_step_checkpoint_dir(dest_model_dir, 0)
1440-
if os.path.exists(step0_dir) and step0_dir != dest_checkpoint_dir:
1437+
# Invalidate the UnslothService _state cache so the trainer
1438+
# re-initializes with the forked checkpoint instead of the base model.
1439+
# _state is a cached_property that reads get_last_checkpoint_dir() on
1440+
# first access; if it was accessed before the fork, it cached the base
1441+
# model and will never pick up the forked weights.
1442+
service = await self._get_service(model)
1443+
if hasattr(service, "_state") and "_state" in service.__dict__:
1444+
del service.__dict__["_state"]
14411445
if verbose:
14421446
print(
1443-
f"Overwriting initial checkpoint at {step0_dir} with forked weights"
1447+
"Invalidated UnslothService _state cache to pick up forked checkpoint"
14441448
)
1445-
shutil.rmtree(step0_dir)
1446-
shutil.copytree(dest_checkpoint_dir, step0_dir)
14471449

14481450
if verbose:
14491451
print(

0 commit comments

Comments
 (0)