Skip to content

Conversation

@kashif
Copy link
Contributor

@kashif kashif commented Jan 15, 2026

What does this PR do?

This pull request introduces enhancements to the DeepSpeed integration for model loading, specifically improving how weight conversions (such as renaming and merging/splitting of weights) are handled when loading state dictionaries into models using DeepSpeed ZeRO-3. The changes add a new utility for applying weight conversion logic before loading, and update the model loading pipeline to use this utility when needed.

Key changes:

Weight Conversion Utilities

  • Added a new function _apply_weight_conversions_to_state_dict in deepspeed.py to handle weight renaming and merging/splitting operations on state dicts before loading them into a model. This function supports both simple renaming and more complex conversions using WeightConverter and WeightRenaming objects.

DeepSpeed Model Loading Logic

  • Updated _load_state_dict_into_zero3_model to accept an optional weight_mapping argument. If provided, it uses the new _apply_weight_conversions_to_state_dict function to preprocess the state dict before loading. It also stores the applied weight conversions on the model for later reference.
  • Modified the call to _load_state_dict_into_zero3_model in _load_pretrained_model (in modeling_utils.py) to pass the weight_mapping argument, ensuring that weight conversions are applied during DeepSpeed ZeRO-3 model loading.

Fixes #43257

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@kashif kashif changed the title add weight_mapping to _load_state_dict_into_zero3_model [DeepSpeed] add weight_mapping to _load_state_dict_into_zero3_model Jan 15, 2026
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kashif kashif requested a review from ArthurZucker January 15, 2026 11:59
new_state_dict[target_name] = param
except Exception as e:
# If conversion fails, log and skip (better than failing completely)
logger.warning(f"Failed to convert {first_param_name}: {e}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the warning ok or should we have an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert for sure. how will it work if it's invalid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks changing

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43303&sha=f45026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3 MOE weights not converted when loading with accelerate + deepspeed

3 participants