Skip to content

Conversation

@mansnils
Copy link
Collaborator

@mansnils mansnils commented Jan 14, 2026

Use exir.memory.alloc for CMSIS-NN scratch buffers, which is ideal since it has a TensorSpec and gets memory planned but creates no additional operator overhead.
Use CMSIS-NN pybind wrapper to get correct buffer size.

Improves: #16041

Comparing w/wo patch memory consumption for conv2d_x3 unit test case. Without patch:

I [executorch:arm_executor_runner.cpp:842 log_mem_status()] model_pte_program_size: 10208 bytes.
I [executorch:arm_executor_runner.cpp:843 log_mem_status()] model_pte_loaded_size: 10208 bytes.
I [executorch:arm_executor_runner.cpp:848 log_mem_status()] input_file_allocator_used: 10976 / 62914560 free: 62903584 ( used: 0 % )
I [executorch:arm_executor_runner.cpp:860 log_mem_status()] method_allocator_used: 5433 / 62914560 free: 62909127 ( used: 0 % )
I [executorch:arm_executor_runner.cpp:867 log_mem_status()] method_allocator_planned: 2560 bytes
I [executorch:arm_executor_runner.cpp:871 log_mem_status()] method_allocator_loaded: 2857 bytes
I [executorch:arm_executor_runner.cpp:875 log_mem_status()] method_allocator_input: 16 bytes
I [executorch:arm_executor_runner.cpp:876 log_mem_status()] method_allocator_executor: 0 bytes
I [executorch:arm_executor_runner.cpp:879 log_mem_status()] temp_allocator: 2097152

With patch:
I [executorch:arm_executor_runner.cpp:846 log_mem_status()] model_pte_program_size: 10336 bytes.
I [executorch:arm_executor_runner.cpp:847 log_mem_status()] model_pte_loaded_size: 10336 bytes.
I [executorch:arm_executor_runner.cpp:852 log_mem_status()] input_file_allocator_used: 11104 / 62914560 free: 62903456 ( used: 0 % )
I [executorch:arm_executor_runner.cpp:864 log_mem_status()] method_allocator_used: 6414 / 62914560 free: 62908146 ( used: 0 % )
I [executorch:arm_executor_runner.cpp:871 log_mem_status()] method_allocator_planned: 2560 bytes
I [executorch:arm_executor_runner.cpp:875 log_mem_status()] method_allocator_loaded: 3838 bytes
I [executorch:arm_executor_runner.cpp:879 log_mem_status()] method_allocator_input: 16 bytes
I [executorch:arm_executor_runner.cpp:880 log_mem_status()] method_allocator_executor: 0 bytes

Summary:

  • The big temp_allocator used for scratch is removed in patch and no longer used except for Linear/FC but this is a PoC/Draft-PR anway
  • method_allocator_planned: 2560 bytes is the same in both cases => This means there is 100% reuse of the scratch buffers
  • increased model size and method_allocator_loaded delta => This can be explained by increased number of planned objects that describe scratch reuse (more meta data). This meta data should stay consistent if scratch buffer sizes increased so I think this acceptable

cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai

Use exir.memory.alloc for CMSIS-NN scratch buffers, which is ideal
since it has a TensorSpec and gets memory planned but creates no
additional operator overhead.
Use CMSIS-NN pybind wrapper to get correct buffer size.

Change-Id: Ia7ec8eda87833888a0639b480e531fd17818298a
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 14, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16580

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Unrelated Failure

As of commit c197701 with merge base 8e8d97e (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@mansnils mansnils added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Jan 14, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 14, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant