Quantize moveaxis/movedim so they delegate to Ethos-U#20314
Conversation
Summary: The ARM PT2 quantizer's pass-through shared-qspec set in quantization_annotator.py (_one_to_one_shared_input_qspec) covers permute/permute_copy/transpose/view/squeeze etc., but omits aten.moveaxis/aten.movedim. A model that uses torch.moveaxis therefore leaves those ops unquantized: the quantizer brackets each one with dequantize -> moveaxis(float) -> quantize. On lowering, moveaxis decomposes to a float permute_copy. The Ethos-U55 operator-support check (operator_support/ethos_u55_support.py) only delegates permute_copy for int8/int16/int32, so it rejects the float one. Each rejected permute is stranded on the host, splitting the model into many delegated partitions (one NPU island per permute), which bloats the .pte with per-partition delegate overhead and host round-trips. Add aten.moveaxis.int / aten.movedim.int to _one_to_one_shared_input_qspec (guarded with getattr for torch-build variance, mirroring the existing transpose.Dimname handling) so they share the input quantization spec exactly like transpose/permute. They then stay int8, decompose to int8 permute_copy, and delegate to the NPU -- eliminating the host float islands. Impact: a quantized example ensemble (ConvNeXt-style blocks that use torch.moveaxis) that previously lowered into 9 Ethos-U55 partitions now lowers into a single delegate, with zero host permutes and ~24% smaller .pte, with no model changes. Generalizes to any moveaxis/movedim-using model on the Ethos-U backend. Differential Revision: D108478011
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20314
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 4 Unrelated Failures, 1 Unclassified FailureAs of commit 2a0bc8c with merge base a581673 ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
@apullin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108478011. |
This PR needs a
|
Summary:
The ARM PT2 quantizer's pass-through shared-qspec set in quantization_annotator.py
(_one_to_one_shared_input_qspec) covers permute/permute_copy/transpose/view/squeeze
etc., but omits aten.moveaxis/aten.movedim. A model that uses torch.moveaxis
therefore leaves those ops unquantized: the quantizer brackets each one with
dequantize -> moveaxis(float) -> quantize.
On lowering, moveaxis decomposes to a float permute_copy. The Ethos-U55
operator-support check (operator_support/ethos_u55_support.py) only delegates
permute_copy for int8/int16/int32, so it rejects the float one. Each rejected
permute is stranded on the host, splitting the model into many delegated
partitions (one NPU island per permute), which bloats the .pte with per-partition
delegate overhead and host round-trips.
Add aten.moveaxis.int / aten.movedim.int to _one_to_one_shared_input_qspec
(guarded with getattr for torch-build variance, mirroring the existing
transpose.Dimname handling) so they share the input quantization spec exactly like
transpose/permute. They then stay int8, decompose to int8 permute_copy, and
delegate to the NPU -- eliminating the host float islands.
Impact: a quantized example ensemble (ConvNeXt-style blocks that
use torch.moveaxis) that previously lowered into 9 Ethos-U55 partitions now lowers
into a single delegate, with zero host permutes and ~24% smaller .pte, with no
model changes. Generalizes to any moveaxis/movedim-using model on the Ethos-U
backend.
Differential Revision: D108478011
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani