Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Compute only in int32/long/float/double for portable ops to save size #9635

Closed
wants to merge 2 commits into from

Conversation

swolchok
Copy link
Contributor

Concern: what if we're running on some sort of 16-bit microcontroller where this is a pessimization?

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9635

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 11 Pending

As of commit ac64f9e with merge base 811352d (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok added a commit that referenced this pull request Mar 25, 2025
… size

Concern: what if we're running on some sort of 16-bit microcontroller where this is a pessimization?
ghstack-source-id: 66b89b1bde83c9af2f2915243c0d1c3fea8d9dd3
ghstack-comment-id: 2752794343
Pull-Request-resolved: #9635
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2025
@swolchok
Copy link
Contributor Author

some sort of 16-bit microcontroller

wasn't sure if I was making things up, so: https://developer.arm.com/Processors/Ethos-U55 is a real example from the present day

@swolchok
Copy link
Contributor Author

size impact: on my mac, test/build_size_test.sh reports that size_test_all_ops has size 1205136 before this PR and 1105856 after, a decrease of around 8%

@swolchok
Copy link
Contributor Author

swolchok commented Mar 26, 2025

This is known to break tests, I think because it breaks SupportedTensorDtypes::SAME_AS_COMMON for reasons outlined in #9613, hence it is RFC status. The problem is fixable, but if we have directional concerns with this then I don't want to invest in fixing it.

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Mar 26, 2025
… size

Concern: what if we're running on some sort of 16-bit microcontroller where this is a pessimization?
ghstack-source-id: a91380cd44aac19ce27d803605bca57e6de6e4ee
ghstack-comment-id: 2752794343
Pull-Request-resolved: #9635
@swolchok
Copy link
Contributor Author

swolchok commented Mar 26, 2025

per discussion with @manuelcandales, if we do this then we we need to cast through the "actual" compute type before casting to the output type so that we match ATen. example:

torch.ops.aten.mul(torch.tensor([100], dtype=torch.int8), torch.tensor([100], dtype=torch.int8), out=torch.zeros([1], dtype=torch.long))
tensor([16])

computing in int32 or int16 would cause this to yield 10000, not 16; casting through int8 would correct this.

@swolchok
Copy link
Contributor Author

This is a bad idea because smaller compute dtypes benefit from additional SIMD lanes.

@swolchok swolchok closed this Mar 26, 2025
// Gate above optimization off if we appear to be on some kind of 8-bit or
// 16-bit CPU, which would invalidate our assumption about 32-bit
// math being just as fast.
constexpr bool cpu_appears_to_be_at_least_32_bit = sizeof(void*) >= 4 && sizeof(int) >= 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants