-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Compute only in int32/long/float/double for portable ops to save size #9635
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9635
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 11 PendingAs of commit ac64f9e with merge base 811352d ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
… size Concern: what if we're running on some sort of 16-bit microcontroller where this is a pessimization? ghstack-source-id: 66b89b1bde83c9af2f2915243c0d1c3fea8d9dd3 ghstack-comment-id: 2752794343 Pull-Request-resolved: #9635
wasn't sure if I was making things up, so: https://developer.arm.com/Processors/Ethos-U55 is a real example from the present day |
size impact: on my mac, test/build_size_test.sh reports that size_test_all_ops has size 1205136 before this PR and 1105856 after, a decrease of around 8% |
This is known to break tests, I think because it breaks SupportedTensorDtypes::SAME_AS_COMMON for reasons outlined in #9613, hence it is RFC status. The problem is fixable, but if we have directional concerns with this then I don't want to invest in fixing it. |
… size Concern: what if we're running on some sort of 16-bit microcontroller where this is a pessimization? ghstack-source-id: a91380cd44aac19ce27d803605bca57e6de6e4ee ghstack-comment-id: 2752794343 Pull-Request-resolved: #9635
per discussion with @manuelcandales, if we do this then we we need to cast through the "actual" compute type before casting to the output type so that we match ATen. example:
computing in int32 or int16 would cause this to yield 10000, not 16; casting through int8 would correct this. |
This is a bad idea because smaller compute dtypes benefit from additional SIMD lanes. |
// Gate above optimization off if we appear to be on some kind of 8-bit or | ||
// 16-bit CPU, which would invalidate our assumption about 32-bit | ||
// math being just as fast. | ||
constexpr bool cpu_appears_to_be_at_least_32_bit = sizeof(void*) >= 4 && sizeof(int) >= 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Concern: what if we're running on some sort of 16-bit microcontroller where this is a pessimization?