Replies: 1 comment 6 replies
-
Hi @lucylq @cccclai, Q1: This discussion is clear an pretty much explains the feature. We just need to clarify the AOT usage since it is not presented in this discussion. Q2: Also, just want to clarify that in Thanks! |
Beta Was this translation helpful? Give feedback.
-
🚀 The feature, motivation and pitch
This proposal details the shared design components between the ExecuTorch features ‘backend weight sharing’ and ‘data separation’
At a high level, we introduce an opaque blob storage for backends to add and request data from. Ahead-of-time, backends place opaque blobs into the blob storage under a unique key. At runtime, backends request the opaque blobs back using those unique keys.
Motivation
RFC
AoT: NamedBlobStore
We first introduce the concept of the ‘NamedBlobStore’. This allows delegates to serialize bytes with keys. These bytes can be retrieved from the NamedDataMap at runtime (see section Runtime: NamedDataMap). Delegates can serialize information shared across methods or subgraphs into the NamedBlobStore, and retrieve them when initializing either method or subgraph.
For data that is saved to multiple external files, users can a field ‘external’ to indicate the desired grouping. Eg. blob1, blob2 in ‘external_file1’, and blob3 in ‘external_file2’.
The NamedBlobStore is part of EdgeProgramManager, and is passed to ExecutorchProgramManager for serialization at to_executorch.
AoT: Preprocess
Preprocess
We provide the ‘NamedBlobStore’ to backends when processing their lowered graphs. While processing, backends can add to the NamedBlobStore with any data they wish to be shared.
Preprocess All
To further address backend weight sharing, we introduce a new API preprocess_all. The limitation with the current preprocess API is that backends can only process a single graph at a time. As a result, they have no information about the larger model and shared components with other graphs that are delegated to the same backend. The new preprocess_all API enables backends to process and lower all the delegated graphs from the model at once. This allows backends to identify the shared components (weights, tensors, etc.) from all the ExportedPrograms when producing their backend payloads. The blob storage service can then be used to serialize any shared components (weights, constant data, etc.) through the named blob store.
Runtime: NamedDataMap
We define an interface called the NamedDataMap (NDM) that looks up data based on string keys. The NDM views over ‘shared_delegate_data’ in the PTE file and ‘shared_external_data’ in the external data file.
The ExecuTorch-provided NDM will use a linear or binary search over the keys to avoid pulling in C++ libraries and increasing the core runtime binary search.
For the external data case, users can bring their own implementation, using e.g. std::unordered_map for faster lookup.
The NDM is passed to backend.init, and backends use it to retrieve data.
The NDM loads upon request and provides read-only data. If a backend wants to mutate the data, they should copy the data, mutate it, and then free the original. Ideally, mutated data is stored in a backend-wide cache so subsequent methods can access it without invoking another load.
Delegate flow
User flow
User flow with shared data inside the PTE file is unchanged.
Example user-flow with data in an external file.
Schema Changes
Note that the runtime doesn’t depend on a specific data file format. The NamedDataMap can interface with data inside the PTE, any custom file format, or wrap around some separate service. As an initial example of an external file, you can check out FlatTensor, please note that it is still experimental and under development.
PTE File
We introduce new tables to the existing ‘program.fbs’ schema, for when shared data is stored inside the PTE.
This parallels the external data file schema below. If the NamedData and corresponding segments are removed and placed in an external file, the PTE file + External File should execute as expected.
External File
We introduce a new file schema for data-only files. The NamedBlobStore is serialized into this schema.
This parallels the PTE file schema changes. If the NamedData and corresponding segments are placed into the PTE file, the PTE file should execute as expected.
cc @mcr229
Beta Was this translation helpful? Give feedback.
All reactions