[RFC][3/3] Blob Storage Design #8187

lucylq · 2025-01-31T22:55:50Z

lucylq
Jan 31, 2025
Collaborator

🚀 The feature, motivation and pitch

This proposal details the shared design components between the ExecuTorch features ‘backend weight sharing’ and ‘data separation’

At a high level, we introduce an opaque blob storage for backends to add and request data from. Ahead-of-time, backends place opaque blobs into the blob storage under a unique key. At runtime, backends request the opaque blobs back using those unique keys.

Motivation

Provides a mechanism for backends to serialize and load shared data ([RFC][1/3] Enable Weight Sharing across a single Backend #8230)
- Ahead-of-time: Backends can share data in the PTE file, instead of duplicating shared data across processed blobs.
- Runtime: backends can load the shared data.
Provides an interface for enabling data separation.
- Ahead-of-time: backends can specify whether shared data is stored in an external file or not.
- Runtime: backends can load the shared data.

RFC

AoT: NamedBlobStore

We first introduce the concept of the ‘NamedBlobStore’. This allows delegates to serialize bytes with keys. These bytes can be retrieved from the NamedDataMap at runtime (see section Runtime: NamedDataMap). Delegates can serialize information shared across methods or subgraphs into the NamedBlobStore, and retrieve them when initializing either method or subgraph.

For data that is saved to multiple external files, users can a field ‘external’ to indicate the desired grouping. Eg. blob1, blob2 in ‘external_file1’, and blob3 in ‘external_file2’.

class NamedBlobStore:
  """
  NamedBlobStore manages the blobs that delegates want to share. Backends add bytes 
  to the store under a unique key. These bytes can be retrieved at runtime using the
  same key with the NamedDataMap. 
  """
  def add_named_blob(key: str, blob: bytes, alignment: int, external: Optional[str]) -> bool:
    """
    Adds a named blob to the NamedBlobStore.
    Args:
	key (str): key used to serialize bytes.
	blob (bytes): Bytes being requested to be serialized.
	alignment (int): alignment for bytes to be serialized with.
	external (Optional[str]): the external filename that this blob is saved to.
    Return:
	bool: true if the blob was successfully added, false if not.
    """

The NamedBlobStore is part of EdgeProgramManager, and is passed to ExecutorchProgramManager for serialization at to_executorch.

AoT: Preprocess

Preprocess
We provide the ‘NamedBlobStore’ to backends when processing their lowered graphs. While processing, backends can add to the NamedBlobStore with any data they wish to be shared.

class Backend(BackendDetails):
  @staticmethod
  def preprocess(
    edge_program, 
    compile_specs,
    named_blob_store: NamedBlobStore,
) -> PreprocessResults:

Preprocess All

To further address backend weight sharing, we introduce a new API preprocess_all. The limitation with the current preprocess API is that backends can only process a single graph at a time. As a result, they have no information about the larger model and shared components with other graphs that are delegated to the same backend. The new preprocess_all API enables backends to process and lower all the delegated graphs from the model at once. This allows backends to identify the shared components (weights, tensors, etc.) from all the ExportedPrograms when producing their backend payloads. The blob storage service can then be used to serialize any shared components (weights, constant data, etc.) through the named blob store.

def preprocess_all(
    exported_programs: Dict[Str, List[ExportedProgram]], 
    named_blob_store: NamedBlobStore)
 -> Dict[Str, List[PreprocessResult]]:
    """
    Args:
      exported_programs (Dict[str, List[ExportedProgram]]): 
        This is a map mapping the method name with a list of all the 
        partitions that were partitioned by the partitioner. If backend_id 
        was specified instead of a partitioner, then the list would be a 
        single element list containing the exported program corresponding 
        to that method's ExportedProgram.
      named_blob_store (NamedBlobStore): 
        blob store that delegates can use to request bytes to be serialized. 
        Backends serialize bytes with a string key. At runtime, they can use 
        this same key to request the same bytes back.
    Return:
      Dict[Str, List[PreprocessResult]]: 
        Must produce one preprocess result for every ExportedProgram in exported_programs.
        The PreprocessResult for method [str] and index [i] corresponds with the 
        ExportedProgram at exported_programs[str][i].
	"""

Runtime: NamedDataMap

We define an interface called the NamedDataMap (NDM) that looks up data based on string keys. The NDM views over ‘shared_delegate_data’ in the PTE file and ‘shared_external_data’ in the external data file.

The ExecuTorch-provided NDM will use a linear or binary search over the keys to avoid pulling in C++ libraries and increasing the core runtime binary search.

For the external data case, users can bring their own implementation, using e.g. std::unordered_map for faster lookup.

// NamedDataMap interface.
class NamedDataMap {
 public:
  virtual ~NamedDataMap () = default;

  // Get data by key.
  virtual Result<FreeableBuffer> get_data(const char* key) const = 0;

  // Get number of keys.
  virtual int get_num_keys() const = 0;

  // Get key at index.
  virtual Result<const char*> get_key_at(int index);
};

The NDM is passed to backend.init, and backends use it to retrieve data.

The NDM loads upon request and provides read-only data. If a backend wants to mutate the data, they should copy the data, mutate it, and then free the original. Ideally, mutated data is stored in a backend-wide cache so subsequent methods can access it without invoking another load.
Delegate flow

// Sample implementation for a backend.

// A backend-specific data cache to store data shared within that backend. 
// This is owned and implemented by the backend. The backend must implement its own locking.
backend::shared_data shared_data_cache = nullptr;

--- 
Result<DelegateHandle*> init(
      BackendInitContext& context,
      FreeableBuffer* processed,
      ArrayRef<CompileSpec> compile_specs,
      NamedDataMap shared_data_map,
) const override {  
  ...
  // Resolve external data when we come across it in the preprocessed graph.
  // Note: backends should lock access to shared data, as multiple threads 
  // could load models simultaneously.
  if (shared_data_cache.find(key) == shared_data_cache.end()) {
    Result<FreeableBuffer> data = shared_data_map.get_data(key);
    // Case 1: delegates that mutate data at runtime. 
    if (mutate) {
      // Copy and mutate data.
      auto initialized_data = backend::initialize_data(data);
      // Add to shared_data_cache.
      shared_data_cache.insert(initialized_data);
      // Free the original data.
      data->Free();
    } 
    // Case 2: delegates do not mutate data at runtime.
    else {
      shared_data_cache.insert(data);
    } 
  }
  ...
}

User flow

User flow with shared data inside the PTE file is unchanged.
Example user-flow with data in an external file.

// Example ExecuTorch runtime flow

// Load program.
Result<FileDataLoader> program_loader = FileDataLoader::from(pte_file_path);
Result<Program> program = Program::load(program_loader);

// Load shared data.
Result<FileDataLoader> data_loader = FileDataLoader::from(data_file_path);
Result<CustomNamedDataMap> custom_named_data_map = CustomNamedDataMap::load(data_loader);

// Pass into method.
Result<Method> method = program->load_method(
    "forward", 			// method name
    memory_manager, 		// memory manager
    nullptr, 			// event_tracer
    custom_named_data_map, 	// external data
);

Error err = method->execute();

Schema Changes

Note that the runtime doesn’t depend on a specific data file format. The NamedDataMap can interface with data inside the PTE, any custom file format, or wrap around some separate service. As an initial example of an external file, you can check out FlatTensor, please note that it is still experimental and under development.

PTE File
We introduce new tables to the existing ‘program.fbs’ schema, for when shared data is stored inside the PTE.
This parallels the external data file schema below. If the NamedData and corresponding segments are removed and placed in an external file, the PTE file + External File should execute as expected.

table NamedData {
  // The unique id of the data blob.
  key: string;

  // Program.segments index where the data for this NamedBlob is stored.
  segment_index: uint32;
}

table Program {
  ...
  segments: [DataSegment];
  ...
  named_data: [NamedData];
}

External File
We introduce a new file schema for data-only files. The NamedBlobStore is serialized into this schema.

This parallels the PTE file schema changes. If the NamedData and corresponding segments are placed into the PTE file, the PTE file should execute as expected.

table NamedData {
  // The unique id of the data blob.
  key: string;

  // FlatData.segments index where the data for this NamedBlob is stored.
  segment_index: uint32;
}

// FlatData is a flatbuffer-based format for storing and loading opaque data.
table FlatData {
  // Schema version.
  version: uint32;

  // List of blobs and references to their location. 
  named_data: [NamedData];

  // List of data segments that follow the FlatData file, sorted by
  // offset. Elements in this schema can refer to these segments by index.
  segments: [DataSegment];
}

root_type FlatData;

cc @mcr229

neuropilot-captain · 2025-02-07T03:35:01Z

neuropilot-captain
Feb 7, 2025

Hi @lucylq @cccclai,
Thanks for the design! We think it is really helpful on enabling weight sharing on MediaTek backend.

Q1: This discussion is clear an pretty much explains the feature. We just need to clarify the AOT usage since it is not presented in this discussion.
Our original flow of lowering a model is invoking exir.to_edge(aten_dialect) to obtain an edge_program. If we want share weights between 2 dialects, do we pass a list/dict of dialects so that preprocess_all() gets two dialects? Can you provide an example to demonstrate this?

Q2: Also, just want to clarify that in
def preprocess_all( exported_programs: Dict[Str, List[ExportedProgram]], named_blob_store: NamedBlobStore)
The argument Dict[Str, List[ExportedProgram]] is mapping the name of dialect(?) and all the subgraphs that need to be compiled by the backend?

Thanks!

6 replies

neuropilot-captain Feb 7, 2025

Weight sharing between 2 dialect really means to extract the shared constant data between two compiled models (after MediaTek backend compile them). It will be achieved by using an MediaTek tool.
Here we would like to know if we have 2 ExportedProgram, how do the users lower them so that the preprocess_all() in MediaTek backend gets them at the same pass.

Thanks for answering Q2, it is very clear!

lucylq Feb 7, 2025
Collaborator Author

Here we would like to know if we have 2 ExportedProgram, how do the users lower them so that the preprocess_all() in MediaTek backend gets them at the same pass.

I think we would introduce a new to_backend overload that takes all the methods/exported programs. These will be partitioned if necessary and give us Dict[str, List[ExportedProgram]].

This is the argument topreprocess_all(), which is implemented by the backend. So Mediatek backend will receive all subgraphs, and can process/lower/extract weights as fit inside preprocess_all().

From the user perspective, I don't think there shouldn't be too much difference; just the new to_backend API.

Let me know if this answers your question!

lucylq Feb 7, 2025
Collaborator Author

@neuropilot-captain I may have misunderstood the question - are you asking about how to share, if we create two separate PTE files with weights? eg. prefill.pte, prefill.weights, decode.pte, decode.weights?

In this case, I think ExecuTorch can provide a tool to merge the .weights files.

mcr229 Feb 7, 2025
Collaborator

Q1: This discussion is clear an pretty much explains the feature. We just need to clarify the AOT usage since it is not presented in this discussion. Our original flow of lowering a model is invoking exir.to_edge(aten_dialect) to obtain an edge_program. If we want share weights between 2 dialects, do we pass a list/dict of dialects so that preprocess_all() gets two dialects? Can you provide an example to demonstrate this?

I believe the flow here might be slightly different, one concept that was introduced to handle multiple methods was the edge program manager. Currently the EdgeProgramManager handles lowering through .to_backend APIs that only accept partitioners, but this is something we will update to also include passing backend_id+compile_spec. The flow I imagine this would take would look something like:

prefill_aten = export(prefill_module, example_inputs)
decode_aten = export(decode_module, example_inputs)

edge_program_manager = to_edge(
    {
        "prefill": prefill_aten,
        "decode": decode_aten
    }
)

delegation_spec = DelegationSpec(
    "mediatek", 
    [], # compilespecs that you might use
)
lowered_program_manager = edge_program_manager.to_backend(
    {
        "prefill": delegation_spec
        "decode": delegation_spec
    }
)
# Note that doing edge_program_manager.to_backend(delegation_spec) will 
# perform the same as above, as it will lower all graphs in the edge program manager
# I just want to show case the above functionality is possible as well

executorch_program = lowered_program_manager.to_executorch()

# serialize executorch_program

The proposal of preprocess_all, will essentially provide the dictionaries above, which are dict[str, ExportedProgram]. This would produce a single pte file. preprocess_all will produce 2 lowered payloads here (one for prefill and one for decode), and we are hoping that the usage of the NamedBlobStorage in preprocess all will provide a separate blob storage for backend authors to store shared data across those two payloads. This would produce 1 .pte file that has methods for prefill and decode (the shared data still lives in the pte file and is accessible through the NamedBlobStorage). As outlined by Lucy, we would also enable fuctionality for the NamedBlobStorage so that this shared data can be emitted as separate data file (.ptd).

Now reading your comments I understand that there might be some use case for which we want prefill and decode to actually be separate .pte files. which means the artifacts we are generating would be two .pte (prefill.pte and decode.pte) as well as a shared weight file (.ptd). Here I would suggest that we add some API functionality in the edge program_manager to do something like:

executorch_program.save(
    {
        "prefill", prefill_file_path
        "decode", decode_file_path
    }
)

let me know your thoughts, happy to discuss further or clarify any points in my response that might be convoluted or hard to understand.

neuropilot-captain Feb 12, 2025

Hi @mcr229 , thank you for providing the sample code and detailed information. After discussing with our internal teams, we have concluded that this design is suitable for enabling weight sharing in the MediaTek backend. We can move forward with this feature design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][3/3] Blob Storage Design #8187

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

[RFC][3/3] Blob Storage Design #8187

lucylq Jan 31, 2025 Collaborator

🚀 The feature, motivation and pitch

Motivation

RFC

AoT: NamedBlobStore

AoT: Preprocess

Runtime: NamedDataMap

User flow

Schema Changes

Replies: 1 comment · 6 replies

neuropilot-captain Feb 7, 2025

neuropilot-captain Feb 7, 2025

lucylq Feb 7, 2025 Collaborator Author

lucylq Feb 7, 2025 Collaborator Author

mcr229 Feb 7, 2025 Collaborator

neuropilot-captain Feb 12, 2025

lucylq
Jan 31, 2025
Collaborator

Replies: 1 comment 6 replies

neuropilot-captain
Feb 7, 2025

lucylq Feb 7, 2025
Collaborator Author

lucylq Feb 7, 2025
Collaborator Author

mcr229 Feb 7, 2025
Collaborator