-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Maps to documentation #856
Changes from all commits
e7dd9ff
27c473a
55c81dd
4f8cf8d
9a83482
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,196 @@ | ||
--- | ||
title: "Maps" | ||
manudhundi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
--- | ||
|
||
# Overview | ||
|
||
There are multiple different implementations of key-value maps inside the framework, suited for different usecases. | ||
We will go over their differences and similarities, and how to choose which one to use. | ||
|
||
## Aptos Blockchain performance and gas cost considerations | ||
|
||
Aptos Blockchain state is kept in **storage slots**. | ||
Furthermore, transaction performance and gas cost is heavily influenced by how these **slots** are read and written. | ||
Breaking down the gas costs further, we have: | ||
1. Storage fee, which are determined by the number and size of **slots** (i.e., writing to a new **slot** incurs the highest storage fee, whereas deleting an existing **slot** provides the largest refund.) | ||
2. IO gas costs —generally much lower— which depend on the number and size of resources read and modified. | ||
3. execution gas costs are based on the computation needed, and are generally in the similar scale as io gas costs. | ||
|
||
Transactions that modify the same **slot** cannot be executed concurrently (with some exceptions, like aggregators and resources as a part of the same resource group), as they conflict with one another. | ||
|
||
One useful analogy is thinking about each **slot** being a file on a disk, | ||
then performance of smart contract would correlate well to a program that | ||
operates on files in the same way. | ||
|
||
## Different Map implementations | ||
|
||
| Implementation | Size Limit | Storage Structure | Key Features | | ||
|--------------------|------------|------------------|--------------| | ||
| `OrderedMap` | Bounded (fits in a single **slot**) | Stored entirely within the resource that contains it | Supports ordered access (front/back, prev/next), implemented as sorted vector, but operations are effectively O(log(n)) due to internal optimizations | | ||
| `Table` | Unbounded | Each (key, value) stored in a separate **slot** | Supports basic operations, like `add`, `remove`, `contains`, but **not iteration**, and **cannot be destroyed**; useful for large/unbounded keys/values and where high-concurrency is needed | | ||
| `TableWithLength` | Unbounded | same as `Table` | Variant of `Table`, with additional length tracking, which adds `length`, `empty`, and `destroy_empty` methods; Adding or removing elements **cannot** be done concurrently, modifying existing elements can. | | ||
| `BigOrderedMap` | Unbounded | Combines multiple keys into a single **slot**, initially stored within resource that contains it, and grows into multiple **slots** dynamically | Implemented as B+ tree; **opportunistically concurrent** for non-adjacent keys; supports ordered access (front/back, prev/next); configurable node capacities to balance storage and performance | | ||
|
||
Note: | ||
- `SimpleMap` has been deprecated, and replaced with `OrderedMap`. | ||
- `SmartTable` has been deprecated, and replaced with `BigOrderedMap`. | ||
|
||
#### Performance comparison | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a perf table that highlights the message that creating a "slot" is costly ? That is, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to add appropriate tests to do that, so will leave this for some later PR |
||
|
||
We measured performance at small scale, measuring microseconds taken for a single pair of `insert` + `remove` operation, into a map of varied size. | ||
|
||
| num elements | OrderedMap | BigOrderedMap max_degree>10000 | BigOrderedMap max_degree=16 | | ||
|--------------|------------|---------------------------|-----------------------------| | ||
| 10 | 65 | 123 | 123 | | ||
| 100 | 85 | 146 | 455 | | ||
| 1000 | 105 | 168 | 567 | | ||
| 10000 | 142 | 210 | 656 | | ||
|
||
You can see that overhead of `BigOrderedMap` compared to `OrderedMap`, when both are in the single **slot**, is around 1.5-2x. | ||
So you can generally used `BigOrdredMap` when it is unknown if data will be too large to be stored in a single **slot**. | ||
|
||
## Common map operations: | ||
|
||
Most maps above support the same set of functions (for actual signatures and restrictions, check out the corresponding implementations): | ||
|
||
#### Creating Maps | ||
|
||
- `new<K, V>(): Self`: creates an empty map | ||
|
||
#### Destroying Maps | ||
|
||
- `destroy_empty<K, V>(self: Self<K, V>)`: Destroys an empty map. (**not** supported by `Table`) | ||
- `destroy<K, V>(self: Self<K, V>, dk: |K|, dv: |V|)`: Destroys a map with given functions that destroy correponding elements. (**not** supported by `Table` and `TableWithLength`) | ||
|
||
#### Managing Entries | ||
|
||
- `add<K, V>(self: &mut Self<K, V>, key: K, value: V)`: Adds a key-value pair to the map. | ||
- `remove<K, V>(self: &mut Self<K, V>, key: K): V`: Removes and returns the value associated with a key. | ||
- `upsert<K, V>(self: &mut Self<K, V>, key: K, value: V): Option<V>`: Inserts or updates a key-value pair. | ||
- `add_all<K, V>(self: &mut Self<K, V>, keys: vector<K>, values: vector<V>)`: Adds multiple key-value pairs to the map. (**not** supported by `Table` and `TableWithLength`) | ||
|
||
#### Retrieving Entries | ||
|
||
- `contains<K, V>(self: &Self<K, V>, key: &K): bool`: Checks whether key exists in the map. | ||
- `borrow<K, V>(self: &Self<K, V>, key: &K): &V`: Returns an immutable reference to the value associated with a key. | ||
- `borrow_mut<K: drop, V>(self: &mut Self<K, V>, key: K): &mut V`: Returns a mutable reference to the value associated with a key. | ||
(`BigOrderedMap` only allows `borrow_mut` when value type has a static constant size, due to modification being able to break it's invariants otherwise. Use `remove()` and `add()` combination instead) | ||
|
||
#### Order-dependant functions | ||
|
||
These set of functions are only implemented by `OrderedMap` and `BigOrderedMap`. | ||
|
||
- `borrow_front<K, V>(self: &Self<K, V>): (&K, &V)` | ||
- `borrow_back<K, V>(self: &Self<K, V>): (&K, &V)` | ||
- `pop_front<K, V>(self: &mut Self<K, V>): (K, V)` | ||
- `pop_back<K, V>(self: &mut Self<K, V>): (K, V)` | ||
- `prev_key<K: copy, V>(self: &Self<K, V>, key: &K): Option<K>` | ||
- `next_key<K: copy, V>(self: &Self<K, V>, key: &K): Option<K>` | ||
|
||
#### Utility Functions | ||
|
||
- `length<K, V>(self: &Self<K, V>): u64`: Returns the number of entries in the map. (not supported by `Table`) | ||
|
||
#### Traversal Functions | ||
|
||
These set of functions are not implemented by `Table` and `TableWithLength`. | ||
|
||
- `keys<K: copy, V>(self: &Self<K, V>): vector<K>` | ||
- `values<K, V: copy>(self: &Self<K, V>): vector<V>` | ||
- `to_vec_pair<K, V>(self: Self<K, V>): (vector<K>, vector<V>)` | ||
- `for_each_ref<K, V>(self: &Self<K, V>, f: |&K, &V|)` | ||
|
||
- `to_ordered_map<K, V>(self: &BigOrderedMap<K, V>): OrderedMap<K, V>`: Converts `BigOrderedMap` into `OrderedMap` | ||
|
||
## Example Usage | ||
|
||
### Creating and Using a OrderedMap | ||
|
||
```move filename="map_usage.move" | ||
module 0x42::map_usage { | ||
use aptos_framework::ordered_map; | ||
|
||
public entry fun main() { | ||
let map = ordeded_map::new<u64, u64>(); | ||
map.add(1, 100); | ||
map.add(2, 200); | ||
|
||
let length = map.length(); | ||
assert!(length == 2, 0); | ||
|
||
let value1 = map.borrow(&1); | ||
assert!(*value1 == 100, 0); | ||
|
||
let value2 = map.borrow(&2); | ||
assert!(*value2 == 200, 0); | ||
|
||
let removed_value = map.remove(&1); | ||
assert!(removed_value == 100, 0); | ||
|
||
map.destroy_empty(); | ||
} | ||
} | ||
``` | ||
|
||
## Additional details for `BigOrderedMap` | ||
|
||
Its current implementation is B+ tree, which is chosen as it is best suited for the onchain storage layout - where the majority of cost comes from loading and writing to storage items, and there is no partial read/write of them. | ||
|
||
Implementation has few characteristics that make it very versatile and useful across wide range of usecases: | ||
|
||
- When it has few elements, it stores all of them within the resource that contains it, providing comparable performance to OrderedMap itself, while then dynamically growing to multiple resources as more and more elements are added | ||
- It reduces amount of conflicts: modifications to a different part of the key-space can be generally done concurrently, and it provides knobs for tuning between concurrency and size | ||
- All operations have guaranteed upper-bounds on performance (how long they take, as well as how much execution and io gas they consume), allowing for safe usage across a variety of use cases. | ||
- One caveat, is refundable storage fee. By default, operation that requires map to grow to more resources needs to pay for storage fee for it. Implementation here has an option to pre-pay for storage slots, and to reuse them as elements are added/removed, allowing applications to achieve fully predictable overall gas charges, if needed. | ||
- If key/value is within the size limits map was configured with, inserts will never fail unpredictably, as map internally understands and manages maximal **slot** size limits. | ||
|
||
### `BigOrderedMap` structure | ||
|
||
`BigOrderedMap` is represented as a tree, where inner nodes split the "key-space" into separate ranges for each of it's children, and leaf nodes contain the actual key-value pairs. | ||
Internally it has `inner_max_degree` representing largest number of children an inner node can have, and `leaf_max_degree` representing largest number of key-value pairs leaf node can have. | ||
|
||
#### Creating `BigOrderedMap` | ||
|
||
Because it's layout affects what can be inserted and performance, there are a few ways to create and configure it: | ||
|
||
- `new<K, V>(): Self<K, V>`: Returns a new `BigOrderedMap` with the default configuration. Only allowed to be called with constant size types. For variable sized types, another constructor is needed, to explicitly select automatic or specific degree selection. | ||
- `new_with_type_size_hints<K, V>(avg_key_bytes: u64, max_key_bytes: u64, avg_value_bytes: u64, max_value_bytes: u64): Self<K, V>`: Returns a map that is configured to perform best when keys and values are of given `avg` sizes, and guarantees to fit elements up to given `max` sizes. | ||
- `new_with_config<K, V>(inner_max_degree: u16, leaf_max_degree: u16, reuse_slots: bool): Self<K, V>`: Returns a new `BigOrderedMap` with the provided max degree consts (the maximum # of children a node can have, both inner and leaf). If 0 is passed for either, then it is dynamically computed based on size of first key and value, and keys and values up to 100x times larger will be accepted. | ||
If non-0 is passed, sizes of all elements must respect (or their additions will be rejected): | ||
- `key_size * inner_max_degree <= MAX_NODE_BYTES` | ||
- `entry_size * leaf_max_degree <= MAX_NODE_BYTES` | ||
Comment on lines
+156
to
+161
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe make a header for these methods like you did above for Table methods |
||
|
||
`reuse_slots` means that removing elements from the map doesn't free the storage slots and returns the refund. | ||
Together with `allocate_spare_slots`, it allows to preallocate slots and have inserts have predictable gas costs. | ||
(otherwise, inserts that require map to add new nodes, cost significantly more, compared to the rest) | ||
|
||
## Source Code | ||
|
||
- [ordered_map.move](https://github.com/aptos-labs/aptos-core/blob/main/aptos-move/framework/aptos-framework/sources/datastructures/ordered_map.move) | ||
- [table.move](https://github.com/aptos-labs/aptos-core/blob/6f5872b567075fe3615e1363d35f89dc5eb45b0d/aptos-move/framework/aptos-stdlib/sources/table.move) | ||
- [table_with_length.move](https://github.com/aptos-labs/aptos-core/blob/6f5872b567075fe3615e1363d35f89dc5eb45b0d/aptos-move/framework/aptos-stdlib/sources/table.move) | ||
- [big_ordered_map.move](https://github.com/aptos-labs/aptos-core/blob/main/aptos-move/framework/aptos-framework/sources/datastructures/big_ordered_map.move) | ||
|
||
## Additional details of (deprecated) SmartTable | ||
|
||
The Smart Table is a scalable hash table implementation based on linear hashing. | ||
This data structure aims to optimize storage and performance by utilizing linear hashing, which splits one bucket at a time instead of doubling the number of buckets, thus avoiding unexpected gas costs. | ||
Unfortunatelly, it's implementation makes every addition/removal be a conflict, making such transactions fully sequential. | ||
The Smart Table uses the SipHash function for faster hash computations while tolerating collisions. Unfortunatelly, this also means that collisions are predictable, which means that if end users can control the keys being inserted, it can have large number of collisions in a single bucket. | ||
|
||
### SmartTable Structure | ||
|
||
The `SmartTable` struct is designed to handle dynamic data efficiently: | ||
|
||
- `buckets`: A table with a length that stores vectors of entries. | ||
- `num_buckets`: The current number of buckets. | ||
- `level`: The number of bits representing `num_buckets`. | ||
- `size`: The total number of items in the table. | ||
- `split_load_threshold`: The load threshold percentage that triggers bucket splits. | ||
- `target_bucket_size`: The target size of each bucket, which is not strictly enforced. | ||
|
||
### SmartTable usage examples | ||
|
||
- [Move Spiders Smart Table](https://movespiders.com/courses/modules/datastructures/lessonId/7) | ||
- [Move Spiders Querying Smart Table via FullNode APIs](https://movespiders.com/courses/modules/datastructures/lessonId/9) | ||
- [Move Spiders Querying Smart Table via View Function](https://movespiders.com/courses/modules/datastructures/lessonId/10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing smart table documentation altogether