Skip to content

Commit

Permalink
reworked partitioned_iter.py example
Browse files Browse the repository at this point in the history
  • Loading branch information
dpdani committed May 7, 2024
1 parent ffae0a6 commit 9fcd93d
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 3 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,24 @@ The usage of `AtomicInt` provides correctness, regardless of the hashmap impleme
But using `AtomicDict` instead of `dict` improves performance, even without using handles: writes to distinct keys do
not generate contention.

### Pre-sized dictionary, with partitioned iterations

`AtomicDict` provides some more features compared to Python's `dict`, in
the [partitioned iteration example](./examples/atomic_dict/partitioned_iter.py) two of them are shown:

1. pre-sizing, which allows for the expensive dynamic resizing of a hash table to be avoided, and
2. partitioned iterations, which allows to split number of elements among threads.

```text
Insertion into builtin dict took 36.81s
Builtin dict iter took 17.56s with 1 thread.
----------
Insertion took 17.17s
Partitioned iter took 8.80s with 1 threads.
Partitioned iter took 5.03s with 2 threads.
Partitioned iter took 3.92s with 3 threads.
```

## AtomicRef

You can use an `AtomicRef` when you have a shared variable that points to an object, and you need to change the
Expand Down
20 changes: 17 additions & 3 deletions examples/atomic_dict/partitioned_iter.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,33 @@
import multiprocessing
import threading
import time

import cereggii


keys = 2**26
py_d = {}

started = time.time()
for _ in range(keys):
py_d[_] = _
print(f"Insertion into builtin dict took {time.time() - started:.2f}s")

started = time.time()
current = 0
for k, v in py_d.items():
current = k + v
print(f"Builtin dict iter took {time.time() - started:.2f}s with 1 thread.")
print("----------")
del py_d

d = cereggii.AtomicDict(min_size=keys * 2)

started = time.time()
for _ in range(keys):
d[_] = _
print(f"Insertion took {time.time() - started:.2f}s")

for n in range(1, multiprocessing.cpu_count() + 1):
for n in range(1, 4):

def iterator(i):
current = 0
Expand All @@ -27,4 +41,4 @@ def iterator(i):
t.start()
for t in threads:
t.join()
print(f"Fast iter took {time.time() - started:.2f}s with {n} threads.")
print(f"Partitioned iter took {time.time() - started:.2f}s with {n} threads.")

0 comments on commit 9fcd93d

Please sign in to comment.