@@ -37,6 +37,34 @@ Range-partitioning is not a silver bullet, meaning that enabling it is not alway
37
37
a link to the list of operations that have support for range-partitioning and practical advices on when one should
38
38
enable it: :doc: `operations that support range-partitioning </usage_guide/optimization_notes/range_partitioning_ops >`.
39
39
40
+ Dynamic-partitioning in Modin
41
+ """""""""""""""""""""""""""""
42
+
43
+ Ray enigne experiences slowdowns when running a large number of small remote tasks at the same time. Ray Core recommends to `avoid tiny task `_.
44
+ When modin DataFrame has a large number of partitions, some functions produce a large number of remote tasks, which can cause slowdowns.
45
+ To solve this problem, Modin suggests using dynamic partitioning. This approach reduces the number of remote tasks
46
+ by combining multiple partitions into a single virtual partition and perform a common remote task on them.
47
+
48
+ Dynamic partitioning is typically used for operations that are fully or partially executed on all partitions separately.
49
+
50
+ .. code-block :: python
51
+
52
+ import modin.pandas as pd
53
+ from modin.config import context
54
+
55
+ df = pd.DataFrame(... )
56
+
57
+ with context(DynamicPartitioning = True ):
58
+ df.abs()
59
+
60
+ Dynamic partitioning is also not always useful, and this approach is usually used for medium-sized DataFrames with a large number of columns.
61
+ If the number of columns is small, the number of partitions will be close to the number of CPUs, and Ray will not have this problem.
62
+ If the DataFrame has too many rows, this is also not a good case for using Dynamic-partitioning, since each task is no longer tiny and performing
63
+ the combined tasks carries more overhead than assigning them separately.
64
+
65
+ Unfortunately, the use of Dynamic-partitioning depends on various factors such as data size, number of CPUs, operations performed,
66
+ and it is up to the user to determine whether Dynamic-partitioning will give a boost in his case or not.
67
+
40
68
Understanding Modin's partitioning mechanism
41
69
""""""""""""""""""""""""""""""""""""""""""""
42
70
@@ -311,3 +339,4 @@ an inner join you may want to swap left and right DataFrames.
311
339
Note that result columns order may differ for first and second ``merge ``.
312
340
313
341
.. _range-partitioning : https://www.techopedia.com/definition/31994/range-partitioning
342
+ .. _`avoid tiny task` : https://docs.ray.io/en/latest/ray-core/tips-for-first-time.html#tip-2-avoid-tiny-tasks
0 commit comments