@@ -20,4 +20,78 @@ or turn them on in source code:
20
20
21
21
import modin.config as cfg
22
22
cfg.Engine.put(' dask' )
23
- cfg.StorageFormat.put(' pandas' )
23
+ cfg.StorageFormat.put(' pandas' )
24
+
25
+ Using Modin on Dask locally
26
+ ---------------------------
27
+
28
+ If you want to use a single node, just change the Modin Engine to Dask and
29
+ continue working with the Modin Dataframe as if it were a Pandas Dataframe.
30
+ You don't even have to initialize the Dask Client, because Modin will do it
31
+ yourself or use the current one if it is already initialized:
32
+
33
+ .. code-block :: python
34
+
35
+ import modin.pandas as pd
36
+ import modin.config as modin_cfg
37
+
38
+ modin_cfg.Engine.put(" dask" )
39
+ df = pd.read_parquet(" s3://my-bucket/big.parquet" )
40
+
41
+ .. note :: In previous versions of Modin, you had to initialize Dask before importing Modin. As of Modin 0.9.0, This is no longer the case.
42
+
43
+ Using Modin on Dask Clusters
44
+ ----------------------------
45
+
46
+ If you want to use clusters of many machines, you don't need to do any additional steps.
47
+ Just initialize a Dask Client on your cluster and use Modin as you would on a single node.
48
+ As long as Dask Client is initialized before any dataframes are created, Modin
49
+ will be able to connect to and use the Dask Cluster.
50
+
51
+ .. code-block :: python
52
+
53
+ from distributed import Client
54
+ import modin.pandas as pd
55
+ import modin.config as modin_cfg
56
+
57
+ # Please define your cluster here
58
+ cluster = ...
59
+ client = Client(cluster)
60
+
61
+ modin_cfg.Engine.put(" dask" )
62
+ df = pd.read_parquet(" s3://my-bucket/big.parquet" )
63
+
64
+ To get more ways to deploy and run Dask clusters, visit the `Deploying Dask Clusters page `_.
65
+
66
+ How Modin uses Dask
67
+ -------------------
68
+
69
+ Modin has a layered architecture, and the core abstraction for data manipulation
70
+ is the Modin Dataframe, which implements a novel algebra that enables Modin to
71
+ handle all of pandas (see Modin's documentation _ for more on the architecture).
72
+ Modin's internal dataframe object has a scheduling layer that is able to partition
73
+ and operate on data with Dask.
74
+
75
+ Conversion to and from Modin from Dask Dataframe
76
+ ------------------------------------------------
77
+
78
+ Modin DataFrame can be converted to/from Dask Dataframe with no-copy partition conversion.
79
+ This allows you to take advantage of both Dask and Modin libraries for maximum performance.
80
+
81
+ .. code-block :: python
82
+
83
+ import modin.pandas as pd
84
+ import modin.config as modin_cfg
85
+ from modin.pandas.io import to_dask, from_dask
86
+
87
+ modin_cfg.Engine.put(" dask" )
88
+ df = pd.read_parquet(" s3://my-bucket/big.parquet" )
89
+
90
+ # Convert Modin to Dask Dataframe
91
+ dask_df = to_dask(df)
92
+
93
+ # Convert Dask to Modin Dataframe
94
+ modin_df = from_dask(dask_df)
95
+
96
+ .. _Deploying Dask Clusters page : https://docs.dask.org/en/stable/deploying.html
97
+ .. _documentation : https://modin.readthedocs.io/en/latest/development/architecture.html
0 commit comments