@@ -25,58 +25,45 @@ or turn them on in source code:
25
25
Using Modin on Dask locally
26
26
---------------------------
27
27
28
- If you want to use a single node, just change the Modin Engine to Dask and
29
- continue working with the Modin Dataframe as if it were a Pandas Dataframe .
30
- You don't even have to initialize the Dask Client, because Modin will do it
31
- yourself or use the current one if it is already initialized:
28
+ If you want to run Modin on Dask locally using a single node, just set Modin engine to `` Dask `` and
29
+ continue working with a Modin DataFrame as if it was a pandas DataFrame .
30
+ You can either initialize a Dask client on your own and Modin connects to the existing Dask cluster or
31
+ allow Modin itself to initialize a Dask client.
32
32
33
33
.. code-block :: python
34
34
35
35
import modin.pandas as pd
36
36
import modin.config as modin_cfg
37
37
38
38
modin_cfg.Engine.put(" dask" )
39
- df = pd.read_parquet( " s3://my-bucket/big.parquet " )
39
+ df = pd.DataFrame( ... )
40
40
41
- .. note :: In previous versions of Modin, you had to initialize Dask before importing Modin. As of Modin 0.9.0, This is no longer the case.
41
+ Using Modin on Dask in a Cluster
42
+ --------------------------------
42
43
43
- Using Modin on Dask Clusters
44
- ----------------------------
45
-
46
- If you want to use clusters of many machines, you don't need to do any additional steps.
47
- Just initialize a Dask Client on your cluster and use Modin as you would on a single node.
48
- As long as Dask Client is initialized before any dataframes are created, Modin
49
- will be able to connect to and use the Dask Cluster.
44
+ If you want to run Modin on Dask in a cluster, you should set up a Dask cluster and initialize a Dask client.
45
+ Once the Dask client is initialized, Modin will be able to connect to it and use the Dask cluster.
50
46
51
47
.. code-block :: python
52
48
53
49
from distributed import Client
54
50
import modin.pandas as pd
55
51
import modin.config as modin_cfg
56
52
57
- # Please define your cluster here
53
+ # Define your cluster here
58
54
cluster = ...
59
55
client = Client(cluster)
60
56
61
57
modin_cfg.Engine.put(" dask" )
62
- df = pd.read_parquet(" s3://my-bucket/big.parquet" )
63
-
64
- To get more ways to deploy and run Dask clusters, visit the `Deploying Dask Clusters page `_.
65
-
66
- How Modin uses Dask
67
- -------------------
58
+ df = pd.DataFrame(... )
68
59
69
- Modin has a layered architecture, and the core abstraction for data manipulation
70
- is the Modin Dataframe, which implements a novel algebra that enables Modin to
71
- handle all of pandas (see Modin's documentation _ for more on the architecture).
72
- Modin's internal dataframe object has a scheduling layer that is able to partition
73
- and operate on data with Dask.
60
+ To get more information on how to deploy and run a Dask cluster, visit the `Deploy Dask Clusters `_ page.
74
61
75
- Conversion to and from Modin from Dask Dataframe
76
- ------------------------------------------------
62
+ Conversion between Modin DataFrame and Dask DataFrame
63
+ -----------------------------------------------------
77
64
78
- Modin DataFrame can be converted to/from Dask Dataframe with no-copy partition conversion.
79
- This allows you to take advantage of both Dask and Modin libraries for maximum performance.
65
+ Modin DataFrame can be converted to/from Dask DataFrame with no-copy partition conversion.
66
+ This allows you to take advantage of both Modin and Dask libraries for maximum performance.
80
67
81
68
.. code-block :: python
82
69
@@ -85,13 +72,13 @@ This allows you to take advantage of both Dask and Modin libraries for maximum p
85
72
from modin.pandas.io import to_dask, from_dask
86
73
87
74
modin_cfg.Engine.put(" dask" )
88
- df = pd.read_parquet( " s3://my-bucket/big.parquet " )
75
+ df = pd.DataFrame( ... )
89
76
90
- # Convert Modin to Dask Dataframe
77
+ # Convert Modin to Dask DataFrame
91
78
dask_df = to_dask(df)
92
79
93
- # Convert Dask to Modin Dataframe
80
+ # Convert Dask to Modin DataFrame
94
81
modin_df = from_dask(dask_df)
95
82
96
- .. _ Deploying Dask Clusters page : https://docs.dask.org/en/stable/deploying.html
83
+ .. _ Deploy Dask Clusters : https://docs.dask.org/en/stable/deploying.html
97
84
.. _documentation : https://modin.readthedocs.io/en/latest/development/architecture.html
0 commit comments