Skip to content

Commit 2658c38

Browse files
committed
Address comments
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
1 parent 3fc6a7e commit 2658c38

File tree

1 file changed

+20
-33
lines changed

1 file changed

+20
-33
lines changed

docs/development/using_pandas_on_dask.rst

+20-33
Original file line numberDiff line numberDiff line change
@@ -25,58 +25,45 @@ or turn them on in source code:
2525
Using Modin on Dask locally
2626
---------------------------
2727

28-
If you want to use a single node, just change the Modin Engine to Dask and
29-
continue working with the Modin Dataframe as if it were a Pandas Dataframe.
30-
You don't even have to initialize the Dask Client, because Modin will do it
31-
yourself or use the current one if it is already initialized:
28+
If you want to run Modin on Dask locally using a single node, just set Modin engine to ``Dask`` and
29+
continue working with a Modin DataFrame as if it was a pandas DataFrame.
30+
You can either initialize a Dask client on your own and Modin connects to the existing Dask cluster or
31+
allow Modin itself to initialize a Dask client.
3232

3333
.. code-block:: python
3434
3535
import modin.pandas as pd
3636
import modin.config as modin_cfg
3737
3838
modin_cfg.Engine.put("dask")
39-
df = pd.read_parquet("s3://my-bucket/big.parquet")
39+
df = pd.DataFrame(...)
4040
41-
.. note:: In previous versions of Modin, you had to initialize Dask before importing Modin. As of Modin 0.9.0, This is no longer the case.
41+
Using Modin on Dask in a Cluster
42+
--------------------------------
4243

43-
Using Modin on Dask Clusters
44-
----------------------------
45-
46-
If you want to use clusters of many machines, you don't need to do any additional steps.
47-
Just initialize a Dask Client on your cluster and use Modin as you would on a single node.
48-
As long as Dask Client is initialized before any dataframes are created, Modin
49-
will be able to connect to and use the Dask Cluster.
44+
If you want to run Modin on Dask in a cluster, you should set up a Dask cluster and initialize a Dask client.
45+
Once the Dask client is initialized, Modin will be able to connect to it and use the Dask cluster.
5046

5147
.. code-block:: python
5248
5349
from distributed import Client
5450
import modin.pandas as pd
5551
import modin.config as modin_cfg
5652
57-
# Please define your cluster here
53+
# Define your cluster here
5854
cluster = ...
5955
client = Client(cluster)
6056
6157
modin_cfg.Engine.put("dask")
62-
df = pd.read_parquet("s3://my-bucket/big.parquet")
63-
64-
To get more ways to deploy and run Dask clusters, visit the `Deploying Dask Clusters page`_.
65-
66-
How Modin uses Dask
67-
-------------------
58+
df = pd.DataFrame(...)
6859
69-
Modin has a layered architecture, and the core abstraction for data manipulation
70-
is the Modin Dataframe, which implements a novel algebra that enables Modin to
71-
handle all of pandas (see Modin's documentation_ for more on the architecture).
72-
Modin's internal dataframe object has a scheduling layer that is able to partition
73-
and operate on data with Dask.
60+
To get more information on how to deploy and run a Dask cluster, visit the `Deploy Dask Clusters`_ page.
7461

75-
Conversion to and from Modin from Dask Dataframe
76-
------------------------------------------------
62+
Conversion between Modin DataFrame and Dask DataFrame
63+
-----------------------------------------------------
7764

78-
Modin DataFrame can be converted to/from Dask Dataframe with no-copy partition conversion.
79-
This allows you to take advantage of both Dask and Modin libraries for maximum performance.
65+
Modin DataFrame can be converted to/from Dask DataFrame with no-copy partition conversion.
66+
This allows you to take advantage of both Modin and Dask libraries for maximum performance.
8067

8168
.. code-block:: python
8269
@@ -85,13 +72,13 @@ This allows you to take advantage of both Dask and Modin libraries for maximum p
8572
from modin.pandas.io import to_dask, from_dask
8673
8774
modin_cfg.Engine.put("dask")
88-
df = pd.read_parquet("s3://my-bucket/big.parquet")
75+
df = pd.DataFrame(...)
8976
90-
# Convert Modin to Dask Dataframe
77+
# Convert Modin to Dask DataFrame
9178
dask_df = to_dask(df)
9279
93-
# Convert Dask to Modin Dataframe
80+
# Convert Dask to Modin DataFrame
9481
modin_df = from_dask(dask_df)
9582
96-
.. _Deploying Dask Clusters page: https://docs.dask.org/en/stable/deploying.html
83+
.. _Deploy Dask Clusters: https://docs.dask.org/en/stable/deploying.html
9784
.. _documentation: https://modin.readthedocs.io/en/latest/development/architecture.html

0 commit comments

Comments
 (0)