@@ -15,7 +15,7 @@ Modin handles all of this seamlessly and transparently.
15
15
It is possible to use a Jupyter notebook, but you will have to deploy a Jupyter server
16
16
on the remote cluster head node and connect to it.
17
17
18
- .. image :: ../../../ img/modin_cluster.png
18
+ .. image :: ../../img/modin_cluster.png
19
19
:alt: Modin cluster
20
20
:align: center
21
21
@@ -29,7 +29,8 @@ First of all, install the necessary dependencies in your environment:
29
29
pip install boto3
30
30
31
31
The next step is to setup your AWS credentials. One can set ``AWS_ACCESS_KEY_ID ``,
32
- ``AWS_SECRET_ACCESS_KEY `` and ``AWS_SESSION_TOKEN``(Optional) `AWS CLI environment variables`_ or
32
+ ``AWS_SECRET_ACCESS_KEY `` and ``AWS_SESSION_TOKEN `` (Optional)
33
+ (refer to `AWS CLI environment variables `_ to get more insight on this) or
33
34
just run the following command:
34
35
35
36
.. code-block :: bash
@@ -77,7 +78,7 @@ Executing in a cluster environment
77
78
- https://github.com/modin-project/modin/issues/6641.
78
79
79
80
Modin lets you instantly speed up your workflows with a large data by scaling pandas
80
- on a cluster. In this tutorial, we will use a 12.5 GB `big_yellow.csv` file that was
81
+ on a cluster. In this tutorial, we will use a 12.5 GB `` big_yellow.csv ` ` file that was
81
82
created by concatenating a 200MB `NYC Taxi dataset `_ file 64 times. Preparing this
82
83
file was provided as part of our `Modin's Ray cluster setup config `_.
83
84
@@ -89,7 +90,7 @@ To run any script in a remote cluster, you need to submit it to the Ray. In this
89
90
the script file is sent to the the remote cluster head node and executed there.
90
91
91
92
In this tutorial, we provide the `exercise_5.py `_ script, which reads the data from the
92
- CSV file and executes such pandas operations as count, groupby and applymap .
93
+ CSV file and executes such pandas operations as count, groupby and map .
93
94
As a result of the script, you will see the size of the file being read and the execution
94
95
time of each function.
95
96
@@ -104,8 +105,8 @@ You can submit this script to the existing remote cluster by running the followi
104
105
105
106
ray submit modin-cluster.yaml exercise_5.py
106
107
107
- To download or upload files to the cluster head node, use `ray rsync_down ` or `ray rsync_up `.
108
- It may help you if you want to use some other Python modules that should be available to
108
+ To download or upload files to the cluster head node, use `` ray rsync_down `` or `` ray rsync_up ` `.
109
+ It may help if you want to use some other Python modules that should be available to
109
110
execute your own script or download a result file after executing the script.
110
111
111
112
.. code-block :: bash
@@ -115,13 +116,14 @@ execute your own script or download a result file after executing the script.
115
116
# upload a file from the local machine to the cluster:
116
117
ray rsync_up modin-cluster.yaml ' /local/path' ' /path/on/cluster'
117
118
118
- Modin performance scales as the number of nodes and cores increases. The following
119
- chart shows the performance of the ``read_csv `` operation with different number of nodes,
120
- with improvements in performance as we increase the number of resources Modin can use.
119
+ Shutting down the cluster
120
+ --------------------------
121
121
122
- .. image :: ../../../../examples/tutorial/jupyter/img/modin_cluster_perf.png
123
- :alt: Cluster Performance
124
- :align: center
122
+ Now that we have finished the computation, we need to shut down the cluster with `ray down ` command.
123
+
124
+ .. code-block :: bash
125
+
126
+ ray down modin-cluster.yaml
125
127
126
128
.. _`Ray's autoscaler options` : https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html#cluster-config
127
129
.. _`Ray's cluster docs` : https://docs.ray.io/en/latest/cluster/getting-started.html
0 commit comments