Skip to content

Commit e282fbf

Browse files
committed
Fix docs build
1 parent a590066 commit e282fbf

File tree

3 files changed

+12
-10
lines changed

3 files changed

+12
-10
lines changed

docs/getting_started/using_modin/using_modin_cluster/using_modin_ray_cluster.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ local development and cluster execution. Users are not required to think about
1212
how many workers exist or how to distribute and partition their data;
1313
Modin handles all of this seamlessly and transparently.
1414

15-
.. image:: ../../../examples/tutorial/jupyter/img/modin_cluster.png
15+
.. image:: ../../examples/tutorial/jupyter/img/modin_cluster.png
1616
:alt: Modin cluster
1717
:align: center
1818
:scale: 90%
@@ -37,7 +37,7 @@ just run the following command:
3737
Starting and connecting to the cluster
3838
--------------------------------------
3939

40-
This example starts 1 head node (m5.24xlarge) and 7 worker nodes (m5.24xlarge), 768 total CPUs.
40+
This example starts 1 head node (m5.24xlarge) and 5 worker nodes (m5.24xlarge), 576 total CPUs.
4141
You can check the `Amazon EC2 pricing`_ .
4242

4343
You can manually create AWS EC2 instances and configure them or just use the `Ray autoscaler` to
@@ -76,7 +76,7 @@ Executing on a cluster environment
7676
Modin lets you instantly speed up your workflows with a large data by scaling pandas
7777
on a cluster. In this tutorial, we will use a 12.5 GB `big_yellow.csv` file that was
7878
created by concatenating a 200MB `NYC Taxi dataset`_ file 64 times. Preparing this
79-
file was provided as part of our `Modin's cluster setup config`_.
79+
file was provided as part of our `Modin's Ray cluster setup config`_.
8080

8181
If you want use another dataset in your own script, you should provide it to each of
8282
the cluster nodes in the same path. We recomnend doing this by customizing the
@@ -119,7 +119,7 @@ with improvements in performance as we increase the number of resources Modin ca
119119
.. _`Ray's autoscaler options`: https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html#cluster-config
120120
.. _`Ray's cluster docs`: https://docs.ray.io/en/latest/cluster/getting-started.html
121121
.. _`NYC Taxi dataset`: https://modin-datasets.intel.com/testing/yellow_tripdata_2015-01.csv
122-
.. _`Modin's cluster setup config`: https://github.com/modin-project/modin/blob/master/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/modin-cluster.yaml
122+
.. _`Modin's Ray cluster setup config`: https://github.com/modin-project/modin/blob/master/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/modin-cluster.yaml
123123
.. _`Amazon EC2 pricing`: https://aws.amazon.com/ec2/pricing/on-demand/
124124
.. _`exercise_5.py`: https://github.com/modin-project/modin/blob/master/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_5.py
125125
.. _`Ray client`: https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html

examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_5.md

+7-5
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
**NOTE**: This exercise has extra requirements. Read instructions carefully before attempting.
1212

13-
**This exercise instructs users on how to start a 700+ core Ray cluster,
13+
**This exercise instructs users on how to start a 500+ core Ray cluster,
1414
and it is not shut down until the end of exercise. Read instructions carefully.**
1515

1616
Often in practice we have a need to exceed the capabilities of a single machine.
@@ -40,7 +40,7 @@ aws configure
4040

4141
## Starting and connecting to the cluster
4242

43-
This example starts 1 head node (m5.24xlarge) and 7 worker nodes (m5.24xlarge), 768 total CPUs.
43+
This example starts 1 head node (m5.24xlarge) and 5 worker nodes (m5.24xlarge), 576 total CPUs.
4444

4545
Cost of this cluster can be found here: https://aws.amazon.com/ec2/pricing/on-demand/.
4646

@@ -102,12 +102,14 @@ some other Python modules that should be available to execute your own script or
102102

103103
```bash
104104
# download a file from the cluster to the local computer:
105-
ray rsync_down cluster.yaml '/path/on/cluster' '/local/path'
105+
ray rsync_down modin-cluster.yaml '/path/on/cluster' '/local/path'
106106
# upload a file from the local computer to the cluster:
107-
ray rsync_up cluster.yaml '/local/path' '/path/on/cluster'
107+
ray rsync_up modin-cluster.yaml '/local/path' '/path/on/cluster'
108108
```
109109

110-
By running the script on clusters of different sizes, we can see how the CSV file reading time decreases as the number of nodes increases.
110+
Modin performance scales as the number of nodes and cores increases. The following chart shows
111+
the performance of the read_csv operation with different number of nodes, with improvements in
112+
performance as we increase the number of resources Modin can use.
111113

112114
![ClusterPerf](../../../img/modin_cluster_perf.png)
113115

examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_5.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
ray.init(address="auto")
77
cpu_count = ray.cluster_resources()["CPU"]
8-
assert cpu_count == 768, f"Expected 768 CPUs, but found {cpu_count}"
8+
assert cpu_count == 576, f"Expected 576 CPUs, but found {cpu_count}"
99

1010
file_size = os.path.getsize("big_yellow.csv")
1111

0 commit comments

Comments
 (0)