Fix aes after code review

Ubuntu · Ubuntu · commit d0377ba8df93 · 2024-02-19T10:40:06.000Z
diff --git a/docs/getting_started/using_modin/using_modin_cluster.rst b/docs/getting_started/using_modin/using_modin_cluster.rst
@@ -15,7 +15,7 @@ Modin handles all of this seamlessly and transparently.
    It is possible to use a Jupyter notebook, but you will have to deploy a Jupyter server 
    on the remote cluster head node and connect to it.
 
-.. image:: ../../../img/modin_cluster.png
+.. image:: ../../img/modin_cluster.png
    :alt: Modin cluster
    :align: center
 
@@ -29,7 +29,8 @@ First of all, install the necessary dependencies in your environment:
    pip install boto3
 
 The next step is to setup your AWS credentials. One can set  ``AWS_ACCESS_KEY_ID``, 
-``AWS_SECRET_ACCESS_KEY`` and ``AWS_SESSION_TOKEN``(Optional) `AWS CLI environment variables`_ or  
+``AWS_SECRET_ACCESS_KEY`` and ``AWS_SESSION_TOKEN`` (Optional)
+(refer to `AWS CLI environment variables`_ to get more insight on this) or  
 just run the following command:
 
 .. code-block:: bash
@@ -77,7 +78,7 @@ Executing in a cluster environment
    - https://github.com/modin-project/modin/issues/6641.
 
 Modin lets you instantly speed up your workflows with a large data by scaling pandas
-on a cluster. In this tutorial, we will use a 12.5 GB `big_yellow.csv` file that was
+on a cluster. In this tutorial, we will use a 12.5 GB ``big_yellow.csv`` file that was
 created by concatenating a 200MB `NYC Taxi dataset`_ file 64 times. Preparing this
 file was provided as part of our `Modin's Ray cluster setup config`_.
 
@@ -89,7 +90,7 @@ To run any script in a remote cluster, you need to submit it to the Ray. In this
 the script file is sent to the the remote cluster head node and executed there. 
 
 In this tutorial, we provide the `exercise_5.py`_ script, which reads the data from the
-CSV file and executes such pandas operations as count, groupby and applymap.
+CSV file and executes such pandas operations as count, groupby and map.
 As a result of the script, you will see the size of the file being read and the execution
 time of each function.
 
@@ -104,8 +105,8 @@ You can submit this script to the existing remote cluster by running the followi
 
    ray submit modin-cluster.yaml exercise_5.py
 
-To download or upload files to the cluster head node, use `ray rsync_down` or `ray rsync_up`.
-It may help you if you want to use some other Python modules that should be available to
+To download or upload files to the cluster head node, use ``ray rsync_down`` or ``ray rsync_up``.
+It may help if you want to use some other Python modules that should be available to
 execute your own script or download a result file after executing the script.
 
 .. code-block:: bash
@@ -115,13 +116,14 @@ execute your own script or download a result file after executing the script.
    # upload a file from the local machine to the cluster:
    ray rsync_up modin-cluster.yaml '/local/path' '/path/on/cluster'
 
-Modin performance scales as the number of nodes and cores increases. The following
-chart shows the performance of the ``read_csv`` operation with different number of nodes,
-with improvements in performance as we increase the number of resources Modin can use.
+Shutting down the cluster
+--------------------------
 
-.. image:: ../../../../examples/tutorial/jupyter/img/modin_cluster_perf.png
-   :alt: Cluster Performance
-   :align: center
+Now that we have finished the computation, we need to shut down the cluster with `ray down` command.
+
+.. code-block:: bash
+
+   ray down modin-cluster.yaml
 
 .. _`Ray's autoscaler options`: https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html#cluster-config
 .. _`Ray's cluster docs`: https://docs.ray.io/en/latest/cluster/getting-started.html
diff --git a/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/README.md b/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/README.md
@@ -4,7 +4,7 @@
 <h1>Scale your pandas workflows on a Ray cluster</h2>
 </center>
 
-**NOTE**: Before completing the exercise, please read the full instructions in the 
+**NOTE**: Before starting the exercise, please read the full instructions in the 
 [Modin documenation](https://modin--6872.org.readthedocs.build/en/6872/getting_started/using_modin/using_modin_cluster.html).
 
 The basic steps to run the script on a remote Ray cluster are:
diff --git a/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_5.py b/examples/tutorial/jupyter/execution/pandas_on_ray/cluster/exercise_5.py
@@ -1,51 +1,20 @@
-import os
 import time
 import ray
+
 import modin.pandas as pd
-from modin.utils import execute
 
 ray.init(address="auto")
 cpu_count = ray.cluster_resources()["CPU"]
 assert cpu_count == 576, f"Expected 576 CPUs, but found {cpu_count}"
 
 file_path = "big_yellow.csv"
-file_size = os.path.getsize(file_path)
-
-
-# get human readable file size
-def sizeof_fmt(num, suffix="B"):
-    for unit in ("", "K", "M", "G", "T"):
-        if abs(num) < 1024.0:
-            return f"{num:3.1f}{unit}{suffix}"
-        num /= 1024.0
-    return f"{num:.1f}P{suffix}"
-
-
-print(f"File size is {sizeof_fmt(file_size)}")  # noqa: T201
 
 t0 = time.perf_counter()
-df = pd.read_csv(file_path, quoting=3)
-t1 = time.perf_counter()
-print(f"read_csv time is {(t1 - t0):.3f}")  # noqa: T201
-
-"""
-IMPORTANT:
-Some Dataframe functions are executed asynchronously, so to correctly measure execution time 
-we need to wait for the execution result. We use the special `execute` function for this, 
-but you should not use this function as it will slow down your script.
-"""
 
-t0 = time.perf_counter()
-execute(df.count())
-t1 = time.perf_counter()
-print(f"count time is {(t1 - t0):.3f}")  # noqa: T201
-
-t0 = time.perf_counter()
-execute(df.groupby("passenger_count").count())
-t1 = time.perf_counter()
-print(f"groupby time is {(t1 - t0):.3f}")  # noqa: T201
+df = pd.read_csv(file_path, quoting=3)
+df_count = df.count()
+df_groupby_count = df.groupby("passenger_count").count()
+df_map = df.map(str)
 
-t0 = time.perf_counter()
-execute(df.applymap(str))
 t1 = time.perf_counter()
-print(f"applymap time is {(t1 - t0):.3f}")  # noqa: T201
+print(f"Full script time is {(t1 - t0):.3f}")  # noqa: T201
diff --git a/examples/tutorial/jupyter/img/modin_cluster_perf.png b/examples/tutorial/jupyter/img/modin_cluster_perf.png