doc: small improvments

starpu-runtime · Sep 30, 2024 · 00257b6 · 00257b6
1 parent 8d512da
commit 00257b6
Show file tree

Hide file tree

Showing 4 changed files with 75 additions and 244 deletions.
diff --git a/doc/doxygen/chapters/starpu_basics/scheduling.doxy b/doc/doxygen/chapters/starpu_basics/scheduling.doxy
@@ -18,54 +18,37 @@
 
 \section TaskSchedulingPolicy Task Scheduling Policies
 
-The basics of the scheduling policy are the following:
+The basics of the scheduling policy are as follows:
 
 <ul>
-<li>The scheduler gets to schedule tasks (<c>push</c> operation) when they become
-ready to be executed, i.e. they are not waiting for some tags, data dependencies
-or task dependencies.</li>
-<li>Workers pull tasks (<c>pop</c> operation) one by one from the scheduler.
+<li>
+The scheduler can schedule tasks (<c>push</c> operation) when they are ready to run, i.e. not waiting for some tags, data dependencies or task dependencies.
+</li>
+<li>
+Workers pull tasks from the scheduler one by one (<c>pop</c> operation).
+</li>
 </ul>
 
-This means scheduling policies usually contain at least one queue of tasks to
-store them between the time when they become available, and the time when a
-worker gets to grab them.
+This means that scheduling policies usually contain at least one queue of tasks to store them between the time they become available, and the time a worker can grab them.
 
-By default, StarPU uses the work-stealing scheduler \b lws. This is
-because it provides correct load balance and locality even if the application codelets do
-not have performance models. Other non-modelling scheduling policies can be
-selected among the list below, thanks to the environment variable \ref
-STARPU_SCHED. For instance, <c>export STARPU_SCHED=dmda</c> . Use <c>help</c> to
-get the list of available schedulers.
+By default, StarPU uses the work-stealing scheduler \b lws. This is because it provides correct load balancing and locality even if the application codelets do not have performance models. Other non-modeled scheduling policies can be selected from the list below, thanks to the \ref
+STARPU_SCHED environment variable. For example, <c>export STARPU_SCHED=dmda</c> . Use <c>help</c> to get the list of available schedulers.
 
-The function starpu_sched_get_predefined_policies() returns a NULL-terminated array of all predefined scheduling policies that are available in StarPU.
-Functions starpu_sched_get_sched_policy_in_ctx() and starpu_sched_get_sched_policy() return the scheduling policy of a task within a specific context or a default context, respectively.
+The starpu_sched_get_predefined_policies() function returns a NULL-terminated array of all predefined scheduling policies available in StarPU. The starpu_sched_get_sched_policy_in_ctx() and starpu_sched_get_sched_policy() functions return the scheduling policy of a task within a specific context or a default context, respectively.
 
 \subsection NonPerformanceModelingPolicies Non Performance Modelling Policies
 
-- The <b>eager</b> scheduler uses a central task queue, from which all workers draw tasks
-to work on concurrently. This however does not permit to prefetch data since the scheduling
-decision is taken late. If a task has a non-0 priority, it is put at the front of the queue.
+- The <b>eager</b> scheduler uses a central task queue, from which all workers draw tasks to work on concurrently. However, this does not allow data prefetching since the scheduling decision is made late. If a task has a priority other than 0, it is placed at the front of the queue.
 
-- The <b>random</b> scheduler uses a queue per worker, and distributes tasks randomly according to assumed worker
-overall performance.
+- The <b>random</b> scheduler uses one queue per worker, and randomly distributes tasks according to the assumed overall performance of the worker.
 
-- The <b>ws</b> (work stealing) scheduler uses a queue per worker, and schedules
-a task on the worker which released it by
-default. When a worker becomes idle, it steals a task from the most loaded
-worker.
+- The <b>ws</b> (work stealing) scheduler uses one queue per worker, and schedules a task on the worker that released it by default. When a worker becomes idle, it steals a task from the most busy worker.
 
-- The <b>lws</b> (locality work stealing) scheduler uses a queue per worker, and schedules
-a task on the worker which released it by
-default. When a worker becomes idle, it steals a task from neighbor workers. It
-also takes priorities into account.
+- The <b>lws</b> (locality work stealing) scheduler uses one queue per worker, and by default, schedules a task on the worker that released it. When a worker becomes idle, it steals a task from neighboring workers. It also takes priorities into account.
 
-- The <b>prio</b> scheduler also uses a central task queue, but sorts tasks by
-priority specified by the application.
+- The <b>prio</b> scheduler also uses a central task queue, but sorts tasks by priority as specified by the application.
 
-- The <b>heteroprio</b> scheduler uses different priorities for the different processing units.
-This scheduler must be configured to work correctly and to expect high-performance
-as described in the corresponding section.
+- The <b>heteroprio</b> scheduler uses different priorities for the different processing units. This scheduler must be configured to work properly and to expect high-performance, as described in the appropriate section.
 
 \subsection DMTaskSchedulingPolicy Performance Model-Based Task Scheduling Policies
 

diff --git a/doc/doxygen/chapters/starpu_basics/tasks.doxy b/doc/doxygen/chapters/starpu_basics/tasks.doxy
@@ -18,95 +18,44 @@
 
 \section TaskGranularity Task Granularity
 
-Similar to other runtimes, StarPU introduces some overhead in managing
-tasks. This overhead, while not always negligible, is mitigated by its
-intelligent scheduling and data management capabilities. The typical
-order of magnitude for this overhead is a few microseconds, which is
-notably smaller than the inherent CUDA overhead. To ensure that this
-overhead remains insignificant, the work assigned to a task should be
-substantial enough.
-
-The length of tasks should ideally be relatively larger to effectively
-counterbalance this overhead. It iss advised to consider the offline
-performance feedback, which provides insights into task lengths.
-Monitoring task lengths becomes crucial if you're encountering
-suboptimal performance.
-
-To gauge the scalability potential based task size, you can run the
-<c>tests/microbenchs/tasks_size_overhead.sh</c> script. It provides a
-visual representation of the speedup achievable with independent tasks
-of very small sizes.
-
-This benchmark is installed in <c>$STARPU_PATH/lib/starpu/examples/</c>.
-It gives a glimpse into how long a task should be (in µs) for StarPU overhead
-to be low enough to keep efficiency. The script generates a plot
-illustrating the speedup trends for tasks of different sizes,
-correlated with the number of CPUs in use.
-
-For example, in the figure below, for 128 µs tasks (the red line),
-StarPU overhead is low enough to guarantee a good speedup if the
-number of CPUs is not more than 36. But with the same number of CPUs,
-64 µs tasks (the black line) cannot have a correct speedup. The number
-of CPUs must be decreased to about 17 in order to keep efficiency.
+Similar to other runtimes, StarPU introduces some overhead in managing tasks. This overhead, while not always negligible, is mitigated by its intelligent scheduling and data management capabilities. The typical order of magnitude for this overhead is a few microseconds, which is significantly less than the inherent CUDA overhead. To ensure that this overhead remains insignificant, the work assigned to a task should be substantial enough.
+
+Ideally, the length of tasks should be relatively large to effectively offset this overhead. It is advisable to consider offline performance feedback, which provides insight into task length. Monitoring task lengths becomes critical when you are experiencing suboptimal performance.
+
+To gauge the scalability potential based on task size, you can run the <c>tests/microbenchs/tasks_size_overhead.sh</c> script. It provides a visual representation of the speedup achievable with independent tasks of very small size.
+
+This benchmark is installed in <c>$STARPU_PATH/lib/starpu/examples/</c>. It gives an idea of how long a task should be (in µs) for StarPU overhead to be low enough to maintain efficiency. The script generates a graph showing the speedup trends for tasks of different sizes, correlated with the number of CPUs used.
+
+For example, in the figure below, for 128 µs tasks (the red line), StarPU overhead is low enough to guarantee a good speedup if the number of CPUs is not more than 36. But with the same number of CPUs, 64 µs tasks (the black line) cannot have a proper speedup. The number of CPUs must be reduced to about 17 to maintain efficiency.
 
 \image html tasks_size_overhead.png
 \image latex tasks_size_overhead.png "" width=\textwidth
 
-To determine the task size your application is using, it is possible
-to use <c>starpu_fxt_data_trace</c> as explained in \ref DataTrace.
+To determine the task size used by your application, it is possible to use <c>starpu_fxt_data_trace</c> as explained in \ref DataTrace.
 
-The selection of a scheduler in StarPU also plays a significant role.
-Different schedulers have varying impacts on the overall execution.
-For example, the \c dmda scheduler may require additional time to make
-decisions, while the \c eager scheduler tends to be more immediate in
-its decisions.
+The choice of a scheduler in StarPU also plays an important role. Different schedulers have different effects on the overall execution. For example, the \c dmda scheduler may require additional time to make decisions, while the \c eager scheduler tends to be more immediate in its decisions.
 
-To assess the impact of scheduler choice on your target machine, you
-can once again utilize the \c tasks_size_overhead.sh script. This
-script provides valuable insights into how different schedulers affect
-performance in conjunction with task sizes.
+To evaluate the impact of scheduler selection on your target machine, you can once again use the \c tasks_size_overhead.sh script. This script provides valuable insight into how different schedulers affect performance in relation to task size.
 
 \section TaskSubmission Task Submission
 
-To enable StarPU to perform online optimizations effectively, it is
-recommended to submit tasks asynchronously whenever possible. The goal
-is to maximize the level of asynchronous submission, allowing StarPU
-to have more flexibility in optimizing the scheduling process.
-Ideally, all tasks should be submitted asynchronously, and the use of
-functions like starpu_task_wait_for_all() or starpu_data_unregister()
-should be limited to waiting for task completion.
+To allow StarPU to effectively perform online optimizations, it is recommended to submit tasks asynchronously whenever possible. The goal is to maximize the level of asynchronous submission, allowing StarPU to have more flexibility in optimizing the scheduling process. Ideally, all tasks should be submitted asynchronously, and the use of functions like starpu_task_wait_for_all() or starpu_data_unregister() should be limited to waiting for task completion.
 
-StarPU will then be able to rework the whole schedule, overlap
-computation with communication, manage accelerator local memory usage, etc.
-A simple example is in the file <c>examples/basic_examples/variable.c</c>
+StarPU will then be able to rework the whole schedule, overlap computation with communication, manage local accelerator memory usage, etc. A simple example can be found in <c>examples/basic_examples/variable.c</c>
 
 \section TaskPriorities Task Priorities
 
-StarPU's default behavior considers tasks in the order they
-are submitted by the application. However, in scenarios where the
-application programmer possesses knowledge about certain tasks that
-should take priority due to their impact on performance (such as tasks
-whose output is crucial for subsequent tasks), the
-starpu_task::priority field can be utilized to convey this information
-to StarPU's scheduling process.
+StarPU's default behavior is to consider tasks in the order in which they are submitted by the application. However, in scenarios where the application programmer has knowledge about certain tasks that should be prioritized due to their impact on performance (such as tasks whose output is critical to subsequent tasks), the starpu_task::priority field can be used to convey this information to StarPU's scheduling process.
 
-An example is provided in the application
-<c>examples/heat/dw_factolu_tag.c</c>.
+An example can be found in <c>examples/heat/dw_factolu_tag.c</c>.
 
 \section SettingManyDataHandlesForATask Setting Many Data Handles For a Task
 
-The maximum number of data that a task can manage is fixed by the macro
-\ref STARPU_NMAXBUFS. This macro has a default value which can be
-customized through the \c configure option \ref enable-maxbuffers
-"--enable-maxbuffers".
+The maximum number of data that a task can manage is set by the macro \ref STARPU_NMAXBUFS. This macro has a default value that can be changed using the \c configure option \ref enable-maxbuffers "--enable-maxbuffers".
 
-However, if you have specific cases where you need tasks to manage
-more data than the maximum allowed, you can use the field
-starpu_task::dyn_handles when defining a task, along with the field
-starpu_codelet::dyn_modes when defining the corresponding codelet.
+However, if you have specific cases where you need tasks to manage more data than the maximum allowed, you can use the starpu_task::dyn_handles field when defining a task, along with the starpu_codelet::dyn_modes field when defining the corresponding codelet.
 
-This dynamic handle mechanism enables tasks to handle additional data
-beyond the usual limit imposed by \ref STARPU_NMAXBUFS.
+This dynamic handle mechanism allows tasks to handle additional data beyond the usual limit imposed by \ref STARPU_NMAXBUFS.
 
 \code{.c}
 enum starpu_data_access_mode modes[STARPU_NMAXBUFS+1] =
@@ -146,17 +95,12 @@ starpu_task_insert(&dummy_big_cl,
 		  0);
 \endcode
 
-The whole code for this complex data interface is available in the
-file <c>examples/basic_examples/dynamic_handles.c</c>.
+The whole code for this complex data interface is available in <c>examples/basic_examples/dynamic_handles.c</c>.
 
 \section SettingVariableDataHandlesForATask Setting a Variable Number Of Data Handles For a Task
 
-Normally, the number of data handles given to a task is set with
-starpu_codelet::nbuffers. This field can however be set to
-\ref STARPU_VARIABLE_NBUFFERS, in which case starpu_task::nbuffers
-must be set, and starpu_task::modes (or starpu_task::dyn_modes,
-see \ref SettingManyDataHandlesForATask) should be used to specify the modes for
-the handles. Examples in <c>examples/basic_examples/dynamic_handles.c</c> show how to implement it.
+Normally, the number of data handles given to a task is set with starpu_codelet::nbuffers. However, this field can be set to \ref STARPU_VARIABLE_NBUFFERS, in which case starpu_task::nbuffers must be set, and starpu_task::modes (or starpu_task::dyn_modes,
+see \ref SettingManyDataHandlesForATask) should be used to specify the modes for the handles. Examples in <c>examples/basic_examples/dynamic_handles.c</c> show how to implement this.
 
 \section InsertTaskUtility Insert Task Utility
 

diff --git a/doc/doxygen/chapters/starpu_installation/building.doxy b/doc/doxygen/chapters/starpu_installation/building.doxy
@@ -16,59 +16,40 @@
 
 /*! \page BuildingAndInstallingStarPU Building and Installing StarPU
 
-Depending on the level of customization required for the library
-installation, we offer several solutions.
+Depending on the level of customization required for the library installation, we offer several solutions.
 
 <ol>
-  <li><b>Basic Installation or Evaluation:</b> If you are looking to
-  simply try out the library, assess its performance on simple cases,
-  run examples, or use the latest stable version, we recommend the
-  following options:
+  <li><b>Basic Installation or Evaluation:</b> If you just want to try out the library, evaluates its performance on simple cases, run examples, or use the latest stable version, we recommend the following options:
     <ul>
     <li>
-    For Linux Debian or Ubuntu distributions, consider using the latest
-    StarPU Debian package (see \ref InstallingABinaryPackage).
+    For Linux Debian or Ubuntu distributions, consider using the latest StarPU Debian package (see \ref InstallingABinaryPackage).
     </li>
     <li>
-    For macOS, you can opt for Brew and follow the steps in \ref
-    InstallingASourcePackage.
+    For macOS, you can use Brew and follow the steps in \ref InstallingASourcePackage.
     </li>
     <li>
-    Using an already installed module on a cluster, as explained in
-    \ref UsingModule
+    Use an already installed module on a cluster, as explained in \ref UsingModule
     </li>
     </ul>
   </li>
-  <li><b>Customization for Specific Needs:</b> If you intend to use
-  StarPU but require modifications, such as switching to another version
-  (git branch), changing the default MPI, utilizing a preferred
-  compiler, or altering source code, consider these options:
+  <li><b>Customize for Specific Needs:</b> If you intend to use StarPU but need modifications, such as switching to a different version (git branch), changing the default MPI, using a preferred compiler, or modifying source code, consider these options:
     <ul>
     <li>
-    Guix or Spack can be useful, as these package managers allow dynamic
-    changes during source-based builds.
-    Refer to \ref InstallingASourcePackage for details.
+    Guix or Spack may be useful, as these package managers allow dynamic changes during source-based builds. See \ref InstallingASourcePackage for details.
     </li>
     <li>
-    Alternatively, you can directly build from the source using the native
-    build system of the library (Makefile, GNU autotools). Instructions
-    can be found in \ref InstallingFromSource.
+    Alternatively, you can build directly from source using the library's native build system (Makefile, GNU autotools). Instructions can be found in \ref InstallingFromSource.
     </li>
     </ul>
   </li>
   <li>
-  <b>Experiment Reproducibility:</b> If your focus is on experiment
-  reproducibility, we recommend using Guix. Refer to \ref
-  InstallingASourcePackage for guidance.
+  <b>Experiment Reproducibility:</b> If your focus is on reproducibility of experiments, we recommend using Guix. Refer to \ref InstallingASourcePackage for guidance.
   </li>
 </ol>
 
-Whichever solution you choose, you can utilize the tool
-<c>bin/starpu_config</c> to view all the configuration parameters used
-during StarPU installation.
+Whichever solution you choose, you can use the tool <c>bin/starpu_config</c> to view all the configuration parameters used during the StarPU installation.
 
-Please refer to the provided documentation for specific installation
-steps and details for each solution.
+Please refer to the documentation provided for specific installation steps and details for each solution.
 
 \section InstallingABinaryPackage Installing a Binary Package