From 3c07f39f0a851c673cee08c6611f47d07ce404d0 Mon Sep 17 00:00:00 2001
From: LocNgu <loc.nguyen_dang_duc@tu-dresden.de>
Date: Fri, 20 Aug 2021 13:41:03 +0200
Subject: [PATCH] review checkpoint_restart.md

check spellling, formatting, links, lint
---
 .../jobs_and_resources/checkpoint_restart.md  | 75 ++++++++++---------
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md
index 7d47b7b94..19536fffd 100644
--- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md
+++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/checkpoint_restart.md
@@ -11,17 +11,16 @@ Espresso, STAR-CCM+, VASP
 In case your program does not natively support checkpointing, there are
 attempts at creating generic checkpoint/restart solutions that should
 work application-agnostic. One such project which we recommend is DMTCP:
-Distributed MultiThreaded CheckPointing
-(<http://dmtcp.sourceforge.net>).
+[Distributed MultiThreaded CheckPointing](<http://dmtcp.sourceforge.net>).
 
 It is available on ZIH systems after having loaded the "dmtcp" module:
 
 ```console
-module load DMTCP
+marie@login$ module load DMTCP
 ```
 
 While our batch system of choice, Slurm, also provides a checkpointing
-interface to the user, unfortunately it does not yet support DMTCP at
+interface to the user, unfortunately, it does not yet support DMTCP at
 this time. However, there are ongoing efforts of writing a Slurm plugin
 that hopefully will change this in the near future. We will update this
 documentation as soon as it becomes available.
@@ -51,21 +50,22 @@ parameters `--ib --rm` and put it between srun and your application
 call, e.g.:
 
 ```console
-srun dmtcp_launch --ib --rm ./my-mpi-application
+marie@login$ srun dmtcp_launch --ib --rm ./my-mpi-application
 ```
 
-`Note:` we have successfully tested checkpointing MPI applications with
-the latest `Intel MPI` (module: intelmpi/2018.0.128). While it might
-work with other MPI libraries, too, we have no experience in this
-regard, so you should always try it out before using it for your
-productive jobs.
+!!! note
+    We have successfully tested checkpointing MPI applications with
+    the latest `Intel MPI` (module: intelmpi/2018.0.128). While it might
+    work with other MPI libraries, too, we have no experience in this
+    regard, so you should always try it out before using it for your
+    productive jobs.
 
 Then just substitute your usual `sbatch` call with `dmtcp_sbatch` and be
 sure to specify the `-t` and `-i` parameters (don't forget you need to
 have loaded the dmtcp module).
 
 ```console
-dmtcp_sbatch -t 2-00:00:00 -i 28000,800 my_batchfile.sh
+marie@login$ dmtcp_sbatch --time 2-00:00:00 --interval 28000,800 my_batchfile.sh
 ```
 
 With `-t|--time` you set the total runtime of your calculation overall
@@ -80,9 +80,9 @@ out the checkpoint files, separated from the interval time via comma
 In the above example, there will be 6 jobs each running 8 hours, so
 about 2 days in total.
 
-Hints:
+!!! Hints
 
--   If you see your first job running into the timelimit, that probably
+    -   If you see your first job running into the timelimit, that probably 
     means the timeout for writing out checkpoint files does not suffice
     and should be increased. Our tests have shown that it takes
     approximately 5 minutes to write out the memory content of a fully
@@ -91,14 +91,13 @@ Hints:
     depending on how much memory your application uses. If your memory
     content is rather incompressible, it might be a good idea to disable
     the checkpoint file compression by setting: `export DMTCP_GZIP=0`
--   Note that all jobs the script deems necessary for your chosen
+    -   Note that all jobs the script deems necessary for your chosen
     timelimit/interval values are submitted right when first calling the
     script. If your applications take considerably less time than what
     you specified, some of the individual jobs will be unnecessary. As
     soon as one job does not find a checkpoint to resume from, it will
     cancel all subsequent jobs for you.
--   See `dmtcp_sbatch -h` for a list of available parameters and more
-    help
+    -   See `dmtcp_sbatch -h` for a list of available parameters and more help
 
 What happens in your work directory?
 
@@ -140,18 +139,20 @@ can be useful if you wish to implement some sort of job chaining on your
 own. 1 In front of your program call, you have to add the wrapper
 script: `dmtcp_launch` **TODO check**
 
-```bash
-#/bin/bash 
-#SBATCH --time=00:01:00
-#SBATCH --cpus-per-task=8 
-#SBATCH --mem-per-cpu=1500
+???+ example
 
-source $DMTCP_ROOT/bin/bash start_coordinator -i 40 --exit-after-ckpt
+    ```bash
+    #/bin/bash 
+    #SBATCH --time=00:01:00
+    #SBATCH --cpus-per-task=8 
+    #SBATCH --mem-per-cpu=1500
 
-dmtcp_launch ./my-application #for sequential/multithreaded applications
-#or: srun dmtcp_launch --ib --rm ./my-mpi-application #for MPI
-applications
-```
+    source $DMTCP_ROOT/bin/bash start_coordinator -i 40 --exit-after-ckpt
+
+    dmtcp_launch ./my-application #for sequential/multithreaded applications
+    #or: srun dmtcp_launch --ib --rm ./my-mpi-application #for MPI
+    applications
+    ```
 
 This will create a checkpoint automatically after 40 seconds and then
 terminate your application and with it the job. If the job runs into its
@@ -166,16 +167,18 @@ original job. If you do not wish to create another checkpoint in your
 restarted run again, you can omit the `-i` and `--exit-after-ckpt`
 parameters this time. Afterwards, the application must be run using the
 restart script, specifying the host and port of the coordinator (they
-have been exported by the start_coordinator function). Example:
+have been exported by the start_coordinator function).
 
-```bash
-#/bin/bash 
-#SBATCH --time=00:01:00 
-#SBATCH --cpus-per-task=8
-#SBATCH --mem-per-cpu=1500
+???+ example
 
-source $DMTCP_ROOT/bin/bash start_coordinator -i 40 --exit-after-ckpt
+    ```bash
+    #/bin/bash 
+    #SBATCH --time=00:01:00 
+    #SBATCH --cpus-per-task=8
+    #SBATCH --mem-per-cpu=1500
 
-./dmtcp_restart_script.sh -h $DMTCP_COORD_HOST -p
-$DMTCP_COORD_PORT
-```
+    source $DMTCP_ROOT/bin/bash start_coordinator -i 40 --exit-after-ckpt
+
+    ./dmtcp_restart_script.sh -h $DMTCP_COORD_HOST -p
+    $DMTCP_COORD_PORT
+    ```
-- 
GitLab