Update experiments.md
Closed
requested to merge okhrin--tu-dresden.de/hpc-compendium:okhrin--tu-dresden.de-dev-experiments-patch-34483 into dev-experiments
1 unresolved thread
Compare changes
- Jan Frenzel authored
@@ -13,7 +13,7 @@ each device has a replica of the model and computes over different parts of the
@@ -45,11 +45,11 @@ with multiple GPUs.
@@ -61,8 +61,8 @@ In this case, we will go through an example with Multi Worker Mirrored Strategy.
@@ -74,7 +74,7 @@ In this case, the training job runs on worker 0, which is `10.1.10.58:12345`.
@@ -154,7 +154,7 @@ serve a larger model.
@@ -185,7 +185,7 @@ synchronize gradients and buffers.
@@ -201,13 +201,13 @@ Keep in mind that only one memory parameter (`--mem-per-cpu=<MB>` or `--mem=<MB>
@@ -237,7 +237,7 @@ marie@compute$ module spider Horovod # Check available modules
@@ -246,9 +246,9 @@ marie@alpha$ module load modenv/hiera GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 Ho
@@ -279,9 +279,9 @@ print('Hello from:', hvd.rank())