From faed07d7bee4a5d3b80cde8f603125ac2cb49bc0 Mon Sep 17 00:00:00 2001 From: Emicia <veronika.scholz@tu-dresden.de> Date: Fri, 1 Oct 2021 10:34:55 +0200 Subject: [PATCH] Fix spelling --- .../docs/software/distributed_training.md | 10 +++--- doc.zih.tu-dresden.de/wordlist.aspell | 31 +++++++++++++++++++ 2 files changed, 36 insertions(+), 5 deletions(-) diff --git a/doc.zih.tu-dresden.de/docs/software/distributed_training.md b/doc.zih.tu-dresden.de/docs/software/distributed_training.md index 05c266d9e..128eee8c8 100644 --- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md +++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md @@ -13,7 +13,7 @@ each device has a replica of the model and computes over different parts of the 2. model parallelism: models are distributed over multiple devices. -In the folowing we will stick to the concept of data parallelism because it is a widely-used +In the following we will stick to the concept of data parallelism because it is a widely-used technique. There are basically two strategies to train the scattered data throughout the devices: @@ -183,7 +183,7 @@ synchronize gradients and buffers. The tutorial can be found [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). To use distributed data parallelism on ZIH systems please make sure the `--ntasks-per-node` -parameter is equal to the number of GPUs you useper node. +parameter is equal to the number of GPUs you use per node. Also, it can be useful to increase `memory/cpu` parameters if you run larger models. Memory can be set up to: @@ -277,13 +277,13 @@ In the example presented installation for TensorFlow. Adapt as required and refer to the Horovod documentation for details. ```bash -HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_TENSORFLOW=1 pip install --no-cache-dir horovod\[tensorflow\] +HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_WITH_TENSORFLOW=1 pip install --no-cache-dir horovod\[tensorflow\] horovodrun --check-build ``` -If you want to use OpenMPI then specify `HOROVOD_GPU_ALLREDUCE=MPI`. -To have better performance it is recommended to use NCCL instead of OpenMPI. +If you want to use OpenMPI then specify `HOROVOD_GPU_ALLREDUCE=MPI`. +To have better performance it is recommended to use NCCL instead of OpenMPI. ##### Verify that Horovod works diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index 3bfbeea4f..9fcd006a1 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -1,9 +1,11 @@ personal_ws-1.1 en 203 +ALLREDUCE Altix Amdahl's analytics anonymized APIs +awk BeeGFS benchmarking BLAS @@ -13,8 +15,13 @@ ccNUMA centauri citable conda +config +CONFIG +cpu CPU +cpus CPUs +crossentropy CSV CUDA cuDNN @@ -24,9 +31,13 @@ dataframes DataFrames datamover DataParallel +dataset +ddl DDP DDR DFG +dir +distr DistributedDataParallel DockerHub EasyBuild @@ -47,22 +58,30 @@ GFLOPS gfortran GiB gnuplot +gpu GPU GPUs +gres hadoop haswell HDFS +hiera +horovod Horovod +horovodrun hostname HPC HPL +hvd hyperparameter hyperparameters icc icpc ifort ImageNet +img Infiniband +init inode Itanium jobqueue @@ -80,11 +99,13 @@ lsf lustre Mathematica MEGWARE +mem MiB MIMD Miniconda MKL MNIST +modenv Montecito mountpoint mpi @@ -99,7 +120,11 @@ multithreaded NCCL Neptun NFS +nodelist +NODELIST NRINGS +ntasks +NUM NUMA NUMAlink NumPy @@ -134,10 +159,15 @@ PowerAI ppc PSOCK Pthreads +pty +PythonAnaconda +pytorch +PyTorch queue randint reachability README +resnet Rmpi rome romeo @@ -175,6 +205,7 @@ SUSE TBB TCP TensorBoard +tensorflow TensorFlow TFLOPS Theano -- GitLab