Lalith Manjunath
--- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md

+ 2

− 2
+++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md

+ 2

− 2
 @@ -159,8 +159,8 @@ Python. To work around this issue and gain performance benefits of parallelism,
 @@ -159,8 +159,8 @@ Python. To work around this issue and gain performance benefits of parallelism,
 `torch.nn.DistributedDataParallel` is recommended. This involves little more code changes to set up,
 but further increases the performance of model training. The starting step is to initialize the
 process group by calling the `torch.distributed.init_process_group()` using the appropriate backend
-such as 'nccl', 'mpi' or 'gloo'. The use of 'nccl' as backend is recommended as it is currently the
+such as NCCL, MPI or Gloo. The use of NCCL as back end is recommended as it is currently the fastest
-fastest backend when using GPUs.
+back end when using GPUs.
 #### Using Multiple GPUs with PyTorch