From 62b86f2baf334aeb40fbb5ea8837d6998ebec5fc Mon Sep 17 00:00:00 2001 From: Christoph Lehmann <christoph.lehmann@tu-dresden.de> Date: Thu, 19 Aug 2021 13:59:35 +0200 Subject: [PATCH] distributed_training.md: added some content from old wiki as starting point and included some basic document structure --- .../docs/software/distributed_training.md | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/doc.zih.tu-dresden.de/docs/software/distributed_training.md b/doc.zih.tu-dresden.de/docs/software/distributed_training.md index e69de29bb..3a95d0a08 100644 --- a/doc.zih.tu-dresden.de/docs/software/distributed_training.md +++ b/doc.zih.tu-dresden.de/docs/software/distributed_training.md @@ -0,0 +1,68 @@ +# Distributed TensorFlow + +TODO + +# Distributed Pytorch + +**hint: just copied some old content as starting point** + +## Using Multiple GPUs with PyTorch + +Effective use of GPUs is essential, and it implies using parallelism in +your code and model. Data Parallelism and model parallelism are effective instruments +to improve the performance of your code in case of GPU using. + +The data parallelism is a widely-used technique. It replicates the same model to all GPUs, +where each GPU consumes a different partition of the input data. You could see this method [here](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html). + +The example below shows how to solve that problem by using model +parallel, which, in contrast to data parallelism, splits a single model +onto different GPUs, rather than replicating the entire model on each +GPU. The high-level idea of model parallel is to place different sub-networks of a model onto different +devices. As the only part of a model operates on any individual device, a set of devices can +collectively serve a larger model. + +It is recommended to use [DistributedDataParallel] +(https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html), +instead of this class, to do multi-GPU training, even if there is only a single node. +See: Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel. +Check the [page](https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead) and +[Distributed Data Parallel](https://pytorch.org/docs/stable/notes/ddp.html#ddp). + +Examples: + +1\. The parallel model. The main aim of this model to show the way how +to effectively implement your neural network on several GPUs. It +includes a comparison of different kinds of models and tips to improve +the performance of your model. **Necessary** parameters for running this +model are **2 GPU** and 14 cores (56 thread). + +(example_PyTorch_parallel.zip) + +Remember that for using [JupyterHub service](../access/jupyterhub.md) +for PyTorch you need to create and activate +a virtual environment (kernel) with loaded essential modules. + +Run the example in the same way as the previous examples. + +## Distributed data-parallel + +[DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel) +(DDP) implements data parallelism at the module level which can run across multiple machines. +Applications using DDP should spawn multiple processes and create a single DDP instance per process. +DDP uses collective communications in the [torch.distributed] +(https://pytorch.org/tutorials/intermediate/dist_tuto.html) +package to synchronize gradients and buffers. + +The tutorial could be found [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). + +To use distributed data parallelisation on Taurus please use following +parameters: `--ntasks-per-node` -parameter to the number of GPUs you use +per node. Also, it could be useful to increase `memomy/cpu` parameters +if you run larger models. Memory can be set up to: + +--mem=250000 and --cpus-per-task=7 for the **ml** partition. + +--mem=60000 and --cpus-per-task=6 for the **gpu2** partition. + +Keep in mind that only one memory parameter (`--mem-per-cpu` = <MB> or `--mem`=<MB>) can be specified -- GitLab