diff --git a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md index dcdd9363c8d406d7227b97abce91ad67298e9a67..b6b0f25d3963da0529f26274a3daf4bdfcb0bbe0 100644 --- a/doc.zih.tu-dresden.de/docs/access/jupyterhub.md +++ b/doc.zih.tu-dresden.de/docs/access/jupyterhub.md @@ -41,7 +41,7 @@ settings. You can: - modify batch system parameters to your needs ([more about batch system Slurm](../jobs_and_resources/slurm.md)) - assign your session to a project or reservation -- load modules from the [module system](../software/runtime_environment.md) +- load modules from the [module system](../software/modules.md) - choose a different standard environment (in preparation for future software updates or testing additional features) @@ -189,7 +189,7 @@ Here is a short list of some included software: \* generic = all partitions except ml -\*\* R is loaded from the [module system](../software/runtime_environment.md) +\*\* R is loaded from the [module system](../software/modules.md) ### Creating and Using a Custom Environment diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md index 1b395644aa972113ac887c764c9a651f56826093..218bd3d4b186efcd583c3fb6c092b4e0dbad3180 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/hardware_overview.md @@ -12,7 +12,7 @@ users and the ZIH. - Login-Nodes (`tauruslogin[3-6].hrsk.tu-dresden.de`) - each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 each with 12 cores - @ 2.50GHz, MultiThreading Disabled, 64 GB RAM, 128 GB SSD local disk + @ 2.50GHz, Multithreading Disabled, 64 GB RAM, 128 GB SSD local disk - IPs: 141.30.73.\[102-105\] - Transfer-Nodes (`taurusexport3/4.hrsk.tu-dresden.de`, DNS Alias `taurusexport.hrsk.tu-dresden.de`) @@ -25,7 +25,7 @@ users and the ZIH. - 32 nodes, each with - 8 x NVIDIA A100-SXM4 - - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, MultiThreading disabled + - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, Multithreading disabled - 1 TB RAM - 3.5 TB local memory at NVMe device at `/tmp` - Hostnames: `taurusi[8001-8034]` @@ -35,7 +35,7 @@ users and the ZIH. ## Island 7 - AMD Rome CPUs - 192 nodes, each with - - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, MultiThreading + - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, Multithreading enabled, - 512 GB RAM - 200 GB /tmp on local SSD local disk @@ -66,7 +66,7 @@ For machine learning, we have 32 IBM AC922 nodes installed with this configurati ## Island 4 to 6 - Intel Haswell CPUs - 1456 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) - @ 2.50GHz, MultiThreading disabled, 128 GB SSD local disk + @ 2.50GHz, Multithreading disabled, 128 GB SSD local disk - Hostname: `taurusi4[001-232]`, `taurusi5[001-612]`, `taurusi6[001-612]` - Varying amounts of main memory (selected automatically by the batch @@ -87,7 +87,7 @@ For machine learning, we have 32 IBM AC922 nodes installed with this configurati ### Extension of Island 4 with Broadwell CPUs * 32 nodes, each witch 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz - (**14 cores**), MultiThreading disabled, 64 GB RAM, 256 GB SSD local disk + (**14 cores**), Multithreading disabled, 64 GB RAM, 256 GB SSD local disk * from the users' perspective: Broadwell is like Haswell * Hostname: `taurusi[4233-4264]` * Slurm partition `broadwell` @@ -95,7 +95,7 @@ For machine learning, we have 32 IBM AC922 nodes installed with this configurati ## Island 2 Phase 2 - Intel Haswell CPUs + NVIDIA K80 GPUs * 64 nodes, each with 2x Intel(R) Xeon(R) CPU E5-E5-2680 v3 (12 cores) - @ 2.50GHz, MultiThreading Disabled, 64 GB RAM (2.67 GB per core), + @ 2.50GHz, Multithreading Disabled, 64 GB RAM (2.67 GB per core), 128 GB SSD local disk, 4x NVIDIA Tesla K80 (12 GB GDDR RAM) GPUs * Hostname: `taurusi2[045-108]` * Slurm Partition `gpu` @@ -104,7 +104,7 @@ For machine learning, we have 32 IBM AC922 nodes installed with this configurati ## SMP Nodes - up to 2 TB RAM - 5 Nodes each with 4x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ - 2.20GHz, MultiThreading Disabled, 2 TB RAM + 2.20GHz, Multithreading Disabled, 2 TB RAM - Hostname: `taurussmp[3-7]` - Slurm partition `smp2` @@ -116,7 +116,7 @@ For machine learning, we have 32 IBM AC922 nodes installed with this configurati ## Island 2 Phase 1 - Intel Sandybridge CPUs + NVIDIA K20x GPUs - 44 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2450 (8 cores) @ - 2.10GHz, MultiThreading Disabled, 48 GB RAM (3 GB per core), 128 GB + 2.10GHz, Multithreading Disabled, 48 GB RAM (3 GB per core), 128 GB SSD local disk, 2x NVIDIA Tesla K20x (6 GB GDDR RAM) GPUs - Hostname: `taurusi2[001-044]` - Slurm partition `gpu1` diff --git a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md index 396657db06766eaab6f8694ca4bed4f8014cf7f4..2af016d0188ae4f926b45e7b8fdc14b039e8baa3 100644 --- a/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md +++ b/doc.zih.tu-dresden.de/docs/jobs_and_resources/slurm_examples.md @@ -39,7 +39,7 @@ For MPI-parallel jobs one typically allocates one core per task that has to be s There are different MPI libraries on ZIH systems for the different micro archtitectures. Thus, you have to compile the binaries specifically for the target architecture and partition. Please refer to the sections [building software](../software/building_software.md) and - [module environments](../software/runtime_environment.md#module-environments) for detailed + [module environments](../software/modules.md#module-environments) for detailed information. !!! example "Job file for MPI application" diff --git a/doc.zih.tu-dresden.de/docs/software/fem_software.md b/doc.zih.tu-dresden.de/docs/software/fem_software.md index aa8917ad3c59fbd1c60a349e9030ff830f904234..3be2314889bfe45f9554fb499c4d757337bef33d 100644 --- a/doc.zih.tu-dresden.de/docs/software/fem_software.md +++ b/doc.zih.tu-dresden.de/docs/software/fem_software.md @@ -11,7 +11,7 @@ marie@login$ module load ANSYS/<version> ``` - The section [runtime environment](runtime_environment.md) provides a comprehensive overview + The section [runtime environment](modules.md) provides a comprehensive overview on the module system and relevant commands. ## Abaqus diff --git a/doc.zih.tu-dresden.de/docs/software/mathematics.md b/doc.zih.tu-dresden.de/docs/software/mathematics.md index 9629e76b77cd8779a993c6c1f3bc5b0fe68d1140..21aab2856a7b9582c3f6b8d5453d7ea2f8b6895b 100644 --- a/doc.zih.tu-dresden.de/docs/software/mathematics.md +++ b/doc.zih.tu-dresden.de/docs/software/mathematics.md @@ -119,7 +119,7 @@ zih$ module load MATLAB ``` (then you will get the most recent Matlab version. -[Refer to the modules section for details.](../software/runtime_environment.md#modules)) +[Refer to the modules section for details.](../software/modules.md#modules)) ### Interactive diff --git a/doc.zih.tu-dresden.de/docs/software/modules.md b/doc.zih.tu-dresden.de/docs/software/modules.md index 8f5a0ae2c4792fd92c458dc89033b2058a1e22de..58f200d25f01d52385626776b53c93f38e999397 100644 --- a/doc.zih.tu-dresden.de/docs/software/modules.md +++ b/doc.zih.tu-dresden.de/docs/software/modules.md @@ -1,199 +1,326 @@ # Modules -Usage of software on HPC systems is managed by a **modules system**. A module is a user interface -that provides utilities for the dynamic modification of a user's environment (e.g., *PATH*, -*LD_LIBRARY_PATH* etc.) to access the compilers, loader, libraries, and utilities. With the help -of modules, users can smoothly switch between different versions of installed software packages -and libraries. +Usage of software on HPC systems is managed by a **modules system**. -For all applications, tools, libraries etc. the correct environment can be easily set by the command +!!! note "Module" -``` -module load -``` + A module is a user interface that provides utilities for the dynamic modification of a user's + environment, e.g. prepending paths to: -e.g: `module load MATLAB`. If several versions are installed they can be chosen like: `module load -MATLAB/2019b`. + * `PATH` + * `LD_LIBRARY_PATH` + * `MANPATH` + * and more -A list of all modules shows by command + to help you to access compilers, loader, libraries and utilities. -``` -module available -#or -module avail -#or -ml av + By using modules, you can smoothly switch between different versions of + installed software packages and libraries. -``` +## Module Commands -Other important commands are: +Using modules is quite straightforward and the following table lists the basic commands. -```Bash -module help #show all module options -module list #list all user-installed modules -module purge #remove all user-installed modules -module spider #search for modules across all environments, can take a parameter -module load <modname> #load module modname -module rm <modname> #unload module modname -module switch <mod> <mod2> #unload module mod1; load module mod2 -``` +| Command | Description | +|:------------------------------|:-----------------------------------------------------------------| +| `module help` | Show all module options | +| `module list` | List active modules in the user environment | +| `module purge` | Remove modules from the user environment | +| `module avail [modname]` | List all available modules | +| `module spider [modname]` | Search for modules across all environments | +| `module load <modname>` | Load module `modname` in the user environment | +| `module unload <modname>` | Remove module `modname` from the user environment | +| `module switch <mod1> <mod2>` | Replace module `mod1` with module `mod2` | -Module files are ordered by their topic on Taurus. By default, with `module available` you will see -all available module files and topics. If you just wish to see the installed versions of a certain -module, you can use `module av <softwarename>` and all available versions of the exact software will -be displayed. +Module files are ordered by their topic on ZIH systems. By default, with `module avail` you will +see all topics and their available module files. If you just wish to see the installed versions of a +certain module, you can use `module avail softwarename` and it will display the available versions of +`softwarename` only. -## Module environments +### Examples -On Taurus, there exist different module environments, each containing a set of software modules. -They are activated via the meta module modenv which has different versions, one of which is loaded -by default. You can switch between them by simply loading the desired modenv-version, e.g.: +???+ example "Finding available software" + This examples illustrates the usage of the command `module avail` to search for available Matlab + installations. + + ```console + marie@compute$ module avail matlab + + ------------------------------ /sw/modules/scs5/math ------------------------------ + MATLAB/2017a MATLAB/2018b MATLAB/2020a + MATLAB/2018a MATLAB/2019b MATLAB/2021a (D) + + Wo: + D: Standard Modul. + + Verwenden Sie "module spider" um alle verfügbaren Module anzuzeigen. + Verwenden Sie "module keyword key1 key2 ...", um alle verfügbaren Module + anzuzeigen, die mindestens eines der Schlüsselworte enthält. + ``` + +???+ example "Loading and removing modules" + + A particular module or several modules are loaded into your environment using the `module load` + command. The counter part to remove a module or several modules is `module unload`. + + ```console + marie@compute$ module load Python/3.8.6 + Module Python/3.8.6-GCCcore-10.2.0 and 11 dependencies loaded. + ``` + +???+ example "Removing all modules" + + To remove all loaded modules from your environment with one keystroke, invoke + + ```console + marie@compute$ module purge + Die folgenden Module wurden nicht entladen: + (Benutzen Sie "module --force purge" um alle Module zu entladen): + + 1) modenv/scs5 + Module Python/3.8.6-GCCcore-10.2.0 and 11 dependencies unloaded. + ``` + +### Front-End ml + +There is a front end for the module command, which helps you to type less. It is `ml`. + Any module command can be given after `ml`: + +| ml Command | module Command | +|:------------------|:------------------------------------------| +| `ml` | `module list` | +| `ml foo bar` | `module load foo bar` | +| `ml -foo -bar baz`| `module unload foo bar; module load baz` | +| `ml purge` | `module purge` | +| `ml show foo` | `module show foo` | + +???+ example "Usage of front-end ml" + + ```console + marie@compute$ ml +Python/3.8.6 + Module Python/3.8.6-GCCcore-10.2.0 and 11 dependencies loaded. + marie@compute$ ml + + Derzeit geladene Module: + 1) modenv/scs5 (S) 5) bzip2/1.0.8-GCCcore-10.2.0 9) SQLite/3.33.0-GCCcore-10.2.0 13) Python/3.8.6-GCCcore-10.2.0 + 2) GCCcore/10.2.0 6) ncurses/6.2-GCCcore-10.2.0 10) XZ/5.2.5-GCCcore-10.2.0 + 3) zlib/1.2.11-GCCcore-10.2.0 7) libreadline/8.0-GCCcore-10.2.0 11) GMP/6.2.0-GCCcore-10.2.0 + 4) binutils/2.35-GCCcore-10.2.0 8) Tcl/8.6.10-GCCcore-10.2.0 12) libffi/3.3-GCCcore-10.2.0 + + Wo: + S: Das Modul ist angeheftet. Verwenden Sie "--force", um das Modul zu entladen. + + marie@compute$ ml -Python/3.8.6 +ANSYS/2020R2 + Module Python/3.8.6-GCCcore-10.2.0 and 11 dependencies unloaded. + Module ANSYS/2020R2 loaded. + ``` + +## Module Environments + +On ZIH systems, there exist different **module environments**, each containing a set of software modules. +They are activated via the meta module `modenv` which has different versions, one of which is loaded +by default. You can switch between them by simply loading the desired modenv-version, e.g. + +```console +marie@compute$ module load modenv/ml ``` -module load modenv/ml -``` -| modenv/scs5 | SCS5 software | default | -| | | | -| modenv/ml | HPC-DA software (for use on the "ml" partition) | | -| modenv/hiera | WIP hierarchical module tree | | -| modenv/classic | Manually built pre-SCS5 (AE4.0) software | default | -| | | | - -The old modules (pre-SCS5) are still available after loading the corresponding `modenv` version -(classic), however, due to changes in the libraries of the operating system, it is not guaranteed -that they still work under SCS5. Please don't use modenv/classic if you do not absolutely have to. -Most software is available under modenv/scs5, too, just be aware of the possibly different spelling -(case-sensitivity). - -The command `module spider <modname>` allows searching for specific software in all modenv -environments. It will also display information on how to load a found module when giving a precise +### modenv/scs5 (default) + +* SCS5 software +* usually optimized for Intel processors (Partitions: `haswell`, `broadwell`, `gpu2`, `julia`) + +### modenv/ml + +* data analytics software (for use on the partition ml) +* necessary to run most software on the partition ml +(The instruction set [Power ISA](https://en.wikipedia.org/wiki/Power_ISA#Power_ISA_v.3.0) +is different from the usual x86 instruction set. +Thus the 'machine code' of other modenvs breaks). + +### modenv/hiera + +* uses a hierarchical module load scheme +* optimized software for AMD processors (Partitions: romeo, alpha) + +### modenv/classic + +* deprecated, old software. Is not being curated. +* may break due to library inconsistencies with the operating system. +* please don't use software from that modenv + +### Searching for Software + +The command `module spider <modname>` allows searching for a specific software across all modenv +environments. It will also display information on how to load a particular module when giving a precise module (with version) as the parameter. -## Per-architecture builds +??? example + + ```console + marie@login$ module spider p7zip + + ---------------------------------------------------------------------------------------------------------------------------------------------------------- + p7zip: + ---------------------------------------------------------------------------------------------------------------------------------------------------------- + Beschreibung: + p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip) for Unix. 7-Zip is a file archiver with highest compression ratio. + + Versionen: + p7zip/9.38.1 + p7zip/17.03-GCCcore-10.2.0 + p7zip/17.03 + + ---------------------------------------------------------------------------------------------------------------------------------------------------------- + Um detaillierte Informationen über ein bestimmtes "p7zip"-Modul zu erhalten (auch wie das Modul zu laden ist), verwenden sie den vollständigen Namen des Moduls. + Zum Beispiel: + $ module spider p7zip/17.03 + ---------------------------------------------------------------------------------------------------------------------------------------------------------- + ``` + +## Per-Architecture Builds Since we have a heterogeneous cluster, we do individual builds of some of the software for each architecture present. This ensures that, no matter what partition the software runs on, a build -optimized for the host architecture is used automatically. This is achieved by having -'/sw/installed' symlinked to different directories on the compute nodes. +optimized for the host architecture is used automatically. +For that purpose we have created symbolic links on the compute nodes, +at the system path `/sw/installed`. However, not every module will be available for each node type or partition. Especially when introducing new hardware to the cluster, we do not want to rebuild all of the older module versions and in some cases cannot fall-back to a more generic build either. That's why we provide the script: `ml_arch_avail` that displays the availability of modules for the different node architectures. +### Example Invocation of ml_arch_avail + +```console +marie@compute$ ml_arch_avail TensorFlow/2.4.1 +TensorFlow/2.4.1: haswell, rome +TensorFlow/2.4.1: haswell, rome ``` -ml_arch_avail CP2K -Example output: +The command shows all modules that match on `TensorFlow/2.4.1`, and their respective availability. +Note that this will not work for meta-modules that do not have an installation directory +(like some tool chain modules). -#CP2K/6.1-foss-2019a: haswell, rome -#CP2K/5.1-intel-2018a: haswell -#CP2K/6.1-foss-2019a-spglib: haswell, rome -#CP2K/6.1-intel-2018a: haswell -#CP2K/6.1-intel-2018a-spglib: haswell -``` +## Advanced Usage -The command shows all modules that match on CP2K, and their respective availability. Note that this -will not work for meta-modules that do not have an installation directory (like some toolchain -modules). +For writing your own Modulefiles please have a look at the [Guide for writing project and private Modulefiles](private_modules.md). -## Project and User Private Modules +## Troubleshooting -Private module files allow you to load your own installed software packages into your environment -and to handle different versions without getting into conflicts. Private modules can be setup for a -single user as well as all users of project group. The workflow and settings for user private module -files is described in the following. The [settings for project private -modules](#project-private-modules) differ only in details. +### When I log in, the wrong modules are loaded by default -The command +Reset your currently loaded modules with `module purge` +(or `module purge --force` if you also want to unload your basic `modenv` module). +Then run `module save` to overwrite the +list of modules you load by default when logging in. -``` -module use <path_to_module_files> -``` +### I can't load module TensorFlow -adds directory by user choice to the list of module directories that are searched by the `module` -command. Within directory `../privatemodules` user can add directories for every software user wish -to install and add also in this directory a module file for every version user have installed. -Further information about modules can be found [here](http://modules.sourceforge.net/). +Check the dependencies by e.g. calling `module spider TensorFlow/2.4.1` +it will list a number of modules that need to be loaded +before the TensorFlow module can be loaded. -This is an example of work a private module file: +??? example "Loading the dependencies" -- create a directory in your home directory: + ```console + marie@compute$ module load TensorFlow/2.4.1 + Lmod hat den folgenden Fehler erkannt: Diese Module existieren, aber + können nicht wie gewünscht geladen werden: "TensorFlow/2.4.1" + Versuchen Sie: "module spider TensorFlow/2.4.1" um anzuzeigen, wie die Module + geladen werden. -``` -cd -mkdir privatemodules && cd privatemodules -mkdir testsoftware && cd testsoftware -``` -- add the directory in the list of module directories: + marie@compute$ module spider TensorFlow/2.4.1 -``` -module use $HOME/privatemodules -``` + ---------------------------------------------------------------------------------- + TensorFlow: TensorFlow/2.4.1 + ---------------------------------------------------------------------------------- + Beschreibung: + An open-source software library for Machine Intelligence -- create a file with the name `1.0` with a test software in the `testsoftware` directory (use e.g. -echo, emacs, etc): -``` -#%Module###################################################################### -## -## testsoftware modulefile -## -proc ModulesHelp { } { - puts stderr "Loads testsoftware" -} - -set version 1.0 -set arch x86_64 -set path /home/<user>/opt/testsoftware/$version/$arch/ - -prepend-path PATH $path/bin -prepend-path LD_LIBRARY_PATH $path/lib - -if [ module-info mode load ] { - puts stderr "Load testsoftware version $version" -} -``` + Sie müssen alle Module in einer der nachfolgenden Zeilen laden bevor Sie das Modul "TensorFlow/2.4.1" laden können. -- check the availability of the module with `ml av`, the output should look like this: + modenv/hiera GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 + This extension is provided by the following modules. To access the extension you must load one of the following modules. Note that any module names in parentheses show the module location in the software hierarchy. -``` ---------------------- /home/masterman/privatemodules --------------------- - testsoftware/1.0 -``` -- load the test module with `module load testsoftware`, the output: + TensorFlow/2.4.1 (modenv/hiera GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5) -``` -Load testsoftware version 1.0 -Module testsoftware/1.0 loaded. -``` -### Project Private Modules + This module provides the following extensions: -Private module files allow you to load project- or group-wide installed software into your -environment and to handle different versions without getting into conflicts. + absl-py/0.10.0 (E), astunparse/1.6.3 (E), cachetools/4.2.0 (E), dill/0.3.3 (E), gast/0.3.3 (E), google-auth-oauthlib/0.4.2 (E), google-auth/1.24.0 (E), google-pasta/0.2.0 (E), grpcio/1.32.0 (E), gviz-api/1.9.0 (E), h5py/2.10.0 (E), Keras-Preprocessing/1.1.2 (E), Markdown/3.3.3 (E), oauthlib/3.1.0 (E), opt-einsum/3.3.0 (E), portpicker/1.3.1 (E), pyasn1-modules/0.2.8 (E), requests-oauthlib/1.3.0 (E), rsa/4.7 (E), tblib/1.7.0 (E), tensorboard-plugin-profile/2.4.0 (E), tensorboard-plugin-wit/1.8.0 (E), tensorboard/2.4.1 (E), tensorflow-estimator/2.4.0 (E), TensorFlow/2.4.1 (E), termcolor/1.1.0 (E), Werkzeug/1.0.1 (E), wrapt/1.12.1 (E) -The module files have to be stored in your global projects directory -`/projects/p_projectname/privatemodules`. An example of a module file can be found in the section -above. To use a project-wide module file you have to add the path to the module file to the module -environment with the command + Help: + Description + =========== + An open-source software library for Machine Intelligence -``` -module use /projects/p_projectname/privatemodules -``` -After that, the modules are available in your module environment and you can load the modules with -the `module load` command. + More information + ================ + - Homepage: https://www.tensorflow.org/ + + + Included extensions + =================== + absl-py-0.10.0, astunparse-1.6.3, cachetools-4.2.0, dill-0.3.3, gast-0.3.3, + google-auth-1.24.0, google-auth-oauthlib-0.4.2, google-pasta-0.2.0, + grpcio-1.32.0, gviz-api-1.9.0, h5py-2.10.0, Keras-Preprocessing-1.1.2, + Markdown-3.3.3, oauthlib-3.1.0, opt-einsum-3.3.0, portpicker-1.3.1, + pyasn1-modules-0.2.8, requests-oauthlib-1.3.0, rsa-4.7, tblib-1.7.0, + tensorboard-2.4.1, tensorboard-plugin-profile-2.4.0, tensorboard-plugin- + wit-1.8.0, TensorFlow-2.4.1, tensorflow-estimator-2.4.0, termcolor-1.1.0, + Werkzeug-1.0.1, wrapt-1.12.1 + + + Names marked by a trailing (E) are extensions provided by another module. + + + + marie@compute$ ml +modenv/hiera +GCC/10.2.0 +CUDA/11.1.1 +OpenMPI/4.0.5 +TensorFlow/2.4.1 + + Die folgenden Module wurden in einer anderen Version erneut geladen: + 1) GCC/7.3.0-2.30 => GCC/10.2.0 3) binutils/2.30-GCCcore-7.3.0 => binutils/2.35 + 2) GCCcore/7.3.0 => GCCcore/10.2.0 4) modenv/scs5 => modenv/hiera -## Using Private Modules and Programs in the $HOME Directory + Module GCCcore/7.3.0, binutils/2.30-GCCcore-7.3.0, GCC/7.3.0-2.30, GCC/7.3.0-2.30 and 3 dependencies unloaded. + Module GCCcore/7.3.0, GCC/7.3.0-2.30, GCC/10.2.0, CUDA/11.1.1, OpenMPI/4.0.5, TensorFlow/2.4.1 and 50 dependencies loaded. + marie@compute$ module list -An automated backup system provides security for the HOME-directories on the cluster on a daily -basis. This is the reason why we urge users to store (large) temporary data (like checkpoint files) -on the /scratch -Filesystem or at local scratch disks. + Derzeit geladene Module: + 1) modenv/hiera (S) 28) Tcl/8.6.10 + 2) GCCcore/10.2.0 29) SQLite/3.33.0 + 3) zlib/1.2.11 30) GMP/6.2.0 + 4) binutils/2.35 31) libffi/3.3 + 5) GCC/10.2.0 32) Python/3.8.6 + 6) CUDAcore/11.1.1 33) pybind11/2.6.0 + 7) CUDA/11.1.1 34) SciPy-bundle/2020.11 + 8) numactl/2.0.13 35) Szip/2.1.1 + 9) XZ/5.2.5 36) HDF5/1.10.7 + 10) libxml2/2.9.10 37) cURL/7.72.0 + 11) libpciaccess/0.16 38) double-conversion/3.1.5 + 12) hwloc/2.2.0 39) flatbuffers/1.12.0 + 13) libevent/2.1.12 40) giflib/5.2.1 + 14) Check/0.15.2 41) ICU/67.1 + 15) GDRCopy/2.1-CUDA-11.1.1 42) JsonCpp/1.9.4 + 16) UCX/1.9.0-CUDA-11.1.1 43) NASM/2.15.05 + 17) libfabric/1.11.0 44) libjpeg-turbo/2.0.5 + 18) PMIx/3.1.5 45) LMDB/0.9.24 + 19) OpenMPI/4.0.5 46) nsync/1.24.0 + 20) OpenBLAS/0.3.12 47) PCRE/8.44 + 21) FFTW/3.3.8 48) protobuf/3.14.0 + 22) ScaLAPACK/2.1.0 49) protobuf-python/3.14.0 + 23) cuDNN/8.0.4.30-CUDA-11.1.1 50) flatbuffers-python/1.12 + 24) NCCL/2.8.3-CUDA-11.1.1 51) typing-extensions/3.7.4.3 + 25) bzip2/1.0.8 52) libpng/1.6.37 + 26) ncurses/6.2 53) snappy/1.1.8 + 27) libreadline/8.0 54) TensorFlow/2.4.1 -**Please note**: We have set `ulimit -c 0` as a default to prevent users from filling the disk with -the dump of a crashed program. bash -users can use `ulimit -Sc unlimited` to enable the debugging -via analyzing the core file (limit coredumpsize unlimited for tcsh). + Wo: + S: Das Modul ist angeheftet. Verwenden Sie "--force", um das Modul zu entladen. + ``` diff --git a/doc.zih.tu-dresden.de/docs/software/private_modules.md b/doc.zih.tu-dresden.de/docs/software/private_modules.md new file mode 100644 index 0000000000000000000000000000000000000000..4b79463f05988afd689b5fa18bddc758c16dfaa7 --- /dev/null +++ b/doc.zih.tu-dresden.de/docs/software/private_modules.md @@ -0,0 +1,105 @@ +# Project and User Private Modules + +Private module files allow you to load your own installed software packages into your environment +and to handle different versions without getting into conflicts. Private modules can be setup for a +single user as well as all users of project group. The workflow and settings for user private module +files is described in the following. The [settings for project private +modules](#project-private-modules) differ only in details. + +In order to use your own module files please use the command +`module use <path_to_module_files>`. It will add the path to the list of module directories +that are searched by lmod (i.e. the `module` command). You may use a directory `privatemodules` +within your home or project directory to setup your own module files. + +Please see the [Environment Modules open source project's web page](http://modules.sourceforge.net/) +for further information on writing module files. + +## 1. Create Directories + +```console +marie@compute$ cd $HOME +marie@compute$ mkdir --verbose --parents privatemodules/testsoftware +marie@compute$ cd privatemodules/testsoftware +``` + +(create a directory in your home directory) + +## 2. Notify lmod + +```console +marie@compute$ module use $HOME/privatemodules +``` + +(add the directory in the list of module directories) + +## 3. Create Modulefile + +Create a file with the name `1.0` with a +test software in the `testsoftware` directory you created earlier +(using your favorite editor) and paste the following text into it: + +``` +#%Module###################################################################### +## +## testsoftware modulefile +## +proc ModulesHelp { } { + puts stderr "Loads testsoftware" +} + +set version 1.0 +set arch x86_64 +set path /home/<user>/opt/testsoftware/$version/$arch/ + +prepend-path PATH $path/bin +prepend-path LD_LIBRARY_PATH $path/lib + +if [ module-info mode load ] { + puts stderr "Load testsoftware version $version" +} +``` + +## 4. Check lmod + +Check the availability of the module with `ml av`, the output should look like this: + +``` +--------------------- /home/masterman/privatemodules --------------------- + testsoftware/1.0 +``` + +## 5. Load Module + +Load the test module with `module load testsoftware`, the output should look like this: + +```console +Load testsoftware version 1.0 +Module testsoftware/1.0 loaded. +``` + +## Project Private Modules + +Private module files allow you to load project- or group-wide installed software into your +environment and to handle different versions without getting into conflicts. + +The module files have to be stored in your global projects directory +`/projects/p_projectname/privatemodules`. An example of a module file can be found in the section +above. To use a project-wide module file you have to add the path to the module file to the module +environment with the command + +```console +marie@compute$ module use /projects/p_projectname/privatemodules +``` + +After that, the modules are available in your module environment and you can load the modules with +the `module load` command. + +## Using Private Modules and Programs in the $HOME Directory + +An automated backup system provides security for the HOME-directories on the cluster on a daily +basis. This is the reason why we urge users to store (large) temporary data (like checkpoint files) +on the /scratch filesystem or at local scratch disks. + +**Please note**: We have set `ulimit -c 0` as a default to prevent users from filling the disk with +the dump of crashed programs. `bash` users can use `ulimit -Sc unlimited` to enable the debugging +via analyzing the core file. diff --git a/doc.zih.tu-dresden.de/docs/software/runtime_environment.md b/doc.zih.tu-dresden.de/docs/software/runtime_environment.md deleted file mode 100644 index 1bca8daa7cfa08f3b58b19e5608c2e333b9055f9..0000000000000000000000000000000000000000 --- a/doc.zih.tu-dresden.de/docs/software/runtime_environment.md +++ /dev/null @@ -1,181 +0,0 @@ -# Runtime Environment - -Make sure you know how to work with a Linux system. Documentations and tutorials can be easily -found on the internet or in your library. - -## Modules - -To allow the user to switch between different versions of installed programs and libraries we use a -*module concept*. A module is a user interface that provides utilities for the dynamic modification -of a user's environment, i.e., users do not have to manually modify their environment variables ( -`PATH` , `LD_LIBRARY_PATH`, ...) to access the compilers, loader, libraries, and utilities. - -For all applications, tools, libraries etc. the correct environment can be easily set by e.g. -`module load Mathematica`. If several versions are installed they can be chosen like `module load -MATLAB/2019b`. A list of all modules shows `module avail`. Other important commands are: - -| Command | Description | -|:------------------------------|:-----------------------------------------------------------------| -| `module help` | show all module options | -| `module list` | list all user-installed modules | -| `module purge` | remove all user-installed modules | -| `module avail` | list all available modules | -| `module spider` | search for modules across all environments, can take a parameter | -| `module load <modname>` | load module `modname` | -| `module unload <modname>` | unloads module `modname` | -| `module switch <mod1> <mod2>` | unload module `mod1` ; load module `mod2` | - -Module files are ordered by their topic on our HPC systems. By default, with `module av` you will -see all available module files and topics. If you just wish to see the installed versions of a -certain module, you can use `module av softwarename` and it will display the available versions of -`softwarename` only. - -### Lmod: An Alternative Module Implementation - -Historically, the module command on our HPC systems has been provided by the rather dated -*Environment Modules* software which was first introduced in 1991. As of late 2016, we also offer -the new and improved [LMOD](https://www.tacc.utexas.edu/research-development/tacc-projects/lmod) as -an alternative. It has a handful of advantages over the old Modules implementation: - -- all modulefiles are cached, which especially speeds up tab - completion with bash -- sane version ordering (9.0 \< 10.0) -- advanced version requirement functions (atleast, between, latest) -- auto-swapping of modules (if a different version was already loaded) -- save/auto-restore of loaded module sets (module save) -- multiple language support -- properties, hooks, ... -- depends_on() function for automatic dependency resolution with - reference counting - -### Module Environments - -On Taurus, there exist different module environments, each containing a set of software modules. -They are activated via the meta module **modenv** which has different versions, one of which is -loaded by default. You can switch between them by simply loading the desired modenv-version, e.g.: - -```Bash -module load modenv/ml -``` - -| | | | -|--------------|------------------------------------------------------------------------|---------| -| modenv/scs5 | SCS5 software | default | -| modenv/ml | HPC-DA software (for use on the "ml" partition) | | -| modenv/hiera | Hierarchical module tree (for use on the "romeo" and "gpu3" partition) | | - -The old modules (pre-SCS5) are still available after loading **modenv**/**classic**, however, due to -changes in the libraries of the operating system, it is not guaranteed that they still work under -SCS5. Please don't use modenv/classic if you do not absolutely have to. Most software is available -under modenv/scs5, too, just be aware of the possibly different spelling (case-sensitivity). - -You can use `module spider \<modname>` to search for a specific -software in all modenv environments. It will also display information on -how to load a found module when giving a precise module (with version) -as the parameter. - -Also see the information under [SCS5 software](../software/scs5_software.md). - -### Per-Architecture Builds - -Since we have a heterogenous cluster, we do individual builds of some of the software for each -architecture present. This ensures that, no matter what partition the software runs on, a build -optimized for the host architecture is used automatically. This is achieved by having -`/sw/installed` symlinked to different directories on the compute nodes. - -However, not every module will be available for each node type or partition. Especially when -introducing new hardware to the cluster, we do not want to rebuild all of the older module versions -and in some cases cannot fall-back to a more generic build either. That's why we provide the script: -`ml_arch_avail` that displays the availability of modules for the different node architectures. - -E.g.: - -```Bash -$ ml_arch_avail CP2K -CP2K/6.1-foss-2019a: haswell, rome -CP2K/5.1-intel-2018a: sandy, haswell -CP2K/6.1-foss-2019a-spglib: haswell, rome -CP2K/6.1-intel-2018a: sandy, haswell -CP2K/6.1-intel-2018a-spglib: haswell -``` - -shows all modules that match on CP2K, and their respective availability. Note that this will not -work for meta-modules that do not have an installation directory (like some toolchain modules). - -### Private User Module Files - -Private module files allow you to load your own installed software into your environment and to -handle different versions without getting into conflicts. - -You only have to call `module use <path to your module files>`, which adds your directory to the -list of module directories that are searched by the `module` command. Within the privatemodules -directory you can add directories for each software you wish to install and add - also in this -directory - a module file for each version you have installed. Further information about modules can -be found at <https://lmod.readthedocs.io> . - -**todo** quite old - -This is an example of a private module file: - -```Bash -dolescha@venus:~/module use $HOME/privatemodules - -dolescha@venus:~/privatemodules> ls -null testsoftware - -dolescha@venus:~/privatemodules/testsoftware> ls -1.0 - -dolescha@venus:~> module av -------------------------------- /work/home0/dolescha/privatemodules --------------------------- -null testsoftware/1.0 - -dolescha@venus:~> module load testsoftware -Load testsoftware version 1.0 - -dolescha@venus:~/privatemodules/testsoftware> cat 1.0 -#%Module###################################################################### -## -## testsoftware modulefile -## -proc ModulesHelp { } { - puts stderr "Loads testsoftware" -} - -set version 1.0 -set arch x86_64 -set path /home/<user>/opt/testsoftware/$version/$arch/ - -prepend-path PATH $path/bin -prepend-path LD_LIBRARY_PATH $path/lib - -if [ module-info mode load ] { - puts stderr "Load testsoftware version $version" -} -``` - -### Private Project Module Files - -Private module files allow you to load your group-wide installed software into your environment and -to handle different versions without getting into conflicts. - -The module files have to be stored in your global projects directory, e.g. -`/projects/p_projectname/privatemodules`. An example for a module file can be found in the section -above. - -To use a project-wide module file you have to add the path to the module file to the module -environment with following command `module use /projects/p_projectname/privatemodules`. - -After that, the modules are available in your module environment and you -can load the modules with `module load` . - -## Misc - -An automated [backup](../data_lifecycle/file_systems.md#backup-and-snapshots-of-the-file-system) -system provides security for the HOME-directories on `Taurus` and `Venus` on a daily basis. This is -the reason why we urge our users to store (large) temporary data (like checkpoint files) on the -/scratch -Filesystem or at local scratch disks. - -`Please note`: We have set `ulimit -c 0` as a default to prevent you from filling the disk with the -dump of a crashed program. `bash` -users can use `ulimit -Sc unlimited` to enable the debugging via -analyzing the core file (limit coredumpsize unlimited for tcsh). diff --git a/doc.zih.tu-dresden.de/docs/software/scs5_software.md b/doc.zih.tu-dresden.de/docs/software/scs5_software.md index f1606236729c0354e5129b71c5c93c14325cb097..b5a1bef60d20cdc9989c8db82f766d31a96d3cdc 100644 --- a/doc.zih.tu-dresden.de/docs/software/scs5_software.md +++ b/doc.zih.tu-dresden.de/docs/software/scs5_software.md @@ -21,7 +21,7 @@ remove it and accept the new one after comparing its fingerprint with those list ## Using Software Modules Starting with SCS5, we only provide -[Lmod](../software/runtime_environment.md#lmod-an-alternative-module-implementation) as the +[Lmod](../software/modules.md#lmod-an-alternative-module-implementation) as the environment module tool of choice. As usual, you can get a list of the available software modules via: @@ -38,7 +38,7 @@ There is a special module that is always loaded (sticky) called | | | | |----------------|-------------------------------------------------|---------| | modenv/scs5 | SCS5 software | default | -| modenv/ml | HPC-DA software (for use on the "ml" partition) | | +| modenv/ml | software for data analytics (partition ml) | | | modenv/classic | Manually built pre-SCS5 (AE4.0) software | hidden | The old modules (pre-SCS5) are still available after loading the diff --git a/doc.zih.tu-dresden.de/mkdocs.yml b/doc.zih.tu-dresden.de/mkdocs.yml index 9dd6e9a3a37fb7c59d5756b2a7970b102f81c283..9c30346df2e9f98d5aef67229cd45902ea3bbf42 100644 --- a/doc.zih.tu-dresden.de/mkdocs.yml +++ b/doc.zih.tu-dresden.de/mkdocs.yml @@ -24,7 +24,7 @@ nav: - Overview: software/overview.md - Environment: - Modules: software/modules.md - - Runtime Environment: software/runtime_environment.md + - Private Modulefiles: software/private_modules.md - Custom EasyBuild Modules: software/custom_easy_build_environment.md - Python Virtual Environments: software/python_virtual_environments.md - Containers: diff --git a/doc.zih.tu-dresden.de/wordlist.aspell b/doc.zih.tu-dresden.de/wordlist.aspell index dadfc9dd82834cd0bbdfb7e485ca21262a69cd8a..52ab5e8401892789922402dcf4a9d186ca67ca9b 100644 --- a/doc.zih.tu-dresden.de/wordlist.aspell +++ b/doc.zih.tu-dresden.de/wordlist.aspell @@ -1,4 +1,4 @@ -personal_ws-1.1 en 203 +personal_ws-1.1 en 203 Abaqus ALLREDUCE Altix @@ -55,7 +55,6 @@ ddl DDP DDR DFG -dir distr DistributedDataParallel DMTCP @@ -135,6 +134,7 @@ init inode IOPS IPs +ISA Itanium jobqueue jpg @@ -150,8 +150,9 @@ LAPACK lapply Leichtbau LINPACK -linter Linter +linter +lmod LoadLeveler localhost lsf @@ -171,6 +172,8 @@ mkdocs MKL MNIST modenv +modenvs +modulefile Montecito mountpoint mpi @@ -186,7 +189,6 @@ multiphysics Multiphysics multithreaded Multithreading -MultiThreading NAMD natively nbsp @@ -215,14 +217,14 @@ OpenBLAS OpenCL OpenGL OpenMP -openmpi OpenMPI +openmpi OpenSSH Opteron OTF overfitting -pandarallel Pandarallel +pandarallel PAPI parallelization parallelize @@ -240,8 +242,8 @@ PMI png PowerAI ppc -pre Pre +pre Preload preloaded preloading @@ -271,8 +273,8 @@ RSS RStudio Rsync runnable -runtime Runtime +runtime sacct salloc Sandybridge @@ -320,8 +322,8 @@ TensorFlow TFLOPS Theano tmp -todo ToDo +todo toolchain toolchains torchvision @@ -358,6 +360,6 @@ XLC XLF Xming yaml -zih ZIH +zih ZIH's