From 4c68c65d76ebde9f27a4fcb3ed3ff0c1d78847f4 Mon Sep 17 00:00:00 2001 From: Martin Schroschk <martin.schroschk@tu-dresden.de> Date: Tue, 15 Jun 2021 14:09:51 +0200 Subject: [PATCH] Add auto transfered pages from old documentation ... serve as starting point for new content --- twiki2md/root/Access/ProjectRequestForm.md | 78 ++ twiki2md/root/AnnouncementOfQuotas.md | 62 ++ twiki2md/root/Applications/Bioinformatics.md | 59 ++ twiki2md/root/Applications/CFD.md | 90 ++ .../CustomEasyBuildEnvironment.md | 133 +++ twiki2md/root/Applications/DeepLearning.md | 372 +++++++ twiki2md/root/Applications/Electrodynamics.md | 17 + twiki2md/root/Applications/FEMSoftware.md | 226 ++++ twiki2md/root/Applications/Mathematics.md | 214 ++++ .../root/Applications/NanoscaleSimulations.md | 216 ++++ .../root/Applications/SoftwareModulesList.md | 962 ++++++++++++++++++ twiki2md/root/Applications/VirtualDesktops.md | 89 ++ twiki2md/root/Applications/Visualization.md | 219 ++++ twiki2md/root/BatchSystems/LoadLeveler.md | 415 ++++++++ twiki2md/root/BatchSystems/PlatformLSF.md | 309 ++++++ twiki2md/root/BatchSystems/WindowsBatch.md | 69 ++ twiki2md/root/Cloud/BeeGFS.md | 146 +++ .../DesktopCloudVisualization.md | 54 + .../Compendium.DataManagement/WorkSpaces.md | 257 +++++ .../root/Compendium.HPCDA/AlphaCentauri.md | 206 ++++ twiki2md/root/Compendium.HPCDA/PowerAI.md | 82 ++ twiki2md/root/Compendium.HPCDA/WarmArchive.md | 27 + .../root/Compendium.SystemTaurus/RomeNodes.md | 99 ++ .../root/Compendium.WebHome/BatchSystems.md | 64 ++ twiki2md/root/Compendium/CheckpointRestart.md | 164 +++ .../SingularityExampleDefinitions.md | 108 ++ .../root/Container/SingularityRecipeHints.md | 75 ++ twiki2md/root/Container/VMTools.md | 142 +++ twiki2md/root/DataManagement/DataMover.md | 85 ++ twiki2md/root/DataManagement/ExportNodes.md | 134 +++ twiki2md/root/DataManagement/FileSystems.md | 259 +++++ .../DataManagement/IntermediateArchive.md | 42 + twiki2md/root/DataMover/Phase2Migration.md | 53 + .../DebuggingTools/MPIUsageErrorDetection.md | 81 ++ twiki2md/root/HPCDA/Dask.md | 136 +++ twiki2md/root/HPCDA/DataAnalyticsWithR.md | 423 ++++++++ twiki2md/root/HPCDA/GetStartedWithHPCDA.md | 405 ++++++++ twiki2md/root/HPCDA/NvmeStorage.md | 16 + twiki2md/root/HPCDA/Power9.md | 11 + twiki2md/root/HPCDA/PyTorch.md | 332 ++++++ twiki2md/root/HPCDA/Python.md | 279 +++++ twiki2md/root/HPCDA/TensorFlow.md | 277 +++++ .../root/HPCDA/TensorFlowContainerOnHPCDA.md | 85 ++ .../root/HPCDA/TensorFlowOnJupyterNotebook.md | 286 ++++++ twiki2md/root/Hardware/HardwareAltix.md | 91 ++ twiki2md/root/Hardware/HardwareAtlas.md | 47 + twiki2md/root/Hardware/HardwareDeimos.md | 42 + twiki2md/root/Hardware/HardwarePhobos.md | 38 + twiki2md/root/Hardware/HardwareTitan.md | 53 + twiki2md/root/Hardware/HardwareVenus.md | 22 + twiki2md/root/HardwareTaurus/SDFlex.md | 42 + twiki2md/root/HardwareTriton.md | 48 + twiki2md/root/Introduction.md | 18 + .../root/JupyterHub/JupyterHubForTeaching.md | 161 +++ twiki2md/root/Login/SSHMitPutty.md | 79 ++ twiki2md/root/Login/SecurityRestrictions.md | 31 + twiki2md/root/PerformanceTools/IOTrack.md | 27 + twiki2md/root/PerformanceTools/PapiLibrary.md | 55 + twiki2md/root/PerformanceTools/PerfTools.md | 236 +++++ twiki2md/root/PerformanceTools/ScoreP.md | 136 +++ twiki2md/root/PerformanceTools/Vampir.md | 186 ++++ twiki2md/root/PerformanceTools/VampirTrace.md | 103 ++ .../Slurm/BindingAndDistributionOfTasks.md | 230 +++++ .../root/SoftwareDevelopment/Compilers.md | 121 +++ .../root/SoftwareDevelopment/Debuggers.md | 247 +++++ .../SoftwareDevelopment/DebuggingTools.md | 20 + .../root/SoftwareDevelopment/Libraries.md | 100 ++ .../root/SoftwareDevelopment/Miscellaneous.md | 50 + .../SoftwareDevelopment/PerformanceTools.md | 15 + twiki2md/root/SystemAtlas.md | 126 +++ .../root/SystemTaurus/EnergyMeasurement.md | 310 ++++++ twiki2md/root/SystemTaurus/HardwareTaurus.md | 110 ++ twiki2md/root/SystemTaurus/KnlNodes.md | 55 + .../SystemTaurus/RunningNxGpuAppsInOneJob.md | 85 ++ twiki2md/root/SystemTaurus/Slurm.md | 570 +++++++++++ .../root/SystemVenus/RamDiskDocumentation.md | 59 ++ twiki2md/root/TensorFlow/Keras.md | 245 +++++ twiki2md/root/TermsOfUse.md | 54 + .../root/Unicore_access/UNICORERestAPI.md | 20 + twiki2md/root/VenusOpen.md | 9 + .../WebCreateNewTopic/ProjectManagement.md | 118 +++ .../root/WebCreateNewTopic/SlurmExamples.md | 1 + twiki2md/root/WebCreateNewTopic/WebVNC.md | 91 ++ twiki2md/root/WebHome.md | 47 + twiki2md/root/WebHome/Access.md | 138 +++ twiki2md/root/WebHome/Accessibility.md | 54 + twiki2md/root/WebHome/Applications.md | 52 + ...orks:ApacheSparkApacheFlinkApacheHadoop.md | 194 ++++ twiki2md/root/WebHome/BuildingSoftware.md | 42 + twiki2md/root/WebHome/CXFSEndOfSupport.md | 46 + twiki2md/root/WebHome/Cloud.md | 113 ++ twiki2md/root/WebHome/Container.md | 280 +++++ twiki2md/root/WebHome/DataManagement.md | 16 + twiki2md/root/WebHome/FurtherDocumentation.md | 81 ++ twiki2md/root/WebHome/GPUProgramming.md | 52 + twiki2md/root/WebHome/HPCDA.md | 79 ++ .../root/WebHome/HPCStorageConcept2019.md | 67 ++ twiki2md/root/WebHome/Hardware.md | 17 + twiki2md/root/WebHome/Impressum.md | 14 + twiki2md/root/WebHome/JupyterHub.md | 374 +++++++ twiki2md/root/WebHome/Login.md | 81 ++ twiki2md/root/WebHome/MachineLearning.md | 59 ++ twiki2md/root/WebHome/MigrateToAtlas.md | 115 +++ twiki2md/root/WebHome/NoIBJobs.md | 48 + .../root/WebHome/PreservationResearchData.md | 112 ++ twiki2md/root/WebHome/RuntimeEnvironment.md | 200 ++++ twiki2md/root/WebHome/SCS5Software.md | 161 +++ twiki2md/root/WebHome/Slurmfeatures.md | 40 + twiki2md/root/WebHome/SoftwareDevelopment.md | 59 ++ twiki2md/root/WebHome/StepByStepTaurus.md | 10 + twiki2md/root/WebHome/SystemAltix.md | 74 ++ twiki2md/root/WebHome/SystemTaurus.md | 216 ++++ twiki2md/root/WebHome/SystemVenus.md | 86 ++ twiki2md/root/WebHome/TaurusII.md | 31 + twiki2md/root/WebHome/Test.md | 2 + .../root/WebHome/TypicalProjectSchedule.md | 546 ++++++++++ 116 files changed, 15415 insertions(+) create mode 100644 twiki2md/root/Access/ProjectRequestForm.md create mode 100644 twiki2md/root/AnnouncementOfQuotas.md create mode 100644 twiki2md/root/Applications/Bioinformatics.md create mode 100644 twiki2md/root/Applications/CFD.md create mode 100644 twiki2md/root/Applications/CustomEasyBuildEnvironment.md create mode 100644 twiki2md/root/Applications/DeepLearning.md create mode 100644 twiki2md/root/Applications/Electrodynamics.md create mode 100644 twiki2md/root/Applications/FEMSoftware.md create mode 100644 twiki2md/root/Applications/Mathematics.md create mode 100644 twiki2md/root/Applications/NanoscaleSimulations.md create mode 100644 twiki2md/root/Applications/SoftwareModulesList.md create mode 100644 twiki2md/root/Applications/VirtualDesktops.md create mode 100644 twiki2md/root/Applications/Visualization.md create mode 100644 twiki2md/root/BatchSystems/LoadLeveler.md create mode 100644 twiki2md/root/BatchSystems/PlatformLSF.md create mode 100644 twiki2md/root/BatchSystems/WindowsBatch.md create mode 100644 twiki2md/root/Cloud/BeeGFS.md create mode 100644 twiki2md/root/Compendium.Applications/DesktopCloudVisualization.md create mode 100644 twiki2md/root/Compendium.DataManagement/WorkSpaces.md create mode 100644 twiki2md/root/Compendium.HPCDA/AlphaCentauri.md create mode 100644 twiki2md/root/Compendium.HPCDA/PowerAI.md create mode 100644 twiki2md/root/Compendium.HPCDA/WarmArchive.md create mode 100644 twiki2md/root/Compendium.SystemTaurus/RomeNodes.md create mode 100644 twiki2md/root/Compendium.WebHome/BatchSystems.md create mode 100644 twiki2md/root/Compendium/CheckpointRestart.md create mode 100644 twiki2md/root/Container/SingularityExampleDefinitions.md create mode 100644 twiki2md/root/Container/SingularityRecipeHints.md create mode 100644 twiki2md/root/Container/VMTools.md create mode 100644 twiki2md/root/DataManagement/DataMover.md create mode 100644 twiki2md/root/DataManagement/ExportNodes.md create mode 100644 twiki2md/root/DataManagement/FileSystems.md create mode 100644 twiki2md/root/DataManagement/IntermediateArchive.md create mode 100644 twiki2md/root/DataMover/Phase2Migration.md create mode 100644 twiki2md/root/DebuggingTools/MPIUsageErrorDetection.md create mode 100644 twiki2md/root/HPCDA/Dask.md create mode 100644 twiki2md/root/HPCDA/DataAnalyticsWithR.md create mode 100644 twiki2md/root/HPCDA/GetStartedWithHPCDA.md create mode 100644 twiki2md/root/HPCDA/NvmeStorage.md create mode 100644 twiki2md/root/HPCDA/Power9.md create mode 100644 twiki2md/root/HPCDA/PyTorch.md create mode 100644 twiki2md/root/HPCDA/Python.md create mode 100644 twiki2md/root/HPCDA/TensorFlow.md create mode 100644 twiki2md/root/HPCDA/TensorFlowContainerOnHPCDA.md create mode 100644 twiki2md/root/HPCDA/TensorFlowOnJupyterNotebook.md create mode 100644 twiki2md/root/Hardware/HardwareAltix.md create mode 100644 twiki2md/root/Hardware/HardwareAtlas.md create mode 100644 twiki2md/root/Hardware/HardwareDeimos.md create mode 100644 twiki2md/root/Hardware/HardwarePhobos.md create mode 100644 twiki2md/root/Hardware/HardwareTitan.md create mode 100644 twiki2md/root/Hardware/HardwareVenus.md create mode 100644 twiki2md/root/HardwareTaurus/SDFlex.md create mode 100644 twiki2md/root/HardwareTriton.md create mode 100644 twiki2md/root/Introduction.md create mode 100644 twiki2md/root/JupyterHub/JupyterHubForTeaching.md create mode 100644 twiki2md/root/Login/SSHMitPutty.md create mode 100644 twiki2md/root/Login/SecurityRestrictions.md create mode 100644 twiki2md/root/PerformanceTools/IOTrack.md create mode 100644 twiki2md/root/PerformanceTools/PapiLibrary.md create mode 100644 twiki2md/root/PerformanceTools/PerfTools.md create mode 100644 twiki2md/root/PerformanceTools/ScoreP.md create mode 100644 twiki2md/root/PerformanceTools/Vampir.md create mode 100644 twiki2md/root/PerformanceTools/VampirTrace.md create mode 100644 twiki2md/root/Slurm/BindingAndDistributionOfTasks.md create mode 100644 twiki2md/root/SoftwareDevelopment/Compilers.md create mode 100644 twiki2md/root/SoftwareDevelopment/Debuggers.md create mode 100644 twiki2md/root/SoftwareDevelopment/DebuggingTools.md create mode 100644 twiki2md/root/SoftwareDevelopment/Libraries.md create mode 100644 twiki2md/root/SoftwareDevelopment/Miscellaneous.md create mode 100644 twiki2md/root/SoftwareDevelopment/PerformanceTools.md create mode 100644 twiki2md/root/SystemAtlas.md create mode 100644 twiki2md/root/SystemTaurus/EnergyMeasurement.md create mode 100644 twiki2md/root/SystemTaurus/HardwareTaurus.md create mode 100644 twiki2md/root/SystemTaurus/KnlNodes.md create mode 100644 twiki2md/root/SystemTaurus/RunningNxGpuAppsInOneJob.md create mode 100644 twiki2md/root/SystemTaurus/Slurm.md create mode 100644 twiki2md/root/SystemVenus/RamDiskDocumentation.md create mode 100644 twiki2md/root/TensorFlow/Keras.md create mode 100644 twiki2md/root/TermsOfUse.md create mode 100644 twiki2md/root/Unicore_access/UNICORERestAPI.md create mode 100644 twiki2md/root/VenusOpen.md create mode 100644 twiki2md/root/WebCreateNewTopic/ProjectManagement.md create mode 100644 twiki2md/root/WebCreateNewTopic/SlurmExamples.md create mode 100644 twiki2md/root/WebCreateNewTopic/WebVNC.md create mode 100644 twiki2md/root/WebHome.md create mode 100644 twiki2md/root/WebHome/Access.md create mode 100644 twiki2md/root/WebHome/Accessibility.md create mode 100644 twiki2md/root/WebHome/Applications.md create mode 100644 twiki2md/root/WebHome/BigDataFrameworks:ApacheSparkApacheFlinkApacheHadoop.md create mode 100644 twiki2md/root/WebHome/BuildingSoftware.md create mode 100644 twiki2md/root/WebHome/CXFSEndOfSupport.md create mode 100644 twiki2md/root/WebHome/Cloud.md create mode 100644 twiki2md/root/WebHome/Container.md create mode 100644 twiki2md/root/WebHome/DataManagement.md create mode 100644 twiki2md/root/WebHome/FurtherDocumentation.md create mode 100644 twiki2md/root/WebHome/GPUProgramming.md create mode 100644 twiki2md/root/WebHome/HPCDA.md create mode 100644 twiki2md/root/WebHome/HPCStorageConcept2019.md create mode 100644 twiki2md/root/WebHome/Hardware.md create mode 100644 twiki2md/root/WebHome/Impressum.md create mode 100644 twiki2md/root/WebHome/JupyterHub.md create mode 100644 twiki2md/root/WebHome/Login.md create mode 100644 twiki2md/root/WebHome/MachineLearning.md create mode 100644 twiki2md/root/WebHome/MigrateToAtlas.md create mode 100644 twiki2md/root/WebHome/NoIBJobs.md create mode 100644 twiki2md/root/WebHome/PreservationResearchData.md create mode 100644 twiki2md/root/WebHome/RuntimeEnvironment.md create mode 100644 twiki2md/root/WebHome/SCS5Software.md create mode 100644 twiki2md/root/WebHome/Slurmfeatures.md create mode 100644 twiki2md/root/WebHome/SoftwareDevelopment.md create mode 100644 twiki2md/root/WebHome/StepByStepTaurus.md create mode 100644 twiki2md/root/WebHome/SystemAltix.md create mode 100644 twiki2md/root/WebHome/SystemTaurus.md create mode 100644 twiki2md/root/WebHome/SystemVenus.md create mode 100644 twiki2md/root/WebHome/TaurusII.md create mode 100644 twiki2md/root/WebHome/Test.md create mode 100644 twiki2md/root/WebHome/TypicalProjectSchedule.md diff --git a/twiki2md/root/Access/ProjectRequestForm.md b/twiki2md/root/Access/ProjectRequestForm.md new file mode 100644 index 000000000..d7cdc7cac --- /dev/null +++ b/twiki2md/root/Access/ProjectRequestForm.md @@ -0,0 +1,78 @@ +# Project Request Form + +## first step (requester) + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="picture 2: personal information" width="170" zoom="on +">%ATTACHURL%/request_step1_b.png</span> <span class="twiki-macro IMAGE" +type="frame" align="right" caption="picture 1: login screen" width="170" +zoom="on +">%ATTACHURL%/request_step1_b.png</span> + +The first step is asking for the personal informations of the requester. +**That's you**, not the leader of this project! \<br />If you have an +ZIH-Login, you can use it \<sup>\[Pic 1\]\</sup>. If not, you have to +fill in the whole informations \<sup>\[Pic.:2\]\</sup>. <span +class="twiki-macro IMAGE">clear</span> + +## second step (project details) + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="picture 3: project details" width="170" zoom="on +">%ATTACHURL%/request_step2_details.png</span> This Step is asking for +general project Details.\<br />Any project have: + +- a title, at least 20 characters long +- a valid duration + - Projects starts at the first of a month and ends on the last day + of a month. So you are not able to send on the second of a month + a project request which start in this month. + - The approval is for a maximum of one year. Be carfull: a + duratoin from "May, 2013" till "May 2014" has 13 month. +- a selected science, according to the DFG: + <http://www.dfg.de/dfg_profil/gremien/fachkollegien/faecher/index.jsp> +- a sponsorship +- a kind of request +- a project leader/manager + - The leader of this project should hold a professorship + (university) or is the head of the research group. + - If you are this Person, leave this fields free. + +<span class="twiki-macro IMAGE">clear</span> + +## third step (hardware) + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="picture 4: hardware" width="170" zoom="on +">%ATTACHURL%/request_step3_machines.png</span> This step inquire the +required hardware. You can find the specifications [here](Hardware). +\<br />For your guidance: + +- gpu => taurus +- many main memory => venus +- other machines => you know it and don't need this guidance + +<span class="twiki-macro IMAGE">clear</span> + +## fourth step (software) + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="picture 5: software" width="170" zoom="on +">%ATTACHURL%/request_step4_software.png</span> Any information you will +give us in this step, helps us to make a rough estimate, if you are able +to realize your project. For Example: you need matlab. Matlab is only +available on Taurus. <span class="twiki-macro IMAGE">clear</span> + +## fifth step (project description) + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="picture 6: project description" width="170" zoom="on +">%ATTACHURL%/request_step5_description.png</span> <span +class="twiki-macro IMAGE">clear</span> + +## sixth step (summary) + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="picture 8: summary" width="170" zoom="on +">%ATTACHURL%/request_step6.png</span> <span +class="twiki-macro IMAGE">clear</span> diff --git a/twiki2md/root/AnnouncementOfQuotas.md b/twiki2md/root/AnnouncementOfQuotas.md new file mode 100644 index 000000000..60131f4ec --- /dev/null +++ b/twiki2md/root/AnnouncementOfQuotas.md @@ -0,0 +1,62 @@ +# Quotas for the home file system + +The quotas of the home file system are meant to help the users to keep +in touch with their data. Especially in HPC, millions of temporary files +can be created within hours. We have identified this as a main reason +for performance degradation of the HOME file system. To stay in +operation with out HPC systems we regrettably have to fall back to this +unpopular technique. + +Based on a balance between the allotted disk space and the usage over +the time, reasonable quotas (mostly above current used space) for the +projects have been defined. The will be activated by the end of April +2012. + +If a project exceeds its quota (total size OR total number of files) it +cannot submit jobs into the batch system. Running jobs are not affected. +The following commands can be used for monitoring: + +- `quota -s -g` shows the file system usage of all groups the user is + a member of. +- `showquota` displays a more convenient output. Use `showquota -h` to + read about its usage. It is not yet available on all machines but we + are working on it. + +**Please mark:** We have no quotas for the single accounts, but for the +project as a whole. There is no feasible way to get the contribution of +a single user to a project's disk usage. + +### Alternatives + +In case a project is above its limits, please + +- remove core dumps, temporary data, +- talk with your colleagues to identify the hotspots, +- check your workflow and use /fastfs for temporary files, +- *systematically* handle your important data: + - for later use (weeks...months) at the HPC systems, build tar + archives with meaningful names or IDs and store them in the [DMF + system](#AnchorDataMigration). Avoid using this system + (`/hpc_fastfs`) for files \< 1 MB! + - refer to the hints for [long term preservation for research + data](PreservationResearchData). + +### No Alternatives + +The current situation is this: + +- `/home` provides about 50 TB of disk space for all systems. Rapidly + changing files (temporary data) decrease the size of usable disk + space since we keep all files in multiple snapshots for 26 weeks. If + the *number* of files comes into the range of a million the backup + has problems handling them. +- The work file system for the clusters is `/fastfs`. Here, we have 60 + TB disk space (without backup). This is the file system of choice + for temporary data. +- About 180 projects have to share our resources, so it makes no sense + at all to simply move the data from `/home` to `/fastfs` or to + `/hpc_fastfs`. + +In case of problems don't hesitate to ask for support. + +Ulf Markwardt, Claudia Schmidt diff --git a/twiki2md/root/Applications/Bioinformatics.md b/twiki2md/root/Applications/Bioinformatics.md new file mode 100644 index 000000000..06d212944 --- /dev/null +++ b/twiki2md/root/Applications/Bioinformatics.md @@ -0,0 +1,59 @@ +# Bioinformatics + +| | | +|-----------------------------------|-------------------------------------------| +| | **module** | +| **[Infernal](#Infernal)** | infernal | +| **[OpenProspect](#OpenProspect)** | openprospect, openprospect/885-mpi | +| **[Phylip](#Phylip)** | phylip | +| **[PhyML](#PhyML)** | phyml/2.4.4, phyml/2.4.5-mpi, phyml/3.0.0 | + +## Infernal + +Infernal ("INFERence of RNA ALignment") is for searching DNA sequence +databases for RNA structure and sequence similarities. It is an +implementation of a special case of profile stochastic context-free +grammars called covariance models (CMs). A CM is like a sequence +profile, but it scores a combination of sequence consensus and RNA +secondary structure consensus, so in many cases, it is more capable of +identifying RNA homologs that conserve their secondary structure more +than their primary sequence. Documentations can be found at [Infernal +homepage](http://infernal.janelia.org) + +A parallel version is available. It can be used at Deimos like: + + bsub -n 4 -e %J_err.txt -a openmpi mpirun.lsf cmsearch --mpi --fil-no-hmm --fil-no-qdb 12smito.cm NC_003179.fas + +## OpenProspect + +The idea of threading is to use an existing protein structure to model +the structure of a new amino acid sequence. OpenProspect is an Open +Source Protein structure threading program. You can even generate your +own Protein Templates to thread sequences against. Once a sequence has +been aligned to a library of template, there are tools to analysis the +features of the alignments to help you pick out the best one. + +Documentations can be found at [OpenProspect +homepage](http://openprospect.sourceforge.net/index.html). + +#Phylip + +## Phylip + +This is a FREE package of programs for inferring phylogenies and +carrying out certain related tasks. At present it contains 31 programs, +which carry out different algorithms on different kinds of data. +Documentations can be found at [Phylip +homepage](http://cmgm.stanford.edu/phylip). + +#PhyML + +## PhyML + +A simple, fast, and accurate algorithm to estimate large phylogenies by +maximum likelihood. + +Documentations can be found at [PhyML +homepage](http://atgc.lirmm.fr/phyml). + +-- Main.UlfMarkwardt - 2009-09-24 diff --git a/twiki2md/root/Applications/CFD.md b/twiki2md/root/Applications/CFD.md new file mode 100644 index 000000000..1d822f378 --- /dev/null +++ b/twiki2md/root/Applications/CFD.md @@ -0,0 +1,90 @@ +# Computational Fluid Dynamics (CFD) + +| | | | | +|---------------|------------|-----------|------------| +| | **Taurus** | **Venus** | **Module** | +| **OpenFOAM** | x | | openfoam | +| **CFX** | x | x | ansys | +| **Fluent** | x | x | ansys | +| **ICEM CFD** | x | x | ansys | +| **STAR-CCM+** | x | | star | + +## OpenFOAM + +The OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox can +simulate anything from complex fluid flows involving chemical reactions, +turbulence and heat transfer, to solid dynamics, electromagnetics and +the pricing of financial options. OpenFOAM is produced by +[Ltd](http://www.opencfd.co.uk/openfoam/%5BOpenCFD) and is freely +available and open source, licensed under the GNU General Public +Licence. + +The command "module spider OpenFOAM" provides the list of installed +OpenFOAM versions. In order to use OpenFOAM, it is mandatory to set the +environment by sourcing the \`bashrc\` (for users running bash or ksh) +or \`cshrc\` `(for users running tcsh` or `csh)` provided by OpenFOAM: + + module load OpenFOAM/VERSION + source $FOAM_BASH + # source $FOAM_CSH + +Example for OpenFOAM job script: + + #!/bin/bash <br />#SBATCH --time=12:00:00 # walltime <br />#SBATCH --ntasks=60 # number of processor cores (i.e. tasks) <br />#SBATCH --mem-per-cpu=500M # memory per CPU core <br />#SBATCH -J "Test" # job name <br />#SBATCH --mail-user=mustermann@tu-dresden.de # email address (only tu-dresden)<br />#SBATCH --mail-type=ALL <br />OUTFILE="Output"<br /><br />module load OpenFOAM<br />source $FOAM_BASH<br /><br />cd /scratch/<YOURUSERNAME> # work directory in /scratch...!<br />srun pimpleFoam -parallel > "$OUTFILE" + +## Ansys CFX + +Ansys CFX is a powerful finite-volume-based program package for modeling +general fluid flow in complex geometries. The main components of the CFX +package are the flow solver cfx5solve, the geometry and mesh generator +cfx5pre, and the post-processor cfx5post. + +Example for CFX job script: + + #!/bin/bash<br />#SBATCH --time=12:00 # walltime<br />#SBATCH --ntasks=4 # number of processor cores (i.e. tasks)<br />#SBATCH --mem-per-cpu=1900M # memory per CPU core<br />#SBATCH --mail-user=.......@tu-dresden.de # email address (only tu-dresden)<br />#SBATCH --mail-type=ALL<br />module load ANSYS<br />cd /scratch/<YOURUSERNAME> # work directory in /scratch...!<br />cfx-parallel.sh -double -def StaticMixer.def<br /> + +## Ansys Fluent + +Fluent need the hostnames and can be run in parallel like this: + + #!/bin/bash + #SBATCH --time=12:00 # walltime + #SBATCH --ntasks=4 # number of processor cores (i.e. tasks) + #SBATCH --mem-per-cpu=1900M # memory per CPU core + #SBATCH --mail-user=.......@tu-dresden.de # email address (only tu-dresden) + #SBATCH --mail-type=ALL + module load ANSYS + + nodeset -e $SLURM_JOB_NODELIST | xargs -n1 > hostsfile_job_$SLURM_JOBID.txt + + fluent 2ddp -t$SLURM_NTASKS -g -mpi=intel -pinfiniband -cnf=hostsfile_job_$SLURM_JOBID.txt < input.in + +To use fluent interactive please try + +module load ANSYS/19.2\<br />srun -N 1 --cpus-per-task=4 --time=1:00:00 +--pty --x11=first bash\<br /> fluent&\<br />\<br /> + +## STAR-CCM+ + +Note: you have to use your own license in order to run STAR-CCM+ on +Taurus, so you have to specify the parameters -licpath and -podkey, see +the example below. + +Our installation provides a script `create_rankfile -f CCM` that +generates a hostlist from the SLURM job environment that can be passed +to starccm+ enabling it to run across multiple nodes. + + #!/bin/bash + #SBATCH --time=12:00 # walltime + #SBATCH --ntasks=32 # number of processor cores (i.e. tasks) + #SBATCH --mem-per-cpu=2500M # memory per CPU core + #SBATCH --mail-user=.......@tu-dresden.de # email address (only tu-dresden) + #SBATCH --mail-type=ALL + + module load STAR-CCM+ + + LICPATH="port@host" + PODKEY="your podkey" + INPUT_FILE="your_simulation.sim" + + starccm+ -collab -rsh ssh -cpubind off -np $SLURM_NTASKS -on $(/sw/taurus/tools/slurmtools/default/bin/create_rankfile -f CCM) -batch -power -licpath $LICPATH -podkey $PODKEY $INPUT_FILE diff --git a/twiki2md/root/Applications/CustomEasyBuildEnvironment.md b/twiki2md/root/Applications/CustomEasyBuildEnvironment.md new file mode 100644 index 000000000..d482d89a4 --- /dev/null +++ b/twiki2md/root/Applications/CustomEasyBuildEnvironment.md @@ -0,0 +1,133 @@ +# EasyBuild + +Sometimes the \<a href="SoftwareModulesList" target="\_blank" +title="List of Modules">modules installed in the cluster\</a> are not +enough for your purposes and you need some other software or a different +version of a software. + +\<br />For most commonly used software, chances are high that there is +already a *recipe* that EasyBuild provides, which you can use. But what +is Easybuild? + +\<a href="<https://easybuilders.github.io/easybuild/>" +target="\_blank">EasyBuild\</a>\<span style="font-size: 1em;"> is the +software used to build and install software on, and create modules for, +Taurus.\</span> + +\<span style="font-size: 1em;">The aim of this page is to introduce +users to working with EasyBuild and to utilizing it to create +modules**.**\</span> + +**Prerequisites:** \<a href="Login" target="\_blank">access\</a> to the +Taurus system and basic knowledge about Linux, \<a href="SystemTaurus" +target="\_blank" title="SystemTaurus">Taurus\</a> and the \<a +href="RuntimeEnvironment" target="\_blank" +title="RuntimeEnvironment">modules system \</a>on Taurus. + +\<span style="font-size: 1em;">EasyBuild uses a configuration file +called recipe or "EasyConfig", which contains all the information about +how to obtain and build the software:\</span> + +- Name +- Version +- Toolchain (think: Compiler + some more) +- Download URL +- Buildsystem (e.g. configure && make or cmake && make) +- Config parameters +- Tests to ensure a successful build + +The "Buildsystem" part is implemented in so-called "EasyBlocks" and +contains the common workflow. Sometimes those are specialized to +encapsulate behaviour specific to multiple/all versions of the software. +\<span style="font-size: 1em;">Everything is written in Python, which +gives authors a great deal of flexibility.\</span> + +## Set up a custom module environment and build your own modules + +Installation of the new software (or version) does not require any +specific credentials. + +\<br />Prerequisites: 1 An existing EasyConfig 1 a place to put your +modules. \<span style="font-size: 1em;">Step by step guide:\</span> + +1\. Create a \<a href="WorkSpaces" target="\_blank">workspace\</a> where +you'll install your modules. You need a place where your modules will be +placed. This needs to be done only once : + + ws_allocate -F scratch EasyBuild 50 # + +2\. Allocate nodes. You can do this with interactive jobs (see the +example below) and/or put commands in a batch file and source it. The +latter is recommended for non-interactive jobs, using the command sbatch +in place of srun. For the sake of illustration, we use an interactive +job as an example. The node parameters depend, to some extent, on the +architecture you want to use. ML nodes for the Power9 and others for the +x86. We will use Haswell nodes. + + srun -p haswell -N 1 -c 4 --time=08:00:00 --pty /bin/bash + +\*Using EasyBuild on the login nodes is not allowed\* + +3\. Load EasyBuild module. + + module load EasyBuild + +\<br />4. Specify Workspace. The rest of the guide is based on it. +Please create an environment variable called \`WORKSPACE\` with the +location of your Workspace: + + WORKSPACE=<location_of_your_workspace> # For example: WORKSPACE=/scratch/ws/anpo879a-EasyBuild + +5\. Load the correct modenv according to your current or target +architecture: \`ml modenv/scs5\` for x86 (default) or \`modenv/ml\` for +Power9 (ml partition). Load EasyBuild module + + ml modenv/scs5 + module load EasyBuild + +6\. Set up your environment: + + export EASYBUILD_ALLOW_LOADED_MODULES=EasyBuild,modenv/scs5 + export EASYBUILD_DETECT_LOADED_MODULES=unload + export EASYBUILD_BUILDPATH="/tmp/${USER}-EasyBuild${SLURM_JOB_ID:-}" + export EASYBUILD_SOURCEPATH="${WORKSPACE}/sources" + export EASYBUILD_INSTALLPATH="${WORKSPACE}/easybuild-$(basename $(readlink -f /sw/installed))" + export EASYBUILD_INSTALLPATH_MODULES="${EASYBUILD_INSTALLPATH}/modules" + module use "${EASYBUILD_INSTALLPATH_MODULES}/all" + export LMOD_IGNORE_CACHE=1 + +7\. \<span style="font-size: 13px;">Now search for an existing +EasyConfig: \</span> + + eb --search TensorFlow + +\<span style="font-size: 13px;">8. Build the EasyConfig and its +dependencies\</span> + + eb TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4.eb -r + +\<span style="font-size: 13px;">After this is done (may take A LONG +time), you can load it just like any other module.\</span> + +9\. To use your custom build modules you only need to rerun step 4, 5, 6 +and execute the usual: + + module load <name_of_your_module> # For example module load TensorFlow-1.8.0-fosscuda-2018a-Python-3.6.4 + +The key is the \`module use\` command which brings your modules into +scope so \`module load\` can find them and the LMOD_IGNORE_CACHE line +which makes LMod pick up the custom modules instead of searching the +system cache. + +## Troubleshooting + +When building your EasyConfig fails, you can first check the log +mentioned and scroll to the bottom to see what went wrong. + +It might also be helpful to inspect the build environment EB uses. For +that you can run \`eb myEC.eb --dump-env-script\` which creates a +sourceable .env file with \`module load\` and \`export\` commands that +show what EB does before running, e.g., the configure step. + +It might also be helpful to use '\<span style="font-size: 1em;">export +LMOD_IGNORE_CACHE=0'\</span> diff --git a/twiki2md/root/Applications/DeepLearning.md b/twiki2md/root/Applications/DeepLearning.md new file mode 100644 index 000000000..3d2874d07 --- /dev/null +++ b/twiki2md/root/Applications/DeepLearning.md @@ -0,0 +1,372 @@ +# Deep learning + + + +**Prerequisites**: To work with Deep Learning tools you obviously need +\<a href="Login" target="\_blank">access\</a> for the Taurus system and +basic knowledge about Python, SLURM manager. + +**Aim** \<span style="font-size: 1em;">of this page is to introduce +users on how to start working with Deep learning software on both the +\</span>\<span style="font-size: 1em;">ml environment and the scs5 +environment of the Taurus system.\</span> + +## Deep Learning Software + +Please refer to our [List of Modules](SoftwareModulesList) page for a +daily-updated list of the respective software versions that are +currently installed. + +## TensorFlow + +\<a href="<https://www.tensorflow.org/guide/>" +target="\_blank">TensorFlow\</a> is a free end-to-end open-source +software library for dataflow and differentiable programming across a +range of tasks. + +TensorFlow is available in both main partitions [ml environment and scs5 +environment](RuntimeEnvironment#Module_Environments) under the module +name "TensorFlow". However, for purposes of machine learning and deep +learning, we recommend using Ml partition (\<a href="HPCDA" +target="\_blank">HPC-DA\</a>). For example: + + module load TensorFlow + +There are numerous different possibilities on how to work with \<a +href="TensorFlow" target="\_blank">Tensorflow\</a> on Taurus. On this +page, for all examples default, scs5 partition is used. Generally, the +easiest way is using the \<a +href="RuntimeEnvironment#Module_Environments" target="\_blank">Modules +system\</a> and Python virtual environment (test case). However, in some +cases, you may need directly installed Tensorflow stable or night +releases. For this purpose use the +[EasyBuild](CustomEasyBuildEnvironment), +[Containers](TensorFlowContainerOnHPCDA) and see [the +example](https://www.tensorflow.org/install/pip). For examples of using +TensorFlow for ml partition with module system see \<a href="TensorFlow" +target="\_self">TensorFlow page for HPC-DA.\</a> + +Note: If you are going used manually installed Tensorflow release we +recommend use only stable versions. + +## Keras + +\<a href="<https://keras.io/>" target="\_blank">Keras\</a>\<span +style="font-size: 1em;"> is a high-level neural network API, written in +Python and capable of running on top of \</span>\<a +href="<https://github.com/tensorflow/tensorflow>" +target="\_top">TensorFlow\</a>\<span style="font-size: 1em;">. Keras is +available in both environments \</span> [ml environment and scs5 +environment](RuntimeEnvironment#Module_Environments)\<span +style="font-size: 1em;"> under the module name "Keras".\</span> + +On this page for all examples default scs5 partition used. There are +numerous different possibilities on how to work with \<a +href="TensorFlow" target="\_blank">Tensorflow\</a> and Keras on Taurus. +Generally, the easiest way is using the \<a +href="RuntimeEnvironment#Module_Environments" target="\_blank">Modules +system\</a> and Python virtual environment (test case) to see Tensorflow +part above. \<span style="font-size: 1em;">For examples of using Keras +for ml partition with the module system see the \</span> [Keras page for +HPC-DA](Keras). + +It can either use TensorFlow as its backend. As mentioned in Keras +documentation Keras capable of running on Theano backend. However, due +to the fact that Theano has been abandoned by the developers, we don't +recommend use Theano anymore. If you wish to use Theano backend you need +to install it manually. To use the TensorFlow backend, please don't +forget to load the corresponding TensorFlow module. TensorFlow should be +loaded automatically as a dependency. + +\<span style="color: #222222; font-size: 1.385em;">Test case: Keras with +TensorFlow on MNIST data\</span> + +Go to a directory on Taurus, get Keras for the examples and go to the +examples: + + git clone <a href='https://github.com/fchollet/keras.git'>https://github.com/fchollet/keras.git</a><br />cd keras/examples/ + +If you do not specify Keras backend, then TensorFlow is used as a +default + +Job-file (schedule job with sbatch, check the status with 'squeue -u +\<Username>'): + + #!/bin/bash<br />#SBATCH --gres=gpu:1 # 1 - using one gpu, 2 - for using 2 gpus<br />#SBATCH --mem=8000<br />#SBATCH -p gpu2 # select the type of nodes (opitions: haswell, <code>smp</code>, <code>sandy</code>, <code>west</code>, <code>gpu, ml) </code><b>K80</b> GPUs on Haswell node<br />#SBATCH --time=00:30:00<br />#SBATCH -o HLR_<name_of_your_script>.out # save output under HLR_${SLURMJOBID}.out<br />#SBATCH -e HLR_<name_of_your_script>.err # save error messages under HLR_${SLURMJOBID}.err<br /> + module purge # purge if you already have modules loaded<br />module load modenv/scs5 # load scs5 environment<br />module load Keras # load Keras module<br />module load TensorFlow # load TensorFlow module<br /> + + # if you see 'broken pipe error's (might happen in interactive session after the second srun command) uncomment line below<br /># module load h5py<br /><br />python mnist_cnn.py + +Keep in mind that you need to put the bash script to the same folder as +an executable file or specify the path. + +Example output: + + x_train shape: (60000, 28, 28, 1) + 60000 train samples + 10000 test samples + Train on 60000 samples, validate on 10000 samples + Epoch 1/12 + + 128/60000 [..............................] - ETA: 12:08 - loss: 2.3064 - acc: 0.0781 + 256/60000 [..............................] - ETA: 7:04 - loss: 2.2613 - acc: 0.1523 + 384/60000 [..............................] - ETA: 5:22 - loss: 2.2195 - acc: 0.2005 + + ... + + 60000/60000 [==============================] - 128s 2ms/step - loss: 0.0296 - acc: 0.9905 - val_loss: 0.0268 - val_acc: 0.9911 + Test loss: 0.02677746053306255 + Test accuracy: 0.9911 + +## Datasets + +There are many different datasets designed for research purposes. If you +would like to download some of them, first of all, keep in mind that +many machine learning libraries have direct access to public datasets +without downloading it (for example [TensorFlow +Datasets](https://www.tensorflow.org/datasets)\<span style="font-size: +1em; color: #444444;">). \</span> + +\<span style="font-size: 1em; color: #444444;">If you still need to +download some datasets, first of all, be careful with the size of the +datasets which you would like to download (some of them have a size of +few Terabytes). Don't download what you really not need to use! Use +login nodes only for downloading small files (hundreds of the +megabytes). For downloading huge files use \</span>\<a href="DataMover" +target="\_blank">Datamover\</a>\<span style="font-size: 1em; color: +#444444;">. For example, you can use command **\<span +class="WYSIWYG_TT">dtwget \</span>**(it is an analogue of the general +wget command). This command submits a job to the data transfer machines. +If you need to download or allocate massive files (more than one +terabyte) please contact the support before.\</span> + +### The ImageNet dataset + +\<span style="font-size: 1em;">The \</span> [ **ImageNet** +](http://www.image-net.org/)\<span style="font-size: 1em;">project is a +large visual database designed for use in visual object recognition +software research. In order to save space in the file system by avoiding +to have multiple duplicates of this lying around, we have put a copy of +the ImageNet database (ILSVRC2012 and ILSVR2017) under\</span>** +/scratch/imagenet**\<span style="font-size: 1em;"> which you can use +without having to download it again. For the future, the Imagenet +dataset will be available in **/warm_archive.**ILSVR2017 also includes a +dataset for recognition objects from a video. Please respect the +corresponding \</span> [Terms of +Use](http://image-net.org/download-faq)\<span style="font-size: +1em;">.\</span> + +## Jupyter notebook + +Jupyter notebooks are a great way for interactive computing in your web +browser. Jupyter allows working with data cleaning and transformation, +numerical simulation, statistical modelling, data visualization and of +course with machine learning. + +There are two general options on how to work Jupyter notebooks using +HPC: remote jupyter server and jupyterhub. + +These sections show how to run and set up a remote jupyter server within +a sbatch GPU job and which modules and packages you need for that. + +\<span style="font-size: 1em; color: #444444;">%RED%Note:<span +class="twiki-macro ENDCOLOR"></span> On Taurus, there is a \</span>\<a +href="JupyterHub" target="\_self">jupyterhub\</a>\<span +style="font-size: 1em; color: #444444;">, where you do not need the +manual server setup described below and can simply run your Jupyter +notebook on HPC nodes. Keep in mind that with Jupyterhub you can't work +with some special instruments. However general data analytics tools are +available.\</span> + +The remote Jupyter server is able to offer more freedom with settings +and approaches. + +Note: Jupyterhub is could be under construction + +### Preparation phase (optional) + +\<span style="font-size: 1em;">On Taurus, start an interactive session +for setting up the environment:\</span> + + srun --pty -n 1 --cpus-per-task=2 --time=2:00:00 --mem-per-cpu=2500 --x11=first bash -l -i + +Create a new subdirectory in your home, e.g. Jupyter + + mkdir Jupyter + cd Jupyter + +There are two ways how to run Anaconda. The easiest way is to load the +Anaconda module. The second one is to download Anaconda in your home +directory. + +1\. Load Anaconda module (recommended): + + module load modenv/scs5 + module load Anaconda3 + +2\. Download latest Anaconda release (see example below) and change the +rights to make it an executable script and run the installation script: + + wget https://repo.continuum.io/archive/Anaconda3-2019.03-Linux-x86_64.sh + chmod 744 Anaconda3-2019.03-Linux-x86_64.sh + ./Anaconda3-2019.03-Linux-x86_64.sh + + (during installation you have to confirm the licence agreement) + +\<span style="font-size: 1em;">Next step will install the anaconda +environment into the home directory (/home/userxx/anaconda3). Create a +new anaconda environment with the name "jnb".\</span> + + conda create --name jnb + +### Set environmental variables on Taurus + +\<span style="font-size: 1em;">In shell activate previously created +python environment (you can deactivate it also manually) and Install +jupyter packages for this python environment:\</span> + + source activate jnb + conda install jupyter + +\<span style="font-size: 1em;">If you need to adjust the config, you +should create the template. Generate config files for jupyter notebook +server:\</span> + + jupyter notebook --generate-config + +Find a path of the configuration file, usually in the home under +.jupyter directory, e.g.\<br +/>/home//.jupyter/jupyter_notebook_config.py + +\<br />Set a password (choose easy one for testing), which is needed +later on to log into the server in browser session: + + jupyter notebook password + Enter password: + Verify password: + +you will get a message like that: + + [NotebookPasswordApp] Wrote *hashed password* to /home/<zih_user>/.jupyter/jupyter_notebook_config.json + +I order to create an SSL certificate for https connections, you can +create a self-signed certificate: + + openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem + +fill in the form with decent values + +Possible entries for your jupyter config +(\_.jupyter/jupyter_notebook*config.py*). Uncomment below lines: + + c.NotebookApp.certfile = u'<path-to-cert>/mycert.pem' + c.NotebookApp.keyfile = u'<path-to-cert>/mykey.key' + + # set ip to '*' otherwise server is bound to localhost only + c.NotebookApp.ip = '*' + c.NotebookApp.open_browser = False + + # copy hashed password from the jupyter_notebook_config.json + c.NotebookApp.password = u'<your hashed password here>' + c.NotebookApp.port = 9999 + c.NotebookApp.allow_remote_access = True + +Note: \<path-to-cert> - path to key and certificate files, for example: +('/home/\<username>/mycert.pem') + +### SLURM job file to run the jupyter server on Taurus with GPU (1x K80) (also works on K20) + + #!/bin/bash -l + #SBATCH --gres=gpu:1 # request GPU + #SBATCH --partition=gpu2 # use GPU partition + #SBATCH --output=notebok_output.txt + #SBATCH --nodes=1 + #SBATCH --ntasks=1 + #SBATCH --time=02:30:00 + #SBATCH --mem=4000M + #SBATCH -J "jupyter-notebook" # job-name + #SBATCH -A <name_of_your_project> + + unset XDG_RUNTIME_DIR # might be required when interactive instead of sbatch to avoid 'Permission denied error' + srun jupyter notebook + +Start the script above (e.g. with the name jnotebook) with sbatch +command: + + sbatch jnotebook.slurm + +If you have a question about sbatch script see the article about \<a +href="Slurm" target="\_blank">SLURM\</a> + +Check by the command: '\<span>tail notebook_output.txt'\</span> the +status and the **token** of the server. It should look like this: + + https://(taurusi2092.taurus.hrsk.tu-dresden.de or 127.0.0.1):9999/ + +\<span style="font-size: 1em;">You can see the \</span>**server node's +hostname**\<span style="font-size: 1em;">by the command: +'\</span>\<span>squeue -u \<username>'\</span>\<span style="font-size: +1em;">.\</span> + +\<span style="color: #222222; font-size: 1.231em;">Remote connect to the +server\</span> + +There are two options on how to connect to the server: + +\<span style="font-size: 1em;">1. You can create an ssh tunnel if you +have problems with the solution above.\</span> \<span style="font-size: +1em;">Open the other terminal and configure ssh tunnel: \</span>\<span +style="font-size: 1em;">(look up connection values in the output file of +slurm job, e.g.)\</span> (recommended): + + node=taurusi2092 #see the name of the node with squeue -u <your_login> + localport=8887 #local port on your computer + remoteport=9999 #pay attention on the value. It should be the same value as value in the notebook_output.txt + ssh -fNL ${localport}:${node}:${remoteport} <zih_user>@taurus.hrsk.tu-dresden.de #configure of the ssh tunnel for connection to your remote server + pgrep -f "ssh -fNL ${localport}" #verify that tunnel is alive + +\<span style="font-size: 1em;">2. On your client (local machine) you now +can connect to the server. You need to know the\</span>** node's +hostname**\<span style="font-size: 1em;">, the \</span> **port** \<span +style="font-size: 1em;"> of the server and the \</span> **token** \<span +style="font-size: 1em;"> to login (see paragraph above).\</span> + +You can connect directly if you know the IP address (just ping the +node's hostname while logged on Taurus). + + #comand on remote terminal + taurusi2092$> host taurusi2092 + # copy IP address from output + # paste IP to your browser or call on local terminal e.g. + local$> firefox https://<IP>:<PORT> # https important to use SSL cert + +To login into the jupyter notebook site, you have to enter the +**token**. (<https://localhost:8887>). Now you can create and execute +notebooks on Taurus with GPU support. + +%RED%Note:<span class="twiki-macro ENDCOLOR"></span> If you would like +to use \<a href="JupyterHub" target="\_self">jupyterhub\</a> after using +a remote manually configurated jupyter server (example above) you need +to change the name of the configuration file +(/home//.jupyter/jupyter_notebook_config.py) to any other. + +### F.A.Q + +Q: - I have an error to connect to the Jupyter server (e.g. "open +failed: administratively prohibited: open failed") + +A: - Check the settings of your \<span style="font-size: 1em;">jupyter +config file. Is it all necessary lines uncommented, the right path to +cert and key files, right hashed password from .json file? Check is the +used local port \<a +href="<https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers>" +target="\_blank">available\</a>? Check local settings e.g. +(/etc/ssh/sshd_config, /etc/hosts)\</span> + +Q: I have an error during the start of the interactive session (e.g. +PMI2_Init failed to initialize. Return code: 1) + +A: Probably you need to provide --mpi=none to avoid ompi errors (). +\<span style="font-size: 1em;">srun --mpi=none --reservation \<...> -A +\<...> -t 90 --mem=4000 --gres=gpu:1 --partition=gpu2-interactive --pty +bash -l\</span> diff --git a/twiki2md/root/Applications/Electrodynamics.md b/twiki2md/root/Applications/Electrodynamics.md new file mode 100644 index 000000000..c7b99613d --- /dev/null +++ b/twiki2md/root/Applications/Electrodynamics.md @@ -0,0 +1,17 @@ +# Electromagnetic Field Simulation + +The following applications are installed at ZIH: + +| | | | +|----------|----------|------------| +| | **Mars** | **Deimos** | +| **HFSS** | | 11.0.2 | + +## HFSS + +[HFSS](http://www.ansoft.com/products/hf/hfss/) is the industry-standard +simulation tool for 3D full-wave electromagnetic field simulation. HFSS +provides E- and H-fields, currents, S-parameters and near and far +radiated field results. + +-- Main.mark - 2010-01-06 diff --git a/twiki2md/root/Applications/FEMSoftware.md b/twiki2md/root/Applications/FEMSoftware.md new file mode 100644 index 000000000..05c059ccd --- /dev/null +++ b/twiki2md/root/Applications/FEMSoftware.md @@ -0,0 +1,226 @@ +# FEM Software + +For an up-to-date list of the installed software versions on our +cluster, please refer to [SoftwareModulesList](SoftwareModulesList). + +## Abaqus + +[ABAQUS](http://www.hks.com) is a general-purpose finite-element program +designed for advanced linear and nonlinear engineering analysis +applications with facilities for linking-in user developed material +models, elements, friction laws, etc. + +Eike Dohmen (from Inst.f. Leichtbau und Kunststofftechnik) sent us the +attached description of his ABAQUS calculations. Please try to adapt +your calculations in that way.\<br />Eike is normally a Windows-User and +his description contains also some hints for basic Unix commands. ( +[ABAQUS-SLURM.pdf](%ATTACHURL%/ABAQUS-SLURM.pdf) - only in German) + +Please note: Abaqus calculations should be started with a batch script. +Please read the information about the [Batch System](BatchSystems) +SLURM. + +The detailed Abaqus documentation can be found at +<http://doc.zih.tu-dresden.de/abaqus> (only accessible from within the +TU Dresden campus net). + +\*Example - Thanks to Benjamin Groeger, Inst. f. Leichtbau und +Kunststofftechnik) \* + +1\. Prepare an Abaqus input-file (here the input example from Benjamin) + +[Rot-modell-BenjaminGroeger.inp](%ATTACHURL%/Rot-modell-BenjaminGroeger.inp) + +2\. Prepare a batch script on taurus like this + +``` +#!/bin/bash<br /><br />### Thanks to Benjamin Groeger, Institut fuer Leichtbau und Kunststofftechnik, 38748<br />### runs on taurus and needs ca 20sec with 4cpu<br />### generates files:<br />### yyyy.com<br />### yyyy.dat<br />### yyyy.msg<br />### yyyy.odb<br />### yyyy.prt<br />### yyyy.sim<br />### yyyy.sta<br /><br />#SBATCH --nodes=1 ### with >1 node abaqus needs a nodeliste<br />#SBATCH --ntasks-per-node=4<br />#SBATCH --mem=500 ### memory (sum)<br />#SBATCH --time=00:04:00<br />### give a name, what ever you want<br />#SBATCH --job-name=yyyy<br />### you get emails when the job will finished or failed<br />### set your right email<br />#SBATCH --mail-type=END,FAIL<br />#SBATCH --mail-user=xxxxx.yyyyyy@mailbox.tu-dresden.de<br />### set your project<br />#SBATCH -A p_xxxxxxx<br /><br />### Abaqus have its own MPI<br />unset SLURM_GTIDS<br /><br />### load and start<br />module load ABAQUS/2019<br />abaqus interactive input=Rot-modell-BenjaminGroeger.inp job=yyyy cpus=4 mp_mode=mpi<br /><br /> +``` + +3\. Start the batch script (name of our script is +"batch-Rot-modell-BenjaminGroeger") + +``` +sbatch batch-Rot-modell-BenjaminGroeger ---> you will get a jobnumber = JobID (for example 3130522) +``` + +4\. Control the status of the job + +``` +squeue -u your_login --> in column "ST" (Status) you will find a R=Running or P=Pending (waiting for resources) +``` + +## ANSYS + +ANSYS is a general-purpose finite-element program for engineering +analysis, and includes preprocessing, solution, and post-processing +functions. It is used in a wide range of disciplines for solutions to +mechanical, thermal, and electronic problems. [ANSYS and ANSYS +CFX](http://www.ansys.com) used to be separate packages in the past and +are now combined. + +ANSYS, like all other installed software, is organized in so-called +[modules](RuntimeEnvironment). To list the available versions and load a +particular ANSYS version, type + +``` +module avail ANSYS<br />...<br />module load ANSYS/VERSION +``` + +In general, HPC-systems are not designed for interactive "GUI-working". +Even so, it is possible to start a ANSYS workbench on Taurus (login +nodes) interactively for short tasks. The second and recommended way is +to use batch files. Both modes are documented in the following. + +### Using Workbench Interactively + +For fast things, ANSYS workbench can be invoked interactively on the +login nodes of Taurus. X11 forwarding needs to enabled when establishing +the SSH connection. For OpenSSH this option is '-X' and it is valuable +to use compression of all data via '-C'. + +``` +# Connect to taurus, e.g. ssh -CX<br />module load ANSYS/VERSION +runwb2 +``` + +If more time is needed, a CPU has to be allocated like this (see topic +[batch systems](BatchSystems) for further information): + +``` +module load ANSYS/VERSION +srun -t 00:30:00 --x11=first [SLURM_OPTIONS] --pty bash +runwb2 +``` + +**Note:** The software NICE Desktop Cloud Visualization (DCV) enables to +remotly access OpenGL-3D-applications running on taurus using its GPUs +(cf. [virtual desktops](Compendium.VirtualDesktops)). Using ANSYS +together with dcv works as follows: + +- Follow the instructions within [virtual + desktops](Compendium.VirtualDesktops) +- \<pre> module load ANSYS\</pre> +- \<pre> unset SLURM_GTIDS\</pre> +- Note the hints w.r.t. GPU support on dcv side +- \<pre>runwb2\</pre> + +### Using Workbench in Batch Mode + +The ANSYS workbench (runwb2) can also be used in a batch script to start +calculations (the solver, not GUI) from a workbench project into the +background. To do so, you have to specify the -B parameter (for batch +mode), -F for your project file, and can then either add differerent +commands via -E parameters directly, or specify a workbench script file +containing commands via -R. + +**NOTE:** Since the MPI library that ANSYS uses internally (Platform +MPI) has some problems integrating seamlessly with SLURM, you have to +unset the enviroment variable SLURM_GTIDS in your job environment before +running workbench. An example batch script could look like this: + + #!/bin/bash + #SBATCH --time=0:30:00 + #SBATCH --nodes=1 + #SBATCH --ntasks=2 + #SBATCH --mem-per-cpu=1000M + + + unset SLURM_GTIDS # Odd, but necessary! + + module load ANSYS/VERSION + + runwb2 -B -F Workbench_Taurus.wbpj -E 'Project.Update' -E 'Save(Overwrite=True)' + #or, if you wish to use a workbench replay file, replace the -E parameters with: -R mysteps.wbjn + +### Running Workbench in Parallel + +Unfortunately, the number of CPU cores you wish to use cannot simply be +given as a command line parameter to your runwb2 call. Instead, you have +to enter it into an XML file in your home. This setting will then be +used for all your runwb2 jobs. While it is also possible to edit this +setting via the Mechanical GUI, experience shows that this can be +problematic via X-Forwarding and we only managed to use the GUI properly +via [DCV](DesktopCloudVisualization), so we recommend you simply edit +the XML file directly with a text editor of your choice. It is located +under: + +*$HOME/.mw/Application Data/Ansys/v181/SolveHandlers.xml* + +(mind the space in there.) You might have to adjust the ANSYS Version +(v181) in the path. In this file, you can find the parameter + + <MaxNumberProcessors>2</MaxNumberProcessors> + +that you can simply change to something like 16 oder 24. For now, you +should stay within single-node boundaries, because multi-node +calculations require additional parameters. The number you choose should +match your used --cpus-per-task parameter in your sbatch script. + +## COMSOL Multiphysics + +" [COMSOL Multiphysics](http://www.comsol.com) (formerly FEMLAB) is a +finite element analysis, solver and Simulation software package for +various physics and engineering applications, especially coupled +phenomena, or multiphysics." +[\[http://en.wikipedia.org/wiki/COMSOL_Multiphysics Wikipedia]([http://en.wikipedia.org/wiki/COMSOL_Multiphysics Wikipedia) +\] + +Comsol may be used remotely on ZIH machines or locally on the desktop, +using ZIH license server. + +For using Comsol on ZIH machines, the following operating modes (see +Comsol manual) are recommended: + +- Interactive Client Server Mode + +In this mode Comsol runs as server process on the ZIH machine and as +client process on your local workstation. The client process needs a +dummy license for installation, but no license for normal work. Using +this mode is almost undistinguishable from working with a local +installation. It works well with Windows clients. For this operation +mode to work, you must build an SSH tunnel through the firewall of ZIH. +For further information, see the Comsol manual. + +Example for starting the server process (4 cores, 10 GB RAM, max. 8 +hours running time): + + module load COMSOL + srun -c4 -t 8:00 --mem-per-cpu=2500 comsol -np 4 server + +- Interactive Job via Batchsystem SLURM + +<!-- --> + + module load COMSOL + srun -n1 -c4 --mem-per-cpu=2500 -t 8:00 --pty --x11=first comsol -np 4 + +Man sollte noch schauen, ob das Rendering unter Options -> Preferences +-> Graphics and Plot Windows auf Software-Rendering steht - und dann +sollte man campusintern arbeiten knnen. + +- Background Job via Batchsystem SLURM + +<!-- --> + + #!/bin/bash + #SBATCH --time=24:00:00 + #SBATCH --nodes=2 + #SBATCH --ntasks-per-node=2 + #SBATCH --cpus-per-task=12 + #SBATCH --mem-per-cpu=2500 + + module load COMSOL + srun comsol -mpi=intel batch -inputfile ./MyInputFile.mph + +Submit via: `sbatch <filename>` + +## LS-DYNA + +Both, the shared memory version and the distributed memory version (mpp) +are installed on all machines. + +To run the MPI version on Taurus or Venus you need a batchfile (sumbmit +with `sbatch <filename>`) like: + + #!/bin/bash<br />#SBATCH --time=01:00:00 # walltime<br />#SBATCH --ntasks=16 # number of processor cores (i.e. tasks)<br />#SBATCH --mem-per-cpu=1900M # memory per CPU core<br /><br />module load ls-dyna<br />srun mpp-dyna i=neon_refined01_30ms.k memory=120000000 diff --git a/twiki2md/root/Applications/Mathematics.md b/twiki2md/root/Applications/Mathematics.md new file mode 100644 index 000000000..6848ea079 --- /dev/null +++ b/twiki2md/root/Applications/Mathematics.md @@ -0,0 +1,214 @@ +# Mathematics Applications + + The following applications are +installed at ZIH: + +| | | | | | +|-----------------|-----------|------------|------------|-------------| +| | **Venus** | **Triton** | **Taurus** | **module** | +| **Mathematica** | | | x | Mathematica | +| **Matlab** | | x | x | MATLAB | +| **Octave** | | | | | +| **R** | | x | x | r | + +*Please do not run expensive interactive sessions on the login nodes. +Instead, use* `srun --pty ...` *to let the batch system place it on a +compute node.* + +## Mathematica + +Mathematica is a general computing environment, organizing many +algorithmic, visualization, and user interface capabilities within a +document-like user interface paradigm. + +To remotely use the graphical frontend one has to add the Mathematica +fonts to the local fontmanager. + +For a Linux workstation: + + scp -r taurus.hrsk.tu-dresden.de:/sw/global/applications/mathematica/10.0/SystemFiles/Fonts/Type1/ ~/.fonts + xset fp+ ~/.fonts/Type1 + +For a Windows workstation: +You have to add additional mathematica fonts at your local PC. At the +end of this webpage you can find an archive with these fonts (.zip) + +If you use **Xming** as X-server at your PC (see also our Information +about Remote access from Windos to Linux): +1 Create a new folder "Mathematica" in the diretory "fonts" of the +installation directory of Xming. (mostly: C:\\Programme\\Xming\\fonts\\) +1 Extract the fonts archive into this new directory "Mathematica". +In result you should have 2 directories "DBF" and "Type1". +1 Add the path to these font files into the file "font-dirs". +You can find it in C:\\Programme\\Xming\\. + + # font-dirs + # comma-separated list of directories to add to the default font path + # defaults are built-ins, misc, TTF, Type1, 75dpi, 100dpi + # also allows entries on individual lines + C:\Programme\Xming\fonts\dejavu,C:\Programme\Xming\fonts\cyrillic + C:\Programme\Xming\fonts\Mathematica\DBF + C:\Programme\Xming\fonts\Mathematica\Type1 + C:\WINDOWS\Fonts + + + +**Mathematica and SLURM:** \<br />Please use the Batchsystem SLURM for +running calculations. This is a small example for a batch script, that +you should prepair and start with command \<br />sbatch +\<scriptname>\<br />File "mathtest.m" is your input script, that +includes the calculation statements for mathematica. File +"mathtest.output" will collect the results of your calculation. + + #!/bin/bash <br />#SBATCH --output=mathtest.out <br />#SBATCH --error=mathtest.err <br />#SBATCH --time=00:05:00 <br />#SBATCH --ntasks=1 <br /><br />module load Mathematica <br />math -run < mathtest.m > mathtest.output + +(also +<https://rcc.uchicago.edu/docs/software/environments/mathematica/index.html>) + +**%RED%NOTE:%ENDCOLOR%** Mathematica licenses are limited. There exist +two types, MathKernel and SubMathKernel licenses. Every sequential job +you start will consume a MathKernel license of which we only have 39. We +do have, however, 312 SubMathKernel licenses, so please, don't start +many sequential jobs but try to parallelize your calculation, utilizing +multiple Sub MathKernel licenses per job, in order to achieve a more +reasonable license usage. + + + +## Matlab + +MATLAB is a numerical computing environment and programming language. +Created by The MathWorks, MATLAB allows easy matrix manipulation, +plotting of functions and data, implementation of algorithms, creation +of user interfaces, and interfacing with programs in other languages. +Although it specializes in numerical computing, an optional toolbox +interfaces with the Maple symbolic engine, allowing it to be part of a +full computer algebra system. + +Running MATLAB via the batch system could look like this (for 456 MB RAM +per core and 12 cores reserved). Please adapt this to your needs! + +- SLURM (taurus, venus): + +<!-- --> + + module load MATLAB<br /> srun -t 8:00 -c 12 --mem-per-cpu=456 --pty --x11=first bash<br /> matlab + +With following command you can see a list of installed software - also +the different versions of matlab. + + module avail + +Please choose one of these, then load the chosen software with the +command: + + module load MATLAB/version + +Or use: + + module load MATLAB + +(then you will get the most recent Matlab version. [Refer to the modules +section for details.](RuntimeEnvironment#Modules)) + +### matlab interactive + +\* if X-server is running and you logged in at the HPC systems, you +should allocate a CPU for your work with command \<pre> srun --pty +--x11=first bash \</pre> + +- now you can call "matlab" (you have 8h time to work with the + matlab-GUI) + +### matlab with script + +- you have to start matlab-calculation as a Batch-Job via command + +<!-- --> + + srun --pty matlab -nodisplay -r basename_of_your_matlab_script #NOTE: you must omit the file extension ".m" here, because -r expects a matlab command or function call, not a file-name. + +**%RED%NOTE:%ENDCOLOR%** while running your calculations as a script +this way is possible, it is generally frowned upon, because you are +occupying Matlab licenses for the entire duration of your calculation +when doing so. Since the available licenses are limited, it is highly +recommended you first compile your script via the Matlab Compiler (mcc) +before running it for a longer period of time on our systems. That way, +you only need to check-out a license during compile time (which is +relatively short) and can run as many instances of your calculation as +you'd like, since it does not need a license during runtime when +compiled to a binary. + +You can find detailled documentation on the Matlab compiler at +Mathworks: <https://de.mathworks.com/help/compiler/> + +### using the matlab compiler (mcc) + +- compile your .m script to a binary: \<pre>mcc -m + name_of_your_matlab_script.m -o compiled_executable -R -nodisplay -R + -nosplash\</pre> + +This will also generate a wrapper script called +run_compiled_executable.sh which sets the required library path +environment variables in order to make this work. It expects the path to +the Matlab installation as an argument, you can use the environment +variable $EBROOTMATLAB as set by the module file for that. + +- then run the binary via the wrapper script in a job (just a simple + example, you should be using an [sbatch + script](Compendium.Slurm#Job_Submission) for that): \<pre>srun + ./run_compiled_executable.sh $EBROOTMATLAB\</pre> + +### matlab parallel (with 'local' configuration) + +- If you want to run your code in parallel, please request as many + cores as you need! +- start a batch job with the number N of processes +- example for N= 4: \<pre> srun -c 4 --pty --x11=first bash\</pre> +- run Matlab with the GUI or the CLI or with a script +- inside use \<pre>matlabpool open 4\</pre> to start parallel + processing + +<!-- --> + +- example for 1000\*1000 matrixmutliplication + +<!-- --> + + R = distributed.rand(1000); + D = R * R + +- to close parallel task: + +<!-- --> + + matlabpool close + +### matlab parallel (with parfor) + +- start a batch job with the number N of processes (e.g. N=12) +- inside use \<pre>matlabpool open N\</pre> or + \<pre>matlabpool(N)\</pre> to start parallel processing. It will use + the 'local' configuration by default. +- Use 'parfor' for a parallel loop, where the **independent** loop + iterations are processed by N threads +- Example: + +<!-- --> + + parfor i = 1:3 + c(:,i) = eig(rand(1000)); + end + +- see also \<pre>help parfor\</pre> + +## Octave + +GNU Octave is a high-level language, primarily intended for numerical +computations. It provides a convenient command line interface for +solving linear and nonlinear problems numerically, and for performing +other numerical experiments using a language that is mostly compatible +with Matlab. It may also be used as a batch-oriented language. + +- [Mathematica-Fonts.zip](%ATTACHURL%/Mathematica-Fonts.zip): + Mathematica-Fonts diff --git a/twiki2md/root/Applications/NanoscaleSimulations.md b/twiki2md/root/Applications/NanoscaleSimulations.md new file mode 100644 index 000000000..004cc0566 --- /dev/null +++ b/twiki2md/root/Applications/NanoscaleSimulations.md @@ -0,0 +1,216 @@ +# Nanoscale Modeling Tools + +| | | | +|---------------------------|-----------------------------------|------------| +| | **Taurus** | **module** | +| **[ABINIT](#ABINIT)** | 7.21, 8.2.3, 8.6.3 | abinit | +| **[CP2K](#CP2K)** | 2.3, 2.4, 2.6, 3.0, 4.1, 5.1, 6.1 | cp2k | +| **[CPMD](#CPMD)** | | \- | +| **[Gamess US](#gamess)** | 2013 | gamess | +| **[Gaussian](#Gaussian)** | g03e01, g09, g09b01, g16 | gaussian | +| **[GROMACS](#GROMACS)** | 4.6.5, 4.6.7,5.1, 5.1.4, 2018.2 | gromacs | +| **[LAMMPS](#lammps)** | 2014, 2015, 2016, 2018 | lammps | +| **[NAMD](#NAMD)** | 2.10, 2.12 | namd | +| **[ORCA](#ORCA)** | 3.0.3, 4.0, 4.0.1 | orca | +| **[Siesta](#Siesta)** | 3.2, 4.0, 4.1 | siesta | +| **[VASP](#VASP)** | 5.3, 5.4.1, 5.4.4 | vasp | + +## NAMD + +[NAMD](http://www.ks.uiuc.edu/Research/namd) is a parallel molecular +dynamics code designed for high-performance simulation of large +biomolecular systems. + +The current version in modenv/scs5 can be started with srun as usual. + +Note that the old version from modenv/classic does not use MPI but +rather uses Infiniband directly. Therefore, you cannot not use +srun/mpirun to spawn the processes but have to use the supplied +"charmrun" command instead. Also, since this is batch system agnostic, +it has no possiblity of knowing which nodes are reserved for it use, so +if you want it to run on more than node, you have to create a hostlist +file and feed it to charmrun via the parameter "++nodelist". Otherwise, +all processes will be launched on the same node (localhost) and the +other nodes remain unused. + +You can use the following snippet in your batch file to create a +hostlist file: + + export NODELISTFILE="/tmp/slurm.nodelist.$SLURM_JOB_ID" + for LINE in `scontrol show hostname $SLURM_JOB_NODELIST` ; do + echo "host $LINE" >> $NODELISTFILE ; + done + + # launch NAMD processes. Note that the environment variable $SLURM_NTASKS is only available if you have + # used the -n|--ntasks parameter. Otherwise, you have to specify the number of processes manually, e.g. +p64 + charmrun +p$SLURM_NTASKS ++nodelist $NODELISTFILE $NAMD inputfile.namd + + # clean up afterwards: + test -f $NODELISTFILE && rm -f $NODELISTFILE + +The current version 2.7b1 of NAMD runs much faster than 2.6. - +Especially on the SGI Altix. Since the parallel performance strongly +depends on the size of the given problem one cannot give a general +advice for the optimum number of CPUs to use. (Please check this by +running NAMD with your molecules and just a few time steps.) + +Any published work which utilizes NAMD shall include the following +reference: \<br /> *James C. Phillips, Rosemary Braun, Wei Wang, James +Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. +Skeel, Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics +with NAMD. Journal of Computational Chemistry, 26:1781-1802, 2005.* + +Electronic documents will include a direct link to the official NAMD +page at <http://www.ks.uiuc.edu/Research/namd/> + +## Gaussian + +Starting from the basic laws of quantum mechanics, +[Gaussian](http://www.gaussian.com) predicts the energies, molecular +structures, and vibrational frequencies of molecular systems, along with +numerous molecular properties derived from these basic computation +types. It can be used to study molecules and reactions under a wide +range of conditions, including both stable species and compounds which +are difficult or impossible to observe experimentally such as +short-lived intermediates and transition structures. + +With `module load gaussian` (or `gaussian/g09`) a number of environment +variables are set according to the needs of Gaussian. Please, set the +directory for temporary data (GAUSS_SCRDIR) manually to somewhere below +\<span>/scratch (you get the path, when you generated a workspace for +your calculation)\<br />\</span> + +This is a small example, kindly provide by Arno Schneeweis (Inst. fr +Angewandte Physik). You need a batch file - for example called +"mybatch.sh" with the following content: + + #!/bin/bash<br />#SBATCH --nodes=1<br />#SBATCH --ntasks-per-node=4 # this number of CPU's has to match with the %nproc in the inputfile<br />#SBATCH --mem=4000<br />#SBATCH --time=00:10:00 # hh:mm:ss<br />#SBATCH --mail-type=END,FAIL<br />#SBATCH --mail-user=vorname.nachname@tu-dresden.de<br />#SBATCH -A ...your_projectname... <br /><br />#### make available the access to Gaussian 16<br />module load modenv/classic<br />module load gaussian/g16_avx2<br />export GAUSS_SCRDIR=...path_to_the_Workspace_that_you_generated_before...<br />g16 < my_input.com > my_output.out + +*As example the input for gaussian could be this my_input.com:\<br />* + + %mem=4GB<br />%nproc=4<br />#P B3LYP/6-31G* opt<br /><br />Toluol<br /><br />0 1<br />C 1.108640 0.464239 -0.122043<br />C 1.643340 -0.780361 0.210457<br />C 0.794940 -1.850561 0.494257<br />C -0.588060 -1.676061 0.445657<br />C -1.122760 -0.431461 0.113257<br />C -0.274360 0.638739 -0.170643<br />C -0.848171 1.974558 -0.527484<br />H 1.777668 1.308198 -0.345947<br />H 2.734028 -0.917929 0.248871<br />H 1.216572 -2.832148 0.756392<br />H -1.257085 -2.520043 0.669489<br />H -2.213449 -0.293864 0.074993<br />H -1.959605 1.917127 -0.513867<br />H -0.507352 2.733596 0.211754<br />H -0.504347 2.265972 -1.545144<br /><br /> + +You have to start the job with command: + +sbatch mybatch.sh + +## \<a name="gamess">\</a>GAMESS US + +GAMESS is an ab-initio quantum mechanics program, which provides many +methods for computation of the properties of molecular systems using +standard quantum chemical methods. For a detailed description, please +look at the [GAMESS home +page](http://www.msg.ameslab.gov/GAMESS/GAMESS.html). + +For runs with Slurm, please use a script like this: + + #!/bin/bash<br />#SBATCH -t 120<br />#SBATCH -n 8<br />#SBATCH --ntasks-per-node=2<br /># you have to make sure that on each node runs an even number of tasks !!<br />#SBATCH --mem-per-cpu=1900<br />module load gamess<br />rungms.slurm cTT_M_025.inp /scratch/mark/gamess <br /># the third parameter is the location of the scratch directory<br /> + +*GAMESS should be cited as:* M.W.Schmidt, K.K.Baldridge, J.A.Boatz, +S.T.Elbert, M.S.Gordon, J.H.Jensen, S.Koseki, N.Matsunaga, K.A.Nguyen, +S.J.Su, T.L.Windus, M.Dupuis, J.A.Montgomery, J.Comput.Chem. 14, +1347-1363(1993). + +## \<a name="lammps">\</a>LAMMPS + +[LAMMPS](http://lammps.sandia.gov) is a classical molecular dynamics +code that models an ensemble of particles in a liquid, solid, or gaseous +state. It can model atomic, polymeric, biological, metallic, granular, +and coarse-grained systems using a variety of force fields and boundary +conditions. For examples of LAMMPS simulations, documentations, and more +visit [LAMMPS sites](http://lammps.sandia.gov). + +## ABINIT + +[ABINIT](http://www.abinit.org) is a package whose main program allows +one to find the total energy, charge density and electronic structure of +systems made of electrons and nuclei (molecules and periodic solids) +within Density Functional Theory (DFT), using pseudopotentials and a +planewave basis. ABINIT also includes options to optimize the geometry +according to the DFT forces and stresses, or to perform molecular +dynamics simulations using these forces, or to generate dynamical +matrices, Born effective charges, and dielectric tensors. Excited states +can be computed within the Time-Dependent Density Functional Theory (for +molecules), or within Many-Body Perturbation Theory (the GW +approximation). + +## CP2K + +[CP2K](http://cp2k.berlios.de/) performs atomistic and molecular +simulations of solid state, liquid, molecular and biological systems. It +provides a general framework for different methods such as e.g. density +functional theory (DFT) using a mixed Gaussian and plane waves approach +(GPW), and classical pair and many-body potentials. + +## CPMD + +The CPMD code is a plane wave/pseudopotential implementation of Density +Functional Theory, particularly designed for ab-initio molecular +dynamics. For examples and documentations see [CPMD +homepage](http://www.cpmd.org). + +## GROMACS + +GROMACS is a versatile package to perform molecular dynamics, i.e. +simulate the Newtonian equations of motion for systems with hundreds to +millions of particles. It is primarily designed for biochemical +molecules like proteins, lipids and nucleic acids that have a lot of +complicated bonded interactions, but since GROMACS is extremely fast at +calculating the nonbonded interactions (that usually dominate +simulations) many groups are also using it for research on +non-biological systems, e.g. polymers. + +For documentations see [Gromacs homepage](http://www.gromacs.org/). + +## ORCA + +ORCA is a flexible, efficient and easy-to-use general purpose tool for +quantum chemistry with specific emphasis on spectroscopic properties of +open-shell molecules. It features a wide variety of standard quantum +chemical methods ranging from semiempirical methods to DFT to single- +and multireference correlated ab initio methods. It can also treat +environmental and relativistic effects. + +To run Orca jobs in parallel, you have to specify the number of +processes in your input file (here for example 16 processes): + + %pal nprocs 16 end + +Note that Orca does the MPI process spawning itself, so you may not use +"srun" to launch it in your batch file. Just set --ntasks to the same +number as in your input file and call the "orca" executable directly. +For parallel runs, it must be called with the full path: + + #!/bin/bash + #SBATCH --ntasks=16 + #SBATCH --nodes=1 + #SBATCH --mem-per-cpu=2000M + + $ORCA_ROOT/orca example.inp + +## Siesta + +Siesta (Spanish Initiative for Electronic Simulations with Thousands of +Atoms) is both a method and its computer program implementation, to +perform electronic structure calculations and ab initio molecular +dynamics simulations of molecules and solids. <http://www.uam.es/siesta> + +In any paper or other academic publication containing results wholly or +partially derived from the results of use of the SIESTA package, the +following papers must be cited in the normal manner: 1 "Self-consistent +order-N density-functional calculations for very large systems", P. +Ordejon, E. Artacho and J. M. Soler, Phys. Rev. B (Rapid Comm.) 53, +R10441-10443 (1996). 1 "The SIESTA method for ab initio order-N +materials simulation" J. M. Soler, E. Artacho, J. D. Gale, A. Garcia, J. +Junquera, P. Ordejon, and D. Sanchez-Portal, J. Phys.: Condens. Matt. +14, 2745-2779 (2002). + +## VASP + +"VAMP/VASP is a package for performing ab-initio quantum-mechanical +molecular dynamics (MD) using pseudopotentials and a plane wave basis +set." +[\[http://cms.mpi.univie.ac.at/vasp/ Official Site]([http://cms.mpi.univie.ac.at/vasp/ Official Site)\]. +It is installed on mars. If you are interested in using VASP on ZIH +machines, please contact [Dr. Ulf +Markwardt](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/wir_ueber_uns/mitarbeiter/markwardt). diff --git a/twiki2md/root/Applications/SoftwareModulesList.md b/twiki2md/root/Applications/SoftwareModulesList.md new file mode 100644 index 000000000..70feb25fc --- /dev/null +++ b/twiki2md/root/Applications/SoftwareModulesList.md @@ -0,0 +1,962 @@ +### SCS5 Environment + +<span class="twiki-macro TABLE">headerrows= 1"</span> <span +class="twiki-macro EDITTABLE" +format="| text, 30, Software | text, 40, Kategorie | text, 30, Letzte Änderung | text, 30, SGI-UV | text, 30, Taurus | " +changerows="on"></span> + +| Software | Category | Last change | Venus | Taurus | +|:--------------------------|:----------|:------------|:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| ABAQUS | cae | 2020-04-30 | \- | 2019 | +| ABINIT | chem | 2019-07-12 | \- | 8.6.3\<br />8.10.3 | +| ACE | lib | 2018-11-22 | \- | 6.5.1 | +| ACTC | lib | 2018-11-22 | \- | 1.1 | +| AFNI | bio | 2018-11-22 | \- | 20180521 | +| AMDLibM | perf | 2019-08-20 | \- | 3.4.0 | +| AMDuProf | perf | 2020-11-26 | \- | 3.3.462\<br />3.1.35 | +| ANSA | cae | 2021-04-08 | \- | 20.1.4 | +| ANSYS | tools | 2020-10-23 | \- | 2020R2\<br />19.5\<br />19.2 | +| ANTLR | tools | 2020-09-29 | \- | 2.7.7 | +| ANTs | data | 2020-10-06 | \- | 2.3.4 | +| APR | tools | 2019-11-08 | \- | 1.7.0\<br />1.6.3 | +| APR-util | tools | 2019-11-08 | \- | 1.6.1 | +| ASE | chem | 2020-09-29 | \- | 3.19.0\<br />3.18.1\<br />3.16.2 | +| ATK | vis | 2020-09-02 | \- | 2.34.1\<br />2.28.1 | +| Advisor | perf | 2018-11-22 | \- | 2018 | +| Anaconda3 | lang | 2019-10-17 | \- | 2019.03 | +| AnsysEM | phys | 2018-11-22 | \- | 19.0 | +| Arrow | data | 2020-12-01 | \- | 0.16.0\<br />0.14.1 | +| AtomPAW | chem | 2019-09-11 | \- | 4.1.0.6 | +| Autoconf | devel | 2021-02-17 | \- | 2.69 | +| Automake | devel | 2021-02-17 | \- | 1.16.2\<br />1.16.1\<br />1.15.1\<br />1.15 | +| Autotools | devel | 2021-02-17 | \- | 20200321\<br />20180311\<br />20170619\<br />20150215 | +| Bazel | devel | 2021-03-19 | \- | 3.7.2\<br />3.7.1\<br />3.4.1\<br />0.29.1\<br />0.26.1\<br />0.20.0\<br />0.16.0\<br />0.12.0 | +| BigDataFrameworkConfigure | devel | 2019-09-16 | \- | 0.0.2\<br />0.0.1 | +| Bison | lang | 2020-12-11 | \- | 3.7.1\<br />3.5.3\<br />3.3.2\<br />3.0.5\<br />3.0.4\<br />2.7 | +| Blitz++ | lib | 2019-04-09 | \- | 0.10 | +| Boost | devel | 2021-04-09 | \- | 1.74.0\<br />1.72.0\<br />1.71.0\<br />1.70.0\<br />1.69.0\<br />1.68.0\<br />1.67.0\<br />1.66.0\<br />1.61.0\<br />1.55.0 | +| Boost.Python | lib | 2019-02-26 | \- | 1.66.0 | +| CDO | data | 2020-10-06 | \- | 1.9.8\<br />1.9.4 | +| CFITSIO | lib | 2019-06-03 | \- | 3.45 | +| CGAL | numlib | 2020-08-20 | \- | 4.14.3\<br />4.14.1\<br />4.11.1 | +| CGNS | cae | 2020-10-09 | \- | 3.3.1 | +| CMake | devel | 2021-02-17 | \- | 3.9.5\<br />3.9.1\<br />3.8.0\<br />3.18.4\<br />3.16.4\<br />3.15.3\<br />3.14.5\<br />3.13.3\<br />3.12.1\<br />3.11.4\<br />3.10.2\<br />3.10.1\<br />3.10.0 | +| COMSOL | phys | 2021-02-10 | \- | 5.6\<br />5.5\<br />5.4 | +| CP2K | chem | 2019-12-12 | \- | 6.1\<br />5.1 | +| CUDA | system | 2021-02-17 | \- | 9.2.88\<br />9.1.85\<br />11.1.1\<br />11.0.2\<br />10.1.243\<br />10.0.130 | +| CUDAcore | system | 2021-02-17 | \- | 11.1.1\<br />11.0.2 | +| Check | lib | 2021-02-17 | \- | 0.15.2 | +| Clang | compiler | 2020-12-09 | \- | 9.0.1\<br />5.0.0 | +| ClustalW2 | bio | 2018-11-22 | \- | 2.1 | +| CubeGUI | perf | 2021-01-29 | \- | 4.4.4\<br />4.4 | +| CubeLib | perf | 2020-10-12 | \- | 4.4.4\<br />4.4 | +| CubeW | perf | 2018-11-22 | \- | 4.4 | +| CubeWriter | perf | 2020-10-12 | \- | 4.4.3 | +| Cython | lang | 2018-11-22 | \- | 0.28.5\<br />0.28.2 | +| DAMASK | phys | 2020-12-10 | \- | 3.0.0\<br />2.0.3-2992\<br />2.0.3 | +| DASH | lib | 2018-11-22 | \- | dash | +| DB | tools | 2021-02-17 | \- | 18.1.40 | +| DBus | devel | 2020-08-11 | \- | 1.13.8\<br />1.13.6\<br />1.13.12 | +| DFTB+ | phys | 2021-04-12 | \- | 19.1\<br />18.2\<br />18.1 | +| DMTCP | tools | 2018-12-06 | \- | 2.5.2\<br />2.5.1 | +| DOLFIN | math | 2020-12-21 | \- | 2019.1.0\<br />2018.1.0.post1\<br />2017.2.0 | +| Delft3D | geo | 2020-01-29 | \- | 6.03 | +| Devel-NYTProf | perf | 2019-08-30 | \- | 6.06 | +| Doxygen | devel | 2021-03-03 | \- | 1.8.20\<br />1.8.17\<br />1.8.16\<br />1.8.15\<br />1.8.14\<br />1.8.13 | +| Dyninst | tools | 2019-01-21 | \- | 9.3.2 | +| ELPA | math | 2021-01-05 | \- | 2019.11.001\<br />2018.11.001\<br />2016.11.001.pre | +| ELSI | math | 2020-02-27 | \- | 2.5.0 | +| EMBOSS | bio | 2019-02-11 | \- | 6.6.0 | +| ESPResSo | phys | 2019-11-19 | \- | 4.1.1 | +| ETSF_IO | lib | 2020-10-15 | \- | 1.0.4 | +| EasyBuild | tools | 2021-03-08 | \- | 4.3.3\<br />4.3.2\<br />4.3.1\<br />4.3.0\<br />4.2.2\<br />4.2.0\<br />4.1.2\<br />4.1.1\<br />4.1.0\<br />4.0.1\<br />3.9.4\<br />3.9.3\<br />3.9.2\<br />3.9.1\<br />3.8.1\<br />3.8.0\<br />3.7.1\<br />3.7.0\<br />3.6.2\<br />3.6.1 | +| Eigen | math | 2021-02-17 | \- | 3.3.8\<br />3.3.7\<br />3.3.4 | +| Emacs | tools | 2020-02-20 | \- | 25.3 | +| ErlangOTP | lang | 2019-03-13 | \- | 21.3-no | +| FFC | math | 2019-02-26 | \- | 2018.1.0 | +| FFTW | numlib | 2021-02-17 | \- | 3.3.8\<br />3.3.7\<br />3.3.4 | +| FFmpeg | vis | 2020-08-11 | \- | 4.2.2\<br />4.2.1\<br />4.1.3\<br />4.1\<br />3.4.2 | +| FIAT | math | 2019-02-26 | \- | 2018.1.0 | +| FLANN | lib | 2019-08-13 | \- | 1.8.4 | +| FLTK | vis | 2019-11-08 | \- | 1.3.5\<br />1.3.4 | +| FSL | bio | 2020-10-30 | \- | 6.0.2\<br />5.0.11 | +| Flink | devel | 2019-09-16 | \- | 1.9.0\<br />1.8.1 | +| FoX | lib | 2018-11-22 | \- | 4.1.2 | +| FreeSurfer | bio | 2020-07-22 | \- | 7.1.1\<br />6.0.0\<br />5.3.0\<br />5.1.0 | +| FriBidi | vis | 2020-08-11 | \- | 1.0.9\<br />1.0.5\<br />1.0.1 | +| GAMS | math | 2018-11-28 | \- | 25.1.3 | +| GCC | compiler | 2020-12-11 | \- | 9.3.0\<br />9.1.0-2.32\<br />8.3.0\<br />8.2.0-2.31.1\<br />8.2.0-2.30\<br />7.3.0-2.30\<br />6.4.0-2.28\<br />10.2.0 | +| GCCcore | compiler | 2020-12-11 | \- | 9.3.0\<br />9.1.0\<br />8.3.0\<br />8.2.0\<br />7.3.0\<br />6.4.0\<br />6.3.0\<br />10.2.0 | +| GCL | compiler | 2018-11-22 | \- | 2.6.12 | +| GDAL | data | 2020-10-06 | \- | 3.0.2\<br />3.0.0\<br />2.2.3 | +| GDB | debugger | 2020-08-11 | \- | 9.1\<br />8.1 | +| GDRCopy | lib | 2021-02-17 | \- | 2.1 | +| GEOS | math | 2020-10-06 | \- | 3.8.0\<br />3.7.2\<br />3.6.2 | +| GL2PS | vis | 2019-11-08 | \- | 1.4.0 | +| GLM | lib | 2018-12-11 | \- | 0.9.9.0 | +| GLPK | tools | 2020-08-28 | \- | 4.65 | +| GLib | vis | 2020-08-11 | \- | 2.64.1\<br />2.62.0\<br />2.60.1\<br />2.54.3 | +| GLibmm | vis | 2020-09-24 | \- | 2.49.7 | +| GMP | math | 2021-02-17 | \- | 6.2.0\<br />6.1.2 | +| GObject-Introspection | devel | 2020-09-02 | \- | 1.63.1\<br />1.60.1\<br />1.54.1 | +| GPAW | chem | 2020-01-15 | \- | 19.8.1 | +| GPAW-setups | chem | 2020-01-15 | \- | 0.9.20000 | +| GPI2 | base | 2019-05-29 | \- | next-27-05-19\<br />1.3.0 | +| GROMACS | bio | 2020-10-12 | \- | 2020\<br />2019.4\<br />2018.2 | +| GSL | numlib | 2020-08-11 | \- | 2.6\<br />2.5\<br />2.4 | +| GTK+ | vis | 2020-01-13 | \- | 2.24.32 | +| Gdk-Pixbuf | vis | 2020-09-02 | \- | 2.38.2\<br />2.36.12\<br />2.36.11 | +| Ghostscript | tools | 2020-08-28 | \- | 9.52\<br />9.50\<br />9.27\<br />9.23\<br />9.22 | +| GitPython | lib | 2020-04-16 | \- | 3.1.1\<br />3.0.3 | +| GlobalArrays | lib | 2020-10-12 | \- | 5.7 | +| Go | compiler | 2019-06-26 | \- | 1.12 | +| GraphicsMagick | vis | 2019-11-08 | \- | 1.3.33\<br />1.3.31\<br />1.3.28 | +| Guile | lang | 2020-08-11 | \- | 1.8.8 | +| Gurobi | math | 2020-12-01 | \- | 9.1.0\<br />9.0.1\<br />8.0.1 | +| HDF | data | 2018-11-22 | \- | 4.2.13 | +| HDF5 | data | 2021-03-17 | \- | 1.10.7\<br />1.10.6\<br />1.10.5\<br />1.10.2\<br />1.10.1 | +| HDFView | vis | 2019-02-20 | \- | 2.14 | +| Hadoop | devel | 2019-08-29 | \- | 2.7.7 | +| HarfBuzz | vis | 2020-09-02 | \- | 2.6.4\<br />2.4.0\<br />2.2.0\<br />1.7.5 | +| Horovod | tools | 2020-01-24 | \- | 0.18.2 | +| Hyperopt | lib | 2020-02-19 | \- | 0.2.2\<br />0.1.1 | +| Hypre | numlib | 2020-12-08 | \- | 2.18.2\<br />2.14.0 | +| ICU | lib | 2021-03-17 | \- | 67.1\<br />66.1\<br />64.2\<br />61.1\<br />56.1 | +| IPython | tools | 2019-11-19 | \- | 7.7.0\<br />6.4.0\<br />6.3.1 | +| ImageMagick | vis | 2020-08-28 | \- | 7.0.9-5\<br />7.0.8-46\<br />7.0.8-11\<br />7.0.7-39\<br />7.0.10-1 | +| Inspector | tools | 2019-01-22 | \- | 2019\<br />2018 | +| Ipopt | lib | 2018-11-22 | \- | 3.12.11 | +| JUnit | devel | 2019-02-13 | \- | 4.12 | +| JasPer | vis | 2020-08-11 | \- | 2.0.14\<br />1.900.1 | +| Java | lang | 2020-12-02 | \- | 14.0.2\<br />11.0.2\<br />1.8.0-162\<br />1.8.0-161 | +| JsonCpp | lib | 2021-03-17 | \- | 1.9.4\<br />1.9.3 | +| Julia | lang | 2020-07-14 | \- | 1.4.2\<br />1.1.1\<br />1.0.2 | +| Keras | math | 2020-03-20 | \- | 2.3.1\<br />2.2.4\<br />2.2.0 | +| LAME | data | 2020-08-11 | \- | 3.100 | +| LAMMPS | chem | 2020-08-11 | \- | 7Aug19\<br />3Mar2020\<br />20180316\<br />12Dec2018 | +| LLVM | compiler | 2020-09-25 | \- | 9.0.1\<br />9.0.0\<br />8.0.1\<br />7.0.1\<br />6.0.0\<br />5.0.1 | +| LMDB | lib | 2021-03-17 | \- | 0.9.24 | +| LS-DYNA | cae | 2020-12-11 | \- | DEV-81069\<br />12.0.0\<br />11.1.0\<br />11.0.0\<br />10.1.0 | +| LS-Opt | cae | 2019-08-20 | \- | 6.0.0\<br />5.2.1 | +| LS-PrePost | cae | 2019-05-03 | \- | 4.6\<br />4.5\<br />4.3 | +| LibTIFF | lib | 2021-02-17 | \- | 4.1.0\<br />4.0.9\<br />4.0.10 | +| LibUUID | lib | 2019-11-08 | \- | 1.0.3 | +| Libint | chem | 2019-10-14 | \- | 1.1.6 | +| LittleCMS | vis | 2020-08-28 | \- | 2.9 | +| M4 | devel | 2020-12-11 | \- | 1.4.18\<br />1.4.17 | +| MATIO | lib | 2018-12-11 | \- | 1.5.12 | +| MATLAB | math | 2021-03-22 | \- | 2021a\<br />2020a\<br />2019b\<br />2018b\<br />2018a\<br />2017a | +| MDAnalysis | phys | 2020-04-21 | \- | 0.20.1 | +| METIS | math | 2020-08-20 | \- | 5.1.0 | +| MPFR | math | 2020-08-20 | \- | 4.0.2\<br />4.0.1 | +| MUMPS | math | 2020-12-08 | \- | 5.2.1\<br />5.1.2 | +| MUST | perf | 2019-01-25 | \- | 1.6.0-rc3 | +| Mako | devel | 2020-08-11 | \- | 1.1.2\<br />1.1.0\<br />1.0.8\<br />1.0.7 | +| Mathematica | math | 2018-11-22 | \- | 11.3.0\<br />11.2.0 | +| Maven | devel | 2020-04-29 | \- | 3.6.3 | +| Maxima | math | 2018-11-22 | \- | 5.42.1 | +| Mercurial | tools | 2018-11-22 | \- | 4.6.1 | +| Mesa | vis | 2020-08-11 | \- | 20.0.2\<br />19.1.7\<br />19.0.1\<br />18.1.1\<br />17.3.6 | +| Meson | tools | 2021-02-17 | \- | 0.55.3\<br />0.53.2\<br />0.51.2\<br />0.50.0 | +| Mesquite | math | 2020-02-12 | \- | 2.3.0 | +| Miniconda2 | lang | 2018-11-22 | \- | 4.5.11 | +| Miniconda3 | lang | 2019-10-17 | \- | 4.5.4 | +| MongoDB | data | 2019-07-29 | \- | 4.0.3 | +| NAMD | chem | 2018-11-22 | \- | 2.12 | +| NASM | lang | 2021-02-17 | \- | 2.15.05\<br />2.14.02\<br />2.13.03\<br />2.13.01 | +| NCCL | lib | 2021-03-17 | \- | 2.8.3\<br />2.4.8\<br />2.4.2\<br />2.3.7 | +| NCO | tools | 2020-09-29 | \- | 4.9.3 | +| NFFT | lib | 2019-08-09 | \- | 3.5.1 | +| NLTK | data | 2018-12-05 | \- | 3.4 | +| NLopt | numlib | 2020-08-28 | \- | 2.6.1\<br />2.4.2 | +| NSPR | lib | 2020-08-11 | \- | 4.25\<br />4.21\<br />4.20 | +| NSS | lib | 2020-08-11 | \- | 3.51\<br />3.45\<br />3.42.1\<br />3.39 | +| NWChem | chem | 2018-11-22 | \- | 6.8.revision47\<br />6.6.revision27746 | +| Nektar++ | math | 2020-02-19 | \- | 5.0.0 | +| NetLogo | math | 2018-11-22 | \- | 6.0.4-64 | +| Ninja | tools | 2021-02-17 | \- | 1.9.0\<br />1.10.1\<br />1.10.0 | +| OPARI2 | perf | 2020-10-12 | \- | 2.0.5\<br />2.0.3 | +| ORCA | chem | 2020-03-04 | \- | 4.2.1\<br />4.1.1 | +| OTF2 | perf | 2020-10-12 | \- | 2.2\<br />2.1.1 | +| Octave | math | 2019-11-08 | \- | 5.1.0 | +| Octopus | chem | 2020-10-15 | \- | 8.4\<br />10.1 | +| OpenBLAS | numlib | 2021-02-17 | \- | 0.3.9\<br />0.3.7\<br />0.3.5\<br />0.3.12\<br />0.3.1\<br />0.2.20 | +| OpenBabel | chem | 2018-11-22 | \- | 2.4.1 | +| OpenCV | vis | 2020-01-13 | \- | 4.0.1\<br />3.4.1 | +| OpenFOAM | cae | 2020-09-23 | \- | v2006\<br />v1912\<br />v1806\<br />8\<br />7\<br />6\<br />5.0\<br />4.1\<br />2.3.1 | +| OpenFOAM-Extend | cae | 2020-02-12 | \- | 4.0 | +| OpenMPI | mpi | 2021-02-17 | \- | 4.0.5\<br />4.0.4\<br />4.0.3\<br />4.0.1\<br />3.1.6\<br />3.1.4\<br />3.1.3\<br />3.1.2\<br />3.1.1\<br />2.1.5\<br />2.1.2\<br />1.10.7 | +| OpenMX | phys | 2020-12-08 | \- | 3.9.2 | +| OpenMolcas | chem | 2020-10-12 | \- | 19.11\<br />18.09 | +| OpenPGM | system | 2019-11-05 | \- | 5.2.122 | +| OpenSSL | system | 2019-11-08 | \- | 1.1.1b\<br />1.0.2l | +| PAPI | perf | 2020-10-12 | \- | 6.0.0\<br />5.7.0\<br />5.6.0 | +| PCL | vis | 2019-08-13 | \- | 1.9.1 | +| PCRE | devel | 2021-03-17 | \- | 8.44\<br />8.43\<br />8.41 | +| PCRE2 | devel | 2020-08-11 | \- | 10.34\<br />10.33\<br />10.32 | +| PDFCrop | tools | 2018-11-22 | \- | 0.4b | +| PDT | perf | 2020-10-12 | \- | 3.25.1\<br />3.25 | +| PETSc | numlib | 2020-12-08 | \- | 3.9.4\<br />3.9.3\<br />3.8.3\<br />3.7.7\<br />3.13.3\<br />3.12.4\<br />3.11.0\<br />3.10.5 | +| PFFT | numlib | 2020-10-15 | \- | 1.0.8 | +| PGI | compiler | 2020-02-07 | \- | 19.4\<br />19.10\<br />18.7\<br />18.4\<br />18.10\<br />17.7\<br />17.10 | +| PLUMED | chem | 2020-08-11 | \- | 2.6.0\<br />2.5.1\<br />2.4.0 | +| PLY | lib | 2019-02-26 | \- | 3.11 | +| PMIx | lib | 2021-02-24 | \- | 3.1.5\<br />3.1.1 | +| PROJ | lib | 2020-10-06 | \- | 6.2.1\<br />6.0.0\<br />5.0.0 | +| Pandoc | tools | 2019-10-14 | \- | 2.5 | +| Pango | vis | 2020-09-02 | \- | 1.44.7\<br />1.43.0\<br />1.42.4\<br />1.41.1 | +| ParMETIS | math | 2020-02-12 | \- | 4.0.3 | +| ParMGridGen | math | 2020-02-12 | \- | 1.0 | +| ParaView | vis | 2020-11-12 | \- | 5.9.0-RC1\<br />5.8.0\<br />5.7.0\<br />5.6.2\<br />5.5.2\<br />5.4.1 | +| Perl | lang | 2021-02-17 | \- | 5.32.0\<br />5.30.2\<br />5.30.0\<br />5.28.1\<br />5.28.0\<br />5.26.1 | +| Pillow | vis | 2021-02-17 | \- | 8.0.1\<br />7.0.0\<br />6.2.1\<br />5.0.0 | +| Pillow-SIMD | vis | 2018-11-22 | \- | 5.0.0 | +| PnetCDF | data | 2019-03-06 | \- | 1.9.0 | +| PyQt5 | vis | 2018-11-22 | \- | 5.10.1 | +| PyTorch | devel | 2021-01-04 | \- | 1.6.0\<br />0.3.1 | +| PyYAML | lib | 2020-04-16 | \- | 5.1.2\<br />5.1\<br />3.13\<br />3.12 | +| Python | lang | 2021-02-17 | \- | 3.8.6\<br />3.8.2\<br />3.7.4\<br />3.7.2\<br />3.6.6\<br />3.6.4\<br />2.7.18\<br />2.7.16\<br />2.7.15\<br />2.7.14 | +| Qhull | math | 2019-11-08 | \- | 2019.1\<br />2015.2 | +| Qt | devel | 2018-11-22 | \- | 4.8.7 | +| Qt5 | devel | 2020-08-11 | \- | 5.9.3\<br />5.14.1\<br />5.13.1\<br />5.12.3\<br />5.10.1 | +| QuantumESPRESSO | chem | 2021-01-05 | \- | 6.6\<br />6.5\<br />6.4.1\<br />6.3\<br />6.2 | +| Qwt | lib | 2020-09-24 | \- | 6.1.4 | +| R | lang | 2020-10-06 | \- | 4.0.0\<br />3.6.2\<br />3.6.0\<br />3.5.1\<br />3.4.4 | +| RDFlib | lib | 2020-09-29 | \- | 4.2.2 | +| RELION | bio | 2019-02-27 | \- | 3.0\<br />2.1 | +| ROOT | data | 2019-06-03 | \- | 6.14.06 | +| Ruby | lang | 2019-11-14 | \- | 2.6.3\<br />2.6.1 | +| SCOTCH | math | 2020-12-08 | \- | 6.0.9\<br />6.0.6\<br />6.0.5\<br />6.0.4\<br />5.1.12b | +| SCons | devel | 2020-01-24 | \- | 3.1.1\<br />3.0.5\<br />3.0.1 | +| SHARC | chem | 2019-01-07 | \- | 2.0 | +| SIONlib | lib | 2020-10-12 | \- | 1.7.6\<br />1.7.4\<br />1.7.2 | +| SIP | lang | 2018-11-22 | \- | 4.19.8\<br />4.19.12 | +| SLEPc | numlib | 2020-02-27 | \- | 3.9.2\<br />3.12.2 | +| SPM | math | 2018-11-22 | \- | 12-r7219 | +| SQLite | devel | 2021-02-17 | \- | 3.33.0\<br />3.31.1\<br />3.29.0\<br />3.27.2\<br />3.26.0\<br />3.24.0\<br />3.21.0\<br />3.20.1 | +| STAR-CCM+ | cae | 2021-03-19 | \- | 15.06.008\<br />15.04.010-R8\<br />15.02.007\<br />14.04.011\<br />14.02.012\<br />13.06.012-R8\<br />13.04.011\<br />13.02.013-R8 | +| SUNDIALS | math | 2018-11-22 | \- | 2.7.0 | +| SWASH | phys | 2020-03-18 | \- | 6.01\<br />5.01 | +| SWIG | devel | 2020-10-06 | \- | 4.0.1\<br />3.0.12 | +| ScaFaCoS | math | 2020-08-11 | \- | 1.0.1 | +| ScaLAPACK | numlib | 2021-02-17 | \- | 2.1.0\<br />2.0.2 | +| Scalasca | perf | 2021-02-02 | \- | 2.5\<br />2.4 | +| SciPy-bundle | lang | 2021-02-17 | \- | 2020.11\<br />2020.03\<br />2019.10\<br />2019.03 | +| Score-P | perf | 2020-10-19 | \- | 6.0\<br />4.0 | +| Serf | tools | 2019-11-08 | \- | 1.3.9 | +| Siesta | phys | 2020-02-27 | \- | 4.1-b4\<br />4.1-b3\<br />4.1 | +| Six | lib | 2018-11-22 | \- | 1.11.0 | +| Spark | devel | 2020-10-22 | \- | 3.0.1\<br />2.4.4\<br />2.4.3 | +| Subversion | tools | 2019-11-08 | \- | 1.9.7\<br />1.12.0 | +| SuiteSparse | numlib | 2020-09-02 | \- | 5.7.1\<br />5.6.0\<br />5.4.0\<br />5.1.2 | +| SuperLU | numlib | 2019-01-15 | \- | 5.2.1 | +| SuperLU_DIST | numlib | 2019-01-16 | \- | 6.1.0 | +| SuperLU_MT | numlib | 2019-01-16 | \- | 3.1 | +| Szip | tools | 2021-03-03 | \- | 2.1.1 | +| Tcl | lang | 2021-02-17 | \- | 8.6.9\<br />8.6.8\<br />8.6.7\<br />8.6.10 | +| TensorFlow | lib | 2021-03-15 | \- | 2.4.1\<br />2.3.1\<br />2.1.0\<br />2.0.0\<br />1.8.0\<br />1.15.0\<br />1.10.0 | +| Tk | vis | 2021-02-17 | \- | 8.6.9\<br />8.6.8\<br />8.6.10 | +| Tkinter | lang | 2021-02-17 | \- | 3.8.6\<br />3.8.2\<br />3.7.4\<br />3.7.2\<br />3.6.6\<br />3.6.4\<br />2.7.15\<br />2.7.14 | +| TotalView | debugger | 2020-09-24 | \- | 8.14.1-8 | +| Trilinos | numlib | 2019-02-26 | \- | 12.12.1 | +| UCX | lib | 2021-02-24 | \- | 1.9.0\<br />1.8.0\<br />1.5.1 | +| UDUNITS | phys | 2020-08-28 | \- | 2.2.26 | +| UFL | cae | 2019-02-26 | \- | 2018.1.0 | +| UnZip | tools | 2021-02-17 | \- | 6.0 | +| VASP | phys | 2020-05-19 | \- | 5.4.4 | +| VMD | vis | 2018-11-22 | \- | 1.9.3 | +| VSEARCH | bio | 2018-11-22 | \- | 2.8.4 | +| VTK | vis | 2020-10-06 | \- | 8.2.0\<br />8.1.1\<br />8.1.0\<br />5.10.1 | +| VTune | tools | 2020-04-28 | \- | 2020\<br />2019\<br />2018 | +| Valgrind | debugger | 2019-11-05 | \- | 3.14.0\<br />3.13.0 | +| Vampir | tools | 2021-01-13 | \- | unstable\<br />9.9.0\<br />9.8.0\<br />9.7.1\<br />9.7.0\<br />9.6.1\<br />9.5.0\<br />9.11\<br />9.10.0 | +| Voro++ | math | 2020-08-11 | \- | 0.4.6 | +| WRF | geo | 2018-12-12 | \- | 3.8.1 | +| Wannier90 | chem | 2020-12-14 | \- | 2.1.0\<br />2.0.1.1\<br />1.2 | +| X11 | vis | 2021-02-17 | \- | 20201008\<br />20200222\<br />20190717\<br />20190311\<br />20180604\<br />20180131 | +| XML-Parser | data | 2018-11-22 | \- | 2.44-01 | +| XZ | tools | 2021-02-17 | \- | 5.2.5\<br />5.2.4\<br />5.2.3\<br />5.2.2 | +| YAXT | tools | 2020-10-06 | \- | 0.6.2\<br />0.6.0 | +| Yasm | lang | 2020-08-11 | \- | 1.3.0 | +| ZeroMQ | devel | 2019-11-05 | \- | 4.3.2\<br />4.2.5 | +| Zip | tools | 2021-03-17 | \- | 3.0 | +| ace | lib | 2018-11-22 | \- | 6.5.0 | +| ant | devel | 2020-09-02 | \- | 1.10.7\<br />1.10.1 | +| archspec | tools | 2020-08-11 | \- | 0.1.0 | +| arpack-ng | numlib | 2019-11-08 | \- | 3.7.0\<br />3.6.1\<br />3.5.0 | +| asciidoc | base | 2018-11-22 | \- | 8.6.9 | +| at-spi2-atk | vis | 2020-09-02 | \- | 2.34.1 | +| at-spi2-core | vis | 2020-09-02 | \- | 2.34.0 | +| auto_ml | lang | 2019-10-29 | \- | 2.9.9 | +| basemap | vis | 2019-04-02 | \- | 1.0.7 | +| binutils | tools | 2020-12-11 | \- | 2.35\<br />2.34\<br />2.32\<br />2.31.1\<br />2.30\<br />2.28\<br />2.27\<br />2.26 | +| bzip2 | tools | 2021-02-17 | \- | 1.0.8\<br />1.0.6 | +| cURL | tools | 2021-02-17 | \- | 7.72.0\<br />7.69.1\<br />7.66.0\<br />7.63.0\<br />7.60.0\<br />7.58.0 | +| cairo | vis | 2020-08-28 | \- | 1.16.0\<br />1.14.12 | +| cftime | data | 2019-07-17 | \- | 1.0.1 | +| chrpath | tools | 2018-11-22 | \- | 0.16 | +| ctags | devel | 2018-11-22 | \- | 5.8 | +| cuDNN | numlib | 2021-03-17 | \- | 8.0.4.30\<br />7.6.4.38\<br />7.4.2.24\<br />7.1.4.18\<br />7.1.4\<br />7.0.5 | +| ddt | tools | 2021-04-12 | \- | 20.2.1\<br />20.0.1\<br />18.2.2 | +| dftd3-lib | chem | 2021-02-17 | \- | 0.9 | +| dill | data | 2019-10-29 | \- | 0.3.1.1\<br />0.3.1 | +| double-conversion | lib | 2021-03-17 | \- | 3.1.5\<br />3.1.4 | +| ecCodes | tools | 2020-10-06 | \- | 2.8.2\<br />2.17.0 | +| expat | tools | 2021-02-17 | \- | 2.2.9\<br />2.2.7\<br />2.2.6\<br />2.2.5 | +| flair | vis | 2019-01-25 | \- | 2.3-0 | +| flair-geoviewer | vis | 2019-01-25 | \- | 2.3-0 | +| flatbuffers | devel | 2021-03-17 | \- | 1.12.0 | +| flatbuffers-python | devel | 2021-03-19 | \- | 1.12 | +| flex | lang | 2020-12-11 | \- | 2.6.4\<br />2.6.3\<br />2.6.0\<br />2.5.39 | +| fontconfig | vis | 2021-02-17 | \- | 2.13.92\<br />2.13.1\<br />2.13.0\<br />2.12.6 | +| foss | toolchain | 2021-02-17 | \- | 2020b\<br />2020a\<br />2019b\<br />2019a\<br />2018b\<br />2018a | +| fosscuda | toolchain | 2021-02-17 | \- | 2020b\<br />2020a\<br />2019b\<br />2019a\<br />2018b | +| freeglut | lib | 2019-11-08 | \- | 3.0.0 | +| freetype | vis | 2021-02-17 | \- | 2.9.1\<br />2.9\<br />2.10.3\<br />2.10.1 | +| future | lib | 2018-11-22 | \- | 0.16.0 | +| gc | lib | 2020-08-11 | \- | 7.6.12\<br />7.6.10\<br />7.6.0 | +| gcccuda | toolchain | 2021-02-17 | \- | 2020b\<br />2020a\<br />2019b\<br />2019a\<br />2018b | +| gettext | tools | 2021-02-17 | \- | 0.21\<br />0.20.1\<br />0.19.8.1\<br />0.19.8 | +| gflags | devel | 2020-09-29 | \- | 2.2.2 | +| giflib | lib | 2021-03-17 | \- | 5.2.1 | +| git | tools | 2021-03-17 | \- | 2.28.0\<br />2.23.0\<br />2.21.0\<br />2.19.1\<br />2.18.0\<br />2.16.1 | +| git-cola | tools | 2018-11-22 | \- | 3.2 | +| git-lfs | tools | 2019-06-26 | \- | 2.7.2 | +| glew | devel | 2020-09-23 | \- | 2.1.0 | +| glog | devel | 2020-09-29 | \- | 0.4.0 | +| gmsh | vis | 2019-11-26 | \- | 4.4.1 | +| gnuplot | vis | 2019-11-08 | \- | 5.2.6\<br />5.2.5\<br />5.2.4\<br />5.2.2 | +| golf | toolchain | 2018-11-22 | \- | 2018a | +| gomkl | toolchain | 2019-07-17 | \- | 2019a | +| gompi | toolchain | 2021-02-17 | \- | 2020b\<br />2020a\<br />2019b\<br />2019a\<br />2018b\<br />2018a | +| gompic | toolchain | 2021-02-17 | \- | 2020b\<br />2020a\<br />2019b\<br />2019a\<br />2018b | +| gperf | devel | 2021-02-17 | \- | 3.1 | +| gperftools | tools | 2018-11-22 | \- | 2.7 | +| grib_api | data | 2018-11-22 | \- | 1.27.0 | +| gzip | tools | 2020-08-11 | \- | 1.9\<br />1.8\<br />1.10 | +| h5py | data | 2020-08-11 | \- | 2.9.0\<br />2.8.0\<br />2.7.1\<br />2.10.0 | +| help2man | tools | 2020-12-11 | \- | 1.47.8\<br />1.47.6\<br />1.47.4\<br />1.47.16\<br />1.47.12\<br />1.47.10 | +| hwloc | system | 2021-02-17 | \- | 2.2.0\<br />2.0.3\<br />1.11.8\<br />1.11.12\<br />1.11.11\<br />1.11.10 | +| hypothesis | tools | 2021-02-17 | \- | 5.41.2\<br />4.44.2 | +| icc | compiler | 2019-10-11 | \- | 2019.1.144\<br />2019.0.117\<br />2018.3.222\<br />2018.1.163 | +| iccifort | toolchain | 2020-09-21 | \- | 2020.2.254\<br />2020.1.217\<br />2019.5.281\<br />2019.1.144\<br />2019.0.117\<br />2018.3.222\<br />2018.1.163 | +| ifort | compiler | 2019-10-11 | \- | 2019.1.144\<br />2019.0.117\<br />2018.3.222\<br />2018.1.163 | +| iimpi | toolchain | 2020-08-03 | \- | 2020a\<br />2019b\<br />2019a\<br />2018b\<br />2018a | +| imkl | numlib | 2021-01-05 | \- | 2020.1.217\<br />2019.5.281\<br />2019.1.144\<br />2018.3.222\<br />2018.1.163 | +| impi | mpi | 2020-08-13 | \- | 2019.7.217\<br />2018.5.288\<br />2018.4.274\<br />2018.3.222\<br />2018.1.163 | +| intel | toolchain | 2020-08-03 | \- | 2020a\<br />2019b\<br />2019a\<br />2018b\<br />2018a | +| intltool | devel | 2021-02-17 | \- | 0.51.0 | +| iomkl | toolchain | 2021-01-05 | \- | 2020a\<br />2019a\<br />2018a | +| iompi | toolchain | 2021-01-05 | \- | 2020a\<br />2019a\<br />2018a | +| itac | tools | 2018-11-22 | \- | 2018.3.022 | +| kim-api | chem | 2020-08-11 | \- | 2.1.3 | +| libGLU | vis | 2020-08-11 | \- | 9.0.1\<br />9.0.0 | +| libGridXC | chem | 2020-02-27 | \- | 0.8.5 | +| libPSML | data | 2020-02-27 | \- | 1.1.8 | +| libarchive | tools | 2021-02-17 | \- | 3.4.3 | +| libcerf | math | 2019-11-08 | \- | 1.7\<br />1.5\<br />1.11 | +| libcint | lib | 2019-01-09 | \- | 3.0.14 | +| libdap | lib | 2020-09-29 | \- | 3.20.6 | +| libdrm | lib | 2020-08-11 | \- | 2.4.99\<br />2.4.97\<br />2.4.92\<br />2.4.91\<br />2.4.100 | +| libelf | devel | 2020-12-11 | \- | 0.8.13 | +| libepoxy | lib | 2020-09-02 | \- | 1.5.4 | +| libevent | lib | 2021-02-24 | \- | 2.1.8\<br />2.1.12\<br />2.1.11 | +| libfabric | lib | 2021-02-17 | \- | 1.11.0 | +| libffi | lib | 2021-02-17 | \- | 3.3\<br />3.2.1 | +| libgd | lib | 2019-11-08 | \- | 2.2.5 | +| libgeotiff | lib | 2020-10-06 | \- | 1.5.1\<br />1.4.2 | +| libglvnd | lib | 2020-08-11 | \- | 1.2.0 | +| libharu | lib | 2018-11-22 | \- | 2.3.0 | +| libiconv | lib | 2021-03-03 | \- | 1.16 | +| libjpeg-turbo | lib | 2021-02-17 | \- | 2.0.5\<br />2.0.4\<br />2.0.3\<br />2.0.2\<br />2.0.0\<br />1.5.3\<br />1.5.2 | +| libmatheval | lib | 2020-08-11 | \- | 1.1.11 | +| libpciaccess | system | 2021-02-17 | \- | 0.16\<br />0.14 | +| libpng | lib | 2021-02-17 | \- | 1.6.37\<br />1.6.36\<br />1.6.34\<br />1.6.32 | +| libreadline | lib | 2021-02-17 | \- | 8.0\<br />7.0 | +| libsigc++ | devel | 2020-09-24 | \- | 2.10.1 | +| libsndfile | lib | 2020-08-28 | \- | 1.0.28 | +| libsodium | lib | 2019-11-05 | \- | 1.0.17\<br />1.0.16 | +| libssh2 | tools | 2018-11-22 | \- | 1.8.0 | +| libtirpc | lib | 2020-09-29 | \- | 1.2.6 | +| libtool | lib | 2021-02-17 | \- | 2.4.6 | +| libunistring | lib | 2020-08-11 | \- | 0.9.7\<br />0.9.10 | +| libunwind | lib | 2020-08-11 | \- | 1.3.1\<br />1.2.1 | +| libvdwxc | chem | 2020-01-15 | \- | 0.4.0 | +| libxc | chem | 2021-01-05 | \- | 4.3.4\<br />4.2.3\<br />3.0.1 | +| libxml++ | lib | 2020-09-24 | \- | 2.40.1 | +| libxml2 | lib | 2021-02-17 | \- | 2.9.9\<br />2.9.8\<br />2.9.7\<br />2.9.4\<br />2.9.10 | +| libxslt | lib | 2020-10-27 | \- | 1.1.34\<br />1.1.33\<br />1.1.32 | +| libxsmm | math | 2019-10-14 | \- | 1.8.3\<br />1.10 | +| libyaml | lib | 2020-04-16 | \- | 0.2.2\<br />0.2.1\<br />0.1.7 | +| likwid | tools | 2020-10-14 | \- | 5.0.1 | +| lo2s | perf | 2020-01-27 | \- | 1.3.0\<br />1.2.2\<br />1.1.1\<br />1.0.2\<br />1.0.1 | +| log4cxx | lang | 2020-02-18 | \- | 0.10.0 | +| lpsolve | math | 2018-11-22 | \- | 5.5.2.5 | +| lz4 | lib | 2020-08-11 | \- | 1.9.2\<br />1.9.1 | +| magma | math | 2021-01-04 | \- | 2.5.4\<br />2.3.0 | +| makedepend | devel | 2018-11-22 | \- | 1.0.5 | +| matplotlib | vis | 2021-02-17 | \- | 3.3.3\<br />3.2.1\<br />3.1.1\<br />3.0.3\<br />3.0.0\<br />2.1.2 | +| mkl-dnn | lib | 2018-11-22 | \- | 0.13 | +| molmod | math | 2020-08-11 | \- | 1.4.5 | +| motif | vis | 2018-11-22 | \- | 2.3.8 | +| ncdf4 | math | 2020-10-06 | \- | 1.17 | +| ncurses | devel | 2021-02-17 | \- | 6.2\<br />6.1\<br />6.0 | +| netCDF | data | 2021-03-03 | \- | 4.7.4\<br />4.7.1\<br />4.6.2\<br />4.6.1\<br />4.6.0 | +| netCDF-Fortran | data | 2021-03-04 | \- | 4.5.3\<br />4.5.2\<br />4.4.5\<br />4.4.4 | +| netcdf4-python | data | 2019-07-17 | \- | 1.4.3 | +| nettle | lib | 2020-01-24 | \- | 3.5.1\<br />3.4.1\<br />3.4 | +| nextstrain | bio | 2020-07-20 | \- | 2.0.0.post1 | +| nfft | math | 2018-11-28 | \- | 3.3.2DLR | +| nsync | devel | 2021-03-17 | \- | 1.24.0 | +| numactl | tools | 2021-02-17 | \- | 2.0.13\<br />2.0.12\<br />2.0.11 | +| numba | lang | 2020-09-25 | \- | 0.47.0 | +| numeca | cae | 2018-11-22 | \- | all | +| nvidia-nsight | tools | 2020-01-21 | \- | 2019.3.1 | +| p7zip | tools | 2018-11-22 | \- | 9.38.1 | +| parallel | tools | 2020-02-25 | \- | 20190922\<br />20190622\<br />20180822 | +| patchelf | tools | 2019-08-09 | \- | 0.9 | +| petsc4py | tools | 2019-02-26 | \- | 3.9.1 | +| pigz | tools | 2018-11-22 | \- | 2.4 | +| pixman | vis | 2020-08-28 | \- | 0.38.4\<br />0.38.0\<br />0.34.0 | +| pkg-config | devel | 2021-02-17 | \- | 0.29.2 | +| pkgconfig | devel | 2021-03-17 | \- | 1.5.1\<br />1.3.1 | +| pocl | lib | 2020-02-19 | \- | 1.4 | +| pompi | toolchain | 2018-11-22 | \- | 2018.04 | +| protobuf | devel | 2021-03-17 | \- | 3.6.1.2\<br />3.6.1\<br />3.14.0\<br />3.10.0 | +| protobuf-python | devel | 2021-03-17 | \- | 3.14.0\<br />3.10.0 | +| pybind11 | lib | 2021-02-17 | \- | 2.6.0\<br />2.4.3\<br />2.2.4 | +| pyscf | chem | 2019-11-13 | \- | 1.6.1\<br />1.6.0 | +| pytecplot | data | 2020-04-06 | \- | 1.0.0 | +| pytest | tools | 2019-02-26 | \- | 3.8.0 | +| qrupdate | numlib | 2019-11-08 | \- | 1.1.2 | +| re2c | tools | 2020-08-11 | \- | 1.3\<br />1.2.1 | +| rgdal | geo | 2019-11-27 | \- | 1.4-4 | +| rstudio | lang | 2020-02-18 | \- | 1.2.5001\<br />1.2.1335\<br />1.1.456 | +| scikit-learn | data | 2020-09-24 | \- | 0.21.3 | +| scorep_plugin_fileparser | perf | 2018-11-22 | \- | 1.3.1 | +| sf | lib | 2020-10-06 | \- | 0.9-5 | +| slepc4py | tools | 2019-02-26 | \- | 3.9.0 | +| snakemake | tools | 2020-04-16 | \- | 5.7.1\<br />5.14.0 | +| snappy | lib | 2021-03-17 | \- | 1.1.8\<br />1.1.7 | +| source-highlight | tools | 2018-11-22 | \- | 3.1.8 | +| spacy | lang | 2018-12-05 | \- | 2.0.18 | +| spglib | chem | 2019-12-12 | \- | 1.14.1 | +| tbb | lib | 2020-08-11 | \- | 2020.1\<br />2019-U4\<br />2018-U5 | +| tcsh | tools | 2018-11-22 | \- | 6.20.00 | +| tecplot360ex | vis | 2020-07-20 | \- | 2019r1 | +| texinfo | devel | 2020-08-11 | \- | 6.7\<br />6.6\<br />6.5 | +| tmux | tools | 2021-04-09 | \- | 3.1c\<br />2.3 | +| torchvision | vis | 2018-11-22 | \- | 0.2.1 | +| tqdm | lib | 2020-09-29 | \- | 4.41.1 | +| typing-extensions | devel | 2021-03-19 | \- | 3.7.4.3 | +| utf8proc | lib | 2019-11-08 | \- | 2.3.0 | +| util-linux | tools | 2021-02-17 | \- | 2.36\<br />2.35\<br />2.34\<br />2.33\<br />2.32\<br />2.31.1 | +| wheel | tools | 2018-11-22 | \- | 0.31.1\<br />0.31.0 | +| x264 | vis | 2020-08-11 | \- | 20191217\<br />20190925\<br />20190413\<br />20181203\<br />20180128 | +| x265 | vis | 2020-08-11 | \- | 3.3\<br />3.2\<br />3.0\<br />2.9\<br />2.6 | +| xbitmaps | devel | 2018-11-22 | \- | 1.1.1 | +| xmlf90 | data | 2020-02-27 | \- | 1.5.4 | +| xorg-macros | devel | 2021-02-17 | \- | 1.19.2\<br />1.19.1 | +| xprop | vis | 2019-11-08 | \- | 1.2.4\<br />1.2.3\<br />1.2.2 | +| xproto | devel | 2018-11-22 | \- | 7.0.31 | +| yaff | chem | 2020-08-11 | \- | 1.6.0 | +| zlib | lib | 2020-12-11 | \- | 1.2.8\<br />1.2.11 | +| zsh | tools | 2021-01-06 | \- | 5.8 | +| zstd | lib | 2020-08-11 | \- | 1.4.4 | + +### Classic Environment + +<span class="twiki-macro TABLE">headerrows= 1"</span> <span +class="twiki-macro EDITTABLE" +format="| text, 30, Software | text, 40, Kategorie | text, 30, Letzte Änderung | text, 30, SGI-UV | text, 30, Taurus | " +changerows="on"></span> + +| Software | Category | Last change | Venus | Taurus | +|:-------------------------|:-------------|:------------|:-----------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| AVS-Express | applications | 2015-06-26 | \- | mpepst8.2\<br />8.2 | +| FPLO | applications | 2018-04-13 | \- | 18.00-53\<br />14.00-49 | +| VirtualGL | tools | 2013-10-02 | \- | | +| abaqus | applications | 2018-04-19 | 2017\<br />2016 | *2018* \<br /> 6.9-EF1\<br />6.13\<br />6.12\<br />2017\<br />2016 | +| abinit | applications | 2013-11-21 | 7.2.1 | 7.2.1 | +| ace | lib | 2018-11-22 | \- | 6.3.3 | +| adolc | libraries | 2014-07-24 | *2.5.0* \<br /> 2.4.1 | *2.5.0* \<br /> 2.4.1 | +| afni | applications | 2014-07-01 | \- | 2011-12-21-1014 | +| amber | applications | 2017-10-06 | \- | 15 | +| ansys | applications | 2018-09-04 | 18.0\<br />17.1\<br />16.1 | 19.0\<br />18.2\<br />18.1\<br />18.0\<br />17.2\<br />17.1\<br />17.0\<br />16.1 | +| ansysem | applications | 2017-07-20 | \- | 16.0 | +| asm | tools | 2017-03-17 | \- | 5.2 | +| autoconf | tools | 2013-10-30 | 2.69 | 2.69 | +| automake | tools | 2014-09-11 | 1.14\<br />1.12.2\<br />1.12 | 1.14\<br />1.12.2 | +| autotools | tools | 2017-02-01 | \- | default\<br />2015 | +| bazel | compilers | 2017-07-13 | \- | 0.5.2 | +| bison | libraries | 2015-03-11 | \- | 3.0.4 | +| blcr | tools | 2016-03-02 | \- | | +| boost | libraries | 2019-03-29 | *1.49* \<br /> 1.69.0\<br />1.54\<br />1.51.0 | *1.54.0* \<br /> 1.66.0\<br />1.65.1\<br />1.65.0\<br />1.64.0\<br />1.63.0\<br />1.62.0\<br />1.61.0\<br />1.60.0\<br />1.59.0\<br />1.58.0\<br />1.57.0\<br />1.56.0\<br />1.55.0\<br />1.49 | +| bowtie | applications | 2013-01-16 | 0.12.8 | \- | +| bullxmpi | libraries | 2016-10-14 | \- | *1.2.8.4* \<br /> 1.2.9.2 | +| casita | tools | 2017-06-08 | \- | 1.9 | +| cdo | tools | 2013-04-08 | 1.6.0 | \- | +| ceph | libraries | 2017-01-13 | \- | 11.1 | +| cereal | libraries | 2016-12-07 | \- | 1.2.1 | +| cg | libraries | 2015-09-18 | \- | 3.1 | +| cgal | libraries | 2018-03-08 | \- | 4.11.1 | +| clFFT | libraries | 2017-07-12 | \- | 2.12.2dev\<br />2.12.2\<br />2.12.1\<br />2.12.0\<br />2.10.0 | +| clang | compilers | 2018-08-20 | \- | 4.0.0 | +| cmake | tools | 2019-04-03 | 3.3.1\<br />3.11.4\<br />2.8.2\<br />2.8.12.2\<br />2.8.11 | *3.10.1* \<br /> 3.9.0\<br />3.6.2\<br />3.3.1\<br />2.8.2\<br />2.8.12.2\<br />2.8.11 | +| collectl | tools | 2017-12-05 | 4.2.0\<br />4.1.2\<br />3.6.7\<br />3.5.1 | 4.2.0\<br />4.1.2\<br />3.6.7\<br />3.5.1 | +| comsol | applications | 2018-04-20 | \- | *5.3a* \<br /> 5.3 | +| conn | libraries | 2017-07-12 | \- | 17f | +| cp2k | applications | 2017-12-05 | 2.5 | *5.1* \<br /> r16298\<br />r15503\<br />r14075\<br />r13178\<br />3.0\<br />2.6.2\<br />2.6\<br />2.4\<br />2.3\<br />130326 | +| cpufrequtils | libraries | 2017-02-16 | \- | gcc5.3.0 | +| ctool | libraries | 2013-04-17 | 2.12 | 2.12 | +| cube | tools | 2018-05-31 | *4.3* | *4.3* \<br /> 4.4 | +| cuda | libraries | 2018-06-07 | \- | *9.2.88* \<br /> 9.1.85\<br />9.0.176\<br />8.0.61\<br />8.0.44\<br />7.5.18\<br />7.0.28 | +| curl | libraries | 2013-01-18 | 7.28.1 | \- | +| cusp | libraries | 2014-04-22 | \- | 0.4.0\<br />0.3.1 | +| cython | libraries | 2016-08-26 | \- | 0.24.1\<br />0.24\<br />0.19.2 | +| dalton | applications | 2016-04-07 | \- | 2016.0 | +| darshan | tools | 2017-09-02 | \- | darshan-3.1.4 | +| dash | libraries | 2019-02-14 | \- | dash | +| dataheap | libraries | 2017-01-12 | \- | 1.2\<br />1.1 | +| ddt | tools | 2021-04-12 | 4.2\<br />4.0\<br />3.2.1 | *18.0.1* \<br /> 6.0.5\<br />6.0 | +| dftb+ | applications | 2017-06-26 | \- | mpi\<br />1.3\<br />1.2.2 | +| dmtcp | tools | 2017-10-18 | \- | *2.5.1-ib* \<br /> ib-id | +| doxygen | tools | 2016-01-27 | \- | 1.8.11\<br />1.7.4 | +| dune | libraries | 2014-05-28 | \- | 2.2.1 | +| dyninst | libraries | 2016-02-16 | 8.1.1 | 8.2.1\<br />8.1.1 | +| eigen | libraries | 2017-03-23 | \- | 3.3.3\<br />3.2.0 | +| eirods | tools | 2013-12-11 | \- | 3.1 | +| eman2 | applications | 2017-06-21 | \- | 2.2 | +| ensight | applications | 2015-07-13 | \- | 10.1.5a\<br />10.0 | +| extrae | applications | 2017-01-03 | \- | 3.4.1 | +| fftw | libraries | 2017-03-18 | \- | 3.3.6pl1\<br />3.3.5\<br />3.3.4 | +| firestarter | applications | 2016-04-20 | \- | 1.4 | +| flex | lang | 2020-12-11 | \- | 2.5.39 | +| fme | applications | 2017-04-21 | \- | 2017 | +| freecad | applications | 2014-05-12 | \- | *0.14* \<br /> 0.13 | +| freeglut | lib | 2019-11-08 | 2.8.1 | 2.8.1 | +| freesurfer | applications | 2015-12-04 | \- | *5.3.0* \<br /> 5.1.0 | +| fsl | libraries | 2018-01-17 | \- | 5.0.5\<br />5.0.4\<br />5.0.10 | +| ga | libraries | 2015-06-22 | \- | 5.2 | +| gamess | applications | 2014-12-11 | \- | 2013 | +| gams | applications | 2017-07-10 | \- | 24.8\<br />24.3.3 | +| gaussian | applications | 2017-06-01 | g16\<br />g09d01\<br />g09b01\<br />g09\<br />g03 | *g16* \<br /> g09d01\<br />g09b01\<br />g09 | +| gautomatch | applications | 2017-06-21 | \- | 0.53 | +| gcc | compilers | 2019-04-02 | 8.3.0\<br />7.1.0\<br />6.3.0\<br />5.5.0\<br />4.9.3\<br />4.9.1\<br />4.8.2\<br />4.8.0\<br />4.7.1 | *7.1.0* \<br /> 6.3.0\<br />6.2.0\<br />5.5.0\<br />5.3.0\<br />5.2.0\<br />5.1.0\<br />4.9.3\<br />4.9.1\<br />4.8.2\<br />4.8.0\<br />4.7.1\<br />4.6.2 | +| gcl | compilers | 2017-03-13 | \- | 2.6.12 | +| gcovr | tools | 2015-02-12 | *3.2* | *3.2* | +| gctf | applications | 2017-06-21 | \- | 0.50 | +| gdb | tools | 2017-08-03 | 7.9.1 | 7.5 | +| gdk | tools | 2015-12-14 | \- | 352 | +| geany | tools | 2014-05-12 | \- | 1.24.1 | +| ghc | compilers | 2015-01-16 | 7.6.3 | 7.6.3 | +| git | tools | 2021-03-17 | *1.8.3.1* \<br /> 2.17.1\<br />1.7.7\<br />1.7.4.1\<br />1.7.3.2 | *2.15.1* \<br /> 2.7.3\<br />1.9.0 | +| glib | libraries | 2016-10-27 | \- | 2.50.1\<br />2.44.1 | +| gmap | tools | 2018-07-13 | \- | 2018-07-04 | +| gmock | tools | 2013-10-17 | \- | *1.6.0* | +| gnuplot | vis | 2019-11-08 | 4.6.1\<br />4.4.0 | 4.6.1\<br />4.4.0 | +| gpaw | applications | 2016-02-19 | \- | 0.11.0 | +| gperftools | tools | 2018-11-22 | \- | gperftools-2.6.1.lua | +| gpi2 | libraries | 2018-04-25 | \- | *git* \<br /> 1.3.0\<br />1.2.2\<br />1.1.0 | +| gpi2-mpi | libraries | 2015-03-30 | \- | *1.1.1* | +| gpudevkit | libraries | 2016-07-25 | \- | 352-79 | +| grads | applications | 2014-08-05 | 2.0.2 | \- | +| grid | tools | 2014-12-09 | \- | 2012 | +| gromacs | applications | 2018-01-22 | \- | *5.1.3* \<br /> 5.1.4\<br />5.1.1\<br />5.1\<br />4.6.7\<br />4.5.5\<br />3.3.3 | +| gsl | libraries | 2015-01-28 | \- | 1.16 | +| gulp | applications | 2015-10-09 | \- | 4.3 | +| gurobi | applications | 2017-06-28 | 7.0.2\<br />6.0.4 | 7.0.2\<br />6.0.4 | +| h5utils | tools | 2014-08-18 | \- | 1.12.1 | +| haskell-platform | tools | 2015-01-16 | 2013.2.0.0 | 2013.2.0.0 | +| hdeem | libraries | 2018-01-22 | \- | *deprecated* \<br /> 2.2.20ms\<br />2.2.2\<br />2.2.19ms\<br />2.2.16ms\<br />2.2.15ms\<br />2.2.13ms\<br />2.1.9ms\<br />2.1.5\<br />2.1.4\<br />2.1.10ms | +| hdf5 | libraries | 2018-02-09 | 1.8.14\<br />1.8.10 | hdfview\<br />1.8.19\<br />1.8.18\<br />1.8.16\<br />1.8.15\<br />1.8.14\<br />1.8.10\<br />1.6.5\<br />1.10.1 | +| hip | libraries | 2018-08-27 | \- | git | +| hoomd-blue | applications | 2016-07-29 | \- | 2.0.1 | +| hpc-x | libraries | 2017-04-10 | \- | 1.8.0 | +| hpctoolkit | tools | 2013-05-28 | 5.3.2 | \- | +| hpx | libraries | 2017-09-15 | \- | hpx | +| htop | tools | 2016-11-04 | 1.0.2 | 1.0.2 | +| hwloc | system | 2021-02-17 | \- | 1.11.8\<br />1.11.6 | +| hyperdex | tools | 2015-08-20 | \- | default\<br />1.8.1\<br />1.7.1 | +| hyperopt | libraries | 2018-03-19 | \- | 0.1 | +| imagemagick | applications | 2015-03-18 | \- | 6.9.0 | +| intel | toolchain | 2020-08-03 | *2013* \<br /> 2017.2.174\<br />2016.2.181\<br />2016.1.150\<br />2015.3.187\<br />2015.2.164\<br />2013-sp1\<br />11.1.069 | *2018.1.163* \<br /> 2018.0.128\<br />2017.4.196\<br />2017.2.174\<br />2017.1.132\<br />2017.0.020\<br />2016.2.181\<br />2016.1.150\<br />2015.3.187\<br />2015.2.164\<br />2015.1.133\<br />2013-sp1\<br />2013\<br />12.1\<br />11.1.069 | +| intelmpi | libraries | 2017-11-21 | \- | *2018.1.163* \<br /> 5.1.3.181\<br />5.1.2.150\<br />5.0.3.048\<br />5.0.1.035\<br />2018.0.128\<br />2017.3.196\<br />2017.2.174\<br />2017.1.132\<br />2017.0.098\<br />2013 | +| iotop | tools | 2013-07-16 | \- | 0.5 | +| iotrack | tools | 2013-07-16 | \- | 0.5 | +| java | tools | 2015-11-17 | jre1.6.0-21\<br />jdk1.8.0-66\<br />jdk1.7.0-25\<br />jdk1.7.0-03 | jdk1.8.0-66\<br />jdk1.7.0-25 | +| julia | compilers | 2018-05-15 | \- | *0.6.2* \<br /> 0.4.6\<br />0.4.1 | +| knime | applications | 2017-03-20 | \- | 3.3.1\<br />3.1.0\<br />2.11.3-24\<br />2.11.3 | +| lammps | applications | 2017-08-31 | 2014sep\<br />2014jun\<br />2013feb | *2016jul* \<br /> 2017aug\<br />2016may\<br />2015aug\<br />2014sep\<br />2014feb\<br />2013feb\<br />2013aug | +| lbfgsb | libraries | 2013-08-02 | *3.0* \<br /> 2.1 | *3.0* \<br /> 2.1 | +| libnbc | libraries | 2014-05-28 | \- | 1.1.1 | +| libssh2 | tools | 2018-11-22 | \- | 1.8.0 | +| libsvm | tools | 2015-11-20 | \- | 3.20 | +| libtool | lib | 2021-02-17 | \- | 2.4.2 | +| libunwind | lib | 2020-09-25 | \- | 1.1 | +| libxc | chem | 2021-01-05 | \- | 3.0.0\<br />2.2.2 | +| liggghts | applications | 2014-05-28 | \- | 2.3.8\<br />2.3.2 | +| llview | tools | 2015-01-28 | \- | | +| llvm | compilers | 2018-04-20 | \- | *4.0.0* \<br /> ykt\<br />3.9.1\<br />3.7\<br />3.4\<br />3.3.1 | +| lo2s | perf | 2020-01-27 | \- | 2018-02-13\<br />2017-12-06\<br />2017-08-07 | +| ls-dyna | applications | 2017-12-05 | *9.0.1* \<br /> 971\<br />7.0 | *10.0.0* \<br /> dev-121559\<br />971\<br />9.0.1\<br />8.1\<br />7.1.2\<br />7.1.1\<br />7.0\<br />6.0 | +| ls-dyna-usermat | applications | 2016-08-25 | \- | 9.0.1-s\<br />9.0.1-d\<br />7.1.2-d\<br />7.1.1-s\<br />7.1.1-d | +| ls-prepost | applications | 2016-08-22 | \- | *4.3* | +| lumerical | applications | 2016-06-01 | \- | fdtd-8.11.422 | +| m4 | tools | 2013-10-30 | \- | 1.4.16 | +| m4ri | libraries | 2017-03-27 | \- | 20140914 | +| make | tools | 2018-02-21 | \- | 4.2 | +| map | tools | 2016-11-22 | \- | 6.0.5 | +| mathematica | applications | 2015-10-16 | \- | *10.0* \<br /> 8.0 | +| matlab | applications | 2019-02-26 | deprecated.lua\<br />2017a.lua\<br />2016b.lua\<br />2015b.lua\<br />2014a.lua\<br />2013a.lua\<br />2012a.lua\<br />2010b.lua\<br />2010a.lua | \- | +| maxima | applications | 2017-03-15 | \- | 5.39.0 | +| med | libraries | 2017-09-27 | \- | 3.2.0 | +| meep | applications | 2015-04-23 | \- | 1.3\<br />1.2.1 | +| mercurial | tools | 2014-10-22 | 3.1.2 | 3.1.2 | +| metis | libraries | 2013-12-17 | 5.1.0 | 5.1.0\<br />4.0.3 | +| mkl | libraries | 2017-05-10 | 2013 | 2017\<br />2015\<br />2013 | +| modenv | environment | 2020-03-25 | \- | scs5.lua\<br />ml.lua\<br />hiera.lua\<br />classic.lua | +| mongodb | applications | 2018-03-19 | \- | 3.6.3 | +| motioncor2 | applications | 2017-06-21 | \- | 01-30-2017 | +| mpb | applications | 2014-08-19 | \- | 1.4.2 | +| mpi4py | libraries | 2016-05-02 | 1.3.1 | 2.0.0\<br />1.3.1 | +| mpirt | libraries | 2017-11-21 | \- | *2018.1.163* \<br /> 5.1.3.181\<br />5.1.2.150\<br />5.0.3.048\<br />5.0.1.035\<br />2018.0.128\<br />2017.3.196\<br />2017.2.174\<br />2017.1.132\<br />2017.0.098\<br />2013 | +| mumps | libraries | 2017-05-11 | \- | 5.1.1 | +| must | tools | 2018-02-01 | \- | 1.5.0\<br />1.4.0 | +| mvapich2 | libraries | 2017-05-23 | \- | 2.2 | +| mysql | tools | 2013-12-06 | \- | 6.0.11 | +| namd | applications | 2015-09-08 | \- | *2.10* \<br /> 2.9 | +| nco | tools | 2013-08-01 | 4.3.0 | \- | +| nedit | tools | 2013-04-30 | 5.6\<br />5.5 | 5.6\<br />5.5 | +| netbeans | applications | 2018-03-07 | \- | 8.2 | +| netcdf | libraries | 2018-02-09 | 4.1.3 | 4.6.0\<br />4.4.0\<br />4.3.3.1\<br />4.1.3 | +| netlogo | applications | 2017-08-08 | \- | 6.0.1\<br />5.3.0\<br />5.2.0 | +| nsys | tools | 2018-09-11 | \- | 2018.1.1.36\<br />2018.0.1.173 | +| numeca | cae | 2018-11-22 | \- | all | +| nwchem | applications | 2016-02-15 | 6.3 | *6.6* \<br /> custom\<br />6.5patched\<br />6.5\<br />6.3.r2\<br />6.3 | +| octave | applications | 2018-03-23 | \- | 3.8.1 | +| octopus | applications | 2017-03-07 | \- | 6.0 | +| openbabel | applications | 2014-03-07 | 2.3.2 | 2.3.2 | +| opencl | libraries | 2015-07-13 | \- | 1.2-4.4.0.117 | +| openems | applications | 2017-08-01 | \- | 0.0.35 | +| openfoam | applications | 2020-10-15 | \- | *2.3.0* \<br /> v1712\<br />v1706\<br />5.0\<br />4.0\<br />2.4.0\<br />2.3.1\<br />2.2.2 | +| openmpi | libraries | 2018-02-01 | \- | *1.10.2* \<br /> 3.0.0\<br />2.1.1\<br />2.1.0\<br />1.8.8\<br />1.10.4\<br />1.10.3 | +| opentelemac | applications | 2017-09-29 | \- | v7p2r3\<br />v7p1r1 | +| oprofile | tools | 2013-06-05 | 0.9.8 | \- | +| orca | applications | 2017-07-27 | \- | 4.0.1\<br />4.0.0.2\<br />3.0.3 | +| otf2 | libraries | 2018-02-12 | \- | *2.0* \<br /> 2.1\<br />1.4\<br />1.3 | +| papi | libraries | 2017-11-06 | 5.1.0 | 5.5.1\<br />5.4.3\<br />5.4.1 | +| parallel | tools | 2020-02-25 | \- | 20170222 | +| paraview | applications | 2016-03-03 | \- | *4.1.0* \<br /> 4.0.1 | +| parmetis | libraries | 2018-01-12 | \- | 4.0.3 | +| pasha | applications | 2013-11-14 | 1.0.9 | \- | +| pathscale | compilers | 2016-03-04 | \- | enzo-6.0.858\<br />enzo-6.0.749 | +| pdt | tools | 2015-09-18 | 3.18.1 | 3.18.1 | +| perf | tools | 2016-06-14 | | | +| perl | applications | 2015-01-29 | 5.20.1\<br />5.12.1 | 5.20.1\<br />5.12.1 | +| petsc | libraries | 2018-03-19 | *3.3-p6* \<br /> 3.1-p8 | *3.3-p6* \<br /> 3.8.3-64bit\<br />3.8.3\<br />3.4.4\<br />3.4.3\<br />3.3-p7-64bit\<br />3.3-p7\<br />3.2-p7\<br />3.1-p8-p\<br />3.1-p8 | +| pgi | compilers | 2018-09-07 | 14.9\<br />14.7\<br />14.6\<br />14.3\<br />13.4 | *18.3* \<br /> 17.7\<br />17.4\<br />17.1\<br />16.9\<br />16.5\<br />16.4\<br />16.10\<br />16.1\<br />15.9\<br />14.9 | +| pigz | tools | 2018-11-22 | \- | 2.3.4 | +| prope-env | tools | 2017-05-02 | \- | *.1.0* | +| protobuf | devel | 2021-03-17 | \- | 3.5.0\<br />3.2.0 | +| pycuda | libraries | 2016-10-28 | \- | *2016.1.2* \<br /> 2013.1.1\<br />2012.1 | +| pyslurm | libraries | 2017-11-09 | \- | 16.05.8 | +| python | libraries | 2018-01-17 | 3.6\<br />3.3.0\<br />2.7.5\<br />2.7 | *3.6* \<br /> intelpython3\<br />intelpython2\<br />3.5.2\<br />3.4.3\<br />3.3.0\<br />3.1.2\<br />2.7.6\<br />2.7.5\<br />2.7 | +| q-chem | applications | 2016-12-12 | \- | 4.4 | +| qt | libraries | 2016-10-26 | \- | *4.8.1* \<br /> 5.4.1\<br />4.8.7 | +| quantum_espresso | applications | 2016-09-13 | *5.0.3* \<br /> 5.0.2 | 5.3.0\<br />5.1.2\<br />5.0.3 | +| r | applications | 2016-02-18 | \- | 3.2.1\<br />2.15.3 | +| ramdisk | tools | 2016-07-21 | 1.0 | \- | +| read-nvml-clocks-pci | tools | 2018-02-22 | \- | 1.0 | +| readex | tools | 2018-06-13 | \- | pre\<br />beta-1806\<br />beta-1805\<br />beta-1804\<br />beta\<br />alpha | +| redis | tools | 2016-06-28 | \- | 3.2.1 | +| relion | applications | 2017-06-21 | \- | 2.1 | +| repoclient | applications | 2017-01-18 | \- | 1.4.1 | +| ripgrep | tools | 2017-02-17 | \- | 0.3.2 | +| robinhood | tools | 2017-05-04 | \- | 2.4.3 | +| root | applications | 2015-02-27 | \- | 6.02.05 | +| rstudio | lang | 2020-02-18 | \- | 0.98.1103 | +| ruby | tools | 2014-07-21 | \- | 2.1.2 | +| samrai | libraries | 2016-04-22 | \- | 3.10.0 | +| samtools | tools | 2018-07-13 | 0.1.18 | 1.8 | +| scafes | libraries | 2017-05-05 | *2.3.0* \<br /> 2.2.0\<br />2.0.0\<br />1.0.0 | *2.3.0* \<br /> 2.2.0\<br />2.1.0\<br />2.0.1\<br />2.0.0\<br />1.0.0 | +| scala | compilers | 2015-06-22 | \- | 2.11.4\<br />2.10.4 | +| scalapack | libraries | 2013-11-21 | 2.0.2 | \- | +| scalasca | tools | 2018-02-06 | \- | *2.3.1* | +| scons | tools | 2015-11-19 | 2.3.4 | 2.4.1\<br />2.3.4 | +| scorep | tools | 2018-10-01 | 1.3.0 | *3.0* \<br /> try\<br />trunk\<br />ompt\<br />java\<br />dev-io\<br />3.1 | +| scorep-apapi | libraries | 2018-01-09 | \- | gcc-2018-01-09 | +| scorep-cpu-energy | libraries | 2017-01-20 | \- | r217\<br />r211\<br />r117\<br />2017-01-20\<br />2016-04-07 | +| scorep-cpu-id | libraries | 2014-08-13 | \- | r117 | +| scorep-dataheap | libraries | 2015-07-28 | \- | *2015-07-28* \<br /> r191\<br />r122 | +| scorep-dev | tools | 2017-07-19 | \- | *05* | +| scorep-hdeem | libraries | 2018-06-19 | \- | *2016-12-20* \<br /> sync\<br />2017-12-08a\<br />2017-12-08.lua\<br />2016-11-21 | +| scorep-plugin-x86-energy | libraries | 2018-06-19 | \- | xmpi\<br />intelmpi\<br />2017-09-06\<br />2017-09-05 | +| scorep-printmetrics | libraries | 2018-02-26 | \- | 2018-02-26 | +| scorep-uncore | libraries | 2018-06-21 | \- | *2018-01-24* \<br /> 2016-03-29 | +| scorep_plugin_x86_energy | libraries | 2018-07-04 | \- | intel-2018\<br />gcc-7.1.0\<br />2017-07-14 | +| scout | compilers | 2015-06-22 | 1.6.0 | 1.6.0 | +| sed | tools | 2018-02-21 | \- | 4.4 | +| sftp | tools | 2014-04-10 | \- | 6.6 | +| shifter | tools | 2016-06-09 | \- | 16.04.0pre1 | +| siesta | applications | 2017-05-29 | 3.1-pl20 | 4.0\<br />3.2-pl4 | +| singularity | tools | 2019-02-13 | | ff69c5f3 | +| sionlib | tools | 2017-06-29 | \- | 1.6.1\<br />1.5.5 | +| siox | libraries | 2016-10-27 | \- | 2016-10-27\<br />2016-10-26 | +| spm | applications | 2014-07-09 | \- | 8-r4667 | +| spm12 | libraries | 2017-07-12 | \- | r6906 | +| spparks | applications | 2016-06-30 | \- | 2016feb | +| sqlite3 | libraries | 2016-07-27 | 3.8.2 | 3.8.10 | +| sra-tools | tools | 2018-07-19 | \- | 2.9.1 | +| stack | tools | 2016-06-23 | \- | 1.1.2 | +| star | applications | 2017-10-25 | \- | *12.06* \<br /> 9.06\<br />12.04\<br />12.02\<br />11.02\<br />10.04 | +| subread | tools | 2018-07-13 | \- | 1.6.2 | +| suitesparse | libraries | 2017-08-25 | \- | *4.5.4* \<br /> 4.2.1 | +| superlu | libraries | 2017-08-25 | \- | *5.2.1* | +| superlu_dist | libraries | 2017-03-21 | \- | 5.1.3 | +| superlu_mt | libraries | 2017-03-21 | \- | 3.1 | +| svn | tools | 2016-03-16 | 1.8.11\<br />1.7.3 | *1.9.3* \<br /> 1.8.8\<br />1.8.11 | +| swig | tools | 2016-04-08 | 2.0 | 3.0.8\<br />2.0 | +| swipl | tools | 2017-02-28 | \- | 7.4.0-rc2 | +| tcl | applications | 2017-08-07 | \- | 8.6.6 | +| tcltk | applications | 2015-02-06 | \- | 8.4.20 | +| tecplot360 | applications | 2018-05-22 | 2015\<br />2013 | 2018r1\<br />2017r2\<br />2017r1\<br />2016r2\<br />2015r2\<br />2015\<br />2013\<br />2010 | +| tesseract | libraries | 2016-06-28 | \- | 3.04 | +| texinfo | devel | 2020-08-11 | \- | 5.2 | +| theodore | libraries | 2016-05-27 | \- | 1.3 | +| tiff | libraries | 2013-09-16 | 3.9.2 | 3.9.2 | +| tinker | applications | 2014-03-04 | \- | 6.3 | +| tmux | tools | 2021-04-09 | \- | 2.2 | +| totalview | tools | 2017-11-03 | *2017.2.11* \<br /> 8.9.2-0\<br />8.8.0-1\<br />8.13.0-0\<br />8.11.0-3 | *2017.2.11* \<br /> 8.9.2-0\<br />8.8.0-1\<br />8.13.0-0\<br />8.11.0-3 | +| trilinos | applications | 2016-04-13 | \- | 12.6.1 | +| trinityrnaseq | applications | 2013-05-16 | r2013-02-25 | \- | +| turbomole | applications | 2017-01-10 | 7.1\<br />6.6\<br />6.5 | 7.1\<br />6.6\<br />6.5 | +| valgrind | tools | 2017-08-08 | 3.8.1 | *3.10.1* \<br /> r15216\<br />3.8.1\<br />3.13.0 | +| vampir | tools | 2019-03-04 | *9.6.1* \<br /> 9.7\<br />9.5.0\<br />9.4.0\<br />9.3.0\<br />8.5.0 | *9.5.0* \<br /> 9.4.0\<br />9.3.0\<br />9.3\<br />9.2.0\<br />9.1.0\<br />9.0.0\<br />8.5.0\<br />8.4.1\<br />8.3.0 | +| vampirlive | tools | 2018-02-27 | \- | | +| vampirtrace | tools | 2016-03-29 | *5.14.4* | *5.14.4* | +| vampirtrace-plugins | libraries | 2014-08-06 | \- | x86\<br />power-1.1\<br />power-1.0\<br />apapi | +| vasp | applications | 2017-11-08 | *5.3* \<br /> 5.2 | 5.4.4\<br />5.4.1\<br />5.3 | +| visit | applications | 2016-12-13 | \- | 2.4.2\<br />2.12.0 | +| vmd | applications | 2016-12-15 | \- | 1.9.3 | +| vt_dataheap | libraries | 2014-08-14 | \- | r190 | +| vtk | libraries | 2016-08-16 | 5.10.1 | 5.10.1 | +| wannier90 | libraries | 2016-04-21 | 1.2 | 2.0.1\<br />1.2 | +| wget | tools | 2015-05-12 | 1.16.3 | 1.16.3 | +| wxwidgets | libraries | 2017-03-15 | \- | 3.0.2 | +| yade | applications | 2014-05-22 | \- | | +| zlib | lib | 2020-12-11 | \- | 1.2.8 | + +### ML Environment + +<span class="twiki-macro TABLE">headerrows= 1"</span> <span +class="twiki-macro EDITTABLE" +format="| text, 30, Software | text, 40, Kategorie | text, 30, Letzte Änderung | text, 30, SGI-UV | text, 30, Taurus | " +changerows="on"></span> + +| Software | Category | Last change | Venus | Taurus | +|:--------------------------|:----------|:------------|:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| ASE | chem | 2020-09-29 | \- | 3.19.0 | +| ATK | vis | 2020-10-12 | \- | 2.34.1\<br />2.28.1 | +| Anaconda3 | lang | 2019-11-13 | \- | 2019.07\<br />2019.03 | +| Arrow | data | 2020-11-30 | \- | 0.16.0\<br />0.14.1 | +| Autoconf | devel | 2020-10-14 | \- | 2.69 | +| Automake | devel | 2020-10-14 | \- | 1.16.1\<br />1.15.1 | +| Autotools | devel | 2020-10-14 | \- | 20180311\<br />20170619 | +| Bazel | devel | 2021-03-22 | \- | 3.7.1\<br />3.4.1\<br />2.0.0\<br />1.1.0\<br />0.29.1\<br />0.26.1\<br />0.25.2\<br />0.20.0\<br />0.18.0 | +| BigDataFrameworkConfigure | devel | 2019-09-16 | \- | 0.0.2\<br />0.0.1 | +| Bison | lang | 2020-10-14 | \- | 3.5.3\<br />3.3.2\<br />3.0.5\<br />3.0.4 | +| Boost | devel | 2020-11-30 | \- | 1.71.0\<br />1.70.0\<br />1.69.0\<br />1.67.0\<br />1.66.0 | +| CMake | devel | 2020-11-02 | \- | 3.9.5\<br />3.9.1\<br />3.16.4\<br />3.15.3\<br />3.13.3\<br />3.12.1\<br />3.11.4\<br />3.10.2 | +| CUDA | system | 2020-10-14 | \- | 9.2.88\<br />11.0.2\<br />10.1.243\<br />10.1.105 | +| CUDAcore | system | 2020-10-14 | \- | 11.0.2 | +| Check | lib | 2020-10-14 | \- | 0.15.2 | +| Clang | compiler | 2020-02-20 | \- | 9.0.1 | +| CubeLib | perf | 2020-07-23 | \- | 4.4.4\<br />4.4 | +| CubeW | perf | 2019-02-04 | \- | 4.4 | +| CubeWriter | perf | 2020-07-23 | \- | 4.4.3 | +| DBus | devel | 2019-09-11 | \- | 1.13.8 | +| Devel-NYTProf | perf | 2019-08-30 | \- | 6.06 | +| Doxygen | devel | 2019-07-17 | \- | 1.8.14\<br />1.8.13 | +| EasyBuild | tools | 2021-03-08 | \- | 4.3.3\<br />4.3.2\<br />4.3.1\<br />4.3.0\<br />4.2.2\<br />4.2.0\<br />4.1.2\<br />4.1.1\<br />4.1.0\<br />4.0.1\<br />3.9.4\<br />3.9.3\<br />3.9.2\<br />3.9.1\<br />3.8.1\<br />3.8.0\<br />3.7.1\<br />3.7.0\<br />3.6.2\<br />3.6.1 | +| Eigen | math | 2020-11-02 | \- | 3.3.7 | +| FFTW | numlib | 2020-10-14 | \- | 3.3.8\<br />3.3.7\<br />3.3.6 | +| FFmpeg | vis | 2020-09-29 | \- | 4.2.1\<br />4.1 | +| Flink | devel | 2019-09-16 | \- | 1.9.0\<br />1.8.1 | +| FriBidi | lang | 2020-09-29 | \- | 1.0.5 | +| GCC | compiler | 2020-10-14 | \- | 9.3.0\<br />8.3.0\<br />8.2.0-2.31.1\<br />7.3.0-2.30\<br />6.4.0-2.28 | +| GCCcore | compiler | 2020-10-14 | \- | 9.3.0\<br />8.3.0\<br />8.2.0\<br />7.3.0\<br />6.4.0 | +| GDAL | data | 2019-08-19 | \- | 2.2.3 | +| GDRCopy | lib | 2020-10-14 | \- | 2.1 | +| GEOS | math | 2019-08-19 | \- | 3.6.2 | +| GLib | vis | 2020-02-19 | \- | 2.62.0\<br />2.60.1\<br />2.54.3 | +| GMP | math | 2020-11-02 | \- | 6.2.0\<br />6.1.2 | +| GObject-Introspection | devel | 2020-10-12 | \- | 1.63.1\<br />1.54.1 | +| GSL | numlib | 2020-02-19 | \- | 2.6\<br />2.5 | +| GTK+ | vis | 2019-02-15 | \- | 2.24.32 | +| Gdk-Pixbuf | vis | 2019-02-15 | \- | 2.36.12 | +| Ghostscript | tools | 2020-02-19 | \- | 9.50\<br />9.27 | +| HDF5 | data | 2020-02-19 | \- | 1.10.5\<br />1.10.2\<br />1.10.1 | +| Hadoop | devel | 2019-09-16 | \- | 2.7.7 | +| HarfBuzz | vis | 2019-02-15 | \- | 2.2.0 | +| Horovod | tools | 2020-08-04 | \- | 0.19.5\<br />0.18.2 | +| Hyperopt | lib | 2020-02-19 | \- | 0.2.2 | +| ICU | lib | 2020-02-19 | \- | 64.2\<br />61.1\<br />56.1 | +| ImageMagick | vis | 2020-02-20 | \- | 7.0.9-5\<br />7.0.8-46 | +| JUnit | devel | 2020-01-21 | \- | 4.12 | +| JasPer | vis | 2020-02-19 | \- | 2.0.14 | +| Java | lang | 2020-02-26 | \- | 11.0.6\<br />1.8.0-162\<br />1.8-191-b26 | +| JsonCpp | lib | 2020-10-30 | \- | 1.9.3 | +| Keras | math | 2019-06-28 | \- | 2.2.4 | +| LAME | data | 2020-06-24 | \- | 3.100 | +| LLVM | compiler | 2020-09-24 | \- | 9.0.0\<br />8.0.1\<br />7.0.1\<br />6.0.0\<br />5.0.1 | +| LMDB | lib | 2020-10-30 | \- | 0.9.24 | +| LibTIFF | lib | 2020-02-19 | \- | 4.0.9\<br />4.0.10 | +| LibUUID | lib | 2019-08-02 | \- | 1.0.3 | +| LittleCMS | vis | 2020-02-20 | \- | 2.9 | +| M4 | devel | 2020-10-14 | \- | 1.4.18\<br />1.4.17 | +| MPFR | math | 2020-06-24 | \- | 4.0.2 | +| Mako | devel | 2020-02-19 | \- | 1.1.0\<br />1.0.8\<br />1.0.7 | +| Mesa | vis | 2020-02-19 | \- | 19.1.7\<br />19.0.1\<br />18.1.1\<br />17.3.6 | +| Meson | tools | 2020-11-03 | \- | 0.55.1\<br />0.51.2\<br />0.50.0 | +| MongoDB | data | 2019-08-05 | \- | 4.0.3 | +| NASM | lang | 2020-02-19 | \- | 2.14.02\<br />2.13.03 | +| NCCL | lib | 2020-02-18 | \- | 2.4.8\<br />2.4.2\<br />2.3.7 | +| NLopt | numlib | 2020-02-19 | \- | 2.6.1\<br />2.4.2 | +| NSPR | lib | 2019-09-11 | \- | 4.21 | +| NSS | lib | 2019-09-11 | \- | 3.42.1 | +| Ninja | tools | 2020-11-03 | \- | 1.9.0\<br />1.10.0 | +| OPARI2 | perf | 2020-07-23 | \- | 2.0.5\<br />2.0.3 | +| OTF2 | perf | 2020-07-23 | \- | 2.2\<br />2.1.1 | +| OpenBLAS | numlib | 2020-10-14 | \- | 0.3.9\<br />0.3.7\<br />0.3.5\<br />0.3.1\<br />0.2.20 | +| OpenCV | vis | 2019-02-21 | \- | 4.0.1 | +| OpenMPI | mpi | 2021-02-10 | \- | 4.0.3\<br />3.1.4\<br />3.1.3\<br />3.1.1 | +| OpenPGM | system | 2019-09-11 | \- | 5.2.122 | +| PAPI | perf | 2020-07-23 | \- | 6.0.0\<br />5.6.0 | +| PCRE | devel | 2020-02-19 | \- | 8.43\<br />8.41 | +| PCRE2 | devel | 2019-09-11 | \- | 10.33 | +| PDT | perf | 2020-07-23 | \- | 3.25 | +| PGI | compiler | 2019-05-14 | \- | 19.4 | +| PMIx | lib | 2020-10-14 | \- | 3.1.5 | +| PROJ | lib | 2019-08-19 | \- | 5.0.0 | +| Pango | vis | 2019-02-15 | \- | 1.42.4 | +| Perl | lang | 2020-10-14 | \- | 5.30.2\<br />5.30.0\<br />5.28.1\<br />5.28.0\<br />5.26.1 | +| Pillow | vis | 2020-06-24 | \- | 6.2.1 | +| PowerAI | data | 2019-12-10 | \- | 1.7.0.a0\<br />1.6.1 | +| PyTorch | devel | 2020-09-29 | \- | 1.6.0\<br />1.3.1\<br />1.1.0 | +| PyTorch-Geometric | devel | 2020-09-29 | \- | 1.6.1 | +| PyYAML | lib | 2020-02-18 | \- | 5.1.2\<br />3.13 | +| Python | lang | 2020-11-02 | \- | 3.8.2\<br />3.7.4\<br />3.7.2\<br />3.6.6\<br />3.6.4\<br />2.7.16\<br />2.7.15\<br />2.7.14 | +| PythonAnaconda | lang | 2019-12-10 | \- | 3.7\<br />3.6 | +| Qt5 | devel | 2019-09-12 | \- | 5.12.3 | +| R | lang | 2020-08-20 | \- | 3.6.2\<br />3.6.0\<br />3.4.4 | +| RDFlib | lib | 2020-09-29 | \- | 4.2.2 | +| SCons | devel | 2019-09-11 | \- | 3.0.5 | +| SIONlib | lib | 2020-07-23 | \- | 1.7.6 | +| SQLite | devel | 2020-11-02 | \- | 3.31.1\<br />3.29.0\<br />3.27.2\<br />3.24.0\<br />3.21.0\<br />3.20.1 | +| SWIG | devel | 2020-10-30 | \- | 4.0.1\<br />3.0.12 | +| ScaLAPACK | numlib | 2020-10-14 | \- | 2.1.0\<br />2.0.2 | +| SciPy-bundle | lang | 2020-11-02 | \- | 2020.03\<br />2019.10 | +| Score-P | perf | 2020-07-23 | \- | 6.0\<br />4.1 | +| Six | lib | 2019-02-05 | \- | 1.11.0 | +| Spark | devel | 2020-09-29 | \- | 3.0.1\<br />2.4.4\<br />2.4.3 | +| SpectrumMPI | mpi | 2019-01-14 | \- | system | +| Szip | tools | 2020-02-18 | \- | 2.1.1 | +| Tcl | lang | 2020-11-02 | \- | 8.6.9\<br />8.6.8\<br />8.6.7\<br />8.6.10 | +| TensorFlow | lib | 2020-10-30 | \- | 2.3.1\<br />2.2.0\<br />2.1.0\<br />2.0.0\<br />1.15.0\<br />1.14.0 | +| Tk | vis | 2020-02-19 | \- | 8.6.9\<br />8.6.8 | +| Tkinter | lang | 2020-09-23 | \- | 3.7.4\<br />3.6.6 | +| UCX | lib | 2020-10-14 | \- | 1.8.0 | +| UDUNITS | phys | 2020-02-19 | \- | 2.2.26 | +| UnZip | tools | 2020-11-02 | \- | 6.0 | +| Vampir | perf | 2020-11-30 | \- | 9.9.0\<br />9.8.0\<br />9.7.1\<br />9.11\<br />9.10.0 | +| X11 | vis | 2020-11-03 | \- | 20200222\<br />20190717\<br />20190311\<br />20180604\<br />20180131 | +| XML-Parser | data | 2019-02-15 | \- | 2.44-01 | +| XZ | tools | 2020-10-14 | \- | 5.2.5\<br />5.2.4\<br />5.2.3 | +| Yasm | lang | 2020-09-29 | \- | 1.3.0 | +| ZeroMQ | devel | 2019-09-11 | \- | 4.3.2 | +| Zip | tools | 2020-07-30 | \- | 3.0 | +| ant | devel | 2020-10-12 | \- | 1.10.7\<br />1.10.1 | +| binutils | tools | 2020-10-14 | \- | 2.34\<br />2.32\<br />2.31.1\<br />2.30\<br />2.28 | +| bokeh | tools | 2020-09-29 | \- | 1.4.0 | +| bzip2 | tools | 2020-11-02 | \- | 1.0.8\<br />1.0.6 | +| cURL | tools | 2020-11-02 | \- | 7.69.1\<br />7.66.0\<br />7.63.0\<br />7.60.0\<br />7.58.0 | +| cairo | vis | 2020-02-19 | \- | 1.16.0\<br />1.14.12 | +| cftime | data | 2019-07-17 | \- | 1.0.1 | +| cuDNN | numlib | 2020-02-18 | \- | 7.6.4.38\<br />7.4.2.24\<br />7.1.4.18 | +| dask | data | 2020-09-29 | \- | 2.8.0 | +| dill | data | 2019-10-29 | \- | 0.3.1.1 | +| double-conversion | lib | 2020-08-13 | \- | 3.1.4 | +| expat | tools | 2020-10-14 | \- | 2.2.9\<br />2.2.7\<br />2.2.6\<br />2.2.5 | +| flatbuffers | devel | 2020-10-30 | \- | 1.12.0 | +| flatbuffers-python | devel | 2021-04-10 | \- | 1.12 | +| flex | lang | 2020-10-14 | \- | 2.6.4\<br />2.6.3 | +| fontconfig | vis | 2020-11-03 | \- | 2.13.92\<br />2.13.1\<br />2.13.0\<br />2.12.6 | +| fosscuda | toolchain | 2020-10-14 | \- | 2020a\<br />2019b\<br />2019a\<br />2018b | +| freetype | vis | 2020-11-03 | \- | 2.9.1\<br />2.9\<br />2.10.1 | +| future | lib | 2019-02-05 | \- | 0.16.0 | +| gcccuda | toolchain | 2020-10-14 | \- | 2020a\<br />2019b\<br />2019a\<br />2018b | +| gettext | tools | 2020-11-03 | \- | 0.20.1\<br />0.19.8.1 | +| gflags | devel | 2020-06-24 | \- | 2.2.2 | +| giflib | lib | 2020-10-30 | \- | 5.2.1 | +| git | tools | 2020-02-18 | \- | 2.23.0\<br />2.18.0 | +| glog | devel | 2020-06-24 | \- | 0.4.0 | +| golf | toolchain | 2019-01-14 | \- | 2018a | +| gompic | toolchain | 2020-10-14 | \- | 2020a\<br />2019b\<br />2019a\<br />2018b | +| gperf | devel | 2020-11-03 | \- | 3.1 | +| gsmpi | toolchain | 2019-01-14 | \- | 2018a | +| gsolf | toolchain | 2019-01-14 | \- | 2018a | +| h5py | data | 2020-07-30 | \- | 2.8.0\<br />2.10.0 | +| help2man | tools | 2020-10-14 | \- | 1.47.8\<br />1.47.6\<br />1.47.4\<br />1.47.12 | +| hwloc | system | 2020-10-14 | \- | 2.2.0\<br />2.0.3\<br />1.11.12\<br />1.11.11\<br />1.11.10 | +| hypothesis | tools | 2020-06-24 | \- | 4.44.2 | +| intltool | devel | 2020-11-03 | \- | 0.51.0 | +| libGLU | vis | 2020-02-19 | \- | 9.0.1\<br />9.0.0 | +| libdrm | lib | 2020-02-19 | \- | 2.4.99\<br />2.4.97\<br />2.4.92\<br />2.4.91 | +| libevent | lib | 2020-10-14 | \- | 2.1.8\<br />2.1.11 | +| libfabric | lib | 2021-02-10 | \- | 1.11.0 | +| libffi | lib | 2020-11-02 | \- | 3.3\<br />3.2.1 | +| libgeotiff | lib | 2019-08-19 | \- | 1.4.2 | +| libjpeg-turbo | lib | 2020-02-19 | \- | 2.0.3\<br />2.0.2\<br />2.0.0\<br />1.5.3 | +| libpciaccess | system | 2020-10-14 | \- | 0.16\<br />0.14 | +| libpng | lib | 2020-11-03 | \- | 1.6.37\<br />1.6.36\<br />1.6.34 | +| libreadline | lib | 2020-10-14 | \- | 8.0\<br />7.0 | +| libsndfile | lib | 2020-02-19 | \- | 1.0.28 | +| libsodium | lib | 2019-09-11 | \- | 1.0.17 | +| libtool | lib | 2020-10-14 | \- | 2.4.6 | +| libunwind | lib | 2020-09-25 | \- | 1.3.1\<br />1.2.1 | +| libxml2 | lib | 2020-10-14 | \- | 2.9.9\<br />2.9.8\<br />2.9.7\<br />2.9.4\<br />2.9.10 | +| libxslt | lib | 2020-10-27 | \- | 1.1.34\<br />1.1.33\<br />1.1.32 | +| libyaml | lib | 2020-01-24 | \- | 0.2.2\<br />0.2.1 | +| magma | math | 2020-09-29 | \- | 2.5.1 | +| matplotlib | vis | 2020-09-23 | \- | 3.1.1\<br />3.0.3 | +| ncurses | devel | 2020-10-14 | \- | 6.2\<br />6.1\<br />6.0 | +| netCDF | data | 2019-07-17 | \- | 4.6.1\<br />4.6.0 | +| netcdf4-python | data | 2019-07-17 | \- | 1.4.3 | +| nettle | lib | 2020-02-19 | \- | 3.5.1\<br />3.4.1\<br />3.4 | +| nsync | devel | 2020-10-30 | \- | 1.24.0 | +| numactl | tools | 2020-10-14 | \- | 2.0.13\<br />2.0.12\<br />2.0.11 | +| numba | lang | 2020-09-25 | \- | 0.47.0 | +| pixman | vis | 2020-02-19 | \- | 0.38.4\<br />0.38.0\<br />0.34.0 | +| pkg-config | devel | 2020-10-14 | \- | 0.29.2 | +| pkgconfig | devel | 2020-02-18 | \- | 1.5.1\<br />1.3.1 | +| pocl | lib | 2020-04-22 | \- | 1.4 | +| protobuf | devel | 2020-01-24 | \- | 3.6.1.2\<br />3.6.1\<br />3.10.0 | +| protobuf-python | devel | 2020-10-30 | \- | 3.10.0 | +| pybind11 | lib | 2020-11-02 | \- | 2.4.3 | +| re2c | tools | 2019-09-11 | \- | 1.1.1 | +| rstudio | lang | 2020-01-21 | \- | 1.2.5001 | +| scikit-image | vis | 2020-09-29 | \- | 0.16.2 | +| scikit-learn | data | 2020-09-24 | \- | 0.21.3 | +| snappy | lib | 2020-10-30 | \- | 1.1.7 | +| spleeter | tools | 2020-10-05 | \- | 1.5.4 | +| torchvision | vis | 2021-03-11 | \- | 0.7.0 | +| tqdm | lib | 2020-09-29 | \- | 4.41.1 | +| typing-extensions | devel | 2021-04-10 | \- | 3.7.4.3 | +| util-linux | tools | 2020-11-03 | \- | 2.35\<br />2.34\<br />2.33\<br />2.32.1\<br />2.32\<br />2.31.1 | +| wheel | tools | 2019-01-30 | \- | 0.31.1 | +| x264 | vis | 2020-06-24 | \- | 20190925\<br />20181203 | +| x265 | vis | 2020-09-29 | \- | 3.2\<br />3.0 | +| xorg-macros | devel | 2020-10-14 | \- | 1.19.2 | +| zlib | lib | 2020-10-14 | \- | 1.2.11 | +| zsh | tools | 2021-01-06 | \- | 5.8 | diff --git a/twiki2md/root/Applications/VirtualDesktops.md b/twiki2md/root/Applications/VirtualDesktops.md new file mode 100644 index 000000000..c9b7f89b0 --- /dev/null +++ b/twiki2md/root/Applications/VirtualDesktops.md @@ -0,0 +1,89 @@ +# Virtual desktops + +Use WebVNC or NICE DCV to run GUI applications on HPC resources. + +<span class="twiki-macro TABLE" columnwidths="10%,45%,45%"></span> + +| | | | +|----------------|-------------------------------------------------------|-------------------------------------------------| +| | **WebVNC** | **NICE DCV** | +| **use case** | all GUI applications that do \<u>not need\</u> OpenGL | only GUI applications that \<u>need\</u> OpenGL | +| **partitions** | all\* (except partitions with GPUs (gpu2, hpdlf, ml) | dcv | + +## Launch a virtual desktop + +<span class="twiki-macro TABLE" columnwidths="10%,45%,45%"></span> \| +**step 1** \| Navigate to \<a href="<https://taurus.hrsk.tu-dresden.de>" +target="\_blank"><https://taurus.hrsk.tu-dresden.de>\</a>. There is our +[JupyterHub](Compendium.JupyterHub) instance. \|\| \| **step 2** \| +Click on the "advanced" tab and choose a preset: \|\| + +| | | | +|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------| +| ^ | **WebVNC** | **DCV** | +| **step 3** | Optional: Finetune your session with the available SLURM job parameters or assign a certain project or reservation. Then save your settings in a new preset for future use. | | +| **step 4** | Click on "Spawn". JupyterHub starts now a SLURM job for you. If everything is ready the JupyterLab interface will appear to you. | | +| **step 5"** | Click on the **button "WebVNC"** to start a virtual desktop. | Click on the \*button "NICE DCV"to start a virtual desktop. | +| ^ | The virtual desktop starts in a new tab or window. | | + +### Demonstration + +\<video controls="" width="320" style="border: 1px solid black">\<source +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/VirtualDesktops/start-virtual-desktop-dcv.mp4>" +type="video/mp4">\<source +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/VirtualDesktops/start-virtual-desktop-dcv.webm>" +type="video/webm">\</video> + +### Using the quickstart feature + +JupyterHub can start a job automatically if the URL contains certain +parameters. + +<span class="twiki-macro TABLE" columnwidths="10%,45%,45%"></span> + +| | | | +|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| examples | \<a href="<https://taurus.hrsk.tu-dresden.de/jupyter/hub/spawn#/>\~(partition\~'interactive\~cpuspertask\~'2\~mempercpu\~'2583)" target="\_blank" style="font-size: 1.5em">WebVNC\</a> | \<a href="<https://taurus.hrsk.tu-dresden.de/jupyter/hub/spawn#/>\~(partition\~'dcv\~cpuspertask\~'6\~gres\~'gpu\*3a1\~mempercpu\~'2583)" target="\_blank" style="font-size: 1.5em">NICE DCV\</a> | +| details about the examples | `interactive` partition, 2 CPUs with 2583 MB RAM per core, no GPU | `dcv` partition, 6 CPUs with 2583 MB RAM per core, 1 GPU | +| link creator | Use the spawn form to set your preferred options. The browser URL will be updated with the corresponding parameters. | | + +If you close the browser tabs or windows or log out from your local +machine, you are able to open the virtual desktop later again - as long +as the session runs. But please remember that a SLURM job is running in +the background which has a certain timelimit. + +## Reconnecting to a session + +In order to reconnect to an active instance of WebVNC, simply repeat the +steps required to start a session, beginning - if required - with the +login, then clicking "My server", then by pressing the "+" sign on the +upper left corner. Provided your server is still running and you simply +closed the window or logged out without stopping your server, you will +find your WebVNC desktop the way you left it. + +## Terminate a remote session + +<span class="twiki-macro TABLE" columnwidths="10%,90%"></span> \| **step +1** \| Close the VNC viewer tab or window. \| \| **step 2** \| Click on +File \> Log Out in the JupyterLab main menu. Now you get redirected to +the JupyterLab control panel. If you don't have your JupyterLab tab or +window anymore, navigate directly to \<a +href="<https://taurus.hrsk.tu-dresden.de/jupyter/hub/home>" +target="\_blank"><https://taurus.hrsk.tu-dresden.de/jupyter/hub/home>\</a>. +\| \| **step 3** \| Click on "Stop My Server". This cancels the SLURM +job and terminates your session. \| + +### Demonstration + +\<video controls="" width="320" style="border: 1px solid black">\<source +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/VirtualDesktops/terminate-virtual-desktop-dcv.mp4>" +type="video/mp4">\<source +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/VirtualDesktops/terminate-virtual-desktop-dcv.webm>" +type="video/webm">\</video> + +**Remark:** This does not work if you click on the "Logout"-Btn in your +virtual desktop. Instead this will just close your DCV session or cause +a black screen in your WebVNC window without a possibility to recover a +virtual desktop in the same Jupyter session. The solution for now would +be to terminate the whole jupyter session and start a new one like +mentioned above. diff --git a/twiki2md/root/Applications/Visualization.md b/twiki2md/root/Applications/Visualization.md new file mode 100644 index 000000000..31c83ce5f --- /dev/null +++ b/twiki2md/root/Applications/Visualization.md @@ -0,0 +1,219 @@ +# Visualization + +## ParaView + +[ParaView](https://paraview.org) is an open-source, multi-platform data +analysis and visualization application. It is available on Taurus under +the `ParaView` [modules](RuntimeEnvironment#Modules) + + taurus$ module avail ParaView + + ParaView/5.4.1-foss-2018b-mpi (D) ParaView/5.5.2-intel-2018a-mpi ParaView/5.7.0-osmesa + ParaView/5.4.1-intel-2018a-mpi ParaView/5.6.2-foss-2019b-Python-3.7.4-mpi ParaView/5.7.0 + +The ParaView package comprises different tools which are designed to +meet interactive, batch and in-situ workflows. + +## Batch Mode - PvBatch + +ParaView can run in batch mode, i.e., without opening the ParaView GUI, +executing a python script. This way, common visualization tasks can be +automated. There are two Python interfaces: - *pvpython* and *pvbatch*. +*pvbatch* only accepts commands from input scripts, and it will run in +parallel if it was built using MPI. + +ParaView is shipped with a prebuild MPI library and ***pvbatch has to be +invoked using this very mpiexec*** command. Make sure to not use *srun +or mpiexec* from another MPI module, e.g., check what *mpiexec* is in +the path: + + taurus$ module load ParaView/5.7.0-osmesa + taurus$ which mpiexec + /sw/installed/ParaView/5.7.0-osmesa/bin/mpiexec + +The resources for the MPI processes have to be allocated via the Slurm +option *-c NUM* (not *-n*, as it would be usually for MPI processes). It +might be valuable in terms of runtime to bind/pin the MPI processes to +hardware. A convenient option is *-bind-to core*. All other options can +be obtained by *taurus$ mpiexec -bind-to -help* or from \<a +href="<https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding%7Cwiki.mpich.org>" +title="<https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding%7Cwiki.mpich.org>"><https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding%7Cwiki.mpich.org>\</a> + +Jobfile + + #!/bin/bash + + #SBATCH -N 1 + #SBATCH -c 12 + #SBATCH --time=01:00:00 + + # Make sure to only use ParaView + module purge + module load ParaView/5.7.0-osmesa + + pvbatch --mpi --force-offscreen-rendering pvbatch-script.py + +Interactive allocation via \`salloc\` + + taurus$ salloc -N 1 -c 16 --time=01:00:00 bash + salloc: Pending job allocation 336202 + salloc: job 336202 queued and waiting for resources + salloc: job 336202 has been allocated resources + salloc: Granted job allocation 336202 + salloc: Waiting for resource configuration + salloc: Nodes taurusi6605 are ready for job + + # Make sure to only use ParaView + taurus$ module purge + taurus$ module load ParaView/5.7.0-osmesa + + # Go to working directory, e.g. workspace + taurus$ cd /path/to/workspace + + # Execute pvbatch using 16 MPI processes in parallel on allocated resources + taurus$ pvbatch --mpi --force-offscreen-rendering pvbatch-script.py + +### Using GPUs + +ParaView Pvbatch can render offscreen through the Native Platform +Interface (EGL) on the graphics card (GPUs) specified by the device +index. For that, use the modules indexed with *-egl*, e.g. +ParaView/5.9.0-RC1-egl-mpi-Python-3.8, and pass the option +\_--egl-device-index=$CUDA_VISIBLE*DEVICES*. + +Jobfile + + #!/bin/bash + + #SBATCH -N 1 + #SBATCH -c 12 + #SBATCH --gres=gpu:2 + #SBATCH --partition=gpu2 + #SBATCH --time=01:00:00 + + # Make sure to only use ParaView + module purge + module load ParaView/5.9.0-RC1-egl-mpi-Python-3.8 + + mpiexec -n $SLURM_CPUS_PER_TASK -bind-to core pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering pvbatch-script.py + #or + pvbatch --mpi --egl-device-index=$CUDA_VISIBLE_DEVICES --force-offscreen-rendering pvbatch-script.py + +## Interactive Mode + +There are different ways of using ParaView on the cluster: + +- GUI via NICE DCV on a GPU node +- Client-/Server mode with MPI-parallel off-screen-rendering +- GUI via X forwarding + +#### Using the GUI via NICE DCV on a GPU node + +This option provides hardware accelerated OpenGL and might provide the +best performance and smooth handling. First, you need to open a DCV +session, so please follow the instructions under [virtual +desktops](Compendium.VirtualDesktops). Start a terminal (right-click on +desktop -> Terminal) in your virtual desktop session, then load the +ParaView module as usual and start the GUI: + + taurus$ module load ParaView/5.7.0 + paraview + +Since your DCV session already runs inside a job, i.e., it has been +scheduled to a compute node, no `srun command` is necessary here. + +#### Using Client-/Server mode with MPI-parallel offscreen-rendering + +ParaView has a built-in client-server architecture, where you run the +GUI locally on your desktop and connect to a ParaView server instance +(so-called `pvserver`) on the cluster. The pvserver performs the +computationally intensive rendering. Note that **your client must be of +the same version as the server**. + +The pvserver can be run in parallel using MPI, but it will only do CPU +rendering via MESA. For this, you need to load the \*osmesa\*-suffixed +version of the ParaView module, which supports offscreen-rendering. +Then, start the `pvserver` via `srun` in parallel using multiple MPI +processes: + + taurus$ module ParaView/5.7.0-osmesa + taurus$ srun -N1 -n8 --mem-per-cpu=2500 -p interactive --pty pvserver --force-offscreen-rendering + srun: job 2744818 queued and waiting for resources + srun: job 2744818 has been allocated resources + Waiting for client... + Connection URL: cs://taurusi6612.taurus.hrsk.tu-dresden.de:11111 + Accepting connection(s): taurusi6612.taurus.hrsk.tu-dresden.de:11111 + +If the default port 11111 is already in use, an alternative port can be +specified via *`-sp=port`.*Once the resources are allocated, the +pvserver is started in parallel and connection information are +outputed.*\<br />* + +This contains the node name which your job and server runs on. However, +since the node names of the cluster are not present in the public domain +name system (only cluster-internally), you cannot just use this line +as-is for connection with your client. You first have to resolve the +name to an IP address on the cluster: Suffix the nodename with **-mn** +to get the management network (ethernet) address, and pass it to a +lookup-tool like *host* in another SSH session: + + taurus$ host taurusi6605-mn<br />taurusi6605-mn.taurus.hrsk.tu-dresden.de has address 172.24.140.229 + +The SSH tunnel has to be created from the user's localhost. The +following example will create a forward SSH tunnel to localhost on port +22222 (or what ever port is prefered): + + localhost$ ssh -L 22222:10.10.32.228:11111 userlogin@cara.dlr.de + +The final step is to start ParaView locally on your own machine and add +the connection + +- File→Connect... +- Add Server + - Name: localhost tunnel + - Server Type: Client / Server + - Host: localhost + - Port: 22222 +- Configure + - Startup Type: Manual + - →Save +- → Connect + +A successful connection is displayed by a *client*connected message +displayed on the `pvserver` process terminal, and within ParaView's +Pipeline Browser (instead of it saying builtin). You now are connected +to the pvserver running on a Taurus node and can open files from the +cluster's filesystems. + +##### Caveats + +Connecting to the compute nodes will only work when you are **inside the +TUD campus network**, because otherwise, the private networks 172.24.\* +will not be routed. That's why you either need to use +[VPN](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn), +or, when coming via the ZIH login gateway (`login1.zih.tu-dresden.de`), +use an SSH tunnel. For the example IP address from above, this could +look like the following: + + # Replace "user" with your login name, of course: + ssh -f -N -L11111:172.24.140.229:11111 user@login1.zih.tu-dresden.de + +This line opens the port 11111 locally and tunnels it via `login1` to +the `pvserver` running on the Taurus node. Note that you then must +instruct your local ParaView client to connect to host `localhost` +instead. The recommendation, though, is to use VPN, which makes this +extra step unnecessary. + +#### Using the GUI via X forwarding (not recommended) + +Even the developers, KitWare, say that X forwarding is not supported at +all by ParaView, as it requires OpenGL extensions that are not supported +by X forwarding. It might still be usable for very small examples, but +the user experience will not be good. Also, you have to make sure your X +forwarding connection provides OpenGL rendering support. Furthermore, +especially in newer versions of ParaView, you might have to set the +environment variable MESA_GL_VERSION_OVERRIDE=3.2 to fool it into +thinking your provided GL rendering version is higher than what it +actually is. Example: + + # 1st, connect to Taurus using X forwarding (-X).<br /># It is a good idea to also enable compression for such connections (-C):<br />ssh -XC taurus.hrsk.tu-dresden.de<br /><br /># 2nd, load the ParaView module and override the GL version (if necessary):<br />module Paraview/5.7.0<br />export MESA_GL_VERSION_OVERRIDE=3.2<br /><br /># 3rd, start the ParaView GUI inside an interactive job. Don't forget the --x11 parameter for X forwarding:<br />srun -n1 -c1 -p interactive --mem-per-cpu=2500 --pty --x11=first paraview diff --git a/twiki2md/root/BatchSystems/LoadLeveler.md b/twiki2md/root/BatchSystems/LoadLeveler.md new file mode 100644 index 000000000..1fd54a807 --- /dev/null +++ b/twiki2md/root/BatchSystems/LoadLeveler.md @@ -0,0 +1,415 @@ +# LoadLeveler - IBM Tivoli Workload Scheduler + + + +## Job Submission + +First of all, to submit a job to LoadLeveler a job file needs to be +created. This job file can be passed to the command: +`llsubmit [llsubmit_options] <job_file>` + +### Job File Examples + +#### Serial Batch Jobs + +An example job file may look like this: + + #@ job_name = my_job + #@ output = $(job_name).$(jobid).out + #@ error = $(job_name).$(jobid).err + #@ class = short + #@ group = triton-ww | triton-ipf | triton-ism | triton-et + #@ wall_clock_limit = 00:30:00 + #@ resources = ConsumableMemory(1 gb) + #@ environment = COPY_ALL + #@ notification = complete + #@ notify_user = your_email@adress + #@ queue + + ./my_serial_program + +This example requests a serial job with a runtime of 30 minutes and a +overall memory requirement of 1GByte. There are four groups available, +don't forget to choose the one and only matching group. When the job +completes, a mail will be sent which includes details about resource +usage. + +#### MPI Parallel Batch Jobs + +An example job file may look like this: + + #@ job_name = my_job + #@ output = $(job_name).$(jobid).out + #@ error = $(job_name).$(jobid).err + #@ job_type = parallel + #@ node = 2 + #@ tasks_per_node = 8 + #@ class = short + #@ group = triton-ww | triton-ipf | triton-ism | triton-et + #@ wall_clock_limit = 00:30:00 + #@ resources = ConsumableMemory(1 gb) + #@ environment = COPY_ALL + #@ notification = complete + #@ notify_user = your_email@adress + #@ queue + + mpirun -x OMP_NUM_THREADS=1 -x LD_LIBRARY_PATH -np 16 ./my_mpi_program + +This example requests a parallel job with 16 processes (2 nodes, 8 tasks +per node), a runtime of 30 minutes, 1GByte memory requirement per task +and therefore a overall memory requirement of 8GByte per node. Please +keep in mind that each node on Triton only provides 45GByte. The choice +of the correct group is also important and necessary. The `-x` option of +`mpirun` exports the specified environment variables to all MPI +processes. + +- `OMP_NUM_THREADS=1`: If you are using libraries like MKL, which are + multithreaded, you always should set the number of threads + explicitly so that the nodes are not overloaded. Otherwise you will + experience heavy performance problems. +- `LD_LIBRARY_PATH`: If your program is linked with shared libraries + (like MKL) which are not standard system libraries, you must export + this variable to the MPI processes. + +When the job completes, a mail will be sent which includes details about +resource usage. + +Before submitting MPI jobs, ensure that the appropriate MPI module is +loaded, e.g issue: + + # module load openmpi + +#### Hybrid MPI+OpenMP Parallel Batch Jobs + +An example job file may look like this: + + #@ job_name = my_job + #@ output = $(job_name).$(jobid).out + #@ error = $(job_name).$(jobid).err + #@ job_type = parallel + #@ node = 4 + #@ tasks_per_node = 8 + #@ class = short + #@ group = triton-ww | triton-ipf | triton-ism | triton-et + #@ wall_clock_limit = 00:30:00 + #@ resources = ConsumableMemory(1 gb) + #@ environment = COPY_ALL + #@ notification = complete + #@ notify_user = your_email@adress + #@ queue + + mpirun -x OMP_NUM_THREADS=8 -x LD_LIBRARY_PATH -np 4 --bynode ./my_hybrid_program + +This example requests a parallel job with 32 processes (4 nodes, 8 tasks +per node), a runtime of 30 minutes, 1GByte memory requirement per task +and therefore a overall memory requirement of 8GByte per node. Please +keep in mind that each node on Triton only provides 45GByte. The choice +of the correct group is also important and necessary. The mpirun command +starts 4 MPI processes (`--bynode` forces one process per node). +`OMP_NUM_THREADS` is set to 8, so that 8 threads are started per MPI +rank. When the job completes, a mail will be sent which includes details +about resource usage. + +### Job File Keywords + +| Keyword | Valid values | Description | +|:-------------------|:------------------------------------------------|:-------------------------------------------------------------------------------------| +| `notification` | `always`, `error`, `start`, `never`, `complete` | When to write notification email. | +| `notify_user` | valid email adress | Notification email adress. | +| `output` | file name | File for stdout of the job. | +| `error` | file name | File for stderr of the job. | +| `job_type` | `parallel`, `serial` | Job type, default is `serial`. | +| `node` | `1` - `64` | Number of nodes requested (parallel jobs only). | +| `tasks_per_node` | `1` - `8` | Number of processors per node requested (parallel jobs only). | +| `class` | see `llclass` | Job queue. | +| `group` | triton-ww, triton-ipf, triton-ism, triton-et | choose matching group | +| `wall_clock_limit` | HH:MM:SS | Run time limit of the job. | +| `resources` | `name(count)` ... `name(count)` | Specifies quantities of the consumable resources consumed by each task of a job step | + +Further Information: +\[\[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl35.admin.doc/am2ug_jobkey.html\]\[Full +description of keywords\]\]. + +### Submit a Job without a Job File + +Submission of a job without a job file can be done by the command: +`llsub [llsub_options] <command>` + +This command is not part of the IBM Loadleveler software but was +developed at ZIH. + +The job file will be created in background by means of the command line +options. Afterwards, the job file will be passed to the command +`llsubmit` which submit the job to LoadLeveler (see above). + +Important options are: + +| Option | Default | Description | +|:----------------------|:---------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `-J <name>` | `llsub` | Specifies the name of the job. You can name the job using any combination of letters, numbers, or both. The job name only appears in the long reports of the llq, llstatus, and llsummary commands. | +| `-n` | `1` | Specifies the total number of tasks of a parallel job you want to run on all available nodes. | +| `-T` | not specified | Specifies the maximum number of OpenMP threads to use per process by setting the environment variable OMP_NUM_THREADS to number. | +| `--o, -oo <filename>` | `<jobname>.<hostname>.<jobid>.out` | Specifies the name of the file to use as standard output (stdout) when your job step runs. | +| `-e, -oe <filename>` | `<jobname>.<hostname>.<jobid>.err` | Specifies the name of the file to use as standard error (stderr) when your job step runs. | +| `-I` | not specified | Submits an interactive job and sends the job's standard output (or standard error) to the terminal. | +| `-q <name>` | non-interactive: `short` interactive(n`1): =interactive` interactive(n>1): `interactive_par` | Specifies the name of a job class defined locally in your cluster. You can use the llclass command to find out information on job classes. | +| `-x` | not specified | Puts the node running your job into exclusive execution mode. In exclusive execution mode, your job runs by itself on a node. It is dispatched only to a node with no other jobs running, and LoadLeveler does not send any other jobs to the node until the job completes. | +| `-hosts <number>` | automatically | Specifies the number of nodes requested by a job step. This option is equal to the bsub option -R "span\[hosts=number\]". | +| `-ptile <number>` | automatically | Specifies the number of nodes requested by a job step. This option is equal to the bsub option -R "span\[ptile=number\]". | +| `-mem <size>` | not specified | Specifies the requirement of memory which the job needs on a single node. The memory requirement is specified in MB. This option is equal to the bsub option -R "rusage\[mem=size\]". | + +The option `-H` prints the list of all available command line options. + +Here is an example for an MPI Job: + + llsub -T 1 -n 16 -e err.txt -o out.txt mpirun -x LD_LIBRARY_PATH -np 16 ./my_program + +### Interactive Jobs + +Interactive Jobs can be submitted by the command: +`llsub -I -q <interactive> <command>` + +### Loadleveler Runtime Environment Variables + +Loadleveler Runtime Variables give you some information within the job +script, for example: + + #@ job_name = my_job + #@ output = $(job_name).$(jobid).out + #@ error = $(job_name).$(jobid).err + #@ job_type = parallel + #@ node = 2 + #@ tasks_per_node = 8 + #@ class = short + #@ wall_clock_limit = 00:30:00 + #@ resources = ConsumableMemory(1 gb) + #@ environment = COPY_ALL + #@ notification = complete + #@ notify_user = your_email@adress + #@ queue + + echo $LOADL_PROCESSOR_LIST + echo $LOADL_STEP_ID + echo $LOADL_JOB_NAME + mpirun -np 16 ./my_mpi_program + +Further Information: +\[\[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl35.admin.doc/am2ug_envvars.html\]\[Full +description of variables\]\]. + +## Job Queues + +The `llclass` command provides information about each queue. Example +output: + + Name MaxJobCPU MaxProcCPU Free Max Description + d+hh:mm:ss d+hh:mm:ss Slots Slots + --------------- -------------- -------------- ----- ----- --------------------- + interactive undefined undefined 32 32 interactive, exclusive shared nodes, max. 12h runtime + triton_ism undefined undefined 8 80 exclusive, serial + parallel queue, nodes shared, unlimited runtime + openend undefined undefined 272 384 serial + parallel queue, nodes shared, unlimited runtime + long undefined undefined 272 384 serial + parallel queue, nodes shared, max. 7 days runtime + medium undefined undefined 272 384 serial + parallel queue, nodes shared, max. 3 days runtime + short undefined undefined 272 384 serial + parallel queue, nodes shared, max. 4 hours runtime + +## Job Monitoring + +### All Jobs in the Queue + + # llq + +#### All of One's Own Jobs + + # llq -u username + +### Details About Why A Job Has Not Yet Started + + # llq -s job-id + +The key information is located at the end of the output, and will look +similar to the following: + + ==================== EVALUATIONS FOR JOB STEP l1f1n01.4604.0 ==================== + The class of this job step is "workq". + Total number of available initiators of this class on all machines in the cluster: 0 + Minimum number of initiators of this class required by job step: 4 + The number of available initiators of this class is not sufficient for this job step. + Not enough resources to start now. + Not enough resources for this step as backfill. + +Or it will tell you the **estimated start** time: + + ==================== EVALUATIONS FOR JOB STEP l1f1n01.8207.0 ==================== + The class of this job step is "checkpt". + Total number of available initiators of this class on all machines in the cluster: 8 + Minimum number of initiators of this class required by job step: 32 + The number of available initiators of this class is not sufficient for this job step. + Not enough resources to start now. + This step is top-dog. + Considered at: Fri Jul 13 12:12:04 2007 + Will start by: Tue Jul 17 18:10:32 2007 + +### Generate a long listing rather than the standard one + + # llq -l job-id + +This command will give you detailed job information. + +### Job Status States + +| | | | +|------------------|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Canceled | CA | The job has been canceled as by the llcancel command. | +| Completed | C | The job has completed. | +| Complete Pending | CP | The job is completed. Some tasks are finished. | +| Deferred | D | The job will not be assigned until a specified date. The start date may have been specified by the user in the Job Command file or it may have been set by LoadLeveler because a parallel job could not obtain enough machines to run the job. | +| Idle | I | The job is being considered to run on a machine though no machine has been selected yet. | +| NotQueued | NQ | The job is not being considered to run. A job may enter this state due to an error in the command file or because LoadLeveler can not obtain information that it needs to act on the request. | +| Not Run | NR | The job will never run because a stated dependency in the Job Command file evaluated to be false. | +| Pending | P | The job is in the process of starting on one or more machines. The request to start the job has been sent but has not yet been acknowledged. | +| Rejected | X | The job did not start because there was a mismatch or requirements for your job and the resources on the target machine or because the user does not have a valid ID on the target machine. | +| Reject Pending | XP | The job is in the process of being rejected. | +| Removed | RM | The job was canceled by either LoadLeveler or the owner of the job. | +| Remove Pending | RP | The job is in the process of being removed. | +| Running | R | The job is running. | +| Starting | ST | The job is starting. | +| Submission Error | SX | The job can not start due to a submission error. Please notify the Bluedawg administration team if you encounter this error. | +| System Hold | S | The job has been put in hold by a system administrator. | +| System User Hold | HS | Both the user and a system administrator has put the job on hold. | +| Terminated | TX | The job was terminated, presumably by means beyond LoadLeveler's control. Please notify the Bluedawg administration team if you encounter this error. | +| User Hold | H | The job has been put on hold by the owner. | +| Vacated | V | The started job did not complete. The job will be scheduled again provided that the job may be rescheduled. | +| Vacate Pending | VP | The job is in the process of vacating. | + +## Cancel a Job + +### A Particular Job + + # llcancel job-id + +### All of One's Jobs + + # llcancel -u username + +## Job History and Usage Summaries + +On each cluster, there exists a file that contains the history of all +jobs run under LoadLeveler. This file is +**/var/loadl/archive/history.archive**, and may be queried using the +**llsummary** command. + +An example of usage would be as follows: + + # llsummary -u estrabd /var/loadl/archive/history.archive + +And the output would look something like: + + Name Jobs Steps Job Cpu Starter Cpu Leverage + estrabd 118 128 07:55:57 00:00:45 634.6 + TOTAL 118 128 07:55:57 00:00:45 634.6 + Class Jobs Steps Job Cpu Starter Cpu Leverage + checkpt 13 23 03:09:32 00:00:18 631.8 + interactive 105 105 04:46:24 00:00:26 660.9 + TOTAL 118 128 07:55:57 00:00:45 634.6 + Group Jobs Steps Job Cpu Starter Cpu Leverage + No_Group 118 128 07:55:57 00:00:45 634.6 + TOTAL 118 128 07:55:57 00:00:45 634.6 + Account Jobs Steps Job Cpu Starter Cpu Leverage + NONE 118 128 07:55:57 00:00:45 634.6 + TOTAL 118 128 07:55:57 00:00:45 634.6 + +The **llsummary** tool has a lot of options, which are discussed in its +man pages. + +## Check status of each node + + # llstatus + +And the output would look something like: + + root@triton[0]:~# llstatus + Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys + n01 Avail 0 0 Idle 0 0.00 2403 AMD64 Linux2 + n02 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n03 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n04 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n05 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n06 Avail 0 0 Idle 0 0.71 9999 AMD64 Linux2 + n07 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n08 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n09 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n10 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n11 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n12 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n13 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n14 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n15 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n16 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n17 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n18 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n19 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n20 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n21 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n22 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n23 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n24 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n25 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n26 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n27 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n28 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n29 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n30 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n31 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n32 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n33 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n34 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n35 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n36 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n37 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n38 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n39 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n40 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n41 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n42 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n43 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n44 Avail 0 0 Idle 0 0.01 9999 AMD64 Linux2 + n45 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n46 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n47 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n48 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n49 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n50 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n51 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n52 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n53 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n54 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n55 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n56 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n57 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n58 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n59 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n60 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n61 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n62 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n63 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + n64 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2 + triton Avail 0 0 Idle 0 0.00 585 AMD64 Linux2 + + AMD64/Linux2 65 machines 0 jobs 0 running tasks + Total Machines 65 machines 0 jobs 0 running tasks + + The Central Manager is defined on triton + + The BACKFILL scheduler is in use + + All machines on the machine_list are present. + +Detailed status information for a specific node: + + # llstatus -l n54 + +Further information: +\[\[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl.doc/llbooks.html\]\[IBM +Documentation (see version 3.5)\]\] + +-- Main.mark - 2010-06-01 diff --git a/twiki2md/root/BatchSystems/PlatformLSF.md b/twiki2md/root/BatchSystems/PlatformLSF.md new file mode 100644 index 000000000..56a86433a --- /dev/null +++ b/twiki2md/root/BatchSystems/PlatformLSF.md @@ -0,0 +1,309 @@ +# Platform LSF + +**`%RED%This Page is deprecated! The current bachsystem on Taurus and Venus is [[Compendium.Slurm][Slurm]]!%ENDCOLOR%`** + + The HRSK-I systems are operated +with the batch system LSF running on *Mars*, *Atlas* resp.. + +## Job Submission + +The job submission can be done with the command: +`bsub [bsub_options] <job>` + +Some options of `bsub` are shown in the following table: + +| bsub option | Description | +|:-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| -n \<N> | set number of processors (cores) to N(default=1) | +| -W \<hh:mm> | set maximum wall clock time to \<hh:mm> | +| -J \<name> | assigns the specified name to the job | +| -eo \<errfile> | writes the standard error output of the job to the specified file (overwriting) | +| -o \<outfile> | appends the standard output of the job to the specified file | +| -R span\[hosts=1\] | use only one SMP node (automatically set by the batch system) | +| -R span\[ptile=2\] | run 2 tasks per node | +| -x | disable other jobs to share the node ( Atlas ). | +| -m | specify hosts to run on ( [see below](#HostList)) | +| -M \<M> | specify per-process (per-core) memory limit (in MB), the job's memory limit is derived from that number (N proc \* M MB); see examples and [Attn. #2](#AttentionNo2) below | +| -P \<project> | specifiy project | + +You can use the `%J` -macro to merge the job ID into names. + +It might be more convenient to put the options directly in a job file +which you can submit using + + bsub < my_jobfile + +The following example job file shows how you can make use of it: + + #!/bin/bash + #BSUB -J my_job # the job's name + #BSUB -W 4:00 # max. wall clock time 4h + #BSUB -R "span[hosts=1]" # run on a single node + #BSUB -n 4 # number of processors + #BSUB -M 500 # 500MB per core memory limit + #BSUB -o out.%J # output file + #BSUB -u name@tu-dresden.de # email address; works ONLY with @tu-dresden.de + + echo Starting Program + cd $HOME/work + a.out # e.g. an OpenMP program + echo Finished Program + +**Understanding memory limits** The option -M to bsub defines how much +memory may be consumed by a single process of the job. The job memory +limit is computed taking this value times the number of processes +requested (-n). Therefore, having -M 600 and -n 4 results in a job +memory limit of 2400 MB. If any one of your processes consumes more than +600 MB memory OR if all processes belonging to this job consume more +than 2400 MB of memory in sum, then the job will be killed by LSF. + +- For serial programs, the given limit is the same for the process and + the whole job, e.g. 500 MB + +<!-- --> + + bsub -W 1:00 -n 1 -M 500 myprog + +- For MPI-parallel programs, the job memory limit is N processes \* + memory limit, e.g. 32\*800 MB = 25600 MB + +<!-- --> + + bsub -W 8:00 -n 32 -M 800 mympiprog + +- For OpenMP-parallel programs, the same applies as with MPI-parallel + programs, e.g. 8\*2000 MB = 16000 MB + +<!-- --> + + bsub -W 4:00 -n 8 -M 2000 myompprog + +LSF sets the user environment according to the environment at the time +of submission. + +Based on the given information the job scheduler puts your job into the +appropriate queue. These queues are subject to permanent changes. You +can check the current situation using the command `bqueues -l` . There +are a couple of rules and restrictions to balance the system loads. One +idea behind them is to prevent users from occupying the machines +unfairly. An indicator for the priority of a job placement in a queue is +therefore the ratio between used and granted CPU time for a certain +period. + +`Attention`: If you do not give the maximum runtime of your program, the +default runtime for the specified queue is taken. This is way below the +maximal possible runtime (see table [below](#JobQueues)). + +#AttentionNo2 `Attention #2`: Some systems enforce a limit on how much +memory each process and your job as a whole may allocate. If your job or +any of its processes exceed this limit (N proc.\*limit for the job), +your job will be killed. If memory limiting is in place, there also +exists a default limit which will be applied to your job if you do not +specify one. Please find the limits along with the description of the +machines' [queues](#JobQueues) below. + +#InteractiveJobs + +### Interactive Jobs + +Interactive activities like editing, compiling etc. are normally limited +to the boot CPU set ( *Mars* ) or to the master nodes ( *Atlas* ). For +the development and testing sometimes a larger number of CPUs and more +CPU time may be needed. Please do not use the interactive queue for +extensive production runs! + +Use the bsub options `-Is` for an interactive and, additionally on +*Atlas*, `-XF` for an X11 job like: + + bsub -Is -XF matlab + +or for an interactive job with a bash use + + bsub -Is -n 2 -W <hh:mm> -P <project> bash + +You can check the current usage of the system with the command `bhosts` +to estimate the time to schedule. + +#ParallelJobs + +### Parallel Jobs + +For submitting parallel jobs, a few rules have to be understood and +followed. In general they depend on the type of parallelization and the +architecture. + +#OpenMPJobs + +#### OpenMP Jobs + +An SMP-parallel job can only run within a node (or a partition), so it +is necessary to include the option `-R "span[hosts=1]"` . The maximum +number of processors for an SMP-parallel program is 506 on a large Altix +partition, and 64 on \<tt>*Atlas*\</tt> . A simple example of a job file +for an OpenMP job can be found above (section [3.4](#LSF-OpenMP)). + +[Further information on pinning +threads.](RuntimeEnvironment#Placing_Threads_or_Processes_on) + +#MpiJobs + +#### MPI Jobs + +There are major differences for submitting MPI-parallel jobs on the +systems at ZIH. Please refer to the HPC systems's section. It is +essential to use the same modules at compile- and run-time. + +### Array Jobs + +Array jobs can be used to create a sequence of jobs that share the same +executable and resource requirements, but have different input files, to +be submitted, controlled, and monitored as a single unit. + +After the job array is submitted, LSF independently schedules and +dispatches the individual jobs. Each job submitted from a job array +shares the same job ID as the job array and are uniquely referenced +using an array index. The dimension and structure of a job array is +defined when the job array is created. + +Here is an example how an array job can looks like: + + #!/bin/bash + + #BSUB -W 00:10 + #BSUB -n 1 + #BSUB -J "myTask[1-100:2]" # create job array with 50 tasks + #BSUB -o logs/out.%J.%I # appends the standard output of the job to the specified file that + # contains the job information (%J) and the task information (%I) + #BSUB -e logs/err.%J.%I # appends the error output of the job to the specified file that + # contains the job information (%J) and the task information (%I) + + echo "Hello Job $LSB_JOBID Task $LSB_JOBINDEX" + +Alternatively, you can use the following single command line to submit +an array job: + + bsub -n 1 -W 00:10 -J "myTask[1-100:2]" -o "logs/out.%J.%I" -e "logs/err.%J.%I" "echo Hello Job \$LSB_JOBID Task \$LSB_JOBINDEX" + +For further details please read the LSF manual. + +### Chain Jobs + +You can use chain jobs to create dependencies between jobs. This is +often the case if a job relies on the result of one or more preceding +jobs. Chain jobs can also be used if the runtime limit of the batch +queues is not sufficient for your job. + +To create dependencies between jobs you have to use the option `-w`. +Since `-w` relies on the job id or the job name it is advisable to use +the option `-J` to create a user specified name for a single job. For +detailed information see the man pages of bsub with `man bsub`. + +Here is an example how a chain job can looks like: + + #!/bin/bash + + #job parameters + time="4:00" + mem="rusage[mem=2000] span[host=1]" + n="8" + + #iteration parameters + start=1 + end=10 + i=$start + + #create chain job with 10 jobs + while [ "$i" -lt "`expr $end + 1`" ] + do + if [ "$i" -eq "$start" ];then + #create jobname + JOBNAME="${USER}_job_$i" + bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" <job> + else + #create jobname + OJOBNAME=$JOBNAME + JOBNAME="${USER}_job_$i" + #only start a job if the preceding job has the status done + bsub -n "$n" -W "$time" -R "$mem" -J "$JOBNAME" -w "done($OJOBNAME)" <job> + fi + i=`expr $i + 1` + done + +#JobQueues + +## Job Queues + +With the command `bqueues [-l <queue name>]` you can get information +about available queues. With `bqueues -l` you get a detailed listing of +the queue properties. + +`Attention`: The queue `interactive` is the only one to accept +interactive jobs! + +## Job Monitoring + +You can check the current usage of the system with the command `bhosts` +to estimate the time to schedule. Or to get an overview on *Atlas*, +lsfview shows the current usage of the system. + +The command `bhosts` shows the load on the hosts. + +For a more convenient overview the command `lsfshowjobs` displays +information on the LSF status like this: + + You have 1 running job using 64 cores + You have 1 pending job + +and the command `lsfnodestat` displays the node and core status of +machine like this: + +# ------------------------------------------- + +nodes available: 714/714 nodes damaged: 0 + +# ------------------------------------------- + +jobs running: 1797 \| cores closed (exclusive jobs): 94 jobs wait: 3361 +\| cores closed by ADMIN: 129 jobs suspend: 0 \| cores working: 2068 +jobs damaged: 0 \| + +# ------------------------------------------- + +normal working cores: 2556 cores free for jobs: 265 \</pre> + +The command `bjobs` allows to monitor your running jobs. It has the +following options: + +| bjobs option | Description | +|:--------------|:----------------------------------------------------------------------------------------------------------------------------------| +| `-r` | Displays running jobs. | +| `-s` | Displays suspended jobs, together with the suspending reason that caused each job to become suspended. | +| `-p` | Displays pending jobs, together with the pending reasons that caused each job not to be dispatched during the last dispatch turn. | +| `-a` | Displays information on jobs in all states, including finished jobs that finished recently. | +| `-l [job_id]` | Displays detailed information for each job or for a particular job. | + +## Checking the progress of your jobs + +If you run code that regularily emits status or progress messages, using +the command + +`watch -n10 tail -n2 '*out'` + +in your `$HOME/.lsbatch` directory is a very handy way to keep yourself +informed. Note that this only works if you did not use the `-o` option +of `bsub`, If you used `-o`, replace `*out` with the list of file names +you passed to this very option. + +#HostList + +## Host List + +The `bsub` option `-m` can be used to specify a list of hosts for +execution. This is especially useful for memory intensive computations. + +### Altix + +Jupiter, saturn, and uranus have 4 GB RAM per core, mars only 1GB. So it +makes sense to specify '-m "jupiter saturn uranus". + +\</noautolink> diff --git a/twiki2md/root/BatchSystems/WindowsBatch.md b/twiki2md/root/BatchSystems/WindowsBatch.md new file mode 100644 index 000000000..085879892 --- /dev/null +++ b/twiki2md/root/BatchSystems/WindowsBatch.md @@ -0,0 +1,69 @@ +## Batch System on the Windows HPC Server + +The graphical user interface to the Windows batch system is the HPC Job +Manager. You can find this resource under Start -> Programs -> Microsoft +HPC Pack -> HPC Job Manager. + +### Job Submission + +To create a new job click at one of the job dialog items in the Actions +submenu Job Submission and specify your job requirements in the job +dialog. + +It is advisable to give your job an unique name for distinguishing +reasons during the job monitoring phase. + +#### Job Types + +- **Job**: The Job is the convenient way to create a batch job where + you want to specify all job requirements in detail. It is also + possible to create a job that consists of multiple task. If you have + dependencies between your tasks, e.g. task b should only be started + if task a has finished you can specify these dependencies in the + submenu item task list in the job dialog. + +<!-- --> + +- **Single-Task Job**: The Single-Task Job is the easiest way to + create a batch job that consists only of one task. In addition you + can specify the number of cores that should be used, the working + directory and job input and output files. + +<!-- --> + +- **Parametric Sweep Job**: The Parametric Sweep Job allows the user + to create a sweep job where the tasks only differ in one input + parameter. For this parameter the user can specify the start, end + and the increment value. With this description the job will consists + of (end-start)/increment individual task, which will be placed on + all free cores of the cluster. + +#### Working Directories + +- `C:\htitan\<LoginName>` +- `C:\titan\HOME_(TITAN)\<LoginName>` +- `\\titan\hpcms-files\HOME_(TITAN)\<LoginName>` +- `\\hpcms\hpcms-files\HOME_(TITAN)\<LoginName>` +- `Z:\HOME_(TITAN)\<LoginName>` (only available at login node) + +### Job Queues + +The queues are named job templates and can be chosen in the job +submission dialog. + +#### Titan + +At the moment there are two queues both without a runtime and/or core +limitation. + +| Batch Queue | Admitted Users | Available CPUs | Default Runtime | Max. Runtime | +|:---------------|:-----------------|:------------------|:----------------|:-------------| +| `Default` | `all` | `min. 1, max. 64` | `none` | `none` | +| `VampirServer` | `selected users` | `min. 1, max. 72` | `none` | `none` | + +### Job Monitoring + +The status of the jobs is visible via the Job Management submenu. Via +the context menu more detailed information of a job is available. + +- Set DENYTOPICVIEW = WikiGuest diff --git a/twiki2md/root/Cloud/BeeGFS.md b/twiki2md/root/Cloud/BeeGFS.md new file mode 100644 index 000000000..131e4e6d9 --- /dev/null +++ b/twiki2md/root/Cloud/BeeGFS.md @@ -0,0 +1,146 @@ +# BeeGFS file system + +%RED%Note: This page is under construction. %ENDCOLOR%%RED%The pipeline +will be changed soon%ENDCOLOR% + +**Prerequisites:** To work with Tensorflow you obviously need \<a +href="Login" target="\_blank">access\</a> for the Taurus system and +basic knowledge about Linux, mounting, SLURM system. + +**Aim** \<span style="font-size: 1em;"> of this page is to introduce +users how to start working with the BeeGFS file\</span>\<span +style="font-size: 1em;"> system - a high-performance parallel file +system.\</span> + +## Mount point + +Understanding of mounting and the concept of the mount point is +important for using file systems and object storage. A mount point is a +directory (typically an empty one) in the currently accessible file +system on which an additional file system is mounted (i.e., logically +attached). \<span style="font-size: 1em;">The default mount points for a +system are the directories in which file systems will be automatically +mounted unless told by the user to do otherwise. \</span>\<span +style="font-size: 1em;">All partitions are attached to the system via a +mount point. The mount point defines the place of a particular data set +in the file system. Usually, all partitions are connected through the +root partition. On this partition, which is indicated with the slash +(/), directories are created. \</span> + +## BeeGFS introduction + +\<span style="font-size: 1em;"> [BeeGFS](https://www.beegfs.io/content/) +is the parallel cluster file system. \</span>\<span style="font-size: +1em;">BeeGFS spreads data \</span>\<span style="font-size: 1em;">across +multiple \</span>\<span style="font-size: 1em;">servers to aggregate +\</span>\<span style="font-size: 1em;">capacity and \</span>\<span +style="font-size: 1em;">performance of all \</span>\<span +style="font-size: 1em;">servers to provide a highly scalable shared +network file system with striped file contents. This is made possible by +the separation of metadata and file contents. \</span> + +BeeGFS is fast, flexible, and easy to manage storage if for your issue +filesystem plays an important role use BeeGFS. It addresses everyone, +who needs large and/or fast file storage + +## Create BeeGFS file system + +To reserve nodes for creating BeeGFS file system you need to create a +[batch](Slurm) job + + #!/bin/bash + #SBATCH -p nvme + #SBATCH -N 4 + #SBATCH --exclusive + #SBATCH --time=1-00:00:00 + #SBATCH --beegfs-create=yes + + srun sleep 1d # sleep for one day + + ## when finished writing, submit with: sbatch <script_name> + +Example output with job id: + + Submitted batch job 11047414 #Job id n.1 + +Check the status of the job with 'squeue -u \<username>' + +## Mount BeeGFS file system + +You can mount BeeGFS file system on the ML partition (ppc64 +architecture) or on the Haswell [partition](SystemTaurus) (x86_64 +architecture) + +### Mount BeeGFS file system on the ML + +Job submission can be done with the command (use job id (n.1) from batch +job used for creating BeeGFS system): + + srun -p ml --beegfs-mount=yes --beegfs-jobid=11047414 --pty bash #Job submission on ml nodes + +Example output: + + srun: job 11054579 queued and waiting for resources #Job id n.2 + srun: job 11054579 has been allocated resources + +### Mount BeeGFS file system on the Haswell nodes (x86_64) + +Job submission can be done with the command (use job id (n.1) from batch +job used for creating BeeGFS system): + + srun --constrain=DA --beegfs-mount=yes --beegfs-jobid=11047414 --pty bash #Job submission on the Haswell nodes + +Example output: + + srun: job 11054580 queued and waiting for resources #Job id n.2 + srun: job 11054580 has been allocated resources + +## Working with BeeGFS files for both types of nodes + +Show contents of the previously created file, for example, +beegfs_11054579 (where 11054579 - job id **n.2** of srun job): + + cat .beegfs_11054579 + +Note: don't forget to go over to your home directory where the file +located + +Example output: + + #!/bin/bash + + export BEEGFS_USER_DIR="/mnt/beegfs/<your_id>_<name_of_your_job>/<your_id>" + export BEEGFS_PROJECT_DIR="/mnt/beegfs/<your_id>_<name_of_your_job>/<name of your project>" + +Execute the content of the file: + + source .beegfs_11054579 + +Show content of user's BeeGFS directory with the command: + + ls -la ${BEEGFS_USER_DIR} + +Example output: + + total 0 + drwx--S--- 2 <username> swtest 6 21. Jun 10:54 . + drwxr-xr-x 4 root root 36 21. Jun 10:54 .. + +Show content of the user's project BeeGFS directory with the command: + + ls -la ${BEEGFS_PROJECT_DIR} + +Example output: + + total 0 + drwxrws--T 2 root swtest 6 21. Jun 10:54 . + drwxr-xr-x 4 root root 36 21. Jun 10:54 .. + +Note: If you want to mount the BeeGFS file system on an x86 instead of +an ML (power) node, you can either choose the partition "interactive" or +the partition "haswell64", but for the partition "haswell64" you have to +add the parameter "--exclude=taurusi\[4001-4104,5001- 5612\]" to your +job. This is necessary because the BeeGFS client is only installed on +the 6000 island. + +#### F.A.Q. diff --git a/twiki2md/root/Compendium.Applications/DesktopCloudVisualization.md b/twiki2md/root/Compendium.Applications/DesktopCloudVisualization.md new file mode 100644 index 000000000..0ca6b2206 --- /dev/null +++ b/twiki2md/root/Compendium.Applications/DesktopCloudVisualization.md @@ -0,0 +1,54 @@ +# Desktop Cloud Visualization (DCV) + +NICE DCV enables remote accessing OpenGL-3D-applications running on the +server (taurus) using the server's GPUs. If you don't need GL +acceleration, you might also want to try our [WebVNC](WebVNC) solution. + +Note that with the 2017 version (and later), while there is still a +separate client available, it is not necessary anymore. You can also use +your (WebGL-capable) browser to connect to the DCV server. A +standalone-client, which might still be a bit more performant, can be +downloaded from <https://www.nice-software.com/download/nice-dcv-2017> + + + +### Access with JupyterHub + +**Check out the [new documentation about virtual +desktops](Compendium.VirtualDesktops).** + +Click here, to start a session on our JupyterHub: +[https://taurus.hrsk.tu-dresden.de/jupyter/hub/spawn#/\~(partition\~'dcv\~cpuspertask\~'6\~gres\~'gpu\*3a1\~mempercpu\~'2583\~environment\~'production)](https://taurus.hrsk.tu-dresden.de/jupyter/hub/spawn#/~(partition~'dcv~cpuspertask~'6~gres~'gpu*3a1~mempercpu~'2583~environment~'test))\<br +/> This link starts your session on the dcv partition +(taurusi210\[7-8\]) with a GPU, 6 CPU cores and 2583 MB memory per core. +Optionally you can modify many different SLURM parameters. For this +follow the general [JupyterHub](Compendium.JupyterHub) documentation. + +Your browser now should load into the JupyterLab application which looks +like this:\<br /> \<br /> \<a +href="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/DesktopCloudVisualization/jupyterlab-and-dcv.png>"> +\<img alt="" src="%ATTACHURL%/jupyterlab-and-dcv.png" style="border: 1px +solid #888;" width="400" /> \</a> + +Click on the `DCV` button. A new tab with the DCV client will be opened. + +### Notes on GPU Support: + +- Check GPU support via: + +``` +glxinfo +name of display: :1 +display: :1 screen: 0 +direct rendering: Yes # <--- This line! +... +``` + +If direct rendering is not set to yes, please contact +<hpcsupport@zih.tu-dresden.de> + +- Expand LD_LIBRARY_PATH: + +``` +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/nvidia/ +``` diff --git a/twiki2md/root/Compendium.DataManagement/WorkSpaces.md b/twiki2md/root/Compendium.DataManagement/WorkSpaces.md new file mode 100644 index 000000000..7d922ca51 --- /dev/null +++ b/twiki2md/root/Compendium.DataManagement/WorkSpaces.md @@ -0,0 +1,257 @@ +# Workspaces + + + +Storage systems come with different flavours in terms of + +- size +- streaming bandwidth +- IOPS rate + +With a limited price one cannot have all in one. That is the reason why +our fast parallel file systems have restrictions wrt. age of files (see +[TermsOfUse](TermsOfUse)). The mechanism of workspaces enables users to +better manage the data life cycle of their HPC data. Workspaces are +primarily login-related. The tool concept of "workspaces" is common in a +large number of HPC centers. The idea is to request for a workspace +directory in a certain storage system - connected with an expiry date. +After a grace period the data is deleted automatically. The **maximum** +lifetime of a workspace depends on the storage system and is listed +below: + +- ssd: 1 day default, 30 days maximum, +- beegfs_global0: 1 day default, 30 days maximum, +- scratch: 1 day default, 100 days maximum, +- warm_archive: 1 day default, 1 year maximum. + +All workspaces can be extended twice (update: 10 times in scratch now). +There is no problem to use the fastest file systems we have, but keep +track on your data and move it to a cheaper system once you have done +your computations. + +## Workspace commands + +To list all available workspaces use: + + mark@tauruslogin6:~> mark@tauruslogin5:~> ws_find -l<br />Available filesystems:<br />scratch<br />warm_archive<br />ssd<br />beegfs_global0 + +To create a workspace, specify a unique name and its life time like +this: + + mark@tauruslogin6:~> ws_allocate -F scratch SPECint 50 + Info: creating workspace. + /scratch/ws/mark-SPECint + remaining extensions : 10 + remaining time in days: 50 + +**Important:** You can (and should) also add your email address and a +relative date for notification: + + mark@tauruslogin6:~> ws_allocate -F scratch -r 7 -m name.lastname@tu-dresden.de SPECint 50 + +\<verbatim><mark@tauruslogin6>:\~\> ws_allocate: \[options\] +workspace_name duration Options: -h \[ --help \] produce help message -V +\[ --version \] show version -d \[ --duration \] arg (=1) duration in +days -n \[ --name \] arg workspace name -F \[ --filesystem \] arg +filesystem -r \[ --reminder \] arg reminder to be sent n days before +expiration -m \[ --mailaddress \] arg mailaddress to send reminder to +(works only with tu-dresden.de addresses) -x \[ --extension \] extend +workspace -u \[ --username \] arg username -g \[ --group \] group +workspace -c \[ --comment \] arg comment\</verbatim> + +The maximum duration depends on the storage system: + +\<table border="2" cellpadding="2" cellspacing="2"> \<tbody> \<tr> \<td +style="padding-left: 30px;">**Storage system ( use with parameter -F ) +\<br />**\</td> \<td style="padding-left: 30px;"> **Duration** \</td> +\<td style="padding-left: 30px;">**Remarks\<br />**\</td> \</tr> \<tr +style="padding-left: 30px;"> \<td style="padding-left: 30px;">ssd\</td> +\<td style="padding-left: 30px;">30 days\</td> \<td style="padding-left: +30px;">High-IOPS file system (/lustre/ssd) on SSDs.\</td> \</tr> \<tr +style="padding-left: 30px;"> \<td style="padding-left: +30px;">beegfs\</td> \<td style="padding-left: 30px;">30 days\</td> \<td +style="padding-left: 30px;">High-IOPS file system (/lustre/ssd) +onNVMes.\</td> \</tr> \<tr style="padding-left: 30px;"> \<td +style="padding-left: 30px;">scratch\</td> \<td style="padding-left: +30px;">100 days\</td> \<td style="padding-left: 30px;">Scratch file +system (/scratch) with high streaming bandwidth, based on spinning +disks.\</td> \</tr> \<tr style="padding-left: 30px;"> \<td +style="padding-left: 30px;">warm_archive\</td> \<td style="padding-left: +30px;">1 year\</td> \<td style="padding-left: 30px;">Capacity file +system based on spinning disks.\</td> \</tr> \</tbody> \</table> A +workspace can be extended twice. With this command, a *new* duration for +the workspace is set (*not cumulative*): + + mark@tauruslogin6:~> ws_extend -F scratch SPECint 100 + Info: extending workspace. + /scratch/ws/mark-SPECint + remaining extensions : 1 + remaining time in days: 100 + +For email notification, you can either use the option `-m` in the +`ws_allocate` command line or use `ws_send_ical` to get an entry in your +calendar. (%RED%This works only with \<span>tu-dresden.de +\</span>addresses<span class="twiki-macro ENDCOLOR"></span>. Please +configure email redirection if you want to use another address.) + + mark@tauruslogin6:~> ws_send_ical -m ulf.markwardt@tu-dresden.de -F scratch SPECint + +You can easily get an overview of your currently used workspaces with +**`ws_list`**. + + mark@tauruslogin6:~> ws_list + id: benchmark_storage + workspace directory : /warm_archive/ws/mark-benchmark_storage + remaining time : 364 days 23 hours + creation time : Thu Jul 4 13:40:31 2019 + expiration date : Fri Jul 3 13:40:30 2020 + filesystem name : warm_archive + available extensions : 2 + id: SPECint + workspace directory : /scratch/ws/mark-SPECint + remaining time : 99 days 23 hours + creation time : Thu Jul 4 13:36:51 2019 + expiration date : Sat Oct 12 13:36:51 2019 + filesystem name : scratch + available extensions : 1 + +With\<span> **\<span>ws_release -F \<file system> \<workspace +name>\</span>**\</span>, you can delete your workspace. + +### Restoring expired workspaces + +**At expiration time** (or when you manually release your workspace), +your workspace will be moved to a special, hidden directory. For a month +(in \_warm*archive*: 2 months), you can still restore your data into a +valid workspace. For that, use + + mark@tauruslogin6:~> ws_restore -l -F scratch + +to get a list of your expired workspaces, and then restore them like +that into an existing, active workspace **newws**: + + mark@tauruslogin6:~> ws_restore -F scratch myuser-myws-1234567 newws + +**NOTE**: the expired workspace has to be specified using the full name +as listed by `ws_restore -l`, including username prefix and timestamp +suffix (otherwise, it cannot be uniquely identified). \<br />The target +workspace, on the other hand, must be given with just its short name as +listed by `ws_list`, without the username prefix. + +### Linking workspaces in home + +It might be valuable to have links to personal workspaces within a +certain directory, e.g., the user home directory. The command +\`ws_register DIR\` will create and manage links to all personal +workspaces within in the directory \`DIR\`. Calling this command will do +the following: + +- The directory \`DIR\` will be created if necessary +- Links to all personal workspaces will be managed: + - Creates links to all available workspaces if not already present + - Removes links to released workspaces \<p> \</p> \<p> \</p> \<p> + \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> + \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> \</p> \<p> + \</p> \<p> \</p> + +**Remark:** An automatic update of the workspace links can be invoked by +putting the command \`ws_register DIR\` in the user's personal shell +configuration file (e.g., .bashrc, .zshrc). + +## How to Use Workspaces + +We see three typical use cases for the use of workspaces: + +### Per-Job-Storage + +A batch job needs a directory for temporary data. This can be deleted +afterwards. + +Here an example for the use with Gaussian: + + #!/bin/bash + #SBATCH --partition=haswell + #SBATCH --time=96:00:00 + #SBATCH --nodes=1 + #SBATCH --ntasks=1 + #SBATCH --cpus-per-task=24 + + module load modenv/classic + module load gaussian + + COMPUTE_DIR=gaussian_$SLURM_JOB_ID + export GAUSS_SCRDIR=$(ws_allocate -F ssd $COMPUTE_DIR 7) + echo $GAUSS_SCRDIR + + srun g16 inputfile.gjf logfile.log + + test -d $GAUSS_SCRDIR && rm -rf $GAUSS_SCRDIR/* + ws_release -F ssd $COMPUTE_DIR + +In a similar manner, other jobs can make use of temporary workspaces. + +### Data for a Campaign + +For a series of calculations that works on the same data, you could +allocate a workspace in the scratch for e.g. 100 days: + + mark@tauruslogin6:~> ws_allocate -F scratch my_scratchdata 100 + Info: creating workspace. + /scratch/ws/mark-my_scratchdata + remaining extensions : 2 + remaining time in days: 99 + +If you want to share it with your project group, set the correct access +attributes, eg. + + mark@tauruslogin6:~> chmod g+wrx /scratch/ws/mark-my_scratchdata + +And verify it with: + + mark@tauruslogin6:~> ls -la /scratch/ws/mark-my_scratchdata <br />total 8<br />drwxrwx--- 2 mark hpcsupport 4096 Jul 10 09:03 .<br />drwxr-xr-x 5 operator adm 4096 Jul 10 09:01 .. + +### Mid-Term Storage + +For data that seldomly changes but consumes a lot of space, the warm +archive can be used. \<br />Note that this is **mounted read-only**on +the compute nodes, so you cannot use it as a work directory for your +jobs! + + mark@tauruslogin6:~> ws_allocate -F warm_archive my_inputdata 365 + /warm_archive/ws/mark-my_inputdata + remaining extensions : 2 + remaining time in days: 365 + +**Attention:** The warm archive is not built for billions of files. +There is a quota active of 100.000 files per group. Maybe you might want +to tar your data. To see your active quota use: + + mark@tauruslogin6:~> qinfo quota /warm_archive/ws/ + Consuming Entity Type Limit Current Usage + GROUP: hpcsupport LOGICAL_DISK_SPACE 100 TB 51 GB (0%) + GROUP: hpcsupport FILE_COUNT 100000 4 (0%) + GROUP: swtest LOGICAL_DISK_SPACE 100 TB 5 GB (0%) + GROUP: swtest FILE_COUNT 100000 38459 (38%) + TENANT: 8a2373d6-7aaf-4df3-86f5-a201281afdbb LOGICAL_DISK_SPACE 5 PB 1 TB (0%) + +Note that the workspaces reside under the mountpoint `/warm_archive/ws/` +and not \<span>/warm_archive\</span>anymore. + +### Troubleshooting + +If you are getting the error: + + Error: could not create workspace directory! + +you should check the \<span>"locale" \</span>setting of your ssh client. +Some clients (e.g. the one from MacOSX) set values that are not valid on +Taurus. You should overwrite LC_CTYPE and set it to a valid locale value +like: + + export LC_CTYPE=de_DE.UTF-8 + +A list of valid locales can be retrieved via \<br /> + + locale -a + +Please use only UTF8 (or plain) settings. Avoid "iso" codepages! diff --git a/twiki2md/root/Compendium.HPCDA/AlphaCentauri.md b/twiki2md/root/Compendium.HPCDA/AlphaCentauri.md new file mode 100644 index 000000000..ac354b51e --- /dev/null +++ b/twiki2md/root/Compendium.HPCDA/AlphaCentauri.md @@ -0,0 +1,206 @@ +# Alpha Centauri - Multi-GPU cluster with NVIDIA A100 + +The sub-cluster "AlphaCentauri" had been installed for AI-related +computations (ScaDS.AI). + + + +## Hardware + +- 34 nodes, each with + - 8 x NVIDIA A100-SXM4 (40 GB RAM) + - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, MultiThreading + enabled + - 1 TB RAM + - 3.5 TB /tmp local NVMe device +- Hostnames: taurusi\[8001-8034\] +- SLURM partition **`alpha`** + +## Hints for the usage + +These nodes of the cluster can be used like other "normal" GPU nodes +(ml, gpu2). + +<span class="twiki-macro RED"></span> **Attention:** <span +class="twiki-macro ENDCOLOR"></span> These GPUs may only be used with +**CUDA 11** or later. Earlier versions do not recognize the new hardware +properly or cannot fully utilize it. Make sure the software you are +using is built against this library. + +## Typical tasks + +\<span style="font-size: 1em;">Machine learning frameworks as TensorFlow +and PyTorch are industry standards now. The example of work with PyTorch +on the new AlphaCentauri sub-cluster is illustrated below in brief +examples.\</span> + +There are three main options on how to work with Tensorflow and PyTorch +on the Alpha Centauri cluster: **1.** **Modules,** **2.** **Virtual** +**Environments (manual software installation)**, **3. \<a +href="<https://taurus.hrsk.tu-dresden.de/>" +target="\_blank">Jyupterhub\</a> 4. \<a href="Container" +target="\_blank">Containers\</a>.** \<br />\<br /> + +### 1. Modules + +\<span +style`"font-size: 1em;">The easiest way is using the </span><a href="RuntimeEnvironment#Module_Environments" target="_blank">Modules system</a><span style="font-size: 1em;"> and Python virtual environment. Modules are a way to use frameworks, compilers, loader, libraries, and utilities. The software environment for the </span> =alpha` +\<span style="font-size: 1em;"> partition is available under the name +**hiera** \</span> : + + module load modenv/hiera + +Machine learning frameworks **PyTorch** and **TensorFlow**available for +**alpha** partition as modules with CUDA11, GCC 10 and OpenMPI4: + + module load modenv/hiera GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 PyTorch/1.7.1 + module load modenv/hiera GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 TensorFlow/2.4.1 + +%RED%Hint<span class="twiki-macro ENDCOLOR"></span>: To check the +available modules for the **hiera** software environment, use the +command: + + module available + +To show all the dependencies you need to load for the core module, use +the command: + + module spider <name_of_the_module> + +### 2. Virtual environments + +It is necessary to use virtual environments for your work with Python. A +virtual environment is a cooperatively isolated runtime environment. +There are two main options to use virtual environments: venv (standard +Python tool) and + +1.** Vitualenv** is a standard Python tool to create isolated Python +environments. It is the %RED%preferred<span +class="twiki-macro ENDCOLOR"></span> interface for managing +installations and virtual environments on Taurus and part of the Python +modules. + +2\. **\<a +href="<https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment>" +target="\_blank">Conda\</a>** is an alternative method for managing +installations and virtual environments on Taurus. Conda is an +open-source package management system and environment management system +from Anaconda. The conda manager is included in all versions of Anaconda +and Miniconda. + +**%RED%Note%ENDCOLOR%**: There are two sub-partitions of the alpha +partition: alpha and alpha-interactive. Please use alpha-interactive for +the interactive jobs and alpha for the batch jobs. + +Examples with conda and venv will be presented below. Also, there is an +example of an interactive job for the AlphaCentauri sub-cluster using +the `alpha-interactive` partition: + + srun -p alpha-interactive -N 1 -n 1 --gres=gpu:1 --time=01:00:00 --pty bash # Job submission in alpha nodes with 1 gpu on 1 node.<br /><br />mkdir conda-virtual-environments #create a folder, please use Workspaces! <br />cd conda-virtual-environments #go to folder<br />which python #check which python are you using<br />ml modenv/hiera<br />ml Miniconda3<br />which python #check which python are you using now<br />conda create -n conda-testenv python=3.8 #create virtual environment with the name conda-testenv and Python version 3.8 <br />conda activate conda-testenv #activate conda-testenv virtual environment <br />conda deactivate #Leave the virtual environment + +New software for data analytics is emerging faster than we can install +it. If you urgently need a certain version we advise you to manually +install it (the machine learning frameworks and required packages) in +your virtual environment (or use a \<a +href="<https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/Container>" +target="\_blank">container\</a>). + +The **Virtualenv** example: + + srun -p alpha-interactive -N 1 -n 1 --gres=gpu:1 --time=01:00:00 --pty bash #Job submission in alpha nodes with 1 gpu on 1 node. + + mkdir python-environments && cd "$_" # Optional: Create folder. Please use Workspaces!<br /><br />module load modenv/hiera modenv/hiera GCC/10.2.0 CUDA/11.1.1 OpenMPI/4.0.5 Python/3.8.6 #Changing the environment and load necessary modules + which python #Check which python are you using + virtualenv --system-site-packages python-environments/envtest #Create virtual environment + source python-environments/envtest/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ + +Example of using **Conda** with a Pytorch and P\<span style="font-size: +1em;">illow installation: \</span> + + conda activate conda-testenv<br />conda install pytorch torchvision cudatoolkit=11.1 -c pytorch -c nvidia<br />conda install -c anaconda pillow + +Verify installation for the **Venv** example: + + python #Start python + from time import gmtime, strftime + print(strftime("%Y-%m-%d %H:%M:%S", gmtime())) #Example output: 2019-11-18 13:54:16<br /><br />deactivate #Leave the virtual environment + +Verify installation for the **Conda** example: + + python #Start python + import torch + torch.version.__version__ #Example output: 1.8.1 + +There is an example of the batch script for the typical usage of the +Alpha Centauri cluster: + + #!/bin/bash + #SBATCH --mem=40GB # specify the needed memory. Same amount memory as on the GPU + #SBATCH -p alpha # specify Alpha-Centauri partition + #SBATCH --gres=gpu:1 # use 1 GPU per node (i.e. use one GPU per task) + #SBATCH --nodes=1 # request 1 node + #SBATCH --time=00:15:00 # runs for 15 minutes + #SBATCH -c 2 # how many cores per task allocated + #SBATCH -o HLR_name_your_script.out # save output message under HLR_${SLURMJOBID}.out + #SBATCH -e HLR_name_your_script.err # save error messages under HLR_${SLURMJOBID}.err + + module load modenv/hiera + eval "$(conda shell.bash hook)" + conda activate conda-testenv && python machine_learning_example.py + + ## when finished writing, submit with: sbatch <script_name> For example: sbatch machine_learning_script.sh + +The Alpha Centauri sub-cluster has the NVIDIA A100-SXM4 with 40 GB RAM +each. Thus It is prudent to have the same memory on the host (cpu). The +number of cores is free for the users to define, at the moment. + +### 3. JupyterHub + +There is \<a href="JupyterHub" target="\_self">jupyterhub\</a> on +Taurus, where you can simply run your Jupyter notebook on Alpha-Centauri +sub-cluster. Also, for more specific cases you can run a manually +created remote jupyter server. You can find the manual server setup \<a +href="DeepLearning" target="\_blank">here.\</a> However, the simplest +option for beginners is using JupyterHub. + +JupyterHub is available here: \<a +href="<https://taurus.hrsk.tu-dresden.de/jupyter>" +target="\_top"><https://taurus.hrsk.tu-dresden.de/jupyter>\</a> + +\<a +href`"https://taurus.hrsk.tu-dresden.de/jupyter" target="_top"></a>After logging, you can start a new session and configure it. There are simple and advanced forms to set up your session. The =alpha` +partition is available in advanced form. You have to choose the +\<span>alpha\</span> partition in the partition field. The resource +recommendations to allocate are the same as described above for the +batch script example (not confuse `--mem-per-cpu` with `--mem` +parameter). + +### 4. Containers + +On Taurus \<a +href`"https://sylabs.io/" target="_blank">Singularity</a> used as a standard container solution. It can be run on the =alpha` +partition as well. Singularity enables users to have full control of +their environment. Detailed information about containers can be found +[here](Container). + +Nvidia +[NGC](https://developer.nvidia.com/blog/how-to-run-ngc-deep-learning-containers-with-singularity/) +containers can be used as an effective solution for machine learning +related tasks. (Downloading containers requires registration). +Nvidia-prepared containers with software solutions for specific +scientific problems can simplify the deployment of deep learning +workloads on HPC. NGC containers have shown consistent performance +compared to directly run code. + +## \<span style="font-size: 1em;">Examples\</span> + +There is a test example of a deep learning task that could be used for +the test. For the correct work, Pytorch and Pillow package should be +installed in your virtual environment (how it was shown above in the +interactive job example) + +- [example_pytorch_image_recognition.zip](%ATTACHURL%/example_pytorch_image_recognition.zip): + example_pytorch_image_recognition.zip + +\<div id="gtx-trans" style="position: absolute; left: 8px; top: +1248.47px;"> \</div> diff --git a/twiki2md/root/Compendium.HPCDA/PowerAI.md b/twiki2md/root/Compendium.HPCDA/PowerAI.md new file mode 100644 index 000000000..7a03aa31e --- /dev/null +++ b/twiki2md/root/Compendium.HPCDA/PowerAI.md @@ -0,0 +1,82 @@ +# PowerAI Documentation Links + +There are different documentation sources for users to learn more about +the PowerAI Framework for Machine Learning. In the following the links +are valid for PowerAI version 1.5.4 + +## General Overview: + +- \<a + href="<https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.5.3/welcome/welcome.htm>" + target="\_blank" title="Landing Page">Landing Page\</a> (note that + you can select differnet PowerAI versions with the drop down menu + "Change Product or version") +- \<a + href="<https://developer.ibm.com/linuxonpower/deep-learning-powerai/>" + target="\_blank" title="PowerAI Developer Portal">PowerAI Developer + Portal \</a>(Some Use Cases and examples) +- \<a + href="<https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.5.4/navigation/pai_software_pkgs.html>" + target="\_blank" title="Included Software Packages">Included + Software Packages\</a> (note that you can select different PowerAI + versions with the drop down menu "Change Product or version") + +## Specific User Howtos. Getting started with...: + +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted.htm>" + target="\_blank" title="Getting Started with PowerAI">PowerAI\</a> +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_caffe.html>" + target="\_blank" title="Caffe">Caffe\</a> +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_tensorflow.html?view=kc>" + target="\_blank" title="Tensorflow">TensorFlow\</a> +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_tensorflow_prob.html?view=kc>" + target="\_blank" title="Tensorflow Probability">TensorFlow + Probability\</a>\<br />This release of PowerAI includes TensorFlow + Probability. TensorFlow Probability is a library for probabilistic + reasoning and statistical analysis in TensorFlow. +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_tensorboard.html?view=kc>" + target="\_blank" title="Tensorboard">TensorBoard\</a> +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_snapml.html>" + target="\_blank">Snap ML\</a>\<br />This release of PowerAI includes + Snap Machine Learning (Snap ML). Snap ML is a library for training + generalized linear models. It is being developed at IBM with the + vision to remove training time as a bottleneck for machine learning + applications. Snap ML supports many classical machine learning + models and scales gracefully to data sets with billions of examples + or features. It also offers distributed training, GPU acceleration, + and supports sparse data structures. +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_pytorch.html>" + target="\_blank">PyTorch\</a>\<br />This release of PowerAI includes + the community development preview of PyTorch 1.0 (rc1). PowerAI's + PyTorch includes support for IBM's Distributed Deep Learning (DDL) + and Large Model Support (LMS). +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_caffe2ONNX.html>" + target="\_blank">Caffe2 and ONNX\</a>\<br />This release of PowerAI + includes a Technology Preview of Caffe2 and ONNX. Caffe2 is a + companion to PyTorch. PyTorch is great for experimentation and rapid + development, while Caffe2 is aimed at production environments. ONNX + (Open Neural Network Exchange) provides support for moving models + between those frameworks. +- \<a + href="<https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_getstarted_ddl.html?view=kc>" + target="\_blank" title="Distributed Deep Learning">Distributed Deep + Learning\</a> (DDL). \<br />Works up to 4 TaurusML worker nodes. + (Larger models with more nodes are possible with PowerAI Enterprise) + +## PowerAI Container + +We have converted the official Docker container to Singularity. Here is +a documentation about the Docker base container, including a table with +the individual software versions of the packages installed within the +container: + +- \<a href="<https://hub.docker.com/r/ibmcom/powerai/>" + target="\_blank">PowerAI Docker Container Docu\</a> diff --git a/twiki2md/root/Compendium.HPCDA/WarmArchive.md b/twiki2md/root/Compendium.HPCDA/WarmArchive.md new file mode 100644 index 000000000..b76327ea7 --- /dev/null +++ b/twiki2md/root/Compendium.HPCDA/WarmArchive.md @@ -0,0 +1,27 @@ +# **Warm Archive** + +<span class="twiki-macro RED"></span> **This page is under construction. +The functionality is not there, yet.** <span +class="twiki-macro ENDCOLOR"></span> + +The warm archive is intended a storage space for the duration of a +running HPC-DA project. It can NOT substitute a long-term archive. It +consists of 20 storage nodes with a net capacity of 10 PB. Within Taurus +(including the HPC-DA nodes), the management software "Quobyte" enables +access via + +- native quobyte client - read-only from compute nodes, read-write + from login and nvme nodes +- S3 - read-write from all nodes, +- Cinder (from OpenStack cluster). + +For external access, you can use: + +- S3 to \<bucket>.s3.taurusexport.hrsk.tu-dresden.de +- or normal file transfer via our taurusexport nodes (see + DataManagement). + +An HPC-DA project can apply for storage space in the warm archive. This +is limited in capacity and duration. + +-- Main.UlfMarkwardt - 2019-05-14 diff --git a/twiki2md/root/Compendium.SystemTaurus/RomeNodes.md b/twiki2md/root/Compendium.SystemTaurus/RomeNodes.md new file mode 100644 index 000000000..cf1bf428c --- /dev/null +++ b/twiki2md/root/Compendium.SystemTaurus/RomeNodes.md @@ -0,0 +1,99 @@ +# AMD EPYC Nodes (Zen 2, Codename "Rome") + +The nodes **taurusi\[7001-7192\]** are each equipped 2x AMD EPYC 7702 +64-Core processors, so there is a total of 128 physical cores in each +node. SMT is also active, so in total, 256 logical cores are available +per node. + +Each node brings 512 GB of main memory, so you can request roughly +1972MB per logical core (using --mem-per-cpu). Note that you will always +get the memory for the logical core sibling too, even if you do not +intend to use SMT (SLURM_HINT=nomultithread which is the default). + +You can use them by specifying partition romeo: **-p romeo** + +**Note:** If you are running a job here with only ONE process (maybe +multiple cores), please explicitely set the option `-n 1` ! + +Be aware that software built with Intel compilers and `-x*` optimization +flags will not run on those AMD processors! That's why most older +modules built with intel toolchains are not availabe on **romeo**. + +We provide the script: **ml_arch_avail** that you can use to check if a +certain module is available on rome architecture. + +## Example, running CP2K on Rome + +First, check what CP2K modules are available in general: + + $ ml spider CP2K + #or: + $ ml avail CP2K/ + +You will see that there are several different CP2K versions avail, built +with different toolchains. Now let's assume you have to decided you want +to run CP2K version 6 at least, so to check if those modules are built +for rome, use: + + $ ml_arch_avail CP2K/6 + CP2K/6.1-foss-2019a: haswell, rome + CP2K/6.1-foss-2019a-spglib: haswell, rome + CP2K/6.1-intel-2018a: sandy, haswell + CP2K/6.1-intel-2018a-spglib: haswell + +There you will see that only the modules built with **foss** toolchain +are available on architecture "rome", not the ones built with **intel**. +So you can load e.g.: + + $ ml CP2K/6.1-foss-2019a + +Then, when writing your batch script, you have to specify the **romeo** +partition. Also, if e.g. you wanted to use an entire ROME node (no SMT) +and fill it with MPI ranks, it could look like this: + + #!/bin/bash + #SBATCH --partition=romeo + #SBATCH --ntasks-per-node=128 + #SBATCH --nodes=1 + #SBATCH --mem-per-cpu=1972 + + srun cp2k.popt input.inp + +## Using the Intel toolchain on Rome + +Currently, we have only newer toolchains starting at `intel/2019b` +installed for the Rome nodes. Even though they have AMD CPUs, you can +still use the Intel compilers on there and they don't even create +bad-performaning code. When using the MKL up to version 2019, though, +you should set the following environment variable to make sure that AVX2 +is used: + + export MKL_DEBUG_CPU_TYPE=5 + +Without it, the MKL does a CPUID check and disables AVX2/FMA on +non-Intel CPUs, leading to much worse performance. **NOTE:** in version +2020, Intel has removed this environment variable and added separate Zen +codepaths to the library. However, they are still incomplete and do not +cover every BLAS function. Also, the Intel AVX2 codepaths still seem to +provide somewhat better performance, so a new workaround would be to +overwrite the `mkl_serv_intel_cpu_true` symbol with a custom function: + + int mkl_serv_intel_cpu_true() { + return 1; + } + +and preloading this in a library: + + gcc -shared -fPIC -o libfakeintel.so fakeintel.c + export LD_PRELOAD=libfakeintel.so + +As for compiler optimization flags, `-xHOST` does not seem to produce +best-performing code in every case on Rome. You might want to try +`-mavx2 -fma` instead. + +### Intel MPI + +We have seen only half the theoretical peak bandwidth via Infiniband +between two nodes, whereas OpenMPI got close to the peak bandwidth, so +you might want to avoid using Intel MPI on romeo if your application +heavily relies on MPI communication until this issue is resolved. diff --git a/twiki2md/root/Compendium.WebHome/BatchSystems.md b/twiki2md/root/Compendium.WebHome/BatchSystems.md new file mode 100644 index 000000000..137dc5c76 --- /dev/null +++ b/twiki2md/root/Compendium.WebHome/BatchSystems.md @@ -0,0 +1,64 @@ +# Batch Systems + +Applications on an HPC system can not be run on the login node. They +have to be submitted to compute nodes with dedicated resources for user +jobs. Normally a job can be submitted with these data: + +- number of CPU cores, +- requested CPU cores have to belong on one node (OpenMP programs) or + can distributed (MPI), +- memory per process, +- maximum wall clock time (after reaching this limit the process is + killed automatically), +- files for redirection of output and error messages, +- executable and command line parameters. + +Depending on the batch system the syntax differs slightly: + +- [Slurm](Slurm) (taurus, venus) + +If you are confused by the different batch systems, you may want to +enjoy this [batch system commands translation +table](http://slurm.schedmd.com/rosetta.pdf). + +\<u>Comment\</u> + +Please keep in mind that for a large runtime a computation may not reach +its end. Try to create shorter runs (4...8 hours) and use checkpointing. +Here is an extreme example from literature for the waste of large +computing resources due to missing checkpoints: + +*Earth was a supercomputer constructed to find the question to the +answer to the Life, the Universe, and Everything by a race of +hyper-intelligent pan-dimensional beings. Unfortunately 10 million years +later, and five minutes before the program had run to completion, the +Earth was destroyed by Vogons.* (Adams, D. The Hitchhikers Guide Through +the Galaxy) + +# Exclusive Reservation of Hardware + +If you need for some special reasons, e.g., for benchmarking, a project +or paper deadline, parts of our machines exclusively, we offer the +opportunity to request and reserve these parts for your project. + +Please send your request **7 working days** before the reservation +should start (as that's our maximum time limit for jobs and it is +therefore not guaranteed that resources are available on shorter notice) +with the following information to the [HPC +support](mailto:hpcsupport@zih.tu-dresden.de?subject=Request%20for%20a%20exclusive%20reservation%20of%20hardware&body=Dear%20HPC%20support%2C%0A%0AI%20have%20the%20following%20request%20for%20a%20exclusive%20reservation%20of%20hardware%3A%0A%0AProject%3A%0AReservation%20owner%3A%0ASystem%3A%0AHardware%20requirements%3A%0ATime%20window%3A%20%3C%5Byear%5D%3Amonth%3Aday%3Ahour%3Aminute%20-%20%5Byear%5D%3Amonth%3Aday%3Ahour%3Aminute%3E%0AReason%3A): + +- `Project:` *\<Which project will be credited for the reservation?>* +- `Reservation owner:` *\<Who should be able to run jobs on the + reservation? I.e., name of an individual user or a group of users + within the specified project.>* +- `System:` *\<Which machine should be used?>* +- `Hardware requirements:` *\<How many nodes and cores do you need? Do + you have special requirements, e.g., minimum on main memory, + equipped with a graphic card, special placement within the network + topology?>* +- `Time window:` *\<Begin and end of the reservation in the form + year:month:dayThour:minute:second e.g.: 2020-05-21T09:00:00>* +- `Reason:` *\<Reason for the reservation.>* + +`Please note` that your project CPU hour budget will be credited for the +reserved hardware even if you don't use it. diff --git a/twiki2md/root/Compendium/CheckpointRestart.md b/twiki2md/root/Compendium/CheckpointRestart.md new file mode 100644 index 000000000..a2f66d74e --- /dev/null +++ b/twiki2md/root/Compendium/CheckpointRestart.md @@ -0,0 +1,164 @@ +# Checkpoint/Restart + +If you wish to do checkpointing, your first step should always be to +check if your application already has such capabilities built-in, as +that is the most stable and safe way of doing it. Applications that are +known to have some sort of **native checkpointing** include: + +Abaqus, Amber, Gaussian, GROMACS, LAMMPS, NAMD, NWChem, Quantum +Espresso, STAR-CCM+, VASP + +In case your program does not natively support checkpointing, there are +attempts at creating generic checkpoint/restart solutions that should +work application-agnostic. One such project which we recommend is DMTCP: +Distributed MultiThreaded CheckPointing +(<http://dmtcp.sourceforge.net>). + +It is available on Taurus after having loaded the "dmtcp" module: + + module load DMTCP + +While our batch system of choice, SLURM, also provides a checkpointing +interface to the user, unfortunately it does not yet support DMTCP at +this time. However, there are ongoing efforts of writing a SLURM plugin +that hopefully will change this in the near future. We will update this +documentation as soon as it becomes available. + +In order to help with setting up checkpointing for your jobs, we have +written a few scripts that make it easier to utilize DMTCP together with +SLURM on our cluster. + +### Using our [chain job](Slurm#Chain_Jobs) script + +For long-running jobs that you wish to split into multiple shorter jobs, +thereby enabling the job scheduler to fill the cluster much more +efficiently and also providing some level of fault-tolerance, we have +written a script that automatically creates a number of jobs for your +desired runtime and adds the checkpoint/restart bits transparently to +your batch script. You just have to specify the targeted total runtime +of your calculation and the interval in which you wish to do +checkpoints. The latter (plus the time it takes to write the checkpoint) +will then be the runtime of the individual jobs. This should be targeted +at below 24 hours in order to be able to run on all of our [haswell64 +partitions](SystemTaurus#Run_45time_Limits), for increased +fault-tolerance it can be chosen even shorter. + +To use it, first add a `dmtcp_launch` before your application call in +your batch script. In case of MPI applications, you have to add the +parameters "--ib --rm" and put it between srun and your application +call, e.g.: + + srun dmtcp_launch --ib --rm ./my-mpi-application + +`Note:` we have successfully tested checkpointing MPI applications with +the latest `Intel MPI` (module: intelmpi/2018.0.128). While it might +work with other MPI libraries, too, we have no experience in this +regard, so you should always try it out before using it for your +productive jobs. + +Then just substitute your usual `sbatch` call with `dmtcp_sbatch` and be +sure to specify the `-t` and `-i` parameters (don't forget you need to +have loaded the dmtcp module). + + dmtcp_sbatch -t 2-00:00:00 -i 28000,800 my_batchfile.sh + +With `-t|--time` you set the total runtime of your calculation over all +jobs. This will be replaced in the batch script in order to shorten your +individual jobs. + +The parameter `-i|--interval` sets the time in seconds for your +checkpoint intervals. It can optionally include a timeout for writing +out the checkpoint files, separated from the interval time via comma +(defaults to 10 minutes). + +In the above example, there will be 6 jobs each running 8 hours, so +about 2 days in total. + +Hints: + +- If you see your first job running into the timelimit, that probably + means the timeout for writing out checkpoint files does not suffice + and should be increased. Our tests have shown that it takes + approximately 5 minutes to write out the memory content of a fully + utilized 64GB haswell node, so you should choose at least 10 minutes + there (better err on the side of caution). Your mileage may vary, + depending on how much memory your application uses. If your memory + content is rather incompressible, it might be a good idea to disable + the checkpoint file compression by setting: `export DMTCP_GZIP=0` +- Note that all jobs the script deems necessary for your chosen + timelimit/interval values are submitted right when first calling the + script. If your applications takes considerably less time than what + you specified, some of the individual jobs will be unnecessary. As + soon as one job does not find a checkpoint to resume from, it will + cancel all subsequent jobs for you. +- See `dmtcp_sbatch -h` for a list of available parameters and more + help + +What happens in your work directory? + +- The script will create subdirectories named `ckpt_<jobid>` for each + individual job it puts into the queue +- It will also create modified versions of your batch script, one for + the first job (`ckpt_launch.job`), one for the middle parts + (`ckpt_rstr.job`) and one for the final job (`cpkt_rstr_last.job`) +- Inside the `ckpt_*` directories you will also find a file + (`job_ids`) containing all job ids that are related to this job + chain + +If you wish to restart manually from one of your checkpoints (e.g., if +something went wrong in your later jobs or the jobs vanished from the +queue for some reason), you have to call `dmtcp_sbatch` with the +`-r|--resume` parameter, specifying a cpkt\_\* directory to resume from. +Then it will use the same parameters as in the initial run of this job +chain. If you wish to adjust the timelimit, for instance because you +realized that your original limit was too short, just use the +`-t|--time` parameter again on resume. + +### Using DMTCP manually + +If for some reason our automatic chain job script is not suitable to +your use-case, you could also just use DMTCP on its own. In the +following we will give you step-by-step instructions on how to +checkpoint your job manually: 1 Load the dmtcp module: +`module load dmtcp` 1 DMTCP usually runs an additional process that +manages the creation of checkpoints and such, the so called +`coordinator`. It must be started in your batch script before the actual +start of your application. To help you with this process, we have +created a bash function called `start_coordinator` that is available +after sourcing `$DMTCP_ROOT/bin/bash` in your script. The coordinator +can take a handful of parameters, see `man dmtcp_coordinator`. Via `-i` +you can specify an interval (in seconds) in which checkpoint files are +to be created automatically. With `--exit-after-ckpt` the application +will be terminated after the first checkpoint has been created, which +can be useful if you wish to implement some sort of job chaining on your +own. 1 In front of your program call, you have to add the wrapper +script: `dmtcp_launch` \<verbatim>#/bin/bash #SBATCH --time=00:01:00 +#SBATCH --cpus-per-task=8 #SBATCH --mem-per-cpu=1500 + +source $DMTCP_ROOT/bin/bash start_coordinator -i 40 --exit-after-ckpt + +dmtcp_launch ./my-application #for sequential/multithreaded applications +#or: srun dmtcp_launch --ib --rm ./my-mpi-application #for MPI +applications\</verbatim> + +This will create a checkpoint automatically after 40 seconds and then +terminate your application and with it the job. If the job runs into its +timelimit (here: 60 seconds), the time to write out the checkpoint was +probably not long enough. If all went well, you should find cpkt\* files +in your work directory together with a script called +./dmtcp_restart_script.sh that can be used to resume from the +checkpoint. 1 To restart your application, you need another batch file +(similiar to the one above) where once again you first have to start the +DMTCP coordinator. The requested resources should match those of your +original job. If you do not wish to create another checkpoint in your +restarted run again, you can omit the -i and --exit-after-ckpt +parameters this time. Afterwards, the application must be run using the +restart script, specifying the host and port of the coordinator (they +have been exported by the start_coordinator function). Example: +\<verbatim>#/bin/bash #SBATCH --time=00:01:00 #SBATCH --cpus-per-task=8 +#SBATCH --mem-per-cpu=1500 + +source $DMTCP_ROOT/bin/bash start_coordinator -i 40 --exit-after-ckpt + +./dmtcp_restart_script.sh -h $DMTCP_COORD_HOST -p +$DMTCP_COORD_PORT\</verbatim> diff --git a/twiki2md/root/Container/SingularityExampleDefinitions.md b/twiki2md/root/Container/SingularityExampleDefinitions.md new file mode 100644 index 000000000..429396b61 --- /dev/null +++ b/twiki2md/root/Container/SingularityExampleDefinitions.md @@ -0,0 +1,108 @@ +# Singularity Example Definitions + +## Basic example + +A usual workflow to create Singularity Definition consists of the +following steps: + +- Start from base image +- Install dependencies + - Package manager + - Other sources +- Build & Install own binaries +- Provide entrypoints & metadata + +An example doing all this: + + Bootstrap: docker + From: alpine + + %post + . /.singularity.d/env/10-docker*.sh + + apk add g++ gcc make wget cmake + + wget https://github.com/fmtlib/fmt/archive/5.3.0.tar.gz + tar -xf 5.3.0.tar.gz + mkdir build && cd build + cmake ../fmt-5.3.0 -DFMT_TEST=OFF + make -j$(nproc) install + cd .. + rm -r fmt-5.3.0* + + cat hello.cpp + #include <fmt/format.h> + + int main(int argc, char** argv){ + if(argc == 1) fmt::print("No arguments passed!\n"); + else fmt::print("Hello {}!\n", argv[1]); + } + EOF + + g++ hello.cpp -o hello -lfmt + mv hello /usr/bin/hello + + %runscript + hello "$@" + + %labels + Author Alexander Grund + Version 1.0.0 + + %help + Display a greeting using the fmt library + + Usage: + ./hello + +## CUDA + CuDNN + OpenMPI + +- Chosen CUDA version depends on installed driver of host +- OpenMPI needs PMI for SLURM integration +- OpenMPI needs CUDA for GPU copy-support +- OpenMPI needs ibverbs libs for Infiniband +- openmpi-mca-params.conf required to avoid warnings on fork (OK on + taurus) +- Environment variables SLURM_VERSION, OPENMPI_VERSION can be set to + choose different version when building the container + +<!-- --> + + Bootstrap: docker + From: nvidia/cuda-ppc64le:10.1-cudnn7-devel-ubuntu18.04 + + %labels + Author ZIH + Requires CUDA driver 418.39+. + + %post + . /.singularity.d/env/10-docker*.sh + + apt-get update + apt-get install -y cuda-compat-10.1 + apt-get install -y libibverbs-dev ibverbs-utils + # Install basic development tools + apt-get install -y gcc g++ make wget python + apt-get autoremove; apt-get clean + + cd /tmp + + : ${SLURM_VERSION:=17-02-11-1} + wget https://github.com/SchedMD/slurm/archive/slurm-${SLURM_VERSION}.tar.gz + tar -xf slurm-${SLURM_VERSION}.tar.gz + cd slurm-slurm-${SLURM_VERSION} + ./configure --prefix=/usr/ --sysconfdir=/etc/slurm --localstatedir=/var --disable-debug + make -C contribs/pmi2 -j$(nproc) install + cd .. + rm -rf slurm-* + + : ${OPENMPI_VERSION:=3.1.4} + wget https://download.open-mpi.org/release/open-mpi/v${OPENMPI_VERSION%.*}/openmpi-${OPENMPI_VERSION}.tar.gz + tar -xf openmpi-${OPENMPI_VERSION}.tar.gz + cd openmpi-${OPENMPI_VERSION}/ + ./configure --prefix=/usr/ --with-pmi --with-verbs --with-cuda + make -j$(nproc) install + echo "mpi_warn_on_fork = 0" >> /usr/etc/openmpi-mca-params.conf + echo "btl_openib_warn_default_gid_prefix = 0" >> /usr/etc/openmpi-mca-params.conf + cd .. + rm -rf openmpi-* diff --git a/twiki2md/root/Container/SingularityRecipeHints.md b/twiki2md/root/Container/SingularityRecipeHints.md new file mode 100644 index 000000000..016007eb4 --- /dev/null +++ b/twiki2md/root/Container/SingularityRecipeHints.md @@ -0,0 +1,75 @@ +# Singularity Recipe Hints + +## Index + +[GUI (X11) applications](#X11) + +------------------------------------------------------------------------ + +### \<a name="X11">\</a>GUI (X11) applications + +Running GUI applications inside a singularity container is possible out +of the box. Check the following definition: + + Bootstrap: docker + From: centos:7 + + %post + yum install -y xeyes + +This image may be run with + + singularity exec xeyes.sif xeyes. + +This works because all the magic is done by singularity already like +setting $DISPLAY to the outside display and mounting $HOME so +$HOME/.Xauthority (X11 authentification cookie) is found. When you are +using \`--contain\` or \`--no-home\` you have to set that cookie +yourself or mount/copy it inside the container. Similar for +\`--cleanenv\` you have to set $DISPLAY e.g. via + + export SINGULARITY_DISPLAY=$DISPLAY + +When you run a container as root (via \`sudo\`) you may need to allow +root for your local display port: \<verbatim>xhost ++local:root\</verbatim> + +#### Hardware acceleration + +If you want hardware acceleration you **may** need +[VirtualGL](https://virtualgl.org). An example definition file is as +follows: + + Bootstrap: docker + From: centos:7 + + %post + yum install -y glx-utils # for glxgears example app + + yum install -y curl + VIRTUALGL_VERSION=2.6.2 # Replace by required (e.g. latest) version + + curl -sSL https://downloads.sourceforge.net/project/virtualgl/"${VIRTUALGL_VERSION}"/VirtualGL-"${VIRTUALGL_VERSION}".x86_64.rpm -o VirtualGL-"${VIRTUALGL_VERSION}".x86_64.rpm + yum install -y VirtualGL*.rpm + /opt/VirtualGL/bin/vglserver_config -config +s +f -t + rm VirtualGL-*.rpm + + # Install video drivers AFTER VirtualGL to avoid them being overwritten + yum install -y mesa-dri-drivers # for e.g. intel integrated GPU drivers. Replace by your driver + +You can now run the application with vglrun: + + singularity exec vgl.sif vglrun glxgears + +**Attention:**Using VirtualGL may not be required at all and could even +decrease the performance. To check install e.g. glxgears as above and +your graphics driver (or use the VirtualGL image from above) and disable +vsync: + + vblank_mode=0 singularity exec vgl.sif glxgears + +Compare the FPS output with the glxgears prefixed by vglrun (see above) +to see which produces more FPS (or runs at all). + +**NVIDIA GPUs** need the \`--nv\` parameter for the singularity command: +\`singularity exec --nv vgl.sif glxgears\` diff --git a/twiki2md/root/Container/VMTools.md b/twiki2md/root/Container/VMTools.md new file mode 100644 index 000000000..bb50c571a --- /dev/null +++ b/twiki2md/root/Container/VMTools.md @@ -0,0 +1,142 @@ +# Singularity on Power9 / ml partition + +Building Singularity containers from a recipe on Taurus is normally not +possible due to the requirement of root (administrator) rights, see +[Containers](Containers). For obvious reasons users on Taurus cannot be +granted root permissions. + +The solution is to build your container on your local Linux machine by +executing something like + + sudo singularity build myContainer.sif myDefinition.def + +Then you can copy the resulting myContainer.sif to Taurus and execute it +there. + +This does **not** work on the ml partition as it uses the Power9 +architecture which your laptop likely doesn't. + +For this we provide a Virtual Machine (VM) on the ml partition which +allows users to gain root permissions in an isolated environment. The +workflow to use this manually is described at [another page](Cloud) but +is quite cumbersome. + +To make this easier two programs are provided: `buildSingularityImage` +and `startInVM` which do what they say. The latter is for more advanced +use cases so you should be fine using *buildSingularityImage*, see the +following section. + +**IMPORTANT:** You need to have your default SSH key without a password +for the scripts to work as entering a password through the scripts is +not supported. + +**The recommended workflow** is to create and test a definition file +locally. You usually start from a base Docker container. Those typically +exist for different architectures but with a common name (e.g. +'ubuntu:18.04'). Singularity automatically uses the correct Docker +container for your current architecture when building. So in most cases +you can write your definition file, build it and test it locally, then +move it to Taurus and build it on Power9 without any further changes. +However, sometimes Docker containers for different architectures have +different suffixes, in which case you'd need to change that when moving +to Taurus. + +## Building a Singularity container in a job + +To build a singularity container on Taurus simply run: + + buildSingularityImage --arch=power9 myContainer.sif myDefinition.def + +This command will submit a batch job and immediately return. Note that +while "power9" is currently the only supported architecture, the +parameter is still required. If you want it to block while the image is +built and see live output, use the parameter `--interactive`: + + buildSingularityImage --arch=power9 --interactive myContainer.sif myDefinition.def + +There are more options available which can be shown by running +`buildSingularityImage --help`. All have reasonable defaults.The most +important ones are: + +- `--time <time>`: Set a higher job time if the default time is not + enough to build your image and your job is cancelled before + completing. The format is the same as for SLURM. +- `--tmp-size=<size in GB>`: Set a size used for the temporary + location of the Singularity container. Basically the size of the + extracted container. +- `--output=<file>`: Path to a file used for (log) output generated + while building your container. +- Various singularity options are passed through. E.g. + `--notest, --force, --update`. See, e.g., `singularity --help` for + details. + +For **advanced users** it is also possible to manually request a job +with a VM (`srun -p ml --cloud=kvm ...`) and then use this script to +build a Singularity container from within the job. In this case the +`--arch` and other SLURM related parameters are not required. The +advantage of using this script is that it automates the waiting for the +VM and mounting of host directories into it (can also be done with +`startInVM`) and creates a temporary directory usable with Singularity +inside the VM controlled by the `--tmp-size` parameter. + +## Filesystem + +**Read here if you have problems like "File not found".** + +As the build starts in a VM you may not have access to all your files. +It is usually bad practice to refer to local files from inside a +definition file anyway as this reduces reproducibility. However common +directories are available by default. For others, care must be taken. In +short: + +- /home/$USER, /scratch/$USER are available and should be used +- /scratch/\<group> also works for all groups the users is in +- /projects/\<group> similar, but is read-only! So don't use this to + store your generated container directly, but rather move it here + afterwards +- /tmp is the VM local temporary directory. All files put here will be + lost! + +If the current directory is inside (or equal to) one of the above +(except /tmp), then relative paths for container and definition work as +the script changes to the VM equivalent of the current directory. +Otherwise you need to use absolute paths. Using `~` in place of `$HOME` +does work too. + +Under the hood, the filesystem of Taurus is mounted via SSHFS at +/host_data, so if you need any other files they can be found there. + +There is also a new SSH key named "kvm" which is created by the scripts +and authorized inside the VM to allow for password-less access to SSHFS. +This is stored at `~/.ssh/kvm` and regenerated if it does not exist. It +is also added to `~/.ssh/authorized_keys`. Note that removing the key +file does not remove it from `authorized_keys`, so remove it manually if +you need to. It can be easily identified by the comment on the key. +However, removing this key is **NOT** recommended, as it needs to be +re-generated on every script run. + +## Starting a Job in a VM + +Especially when developing a Singularity definition file it might be +useful to get a shell directly on a VM. To do so simply run: + + startInVM --arch=power9 + +This will execute an `srun` command with the `--cloud=kvm` parameter, +wait till the VM is ready, mount all folders (just like +`buildSingularityImage`, see the Filesystem section above) and come back +with a bash inside the VM. Inside that you are root, so you can directly +execute `singularity build` commands. + +As usual more options can be shown by running `startInVM --help`, the +most important one being `--time`. + +There are 2 special use cases for this script: 1 Execute an arbitrary +command inside the VM instead of getting a bash by appending the command +to the script. Example: \<pre>startInVM --arch=power9 singularity build +\~/myContainer.sif \~/myDefinition.def\</pre> 1 Use the script in a job +manually allocated via srun/sbatch. This will work the same as when +running outside a job but will **not** start a new job. This is useful +for using it inside batch scripts, when you already have an allocation +or need special arguments for the job system. Again you can run an +arbitrary command by passing it to the script. diff --git a/twiki2md/root/DataManagement/DataMover.md b/twiki2md/root/DataManagement/DataMover.md new file mode 100644 index 000000000..3479ed5ca --- /dev/null +++ b/twiki2md/root/DataManagement/DataMover.md @@ -0,0 +1,85 @@ +## Transferring files between HPC systems + +We provide a special data transfer machine providing the global file +systems of each ZIH HPC system. This machine is not accessible through +SSH as it is dedicated to data transfers. To move or copy files from one +file system to another file system you have to use the following +commands: + +- **dtcp**, **dtls, dtmv**, **dtrm, dtrsync**, **dttar** + +These commands submit a job to the data transfer machines performing the +selected command. Except the following options their syntax is the same +than the shell command without **dt** prefix (cp, ls, mv, rm, rsync, +tar). + +Additional options: + +| | | +|-------------------|-------------------------------------------------------------------------------| +| --account=ACCOUNT | Assign data transfer job to specified account. | +| --blocking | Do not return until the data transfer job is complete. (default for **dtls**) | +| --time=TIME | Job time limit (default 18h). | + +- **dtinfo**, **dtqueue**, **dtq**, **dtcancel** + +**dtinfo** shows information about the nodes of the data transfer +machine (like sinfo). **dtqueue** and **dtq** shows all the data +transfer jobs that belong to you (like squeue -u $USER). **dtcancel** +signals data transfer jobs (like scancel). + +To identify the mount points of the different HPC file systems on the +data transfer machine, please use **dtinfo**. It shows an output like +this (attention, the mount points can change without an update on this +web page) : + +| HPC system | Local directory | Directory on data transfer machine | +|:-------------------|:-----------------|:-----------------------------------| +| Taurus, Venus | /scratch/ws | /scratch/ws | +| | /ssd/ws | /ssd/ws | +| | /warm_archive/ws | /warm_archive/ws | +| | /home | /home | +| | /projects | /projects | +| **Archive** | | /archiv | +| **Group Storages** | | /grp/\<group storage> | + +### How to copy your data from an old scratch (Atlas, Triton, Venus) to our new scratch (Taurus) + +You can use our tool called Datamover to copy your data from A to B. + + dtcp -r /scratch/<project or user>/<directory> /projects/<project or user>/<directory> # or + dtrsync -a /scratch/<project or user>/<directory> /lustre/ssd/<project or user>/<directory> + +Options for dtrsync: + + -a, --archive archive mode; equals -rlptgoD (no -H,-A,-X) + + -r, --recursive recurse into directorie + -l, --links copy symlinks as symlinks + -p, --perms preserve permissions + -t, --times preserve modification times + -g, --group preserve group + -o, --owner preserve owner (super-user only) + -D same as --devices --specials + +Example: + + dtcp -r /scratch/rotscher/results /luste/ssd/rotscher/ # or + new: dtrsync -a /scratch/rotscher/results /home/rotscher/results + +### Examples on how to use data transfer commands: + +Copying data from Taurus' /scratch to Taurus' /projects + + % dtcp -r /scratch/jurenz/results/ /home/jurenz/ + +Moving data from Venus' /sratch to Taurus' /luste/ssd + + % dtmv /scratch/jurenz/results/ /lustre/ssd/jurenz/results + +TGZ data from Taurus' /scratch to the Archive + + % dttar -czf /archiv/jurenz/taurus_results_20140523.tgz /scratch/jurenz/results + +**%RED%Note:<span class="twiki-macro ENDCOLOR"></span>**Please do not +generate files in the archive much larger that 500 GB. diff --git a/twiki2md/root/DataManagement/ExportNodes.md b/twiki2md/root/DataManagement/ExportNodes.md new file mode 100644 index 000000000..276ba7562 --- /dev/null +++ b/twiki2md/root/DataManagement/ExportNodes.md @@ -0,0 +1,134 @@ +# Move data to/from ZIH's file systems + + + +## Export Nodes + +To copy large data to/from the HPC machines, the Taurus export nodes +should be used. While it is possible to transfer small files directly +via the login nodes, they are not intended to be used that way and there +exists a CPU time limit on the login nodes, killing each process that +takes up too much CPU time, which also affects file-copy processes if +the copied files are very large. The export nodes have a better uplink +(10GBit/s) and are generally the preferred way to transfer your data. +Note that you cannot log in via ssh to the export nodes, but only use +scp, rsync or sftp on them. + +They are reachable under the hostname: +**taurusexport.hrsk.tu-dresden.de** (or +taurusexport3.hrsk.tu-dresden.de, taurusexport4.hrsk.tu-dresden.de). + +## Access from Linux machine + +There are three possibilities to exchange data between your local +machine (lm) and the hpc machines (hm), which are explained in the +following abstract in more detail. + +### SCP + +Type following commands in the terminal when you are in the directory of +the local machine. + +#### Copy data from lm to hm + + # Copy file + scp <file> <zih-user>@<machine>:<target-location> + # Copy directory + scp -r <directory> <zih-user>@<machine>:<target-location> + +#### Copy data from hm to lm + + # Copy file + scp <zih-user>@<machine>:<file> <target-location> + # Copy directory + scp -r <zih-user>@<machine>:<directory> <target-location> + +Example: + + scp helloworld.txt mustermann@taurusexport.hrsk.tu-dresden.de:~/. + +Additional information: <http://www.computerhope.com/unix/scp.htm> + +### SFTP + +Is a virtual command line, which you could access with the following +line: + + # Enter virtual command line + sftp <zih-user>@<machine> + # Exit virtual command line + sftp> exit + # or + sftp> <Ctrl+D> + +After that you have access to the filesystem on the hpc machine and you +can use the same commands as on your local machine, e.g. ls, cd, pwd and +many more. If you would access to your local machine from this virtual +command line, then you have to put the letter l (local machine) before +the command, e.g. lls, lcd or lpwd. + +#### Copy data from lm to hm + + # Copy file + sftp> put <file> + # Copy directory + sftp> put -r <directory> + +#### Copy data from hm to lm + + # Copy file + sftp> get <file> + # Copy directory + sftp> get -r <directory> + +Example: + + sftp> get helloworld.txt + +Additional information: <http://www.computerhope.com/unix/sftp.htm> + +### RSYNC + +Type following commands in the terminal when you are in the directory of +the local machine. + +#### Copy data from lm to hm + + # Copy file + rsync <file> <zih-user>@<machine>:<target-location> + # Copy directory + rsync -r <directory> <zih-user>@<machine>:<target-location> + +#### Copy data from hm to lm + + # Copy file + rsync <zih-user>@<machine>:<file> <target-location> + # Copy directory + rsync -r <zih-user>@<machine>:<directory> <target-location> + +Example: + + rsync helloworld.txt mustermann@taurusexport.hrsk.tu-dresden.de:~/. + +Additional information: <http://www.computerhope.com/unix/rsync.htm> + +## Access from Windows machine + +First you have to install WinSCP. ( +<http://winscp.net/eng/download.php>) + +Then you have to execute the WinSCP application and configure some +option as described below. + +<span class="twiki-macro IMAGE" size="600">WinSCP_001_new.PNG</span> + +<span class="twiki-macro IMAGE" size="600">WinSCP_002_new.PNG</span> + +<span class="twiki-macro IMAGE" size="600">WinSCP_003_new.PNG</span> + +<span class="twiki-macro IMAGE" size="600">WinSCP_004_new.PNG</span> + +After your connection succeeded, you can copy files from your local +machine to the hpc machine and the other way around. + +<span class="twiki-macro IMAGE" size="600">WinSCP_005_new.PNG</span> diff --git a/twiki2md/root/DataManagement/FileSystems.md b/twiki2md/root/DataManagement/FileSystems.md new file mode 100644 index 000000000..4a96c0aa4 --- /dev/null +++ b/twiki2md/root/DataManagement/FileSystems.md @@ -0,0 +1,259 @@ +File systems + + + +## Permanent file systems + +### Global /home file system + +Each user has 50 GB in his /home directory independent of the granted +capacity for the project. Hints for the usage of the global home +directory: + +- If you need distinct `.bashrc` files for each machine, you should + create separate files for them, named `.bashrc_<machine_name>` +- If you use various machines frequently, it might be useful to set + the environment variable HISTFILE in `.bashrc_deimos` and + `.bashrc_mars` to `$HOME/.bash_history_<machine_name>`. Setting + HISTSIZE and HISTFILESIZE to 10000 helps as well. +- Further, you may use private module files to simplify the process of + loading the right installation directories, see [private + modules](#AnchorPrivateModule). + +### Global /projects file system + +For project data, we have a global project directory, that allows better +collaboration between the members of an HPC project. However, for +compute nodes /projects is mounted as read-only, because it is not a +filesystem for parallel I/O. See below and also check the [HPC +introduction](%PUBURL%/Compendium/WebHome/HPC-Introduction.pdf) for more +details. + +#AnchorBackup + +### Backup and snapshots of the file system + +- Backup is **only** available in the `/home` and the `/projects` file + systems! +- Files are backed up using snapshots of the NFS server and can be + restored by the user +- A changed file can always be recovered as it was at the time of the + snapshot +- Snapshots are taken: + - from Monday through Saturday between 06:00 and 18:00 every two + hours and kept for one day (7 snapshots) + - from Monday through Saturday at 23:30 and kept for two weeks (12 + snapshots) + - every Sunday st 23:45 and kept for 26 weeks +- to restore a previous version of a file: + - go into the directory of the file you want to restore + - run `cd .snapshot` (this subdirectory exists in every directory + on the /home file system although it is not visible with + `ls -a`) + - in the .snapshot-directory are all available snapshots listed + - just `cd` into the directory of the point in time you wish to + restore and copy the file you wish to restore to where you want + it + - \*Attention\* The .snapshot directory is not only hidden from + normal view (`ls -a`), it is also embedded in a different + directory structure. An \<span class="WYSIWYG_TT">ls + ../..\</span>will not list the directory where you came from. + Thus, we recommend to copy the file from the location where it + originally resided: \<pre>% pwd /home/username/directory_a % cp + .snapshot/timestamp/lostfile lostfile.backup \</pre> +- /home and /projects/ are definitely NOT made as a work directory: + since all files are kept in the snapshots and in the backup tapes + over a long time, they + - senseless fill the disks and + - prevent the backup process by their sheer number and volume from + working efficiently. + +#AnchorQuota + +### Group quotas for the file system + +The quotas of the home file system are meant to help the users to keep +in touch with their data. Especially in HPC, it happens that millions of +temporary files are created within hours. This is the main reason for +performance degradation of the file system. If a project exceeds its +quota (total size OR total number of files) it cannot submit jobs into +the batch system. The following commands can be used for monitoring: + +- `showquota` shows your projects' usage of the file system. +- `quota -s -f /home` shows the user's usage of the file system. + +In case a project is above it's limits please... + +- remove core dumps, temporary data +- talk with your colleagues to identify the hotspots, +- check your workflow and use /tmp or the scratch file systems for + temporary files +- *systematically*handle your important data: + - For later use (weeks...months) at the HPC systems, build tar + archives with meaningful names or IDs and store e.g. them in an + [archive](IntermediateArchive). + - refer to the hints for [long term preservation for research + data](PreservationResearchData). + +## Work directories + +| File system | Usable directory | Capacity | Availability | Backup | Remarks | +|:------------|:------------------|:---------|:-------------|:-------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Lustre` | `/scratch/` | 4 PB | global | No | Only accessible via [workspaces](WorkSpaces). Not made for billions of files! | +| `Lustre` | `/lustre/ssd` | 40 TB | global | No | Only accessible via [workspaces](WorkSpaces). For small I/O operations | +| `BeeGFS` | `/beegfs/global0` | 232 TB | global | No | Only accessible via [workspaces](WorkSpaces). Fastest available file system, only for large parallel applications running with millions of small I/O operations | +| `ext4` | `/tmp` | 95.0 GB | local | No | is cleaned up after the job automatically | + +### Large files in /scratch + +The data containers in Lustre are called object storage targets (OST). +The capacity of one OST is about 21 TB. All files are striped over a +certain number of these OSTs. For small and medium files, the default +number is 2. As soon as a file grows above \~1 TB it makes sense to +spread it over a higher number of OSTs, eg. 16. Once the file system is +used \> 75%, the average space per OST is only 5 GB. So, it is essential +to split your larger files so that the chunks can be saved! + +Lets assume you have a dierctory where you tar your results, eg. +`/scratch/mark/tar` . Now, simply set the stripe count to a higher +number in this directory with: + + lfs setstripe -c 20 /scratch/ws/mark-stripe20/tar + +%RED%Note:<span class="twiki-macro ENDCOLOR"></span> This does not +affect existing files. But all files that **will be created** in this +directory will be distributed over 20 OSTs. + +## Warm archive + +## + +## Recommendations for file system usage + +To work as efficient as possible, consider the following points + +- Save source code etc. in `/home` or /projects/... +- Store checkpoints and other temporary data in `/scratch/ws/...` +- Compilation in `/dev/shm` or `/tmp` + +Getting high I/O-bandwitdh + +- Use many clients +- Use many processes (writing in the same file at the same time is + possible) +- Use large I/O transfer blocks + +## Cheat Sheet for debugging file system issues + +Every Taurus-User should normaly be able to perform the following +commands to get some intel about theire data. + +### General + +For the first view, you can easily use the "df-command". + + df + +Alternativly you can use the "findmnt"-command, which is also able to +perform an "df" by adding the "-D"-parameter. + + findmnt -D + +Optional you can use the "-t"-parameter to specify the fs-type or the +"-o"-parameter to alter the output. + +We do **not recommend** the usage of the "du"-command for this purpose. +It is able to cause issues for other users, while reading data from the +filesystem. + +### Lustre file system + +These commands work for /scratch and /ssd. + +#### Listing disk usages per OST and MDT + + lfs quota -h -u username /path/to/my/data + +It is possible to display the usage on each OST by adding the +"-v"-parameter. + +#### Listing space usage per OST and MDT + + lfs df -h /path/to/my/data + +#### Listing inode usage for an specific path + + lfs df -i /path/to/my/data + +#### Listing OSTs + + lfs osts /path/to/my/data + +#### View striping information + + lfs getstripe myfile + lfs getstripe -d mydirectory + +The "-d"-parameter will also display striping for all files in the +directory + +### BeeGFS + +Commands to work with the BeeGFS file system. + +#### Capacity and file system health + +View storage and inode capacity and utilization for metadata and storage +targets. + + beegfs-df -p /beegfs/global0 + +The "-p" parameter needs to be the mountpoint of the file system and is +mandatory. + +List storage and inode capacity, reachability and consistency +information of each storage target. + + beegfs-ctl --listtargets --nodetype=storage --spaceinfo --longnodes --state --mount=/beegfs/global0 + +To check the capacity of the metadata server just toggle the +"--nodetype" argument. + + beegfs-ctl --listtargets --nodetype=meta --spaceinfo --longnodes --state --mount=/beegfs/global0 + +#### Striping + +View the stripe information of a given file on the file system and shows +on which storage target the file is stored. + + beegfs-ctl --getentryinfo /beegfs/global0/my-workspace/myfile --mount=/beegfs/global0 + +Set the stripe pattern for an directory. In BeeGFS the stripe pattern +will be inherited form a directory to its children. + + beegfs-ctl --setpattern --chunksize=1m --numtargets=16 /beegfs/global0/my-workspace/ --mount=/beegfs/global0 + +This will set the stripe pattern for "/beegfs/global0/path/to/mydir/" to +a chunksize of 1M distributed over 16 storage targets. + +Find files located on certain server or targets. The following command +searches all files that are stored on the storage targets with id 4 or +30 und my-workspace directory. + + beegfs-ctl --find /beegfs/global0/my-workspace/ --targetid=4 --targetid=30 --mount=/beegfs/global0 + +#### Network + +View the network addresses of the file system servers. + + beegfs-ctl --listnodes --nodetype=meta --nicdetails --mount=/beegfs/global0 + beegfs-ctl --listnodes --nodetype=storage --nicdetails --mount=/beegfs/global0 + beegfs-ctl --listnodes --nodetype=client --nicdetails --mount=/beegfs/global0 + +Display connections the client is actually using + + beegfs-net + +Display possible connectivity of the services + + beegfs-check-servers -p /beegfs/global0 diff --git a/twiki2md/root/DataManagement/IntermediateArchive.md b/twiki2md/root/DataManagement/IntermediateArchive.md new file mode 100644 index 000000000..d70817e01 --- /dev/null +++ b/twiki2md/root/DataManagement/IntermediateArchive.md @@ -0,0 +1,42 @@ +# Intermediate Archive + +With the "Intermediate Archive", ZIH is closing the gap between a normal +disk-based file system and [Longterm Archive](PreservationResearchData). +The Intermediate Archive is a hierarchical file system with disks for +buffering and tapes for storing research data. + +Its intended use is the storage of research data for a maximal duration +of 3 years. For storing the data after exceeding this time, the user has +to supply essential metadata and migrate the files to the [Longterm +Archive](PreservationResearchData). Until then, she/he has to keep track +of her/his files. + +Some more information: + +- Maximum file size in the archive is 500 GB (split up your files, see + [Datamover](DataMover) ) +- Data will be stored in two copies on tape. +- The bandwidth to this data is very limited. Hence, this file system + must not be used directly as input or output for HPC jobs. + +## How to access the "Intermediate Archive" + +For storing and restoring your data in/from the "Intermediate Archive" +you can use the tool [Datamover](DataMover). To use the +[Datamover](DataMover) you have to login to Taurus +(taurus.hrsk.tu-dresden.de). + +### Store data + + dtcp -r /<directory> /archiv/<project or user>/<directory> # or + dtrsync -av /<directory> /archiv/<project or user>/<directory> + +### Restore data + + dtcp -r /archiv/<project or user>/<directory> /<directory> # or + dtrsync -av /archiv/<project or user>/<directory> /<directory> + +### Examples + + dtcp -r /scratch/rotscher/results /archiv/rotscher/ # or + dtrsync -av /scratch/rotscher/results /archiv/rotscher/results diff --git a/twiki2md/root/DataMover/Phase2Migration.md b/twiki2md/root/DataMover/Phase2Migration.md new file mode 100644 index 000000000..7491426ca --- /dev/null +++ b/twiki2md/root/DataMover/Phase2Migration.md @@ -0,0 +1,53 @@ + + +### How to copy your data from an old scratch (Atlas, Venus, Taurus I) to our new scratch (Taurus II) + +Currently there is only Taurus (I) scracht mountet on Taurus (II). To +move files from Venus/Atlas to Taurus (II) you have to do an +intermediate step over Taurus (I) + +#### How to copy data from Atlas/Venus scratch to scratch of Taurus I (first step) + +First you have to login to Taurus I. + + ssh <username>@tauruslogin[1-2].hrsk.tu-dresden.de + +After your are logged in, you can use our tool called Datamover to copy +your data from A to B. + + dtcp -r /atlas_scratch/<project or user>/<directory> /scratch/<project or user>/<directory> + + e.g. file: dtcp -r /atlas_scratch/rotscher/file.txt /scratch/rotscher/ + e.g. directory: dtcp -r /atlas_scratch/rotscher/directory /scratch/rotscher/ + +#### How to copy data from scratch of Taurus I to scratch of Taurus II (second step) + +First you have to login to Taurus II. + + ssh <username>@tauruslogin[3-5].hrsk.tu-dresden.de + +After your are logged in, you can use our tool called Datamover to copy +your data from A to B. + + dtcp -r /phase1_scratch/<project or user>/<directory> /scratch/<project or user>/<directory> + + e.g. file: dtcp -r /phase1_scratch/rotscher/file.txt /scratch/rotscher/ + e.g. directory: dtcp -r /phase1_scratch/rotscher/directory /scratch/rotscher/ + +### Examples on how to use data transfer commands: + +#### Copying data from Atlas' /scratch to Taurus' /scratch + + % dtcp -r /atlas_scratch/jurenz/results /taurus_scratch/jurenz/ + +#### Moving data from Venus' /scratch to Taurus' /scratch + + % dtmv /venus_scratch/jurenz/results/ /taurus_scratch/jurenz/venus_results + +#### TGZ data from Taurus' /scratch to the Archive + + % dttar -czf /archiv/jurenz/taurus_results_20140523.tgz /taurus_scratch/jurenz/results + +- Set DENYTOPICVIEW = WikiGuest + +-- Main.MatthiasKraeusslein - 2015-08-20 diff --git a/twiki2md/root/DebuggingTools/MPIUsageErrorDetection.md b/twiki2md/root/DebuggingTools/MPIUsageErrorDetection.md new file mode 100644 index 000000000..2b72f35df --- /dev/null +++ b/twiki2md/root/DebuggingTools/MPIUsageErrorDetection.md @@ -0,0 +1,81 @@ +# Introduction + +MPI as the de-facto standard for parallel applications of the the +massage passing paradigm offers more than one hundred different API +calls with complex restrictions. As a result, developing applications +with this interface is error prone and often time consuming. Some usage +errors of MPI may only manifest on some platforms or some application +runs, which further complicates the detection of these errors. Thus, +special debugging tools for MPI applications exist that automatically +check whether an application conforms to the MPI standard and whether +its MPI calls are safe. At ZIH, we maintain and support MUST for this +task, though different types of these tools exist (see last section). + +# MUST + +MUST checks if your application conforms to the MPI standard and will +issue warnings if there are errors or non-portable constructs. You can +apply MUST without modifying your source code, though we suggest to add +the debugging flag "-g" during compilation. + +- [MUST introduction slides](%ATTACHURL%/parallel_debugging_must.pdf) + +## Setup and Modules + +You need to load a module file in order to use MUST. Each MUST +installation uses a specific combination of a compiler and an MPI +library, make sure to use a combination that fits your needs. Right now +we only provide a single combination on each system, contact us if you +need further combinations. You can query for the available modules with: + + module avail must + +You can load a MUST module as follows: + + module load must + +Besides loading a MUST module, no further changes are needed during +compilation and linking. + +## Running with MUST + +In order to run with MUST you need to replace the mpirun/mpiexec command +with mustrun: + + mustrun -np <NPROC> ./a.out + +Besides replacing the mpiexec command you need to be aware that **MUST +always allocates an extra process**. I.e. if you issue a "mustrun -np 4 +./a.out" then MUST will start 5 processes instead. This is usually not +critical, however in batch jobs **make sure to allocate space for this +extra task**. + +Finally, MUST assumes that your application may crash at any time. To +still gather correctness results under this assumption is extremely +expensive in terms of performance overheads. Thus, if your application +does not crashs, you should add an "--must:nocrash" to the mustrun +command to make MUST aware of this knowledge. Overhead is drastically +reduced with this switch. + +## Result Files + +After running your application with MUST you will have its output in the +working directory of your application. The output is named +"MUST_Output.html". Open this files in a browser to anlyze the results. +The HTML file is color coded: Entries in green represent notes and +useful information. Entries in yellow represent warnings, and entries in +red represent errors. + +# Other MPI Correctness Tools + +Besides MUST, there exist further MPI correctness tools, these are: + +- Marmot (predecessor of MUST) +- MPI checking library of the Intel Trace Collector +- ISP (From Utah) +- Umpire (predecessor of MUST) + +ISP provides a more thorough deadlock detection as it investigates +alternative execution paths, however its overhead is drastically higher +as a result. Contact our support if you have a specific use cases that +needs one of these tools. diff --git a/twiki2md/root/HPCDA/Dask.md b/twiki2md/root/HPCDA/Dask.md new file mode 100644 index 000000000..2c38ebf22 --- /dev/null +++ b/twiki2md/root/HPCDA/Dask.md @@ -0,0 +1,136 @@ +# Dask + +\<span style="font-size: 1em;"> **Dask** is an open-source library for +parallel computing. Dask is a flexible library for parallel computing in +Python.\</span> + +Dask natively scales Python. It\<span style="font-size: 1em;"> provides +advanced parallelism for analytics, enabling performance at scale for +some of the popular tools. For instance: Dask arrays scale Numpy +workflows, Dask dataframes scale Pandas workflows, Dask-ML scales +machine learning APIs like Scikit-Learn and XGBoost\</span> + +Dask is composed of two parts: + +- Dynamic task scheduling optimized for computation and interactive + computational workloads. +- Big Data collections like parallel arrays, data frames, and lists + that extend common interfaces like NumPy, Pandas, or Python + iterators to larger-than-memory or distributed environments. These + parallel collections run on top of dynamic task schedulers. + +Dask supports several user interfaces: + +High-Level: + +- Arrays: Parallel NumPy +- Bags: Parallel lists +- DataFrames: Parallel Pandas +- Machine Learning : Parallel Scikit-Learn +- Others from external projects, like XArray + +Low-Level: + +- Delayed: Parallel function evaluation +- Futures: Real-time parallel function evaluation + +## Installation + +### installation using Conda + +Dask is installed by default in +[Anaconda](https://www.anaconda.com/download/). To install/update Dask +on a Taurus with using the [conda](https://www.anaconda.com/download/) +follow the example: + + srun -p ml -N 1 -n 1 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash #Job submission in ml nodes with allocating: 1 node, 1 gpu per node, 4 hours + +Create a conda virtual environment. We would recommend using a +workspace. See the example (use `--prefix` flag to specify the +directory)\<br />\<span style="font-size: 1em;">Note: You could work +with simple examples in your home directory (where you are loading by +default). However, in accordance with the \</span>\<a +href="HPCStorageConcept2019" target="\_blank">storage concept\</a>\<span +style="font-size: 1em;">,\</span>** please use \<a href="WorkSpaces" +target="\_blank">workspaces\</a> for your study and work projects.** + + conda create --prefix /scratch/ws/0/aabc1234-Workproject/conda-virtual-environment/dask-test python=3.6 + +By default, conda will locate the environment in your home directory: + + conda create -n dask-test python=3.6 + +Activate the virtual environment, install Dask and verify the +installation: + + ml modenv/ml + ml PythonAnaconda/3.6 + conda activate /scratch/ws/0/aabc1234-Workproject/conda-virtual-environment/dask-test python=3.6 + which python + which conda + conda install dask + python + + from dask.distributed import Client, progress + client = Client(n_workers=4, threads_per_worker=1) + client + +### installation using Pip + +You can install everything required for most common uses of Dask +(arrays, dataframes, etc) + + srun -p ml -N 1 -n 1 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash + + cd /scratch/ws/0/aabc1234-Workproject/python-virtual-environment/dask-test + + ml modenv/ml + module load PythonAnaconda/3.6 + which python + + python3 -m venv --system-site-packages dask-test + source dask-test/bin/activate + python -m pip install "dask[complete]" + + python + from dask.distributed import Client, progress + client = Client(n_workers=4, threads_per_worker=1) + client + +Distributed scheduler + +? + +## Run Dask on Taurus + +\<span style="font-size: 1em;">The preferred and simplest way to run +Dask on HPC systems today both for new, experienced users or +administrator is to use \</span> +[dask-jobqueue](https://jobqueue.dask.org/)\<span style="font-size: +1em;">.\</span> + +You can install dask-jobqueue with `pip <span>or</span>` `conda` + +Installation with Pip + + srun -p haswell -N 1 -n 1 -c 4 --mem-per-cpu=2583 --time=01:00:00 --pty bash + +\<verbatim>cd +/scratch/ws/0/aabc1234-Workproject/python-virtual-environment/dask-test +ml modenv/ml module load PythonAnaconda/3.6 which python + +source dask-test/bin/activate pip install dask-jobqueue --upgrade # +Install everything from last released version\</verbatim> + +Installation with Conda + + srun -p haswell -N 1 -n 1 -c 4 --mem-per-cpu=2583 --time=01:00:00 --pty bash + +\<verbatim>ml modenv/ml module load PythonAnaconda/3.6 source +dask-test/bin/activate + +conda install dask-jobqueue -c conda-forge\</verbatim> + +-- Main.AndreiPolitov - 2020-08-26 + +**\<br />** diff --git a/twiki2md/root/HPCDA/DataAnalyticsWithR.md b/twiki2md/root/HPCDA/DataAnalyticsWithR.md new file mode 100644 index 000000000..0acc0ba2f --- /dev/null +++ b/twiki2md/root/HPCDA/DataAnalyticsWithR.md @@ -0,0 +1,423 @@ +# R for data analytics + + + +[ **R** ](https://www.r-project.org/about.html)\<span style="font-size: +1em;"> is a programming language and environment for statistical +computing and graphics. R provides a wide variety of statistical (linear +and nonlinear modelling, classical statistical tests, time-series +analysis, classification, etc) and graphical techniques. R is an +integrated suite of software facilities for data manipulation, +calculation and graphing.\</span> + +R possesses an extensive catalogue of statistical and graphical methods. +It includes machine learning algorithms, linear regression, time series, +statistical inference. + +**Aim** of this page is to introduce users on how to start working with +the R language on Taurus in general as well as on the HPC-DA system.\<br +/>**Prerequisites:** To work with the R on Taurus you obviously need +access for the Taurus system and basic knowledge about programming and +[SLURM](Slurm) system. + +For general information on using the HPC-DA system, see the [Get started +with HPC-DA system](GetStartedWithHPCDA) page. + +You can also find the information you need on the HPC-Introduction and +HPC-DA-Introduction presentation slides. + +\<br />\<span style="font-size: 1em;">We recommend using +\</span>**Haswell**\<span style="font-size: 1em;">and/or\</span> ** +[Romeo](RomeNodes)**\<span style="font-size: 1em;">partitions to work +with R. Please use the ml partition only if you need GPUs! \</span> + +## R console + +This is a quickstart example. The `srun` command is used to submit a +real-time execution job designed for interactive use with output +monitoring. Please check [the page](Slurm) for details. The R language +available for both types of Taurus nodes/architectures x86 (scs5 +software environment) and Power9 (ml software environment). + +Haswell partition: + + srun --partition=haswell --ntasks=1 --nodes=1 --cpus-per-task=4 --mem-per-cpu=2583 --time=01:00:00 --pty bash #job submission in haswell nodes with allocating: 1 task per node, 1 node, 4 CPUs per task with 2583 mb per CPU(core) on 1 hour + + module load modenv/scs5 #Ensure that you are using the scs5 partition. Example output: The following have been reloaded with a version change: 1) modenv/ml => modenv/scs5 + module available R/3.6 #Check all availble modules with R version 3.6. You could use also "ml av R" but it gives huge output. + module load R #Load default R module Example output: Module R/3.6.0-foss 2019a and 56 dependencies loaded. + which R #Checking of current version of R + R #Start R console + +Here are the parameters of the job with all the details to show you the +correct and optimal way to do it. Please allocate the job with respect +to [hardware specification](HardwareTaurus)! Besides, it should be noted +that the value of the \<span>--mem-per-cpu\</span> parameter is +different for the different partitions. it is important to respect \<a +href="SystemTaurus#Memory_Limits" target="\_blank">memory limits\</a>. +Please note that the default limit is 300 MB per cpu. + +However, using srun directly on the shell will lead to blocking and +launch an interactive job. Apart from short test runs, it is +**recommended to launch your jobs into the background by using batch +jobs**. For that, you can conveniently place the parameters directly +into the job file which can be submitted using +`sbatch [options] <job file><span>. </span>`\<span style="font-size: +1em;">The examples could be found [here](GetStartedWithHPCDA) or +[here](Slurm). Furthermore, you could work with simple examples in your +home directory but according to \<a href="HPCStorageConcept2019" +target="\_blank">storage concept\</a>** please use \<a href="WorkSpaces" +target="\_blank">workspaces\</a> for your study and work +projects!**\</span> + +It is also possible to run Rscript directly (after loading the module): + + Rscript /path/to/script/your_script.R param1 param2 #run Rscript directly. For instance: Rscript /scratch/ws/mastermann-study_project/da_script.r + +## R with Jupyter notebook + +In addition to using interactive srun jobs and batch jobs, there is +another way to work with the **R** on Taurus. JipyterHub is a quick and +easy way to work with jupyter notebooks on Taurus. See\<span +style="font-size: 1em;"> the \<a href="JupyterHub" +target="\_blank">JupyterHub page\</a> for detailed instructions.\</span> + +The [production environment](JupyterHub#Standard_environments) of +JupyterHub contains R as a module for all partitions. R could be run in +the Notebook or Console for [JupyterLab](JupyterHub#JupyterLab). + +## RStudio + +\<a href="<https://rstudio.com/>" target="\_blank">RStudio\</a> is an +integrated development environment (IDE) for R. It includes a console, +syntax-highlighting editor that supports direct code execution, as well +as tools for plotting, history, debugging and workspace management. +RStudio is also available for both Taurus x86 (scs5) and Power9 (ml) +nodes/architectures. + +The best option to run RStudio is to use JupyterHub. RStudio will work +in a browser. It is currently available in the **test** environment on +both x86 (**scs5**) and Power9 (**ml**) architectures/partitions. It can +be started similarly as a new kernel from \<a +href="JupyterHub#JupyterLab" target="\_blank">JupyterLab\</a> launcher. +See the picture below. + +\<img alt="environments.png" height="70" +src="%ATTACHURL%/environments.png" title="environments.png" width="300" +/> + +\<img alt="Launcher.png" height="205" src="%ATTACHURL%/Launcher.png" +title="Launcher.png" width="195" /> + +Please keep in mind that it is not currently recommended to use the +interactive x11 job with the desktop version of Rstudio, as described, +for example, [here](Slurm#Interactive_X11_47GUI_Jobs) or in introduction +HPC-DA slides. This method is unstable. + +## Install packages in R + +By default, user-installed packages are stored in the +\<span>/$HOME/R\</span>/ folder inside a subfolder depending on the +architecture (on Taurus: x86 or PowerPC). Install packages using the +shell: + + srun -p haswell -N 1 -n 1 -c 4 --mem-per-cpu=2583 --time=01:00:00 --pty bash #job submission to the haswell nodes with allocating: 1 task per node, 1 node, 4 CPUs per task with 2583 mb per CPU(core) in 1 hour + module purge + module load modenv/scs5 #Changing the environment. Example output: The following have been reloaded with a version change: 1) modenv/ml => modenv/scs5 + + module load R #Load R module Example output: Module R/3.6.0-foss-2019a and 56 dependencies loaded. + which R #Checking of current version of R + R #Start of R console + install.packages("package_name") #For instance: install.packages("ggplot2") + +Note that to allocate the job the slurm parameters are used with +different (short) notations, but with the same values as in the previous +example. + +## Deep Learning with R + +This chapter will briefly describe working with **ml partition** (Power9 +architecture). This means that it will focus on the work with the GPUs, +and the main scenarios will be explained. + +\*Important: Please use the ml partition if you need GPUs\* \<span +style="font-size: 1em;"> Otherwise using the x86 partitions (e.g +Haswell) would most likely be more beneficial. \</span> + +### R Interface to Tensorflow + +The ["Tensorflow" R package](https://tensorflow.rstudio.com/) provides R +users access to the Tensorflow toolset. +[TensorFlow](https://www.tensorflow.org/) is an open-source software +library for numerical computation using data flow graphs. + + srun -p ml -N 1 -n 1 -c 7 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash + + module purge #clear modules + ml modenv/ml #load ml environment + ml TensorFlow + ml R + + which python + mkdir python-virtual-environments #Create folder. Please use Workspaces! + cd python-virtual-environments #Go to folder + python3 -m venv --system-site-packages R-TensorFlow #create python virtual environment + source R-TensorFlow/bin/activate #activate environment + module list + which R + +Please allocate the job with respect to [hardware +specification](HardwareTaurus)! Note that the ML nodes have 4way-SMT, so +for every physical core allocated, you will always get 4\*1443mb +=5772mb. + +To configure "reticulate" R library to point to the Python executable in +your virtual environment, create a file \<span style="font-size: +1em;">nam\</span>\<span style="font-size: 1em;">ed .Rprofile in your +project directory (e.g. R-TensorFlow) with the following +contents:\</span> + + Sys.setenv(RETICULATE_PYTHON = "/sw/installed/Anaconda3/2019.03/bin/python") #assign the output of the 'which python' to the RETICULATE_PYTHON + +Let's start R, install some libraries and evaluate the result + + R + install.packages("reticulate") + library(reticulate) + reticulate::py_config() + install.packages("tensorflow") + library(tensorflow) + tf$constant("Hellow Tensorflow") #In the output 'Tesla V100-SXM2-32GB' should be mentioned + +Please find the example of the code in the +[attachment](%ATTACHURL%/TensorflowMNIST.R?t=1597837603). The example +shows the use of the Tensorflow package with the R for the +classification problem related to the MNIST dataset.\<br />\<span +style="font-size: 1em;">As an alternative to the TensorFlow rTorch could +be used. \</span>\<a +href="<https://cran.r-project.org/web/packages/rTorch/index.html>" +target="\_blank">rTorch\</a>\<span style="font-size: 1em;"> is an 'R' +implementation and interface for the \<a href="<https://pytorch.org/>" +target="\_blank">PyTorch\</a> Machine Learning framework\</span>\<span +style="font-size: 1em;">.\</span> + +## Parallel computing with R + +Generally, the R code is serial. However, many computations in R can be +made faster by the use of parallel computations. Taurus allows a vast +number of options for parallel computations. Large amounts of data +and/or use of complex models are indications of the use of parallelism. + +### General information about the R parallelism + +There are various techniques and packages in R that allow +parallelization. This chapter concentrates on most general methods and +examples. The Information here is Taurus-specific. The \<a +href="<https://www.rdocumentation.org/packages/parallel/versions/3.6.2>" +target="\_blank">parallel package\</a> will be used for the purpose of +the chapter. + +%RED%Note:<span class="twiki-macro ENDCOLOR"></span> Please do not +install or update R packages related to parallelism it could lead to +conflict with other pre-installed packages. + +### \<span style="font-size: 1em;">Basic lapply-based parallelism \</span> + +**`lapply()`** function is a part of base R. lapply is useful for +performing operations on list-objects. Roughly speaking, lapply is a +vectorisation of the source code but it could be used for +parallelization. To use more than one core with lapply-style +parallelism, you have to use some type of networking so that each node +can communicate with each other and shuffle the relevant data around. +The simple example of using the "pure" lapply parallelism could be found +as the [attachment](%ATTACHURL%/lapply.R). + +### Shared-memory parallelism + +The `parallel` library includes the `mclapply()` function which is a +shared memory version of lapply. The "mc" stands for "multicore". This +function distributes the `lapply` tasks across multiple CPU cores to be +executed in parallel. + +This is a simple option for parallelisation. It doesn't require much +effort to rewrite the serial code to use mclapply function. Check out an +[example](%ATTACHURL%/multicore.R). The cons of using shared-memory +parallelism approach that it is limited by the number of cores(cpus) on +a single node. + +<span class="twiki-macro RED"></span> **Important:** <span +class="twiki-macro ENDCOLOR"></span> Please allocate the job with +respect to [hardware specification](HardwareTaurus). The current maximum +number of processors (read as cores) for an SMP-parallel program on +Taurus is 56 (smp2 partition), for the Haswell partition, it is a 24. +The large SMP system (Julia) is coming soon with a total number of 896 +nodes. + +Submitting a multicore R job to SLURM is very similar to [Submitting an +OpenMP Job](Slurm#Binding_and_Distribution_of_Tasks) since both are +running multicore jobs on a **single** node. Below is an example: + + #!/bin/bash + #SBATCH --nodes=1 + #SBATCH --tasks-per-node=1 + #SBATCH --cpus-per-task=16 + #SBATCH --time=00:10:00 + #SBATCH -o test_Rmpi.out + #SBATCH -e test_Rmpi.err + + module purge + module load modenv/scs5 + module load R + + R CMD BATCH Rcode.R + +Examples of R scripts with the shared-memory parallelism can be found as +an [attachment](%ATTACHURL%/multicore.R) on the bottom of the page. + +### Distributed memory parallelism + +To use this option we need to start by setting up a cluster, a +collection of workers that will do the job in parallel. There are three +main options for it: MPI cluster, PSOCK cluster and FORK cluster. We use +\<span>makeCluster {parallel}\</span> function to create a set of copies +of **R** running in parallel and communicating over sockets, the type of +the cluster could be specified by the \<span>TYPE \</span>variable. + +#### MPI cluster + +This way of the R parallelism uses the +[Rmpi](http://cran.r-project.org/web/packages/Rmpi/index.html) package +and the [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface) +(Message Passing Interface) as a "backend" for its parallel operations. +Parallel R codes submitting a multinode MPI R job to SLURM is very +similar to \<a href="Slurm#Binding_and_Distribution_of_Tasks" +target="\_blank">submitting an MPI Job\</a> since both are running +multicore jobs on multiple nodes. Below is an example of running R +script with the Rmpi on Taurus: + + #!/bin/bash + #SBATCH --partition=haswell #specify the partition + #SBATCH --ntasks=16 #This parameter determines how many processes will be spawned. Please use >= 8. + #SBATCH --cpus-per-task=1 + #SBATCH --time=00:10:00 + #SBATCH -o test_Rmpi.out + #SBATCH -e test_Rmpi.err + + module purge + module load modenv/scs5 + module load R + + mpirun -n 1 R CMD BATCH Rmpi.R #specify the absolute path to the R script, like: /scratch/ws/max1234-Work/R/Rmpi.R + + # when finished writing, submit with sbatch <script_name> + +\<span class="WYSIWYG_TT"> **-ntasks** \</span> SLURM option is the best +and simplest way to run your application with MPI. The number of nodes +required to complete this number of tasks will then be selected. Each +MPI rank is assigned 1 core(CPU). + +However, in some specific cases, you can specify the number of nodes and +the number of necessary tasks per node: + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + module purge + module load modenv/scs5 + module load R + + time mpirun -quiet -np 1 R CMD BATCH --no-save --no-restore Rmpi_c.R #this command will calculate the time of completion for your script + +The illustration above shows the binding of an MPI-job. Use an +[example](%ATTACHURL%/Rmpi_c.R) from the attachment. In which 32 global +ranks are distributed over 2 nodes with 16 cores(CPUs) each. Each MPI +rank has 1 core assigned to it. + +To use Rmpi and MPI please use one of these partitions: **Haswell**, +**Broadwell** or **Rome**.\<br />**%RED%Important:<span +class="twiki-macro ENDCOLOR"></span>**\<span +style`"font-size: 1em;"> Please allocate the required number of nodes and cores according to the hardware specification: 1 Haswell's node: 2 x [Intel Xeon (12 cores)]; 1 Broadwell's Node: 2 x [Intel Xeon (14 cores)]; 1 Rome's node: 2 x [AMD EPYC (64 cores)]. Please also check the </span><a href="HardwareTaurus" target="_blank">hardware specification</a><span style="font-size: 1em;"> (number of nodes etc). The =sinfo` +command gives you a quick overview of the status of partitions.\</span> + +\<span style="font-size: 1em;">Please use \</span>\<span>mpirun\</span> +command \<span style="font-size: 1em;">to run the Rmpi script. It is a +wrapper that enables the communication between processes running on +different machines. \</span>\<span style="font-size: 1em;">We recommend +always use \</span>\<span style="font-size: 1em;">"\</span>\<span>-np +1\</span>\<span style="font-size: 1em;">"\</span>\<span +style="font-size: 1em;"> (the number of MPI processes to launch)\</span> +\<span style="font-size: 1em;">because otherwise, it spawns additional +processes dynamically.\</span> + +Examples of R scripts with the Rmpi can be found as attachments at the +bottom of the page. + +#### PSOCK cluster + +The `type="PSOCK"` uses TCP sockets to transfer data between nodes. +PSOCK is the default on *all* systems. However, if your parallel code +will be executed on Windows as well you should use the PSOCK method. The +advantage of this method is that It does not require external libraries +such as Rmpi. On the other hand, TCP sockets are relatively +[slow](http://glennklockwood.blogspot.com/2013/06/whats-killing-cloud-interconnect.html). +Creating a PSOCK cluster is similar to launching an MPI cluster, but +instead of simply saying how many parallel workers you want, you have to +manually specify the number of nodes according to the hardware +specification and parameters of your job. The example of the code could +be found as an [attachment](%ATTACHURL%/RPSOCK.R?t=1597043002). + +#### FORK cluster + +The `type="FORK"` behaves exactly like the `mclapply` function discussed +in the previous section. Like `mclapply`, it can only use the cores +available on a single node, but this does not require clustered data +export since all cores use the same memory. You may find it more +convenient to use a FORK cluster with `parLapply` than `mclapply` if you +anticipate using the same code across multicore *and* multinode systems. + +### Other parallel options + +There are numerous different parallel options for R. However for general +users, we would recommend using the options listed above. However, the +alternatives should be mentioned: + +- \<span> + [foreach](https://cran.r-project.org/web/packages/foreach/index.html) + \</span>package. It is functionally equivalent to the [lapply-based + parallelism](https://www.glennklockwood.com/data-intensive/r/lapply-parallelism.html) + discussed before but based on the for-loop; +- [future](https://cran.r-project.org/web/packages/future/index.html) + package. The purpose of this package is to provide a lightweight and + unified Future API for sequential and parallel processing of R + expression via futures; +- [Poor-man's + parallelism](https://www.glennklockwood.com/data-intensive/r/alternative-parallelism.html#6-1-poor-man-s-parallelism) + (simple data parallelism). It is the simplest, but not an elegant + way to parallelize R code. It runs several copies of the same R + script where's each read different sectors of the input data; +- \<a + href="<https://www.glennklockwood.com/data-intensive/r/alternative-parallelism.html#6-2-hands-off-parallelism>" + target="\_blank">Hands-off (OpenMP) method\</a>. R has + [OpenMP](https://www.openmp.org/resources/) support. Thus using + OpenMP is a simple method where you don't need to know a much about + the parallelism options in your code. Please be careful and don't + mix this technique with other methods! + +-- Main.AndreiPolitov - 2020-05-18 + +- [TensorflowMNIST.R](%ATTACHURL%/TensorflowMNIST.R?t=1597837603)\<span + style="font-size: 13px;">: TensorflowMNIST.R\</span> +- [lapply.R](%ATTACHURL%/lapply.R)\<span style="font-size: 13px;">: + lapply.R\</span> +- [multicore.R](%ATTACHURL%/multicore.R)\<span style="font-size: + 13px;">: multicore.R\</span> +- [Rmpi.R](%ATTACHURL%/Rmpi.R)\<span style="font-size: 13px;">: + Rmpi.R\</span> +- [Rmpi_c.R](%ATTACHURL%/Rmpi_c.R)\<span style="font-size: 13px;">: + Rmpi_c.R\</span> +- [RPSOCK.R](%ATTACHURL%/RPSOCK.R)\<span style="font-size: 13px;">: + RPSOCK.R\</span> + +\<div id="gtx-trans" style="position: absolute; left: 35px; top: +5011.8px;"> \</div> diff --git a/twiki2md/root/HPCDA/GetStartedWithHPCDA.md b/twiki2md/root/HPCDA/GetStartedWithHPCDA.md new file mode 100644 index 000000000..5afddc4b3 --- /dev/null +++ b/twiki2md/root/HPCDA/GetStartedWithHPCDA.md @@ -0,0 +1,405 @@ +# Get started with HPC-DA + + + +\<span style="font-size: 1em;">HPC-DA (High-Performance Computing and +Data Analytics) is a part of TU-Dresden general purpose HPC cluster +(Taurus). HPC-DA is the best\</span>** option**\<span style="font-size: +1em;"> for \</span>**Machine learning, Deep learning**\<span +style="font-size: 1em;">applications and tasks connected with the big +data.\</span> + +**This is an introduction of how to run machine learning applications on +the HPC-DA system.** + +The main **aim** of this guide is to help users who have started working +with Taurus and focused on working with Machine learning frameworks such +as TensorFlow or Pytorch. **Prerequisites:** \<span style="font-size: +1em;"> To work with HPC-DA, you need \</span>\<a href="Login" +target="\_blank">access\</a>\<span style="font-size: 1em;"> for the +Taurus system and preferably have basic knowledge about High-Performance +computers and Python.\</span> + +\<span style="font-size: 1em;">Disclaimer: This guide provides the main +steps on the way of using Taurus, for details please follow links in the +text.\</span> + +You can also find the information you need on the +[HPC-Introduction](%ATTACHURL%/HPC-Introduction.pdf?t=1585216700) and +[HPC-DA-Introduction](%ATTACHURL%/HPC-DA-Introduction.pdf?t=1585162693) +presentation slides. + +## Why should I use HPC-DA? The architecture and feature of the HPC-DA + +HPC-DA built on the base of +[Power9](https://www.ibm.com/it-infrastructure/power/power9) +architecture from IBM. HPC-DA created from [AC922 IBM +servers](https://www.ibm.com/ie-en/marketplace/power-systems-ac922), +which was created for AI challenges, analytics and working with, Machine +learning, data-intensive workloads, deep-learning frameworks and +accelerated databases. POWER9 is the processor with state-of-the-art I/O +subsystem technology, including next-generation NVIDIA NVLink, PCIe Gen4 +and OpenCAPI. [Here](Power9) you could find a detailed specification of +the TU Dresden HPC-DA system. + +The main feature of the Power9 architecture (ppc64le) is the ability to +work the [ **NVIDIA Tesla V100** +](https://www.nvidia.com/en-gb/data-center/tesla-v100/)GPU with +**NV-Link** support. NV-Link technology allows increasing a total +bandwidth of 300 gigabytes per second (GB/sec) - 10X the bandwidth of +PCIe Gen 3. The bandwidth is a crucial factor for deep learning and +machine learning applications. + +Note: the Power9 architecture not so common as an x86 architecture. This +means you are not so flexible with choosing applications for your +projects. Even so, the main tools and applications are available. See +available modules here. \<br />**Please use the ml partition if you need +GPUs!** Otherwise using the x86 partitions (e.g Haswell) most likely +would be more beneficial. + +## Login + +### SSH Access + +\<span style="font-size: 1em; color: #444444;">The recommended way to +connect to the HPC login servers directly via ssh:\</span> + + ssh <zih-login>@taurus.hrsk.tu-dresden.de + +Please put this command in the terminal and replace \<zih-login> with +your login that you received during the access procedure. Accept the +host verifying and enter your password. + +T\<span style="font-size: 1em;">his method requires two conditions: +Linux OS, workstation within the campus network. For other options and +details check the \</span>\<a href="Login" target="\_blank">Login +page\</a>\<span style="font-size: 1em;">.\</span> + +## Data management + +### Workspaces + +As soon as you have access to HPC-DA you have to manage your data. The +main method of working with data on Taurus is using Workspaces. \<span +style="font-size: 1em;">You could work with simple examples in your home +directory (where you are loading by default). However, in accordance +with the \</span>\<a href="HPCStorageConcept2019" +target="\_blank">storage concept\</a>,** please use \<a +href="WorkSpaces" target="\_blank">workspaces\</a> for your study and +work projects.** + +You should create your workspace with a similar command: + + ws_allocate -F scratch Machine_learning_project 50 #allocating workspase in scratch directory for 50 days + +After the command, you will have an output with the address of the +workspace based on scratch. Use it to store the main data of your +project. + +\<span style="font-size: 1em;">For different purposes, you should use +different storage systems. \</span>\<span style="font-size: 1em;">To +work as efficient as possible, consider the following points:\</span> + +- Save source code etc. in **`/home`** or **`/projects/...`** +- Store checkpoints and other massive but temporary data with + workspaces in: **`/scratch/ws/...`** +- For data that seldom changes but consumes a lot of space, use + mid-term storage with workspaces: **`/warm_archive/...`** +- For large parallel applications where using the fastest file system + is a necessity, use with workspaces: **`/lustre/ssd/...`** +- Compilation in **`/dev/shm`** or **`/tmp`** + +### Data moving + +#### Moving data to/from the HPC machines + +To copy data to/from the HPC machines, the Taurus [export +nodes](ExportNodes) should be used. They are the preferred way to +transfer your data. There are three possibilities to exchanging data +between your local machine (lm) and the HPC machines (hm):\<span> **SCP, +RSYNC, SFTP**.\</span> + +Type following commands in the local directory of the local machine. For +example, the **`SCP`** command was used. + +#### Copy data from lm to hm + + scp <file> <zih-user>@taurusexport.hrsk.tu-dresden.de:<target-location> #Copy file from your local machine. For example: scp helloworld.txt mustermann@taurusexport.hrsk.tu-dresden.de:/scratch/ws/mastermann-Macine_learning_project/ + + scp -r <directory> <zih-user>@taurusexport.hrsk.tu-dresden.de:<target-location> #Copy directory from your local machine. + +#### Copy data from hm to lm + + scp <zih-user>@taurusexport.hrsk.tu-dresden.de:<file> <target-location> #Copy file. For example: scp mustermann@taurusexport.hrsk.tu-dresden.de:/scratch/ws/mastermann-Macine_learning_project/helloworld.txt /home/mustermann/Downloads + + scp -r <zih-user>@taurusexport.hrsk.tu-dresden.de:<directory> <target-location> #Copy directory + +#### Moving data inside the HPC machines. Datamover + +The best way to transfer data inside the Taurus is the \<a +href="DataMover" target="\_blank">datamover\</a>. It is the special data +transfer machine providing the global file systems of each ZIH HPC +system. Datamover provides the best data speed. To load, move, copy etc. +files from one file system to another file system, you have to use +commands with **dt** prefix, such as: + +**`dtcp, dtwget, dtmv, dtrm, dtrsync, dttar, dtls`** + +These commands submit a job to the data transfer machines that execute +the selected command. Except for the '\<span>dt'\</span> prefix, their +syntax is the same as the shell command without the +'\<span>dt\</span>'**.** + + dtcp -r /scratch/ws/<name_of_your_workspace>/results /luste/ssd/ws/<name_of_your_workspace> #Copy from workspace in scratch to ssd.<br />dtwget https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz #Download archive CIFAR-100. + +## BatchSystems. SLURM + +After logon and preparing your data for further work the next logical +step is to start your job. For these purposes, SLURM is using. Slurm +(Simple Linux Utility for Resource Management) is an open-source job +scheduler that allocates compute resources on clusters for queued +defined jobs. By\<span style="font-size: 1em;"> default, after your +logging, you are using the login nodes. The intended purpose of these +nodes speaks for oneself. \</span>\<span style="font-size: +1em;">Applications on an HPC system can not be run there! They have to +be submitted to compute nodes (ml nodes for HPC-DA) with dedicated +resources for user jobs. \</span> + +\<span +style`"font-size: 1em;">Job submission can be done with the command: =srun [options] <command>.` +\</span> + +This is a simple example which you could use for your start. The `srun` +command is used to submit a job for execution in real-time designed for +interactive use, with monitoring the output. For some details please +check [the page](Slurm). + + srun -p ml -N 1 --gres=gpu:1 --time=01:00:00 --pty --mem-per-cpu=8000 bash #Job submission in ml nodes with allocating: 1 node, 1 gpu per node, with 8000 mb on 1 hour. + +However, using srun directly on the shell will lead to blocking and +launch an interactive job. Apart from short test runs, it is +**recommended to launch your jobs into the background by using batch +jobs**. For that, you can conveniently put the parameters directly into +the job file which you can submit using `sbatch [options] <job file>.` + +This is the example of the sbatch file to run your application: + + #!/bin/bash + #SBATCH --mem=8GB # specify the needed memory + #SBATCH -p ml # specify ml partition + #SBATCH --gres=gpu:1 # use 1 GPU per node (i.e. use one GPU per task) + #SBATCH --nodes=1 # request 1 node + #SBATCH --time=00:15:00 # runs for 10 minutes + #SBATCH -c 1 # how many cores per task allocated + #SBATCH -o HLR_name_your_script.out # save output message under HLR_${SLURMJOBID}.out + #SBATCH -e HLR_name_your_script.err # save error messages under HLR_${SLURMJOBID}.err + <br />module load modenv/ml + module load TensorFlow<br /><br />python machine_learning_example.py<br /><br />## when finished writing, submit with: sbatch <script_name> For example: sbatch machine_learning_script.slurm + +The `machine_learning_example.py` contains a simple ml application based +on the mnist model to test your sbatch file. It could be found as the +[attachment](%ATTACHURL%/machine_learning_example.py) in the bottom of +the page. + +## Start your application + +As stated before HPC-DA was created for deep learning, machine learning +applications. Machine learning frameworks as TensorFlow and PyTorch are +industry standards now. + +There are three main options on how to work with Tensorflow and PyTorch: +**1.** **Modules,** **2.** **JupyterNotebook, 3. Containers** + +**1.** **Modules** + +The easiest way is using the \<a +href="RuntimeEnvironment#Module_Environments" target="\_blank">Modules +system\</a> and Python virtual environment. Modules are a way to use +frameworks, compilers, loader, libraries, and utilities. The module is a +user interface that provides utilities for the dynamic modification of a +user's environment without manual modifications. You could use them for +srun , bath jobs (sbatch) and the Jupyterhub. + +A virtual environment is a cooperatively isolated runtime environment +that allows Python users and applications to install and update Python +distribution packages without interfering with the behaviour of other +Python applications running on the same system. At its core, the main +purpose of Python virtual environments is to create an isolated +environment for Python projects. + +**Vitualenv (venv)** is a standard Python tool to create isolated Python +environments. We recommend using venv to work with Tensorflow and +Pytorch on Taurus. It has been integrated into the standard library +under the \<a href="<https://docs.python.org/3/library/venv.html>" +target="\_blank">venv module\</a>. However, if you have reasons +(previously created environments etc) you could easily use conda. The +conda is the second way to use a virtual environment on the Taurus. \<a +href="<https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>" +target="\_blank">Conda\</a> is an open-source package management system +and environment management system from the Anaconda. + +As was written in the previous chapter, to start the application (using +modules) and to run the job exist two main options: + +- The **\<span class="WYSIWYG_TT">srun\</span> command:** + +<!-- --> + + srun -p ml -N 1 -n 1 -c 2 --gres=gpu:1 --time=01:00:00 --pty --mem-per-cpu=8000 bash #job submission in ml nodes with allocating: 1 node, 1 task per node, 2 CPUs per task, 1 gpu per node, with 8000 mb on 1 hour. + + module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml + + mkdir python-virtual-environments #create folder for your environments + cd python-virtual-environments #go to folder + module load TensorFlow #load TensorFlow module to use python. Example output: Module Module TensorFlow/2.1.0-fosscuda-2019b-Python-3.7.4 and 31 dependencies loaded. + which python #check which python are you using + python3 -m venv --system-site-packages env #create virtual environment "env" which inheriting with global site packages + source env/bin/activate #activate virtual environment "env". Example output: (env) bash-4.2$ + +The inscription (env) at the beginning of each line represents that now +you are in the virtual environment. + +Now you can check the working capacity of the current environment. + + python # start python + import tensorflow as tf + print(tf.__version__) # example output: 2.1.0 + +- The second and main option is using batch jobs (**`sbatch`**). It is + used to submit a job script for later execution. Consequently, it is + **recommended to launch your jobs into the background by using batch + jobs**. To launch your machine learning application as well to srun + job you need to use modules. See the previous chapter with the + sbatch file example. + +Versions: TensorFlow 1.14, 1.15, 2.0, 2.1; PyTorch 1.1, 1.3 are +available. (25.02.20) + +Note: However in case of using sbatch files to send your job you usually +don't need a virtual environment. + +**2. JupyterNotebook** + +The Jupyter Notebook is an open-source web application that allows you +to create documents containing live code, equations, visualizations, and +narrative text. Jupyter notebook allows working with TensorFlow on +Taurus with GUI (graphic user interface) in a **web browser** and the +opportunity to see intermediate results step by step of your work. This +can be useful for users who dont have huge experience with HPC or Linux. + +There is \<a href="JupyterHub" target="\_self">jupyterhub\</a> on +Taurus, where you can simply run your Jupyter notebook on HPC nodes. +Also, for more specific cases you can run a manually created remote +jupyter server. You can find the manual server setup \<a +href="DeepLearning" target="\_blank">here.\</a> However, the simplest +option for beginners is using JupyterHub. + +JupyterHub is available here: \<a +href="<https://taurus.hrsk.tu-dresden.de/jupyter>" +target="\_top"><https://taurus.hrsk.tu-dresden.de/jupyter>\</a> + +After logging, you can start a new session and\<span style="font-size: +1em;"> configure it. There are simple and advanced forms to set up your +session. On the simple form, you have to choose the "IBM Power +(ppc64le)" architecture. You can select the required number of CPUs and +GPUs. For the acquaintance with the system through the examples below +the recommended amount of CPUs and 1 GPU will be enough. \</span>\<span +style="font-size: 1em;">With the advanced form, \</span>\<span +style="font-size: 1em;">you can use the configuration with 1 GPU and 7 +CPUs. To access for all your workspaces use " / " in the workspace +scope. Please check\</span>\<span style="font-size: 1em;"> updates and +details \</span>\<a href="JupyterHub" target="\_blank">here\</a>\<span +style="font-size: 1em;">.\</span> + +\<span style="font-size: 1em;">Several Tensorflow and PyTorch examples +for the Jupyter notebook have been prepared based on some simple tasks +and models which will give you an understanding of how to work with ML +frameworks and JupyterHub. It could be found as the \</span> +[attachment](%ATTACHURL%/machine_learning_example.py)\<span +style="font-size: 1em;"> in the bottom of the page. A detailed +explanation and examples for TensorFlow can be found \</span>\<a +href="TensorFlowOnJupyterNotebook" title="EXAMPLES AND RUNNING THE +MODEL">here\</a>\<span style="font-size: 1em;">. For the Pytorch - +\</span> [here](PyTorch)\<span style="font-size: 1em;">. \</span>\<span +style="font-size: 1em;">Usage information about the environments for the +JupyterHub could be found \</span> [here](JupyterHub)\<span +style="font-size: 1em;"> in the chapter 'Creating and using your own +environment'.\</span> + +Versions: TensorFlow 1.14, 1.15, 2.0, 2.1; PyTorch 1.1, 1.3 are +available. (25.02.20) + +**3.** **Containers** + +Some machine learning tasks such as benchmarking require using +containers. A container is a standard unit of software that packages up +code and all its dependencies so the application runs quickly and +reliably from one computing environment to another. \<span +style="font-size: 1em;">Using containers gives you more flexibility +working with modules and software but at the same time requires more +effort.\</span> + +On Taurus \<a href="<https://sylabs.io/>" +target="\_blank">Singularity\</a> used as a standard container solution. +Singularity enables users to have full control of their environment. +This means that **you dont have to ask an HPC support to install +anything for you - you can put it in a Singularity container and +run!**As opposed to Docker (the beat-known container solution), +Singularity is much more suited to being used in an HPC environment and +more efficient in many cases. Docker containers also can easily be used +by Singularity from the [DockerHub](https://hub.docker.com) for +instance. Also, some containers are available in \<a +href="<https://singularity-hub.org/>" +target="\_blank">SingularityHub\</a>. + +\<span style="font-size: 1em;">The simplest option to start working with +containers on HPC-DA is i\</span>\<span style="font-size: 1em;">mporting +from Docker or SingularityHub container with TensorFlow. It does +\</span> **not require root privileges** \<span style="font-size: 1em;"> +and so works on Taurus directly\</span>\<span style="font-size: 1em;">: +\</span> + + srun -p ml -N 1 --gres=gpu:1 --time=02:00:00 --pty --mem-per-cpu=8000 bash #allocating resourses from ml nodes to start the job to create a container.<br />singularity build my-ML-container.sif docker://ibmcom/tensorflow-ppc64le #create a container from the DockerHub with the last TensorFlow version<br />singularity run --nv my-ML-container.sif #run my-ML-container.sif container with support of the Nvidia's GPU. You could also entertain with your container by commands: singularity shell, singularity exec + +There are two sources for containers for Power9 architecture with +Tensorflow and PyTorch on the board: \<span style="font-size: 1em;"> +[Tensorflow-ppc64le](https://hub.docker.com/r/ibmcom/tensorflow-ppc64le) +- \</span>\<span style="font-size: 1em;">Community-supported ppc64le +docker container for TensorFlow. \</span>\<a +href="<https://hub.docker.com/r/ibmcom/powerai/>" +target="\_blank">PowerAI container\</a> - \<span style="font-size: +1em;">Official Docker container with Tensorflow, PyTorch and many other +packages. Heavy container. It requires a lot of space. Could be found on +Taurus.\</span> + +Note: You could find other versions of software in the container on the +"tag" tab on the docker web page of the container. + +To use not a pure Tensorflow, PyTorch but also with some Python packages +you have to use the definition file to create the container +(bootstrapping). For details please see the [Container](Container) page +from our wiki. Bootstrapping **has required root privileges** and +Virtual Machine (VM) should be used! There are two main options on how +to work with VM on Taurus: [VM tools](VMTools) - automotive algorithms +for using virtual machines; [Manual method](Cloud) - it requires more +operations but gives you more flexibility and reliability. + +-- Main.AndreiPolitov - 2020-02-05 + +- [machine_learning_example.py](%ATTACHURL%/machine_learning_example.py): + machine_learning_example.py +- [example_TensofFlow_MNIST.zip](%ATTACHURL%/example_TensofFlow_MNIST.zip): + example_TensofFlow_MNIST.zip +- [example_Pytorch_MNIST.zip](%ATTACHURL%/example_Pytorch_MNIST.zip): + example_Pytorch_MNIST.zip +- [example_Pytorch_image_recognition.zip](%ATTACHURL%/example_Pytorch_image_recognition.zip): + example_Pytorch_image_recognition.zip +- [example_TensorFlow_Automobileset.zip](%ATTACHURL%/example_TensorFlow_Automobileset.zip): + example_TensorFlow_Automobileset.zip +- [HPC-Introduction.pdf](%ATTACHURL%/HPC-Introduction.pdf): + HPC-Introduction.pdf +- [HPC-DA-Introduction.pdf](%ATTACHURL%/HPC-DA-Introduction.pdf): + HPC-DA-Introduction.pdf + +\<div id="gtx-trans" style="position: absolute; left: -5px; top: +526.833px;"> \</div> diff --git a/twiki2md/root/HPCDA/NvmeStorage.md b/twiki2md/root/HPCDA/NvmeStorage.md new file mode 100644 index 000000000..565904df9 --- /dev/null +++ b/twiki2md/root/HPCDA/NvmeStorage.md @@ -0,0 +1,16 @@ +# NVMe Storage + +## Hardware\<img align="right" alt="nvme.png" src="%ATTACHURL%/nvme.png" title="nvme.png" width="150" /> + +90 NVMe storage nodes, each with + +- 8x Intel NVMe Datacenter SSD P4610, 3.2 TB +- 3.2 GB/s (8x 3.2 =25.6 GB/s) +- 2 Infiniband EDR links, Mellanox MT27800, ConnectX-5, PCIe x16, 100 + Gbit/s +- 2 sockets Intel Xeon E5-2620 v4 (16 cores, 2.10GHz) +- 64 GB RAM + +NVMe cards can saturate the HCAs + +-- Main.UlfMarkwardt - 2019-05-07 diff --git a/twiki2md/root/HPCDA/Power9.md b/twiki2md/root/HPCDA/Power9.md new file mode 100644 index 000000000..2fb4ad4b2 --- /dev/null +++ b/twiki2md/root/HPCDA/Power9.md @@ -0,0 +1,11 @@ +# IBM Power9 Nodes for Machine Learning + +For machine learning, we have 32 IBM AC922 nodes installed with this +configuration: + +- 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores) +- 256 GB RAM DDR4 2666MHz +- 6x NVIDIA VOLTA V100 with 32GB HBM2 +- NVLINK bandwidth 150 GB/s between GPUs and host + +-- Main.UlfMarkwardt - 2019-05-07 diff --git a/twiki2md/root/HPCDA/PyTorch.md b/twiki2md/root/HPCDA/PyTorch.md new file mode 100644 index 000000000..e9dee25d2 --- /dev/null +++ b/twiki2md/root/HPCDA/PyTorch.md @@ -0,0 +1,332 @@ +# Pytorch for Data Analytics + + + +\<a href="<https://pytorch.org/>" target="\_blank">PyTorch\</a>\<span +style="font-size: 1em;"> is an open-source machine learning framework. +It is an optimized tensor library for deep learning using GPUs and CPUs. +PyTorch is a machine learning tool developed by Facebooks AI division to +process large-scale object detection, segmentation, classification, etc. +\</span>\<span style="font-size: 1em;">PyTorch provides a core data +structure, the tensor, a \</span>\<span style="font-size: +1em;">multi-dimensional array that shares many similarities +\</span>\<span style="font-size: 1em;">with Numpy arrays. PyTorch also +consumed Caffe2 for its backend and added support of ONNX.\</span> + +**Prerequisites:** To work with PyTorch you obviously need \<a +href="Login" target="\_blank">access\</a> for the Taurus system and +basic knowledge about Python, Numpy and SLURM system. + +\<span style="font-size: 1em;">**Aim**of this page is to introduce users +on how to start working with PyTorch on the \</span>\<a href="HPCDA" +target="\_self">HPC-DA\</a>\<span style="font-size: 1em;"> system - part +of the TU Dresden HPC system.\</span> + +There are numerous different possibilities of how to work with PyTorch +on Taurus. Here we will consider two main methods. + +1\. The first option is using Jupyter notebook with HPC-DA nodes. The +easiest way is by using [Jupyterhub](JupyterHub). It is a recommended +way for beginners in PyTorch and users who are just starting their work +with Taurus. + +2\. The second way is using the \<a +href="RuntimeEnvironment#Module_Environments" target="\_blank">Modules +system\</a> and Python or conda virtual environment. See [the Python +page](Python) for the HPC-DA system. + +Note: The information on working with the PyTorch using Containers could +be found [here](TensorFlowContainerOnHPCDA). + +## Get started with PyTorch + +### Virtual environment + +For working with PyTorch and python packages using virtual environments +(kernels) is necessary. + +Creating and using your kernel (environment) has the benefit that you +can install your preferred python packages and use them in your +notebooks. + +A virtual environment is a cooperatively isolated runtime environment +that allows Python users and applications to install and upgrade Python +distribution packages without interfering with the behaviour of other +Python applications running on the same system. So the [Virtual +environment](https://docs.python.org/3/glossary.html#term-virtual-environment) +is a self-contained directory tree that contains a Python installation +for a particular version of Python, plus several additional packages. At +its core, the main purpose of Python virtual environments is to create +an isolated environment for Python projects. Python virtual environment +is the main method to work with Deep Learning software as PyTorch on the +\<a href="HPCDA" target="\_self">HPC-DA\</a> system. + +### Conda and Virtualenv + +There are two methods of how to work with virtual environments on +Taurus: + +1.** Vitualenv (venv)** is a standard Python tool to create isolated +Python environments. In general, It is the preferred interface for +managing installations and virtual environments on Taurus. It has been +integrated into the standard library under the \<a +href="<https://docs.python.org/3/library/venv.html>" target="\_top">venv +module\</a>. We recommend using **venv** to work with Python packages +and Tensorflow on Taurus. + +2\. The** conda** command is the interface for managing installations +and virtual environments on Taurus. The **conda** is a tool for managing +and deploying applications, environments and packages. Conda is an +open-source package management system and environment management system +from Anaconda. The conda manager is included in all versions of Anaconda +and Miniconda.\<br />%RED%**Important note!**<span +class="twiki-macro ENDCOLOR"></span> Due to the use of Anaconda to +create PyTorch modules for the ml partition, it is recommended to use +the conda environment for working with the PyTorch to avoid conflicts +over the sources of your packages (pip or conda). + +**Note:** \<span style="font-size: 1em;"> Keep in mind that you +**cannot** \</span>\<span style="font-size: 1em;">use conda for working +with the virtual environments previously created with Vitualenv tool and +vice versa\</span> + +This example shows how to install and start working with PyTorch (with +using module system) + + srun -p ml -N 1 -n 1 -c 2 --gres=gpu:1 --time=01:00:00 --pty --mem-per-cpu=5772 bash #Job submission in ml nodes with 1 gpu on 1 node with 2 CPU and with 5772 mb for each cpu. + module load modenv/ml #Changing the environment. Example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml + mkdir python-virtual-environments #Create folder + cd python-virtual-environments #Go to folder + module load PythonAnaconda/3.6 #Load Anaconda with Python. Example output: Module Module PythonAnaconda/3.6 loaded. + which python #Check which python are you using + python3 -m venv --system-site-packages envtest #Create virtual environment + source envtest/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ + module load PyTorch #Load PyTorch module. Example output: Module PyTorch/1.1.0-PythonAnaconda-3.6 loaded. + python #Start python + import torch + torch.version.__version__ #Example output: 1.1.0 + +\<span style`"font-size: 1em;">Keep in mind that using </span> ==srun=` +\<span style="font-size: 1em;"> directly on the shell will lead to +blocking and launch an interactive job. Apart from short test runs, it +is \</span> **recommended to launch your jobs into the background by +using batch jobs** \<span style="font-size: 1em;">. For that, you can +conveniently put the parameters directly into the job file which you can +submit using \</span>`<span> *sbatch [options] <job file>* </span>.` + +## Running the model and examples + +Below are examples of Jupyter notebooks with PyTorch models which you +can run on ml nodes of HPC-DA. + +There are two ways how to work with the Jupyter notebook on HPC-DA +system. \<span style="font-size: 1em;">You can use a \</span> [remote +Jupyter server](DeepLearning)\<span style="font-size: 1em;"> or \<a +href="JupyterHub" target="\_blank">Jupyterhub\</a>. Jupyterhub is a +simple and recommended way to use PyTorch. We are using Jupyterhub for +our examples. \</span> + +Prepared examples of PyTorch models give you an understanding of how to +work with Jupyterhub and PyTorch models. It can be useful and +instructive to start your acquaintance with PyTorch and HPC-DA system +from these simple examples. + +\<span style="font-size: 1em;">JupyterHub is available here: \</span>\<a +href="<https://taurus.hrsk.tu-dresden.de/jupyter>" +target="\_top"><https://taurus.hrsk.tu-dresden.de/jupyter>\</a> + +After login, you can start a new session by clicking on the button. + +%RED%Note:<span class="twiki-macro ENDCOLOR"></span> Detailed guide +(\<span style="font-size: 1em;">with pictures and instructions) +\</span>\<span style="font-size: 1em;">how to run the Jupyterhub you +could find on [the page](JupyterHub). \</span> + +\<span style="font-size: 1em;">Please choose the "IBM Power (ppc64le)" +architecture. \</span>\<span style="font-size: 1em;">You need to +download an example (prepared as jupyter notebook file) that already +contains all you need for the start of the work. Please put the file +into your previously created virtual environment in your working +directory or use the kernel for your notebook (\</span> [see Jupyterhub +page](JupyterHub)\<span style="font-size: 1em;">).\</span> + +%RED%Note<span class="twiki-macro ENDCOLOR"></span>: You could work with +simple examples in your home directory but according to \<a +href="HPCStorageConcept2019" target="\_blank">storage concept\</a>** +please use \<a href="WorkSpaces" target="\_blank">workspaces\</a> for +your study and work projects**. For this reason, you have to use +advanced options of Jupyterhub and put "/" in "Workspace scope" field. + +\<span style="font-size: 1em;">To download the first example (from the +list below) into your previously created virtual environment you could +use the following command:\</span> + + ws_list #list of your workspaces + cd <name_of_your_workspace> #go to workspace + + wget https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/PyTorch/example_MNIST_Pytorch.zip + unzip example_MNIST_Pytorch.zip + +Also, you could use kernels for all notebooks, not only for them which +placed in your virtual environment. See the \<a href="JupyterHub" +target="\_blank">jupyterhub\</a> page. + +Examples: + +1\. Simple MNIST model. The MNIST database is a large database of +handwritten digits that is commonly used for \<a +href="<https://en.wikipedia.org/wiki/Training_set>" title="Training +set">t\</a>raining various image processing systems. PyTorch allows us +to import and download the MNIST dataset directly from the Torchvision - +package consists of datasets, model architectures and transformations. +The model contains a neural network with sequential architecture and +typical modules for this kind of models. Recommended parameters for +running this model are 1 GPU and 7 cores (28 thread) + +[https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/PyTorch/example_MNIST_Pytorch.zip](%ATTACHURL%/example_MNIST_Pytorch.zip) + +#### Running the model + +Open \<a href="<https://taurus.hrsk.tu-dresden.de/jupyter>" +target="\_blank">JupyterHub\</a> and follow instructions above. + +\<span style="font-size: 1em;">In Jupyterhub documents are organized +with tabs and a very versatile split-screen feature. On the left side of +the screen, you can open your file. Use 'File-Open from Path' to go to +your workspace (e.g. /scratch/ws/\<username-name_of_your_ws>). You could +run each cell separately step by step and analyze the result of each +step. Default command for running one cell Shift+Enter'. Also, you could +run all cells with the command 'run all cells' in the 'Run' Tab.\</span> + +## Components and advantages of the PyTorch + +### Pre-trained networks + +\<span style="font-size: 1em;">The PyTorch gives you an opportunity to +use pre-trained models and networks for your purposes (as a TensorFlow +for instance) especially for computer vision and image recognition. As +you know computer vision is one of the fields that have been most +impacted by the advent of deep learning.\</span> + +We will use a network trained on ImageNet, taken from the TorchVision +project, which contains a few of the best performing neural network +architectures for computer vision, such as AlexNet, one of the early +breakthrough networks for image recognition, and ResNet, which won the +ImageNet classification, detection, and localization competitions, in +2015. \<a href="<https://github.com/pytorch/vision>" +target="\_blank">TorchVision\</a> also has easy access to datasets like +ImageNet and other utilities for getting up to speed with computer +vision applications in PyTorch. The pre-defined models can be found in +torchvision.models. + +%RED%Important note:<span class="twiki-macro ENDCOLOR"></span> For the +ml nodes only the Torchvision 0.2.2. is available (10.11.20). The last +updates from IBM include only Torchvision 0.4.1 CPU version. Be careful +some features from modern versions of Torchvision are not available in +the 0.2.2 (e.g. some kinds of `transforms`). Always check the version +with: `print(torchvision.__version__)` + +Examples: + +\<span style="font-size: 1em;">1. Image recognition example. This +PyTorch script is using Resnet to single image classification. +Recommended parameters for running this model are 1 GPU and 7 cores (28 +thread).\</span> + +[https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/PyTorch/example_Pytorch_image_recognition.zip](%ATTACHURL%/example_Pytorch_image_recognition.zip) + +Remember that for using [JupyterHub service](JupyterHub) for PyTorch you +need to create and activate a virtual environment (kernel) with loaded +essential modules (see "envtest" environment form the virtual +environment example). + +\<span style="font-size: 1em;">Run the example in the same way as the +previous example (MNIST model).\</span> + +### Using Multiple GPUs with PyTorch + +Effective use of GPUs is essential, and it implies using parallelism in +your code and model. \<span style="font-size: 1em;">Data Parallelism and +model parallelism are effective instruments to improve the performance +of your code in case of GPU using. \</span> + +\<span style="font-size: 1em;">The data parallelism\</span>\<span +style="font-size: 1em;"> is a widely-used technique. It replicates the +same model to all GPUs, where each GPU consumes a different partition of +the input data. You could see this method \<a +href="<https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html>" +target="\_blank">here\</a>.\</span> + +The example below shows how to solve that problem by using model +parallel, which, in contrast to data parallelism, splits a single model +onto different GPUs, rather than replicating the entire model on each +GPU. \<span style="font-size: 1em;">The high-level idea of model +parallel is to place different sub-networks of a model onto different +devices\</span>\<span style="font-size: 1em;">. As the only part of a +model operates on any individual device, a set of devices can +collectively serve a larger model.\</span> + +It is recommended to use \<a +href="<https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel>" +title="torch.nn.parallel.DistributedDataParallel">`DistributedDataParallel`\</a>, +instead of this class, to do multi-GPU training, even if there is only a +single node. See: [Use nn.parallel.DistributedDataParallel instead of +multiprocessing or +nn.DataParallel](https://pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead) +and [Distributed Data +Parallel](https://pytorch.org/docs/stable/notes/ddp.html#ddp). + +Examples: + +1\. The parallel model. The main aim of this model to show the way how +to effectively implement your neural network on several GPUs. It +includes a comparison of different kinds of models and tips to improve +the performance of your model. **Necessary** parameters for running this +model are **2 GPU** and 14 cores (56 thread). + +[https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/PyTorch/example_PyTorch_parallel.zip](%ATTACHURL%/example_PyTorch_parallel.zip?t=1572619180) + +Remember that for using [JupyterHub service](JupyterHub) for PyTorch you +need to create and activate a virtual environment (kernel) with loaded +essential modules. + +Run the example in the same way as the previous examples. + +#### Distributed data-parallel + +[DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel) +(DDP) implements data parallelism at the module level which can run +across multiple machines. Applications using DDP should spawn multiple +processes and create a single DDP instance per process. DDP uses +collective communications in the +[torch.distributed](https://pytorch.org/tutorials/intermediate/dist_tuto.html) +package to synchronize gradients and buffers. + +The tutorial could be found +[here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). + +To use distributed data parallelisation on Taurus please use following +parameters: `--ntasks-per-node` -parameter to the number of GPUs you use +per node. Also, it could be useful to increase `memomy/cpu` parameters +if you run larger models. Memory can be set up to: + +\<span>--mem=250000\</span> and \<span>--cpus-per-task=7 \</span>for the +**ml** partition. + +--mem=60000 and --cpus-per-task=6 for the **gpu2** partition. + +Keep in mind that only one memory parameter (`--mem-per-cpu` =\<MB> or +\<span>--mem=\</span>\<MB>**)**can be specified + +## F.A.Q + +-- Main.AndreiPolitov - 2019-08-12 + +- [example_MNIST_Pytorch.zip](%ATTACHURL%/example_MNIST_Pytorch.zip): + example_MNIST_Pytorch.zip +- [example_Pytorch_image_recognition.zip](%ATTACHURL%/example_Pytorch_image_recognition.zip): + example_Pytorch_image_recognition.zip +- [example_PyTorch_parallel.zip](%ATTACHURL%/example_PyTorch_parallel.zip): + example_PyTorch_parallel.zip \<div id="gtx-anchor" style="position: + absolute; visibility: hidden; left: 46.4875px; top: 3470.69px; + width: 353.613px; height: 14.4px;"> \</div> diff --git a/twiki2md/root/HPCDA/Python.md b/twiki2md/root/HPCDA/Python.md new file mode 100644 index 000000000..7e1d4743e --- /dev/null +++ b/twiki2md/root/HPCDA/Python.md @@ -0,0 +1,279 @@ +# Python for Data Analytics + + + +Python is a high-level interpreted language widely used in research and +science. Using HPC allows you to work with python quicker and more +effective. Taurus allows working with a lot of available packages and +libraries which give more useful functionalities and allow use all +features of Python and to avoid minuses. + +**Prerequisites:** To work with Python, you obviously need \<a +href="Login" target="\_blank">access\</a> for the Taurus system and +basic knowledge about Python, SLURM system. + +**Aim** of this page is to introduce users on how to start working with +Python on the \<a href="HPCDA" target="\_self">HPC-DA\</a> system - part +of the TU Dresden HPC system. + +\<span style="font-size: 1em;">There are three main options on how to +work with Keras and Tensorflow on the HPC-DA: 1. Modules; 2. \</span> +[JupyterNotebook](JupyterHub)\<span style="font-size: 1em;">; 3. +\</span> [Containers](TensorFlowContainerOnHPCDA)\<span +style="font-size: 1em;">. The main way is using the \</span> [Modules +system](RuntimeEnvironment#Module_Environments)\<span style="font-size: +1em;"> and Python virtual environment.\</span> + +You could work with simple examples in your home directory but according +to \<a href="HPCStorageConcept2019" target="\_blank">the storage +concept\</a>** please use \<a href="WorkSpaces" +target="\_blank">workspaces\</a> for your study and work projects**. + +## Virtual environment + +There are two methods of how to work with virtual environments on +Taurus: + +1.** Vitualenv** is a standard Python tool to create isolated Python +environments. It is the preferred interface for managing installations +and virtual environments on Taurus and part of the Python modules. + +2\. **Conda** is an alternative method for managing installations and +virtual environments on Taurus. Conda is an open-source package +management system and environment management system from Anaconda. The +conda manager is included in all versions of Anaconda and Miniconda. + +**Note:** Keep in mind that you **cannot** use virtualenv for working +with the virtual environments previously created with conda tool and +vice versa! Prefer virtualenv whenever possible. + +\<span style="font-size: 1em;">This example shows how to start working +with \</span> **Virtualenv** \<span style="font-size: 1em;"> and Python +virtual environment (using the module system) \</span> + + srun -p ml -N 1 -n 1 -c 7 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash #Job submission in ml nodes with 1 gpu on 1 node. + + mkdir python-environments # Optional: Create folder. Please use Workspaces!<br /><br />module load modenv/ml #Changing the environment. Example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml<br />ml av Python #Check the available modules with Python + module load Python #Load default Python. Example output: Module Python/3.7.4-GCCcore-8.3.0 with 7 dependencies loaded + which python #Check which python are you using + virtualenv --system-site-packages python-environments/envtest #Create virtual environment + source python-environments/envtest/bin/activate #Activate virtual environment. Example output: (envtest) bash-4.2$ + python #Start python + from time import gmtime, strftime + print(strftime("%Y-%m-%d %H:%M:%S", gmtime())) #Example output: 2019-11-18 13:54:16<br /><br />deactivate # Leave the virtual environment + +The \<a href="<https://virtualenv.pypa.io/en/latest/>" title="Creation +of virtual environments.">virtualenv\</a> Python module (Python 3) +provides support for creating virtual environments with their own site +directories, optionally isolated from system site directories. Each +virtual environment has its own Python binary (which matches the version +of the binary that was used to create this environment) and can have its +own independent set of installed Python packages in its site +directories. This allows you to manage separate package installations +for different projects. It essentially allows us to create a virtual +isolated Python installation and install packages into that virtual +installation. When you switch projects, you can simply create a new +virtual environment and not have to worry about breaking the packages +installed in other environments. + +In your virtual environment, you can use packages from the [Complete +List of Modules](SoftwareModulesList) or if you didn't find what you +need you can install required packages with the command: \<span>pip +install\</span>. With the command \<span>pip freeze\</span>, you can see +a list of all installed packages and their versions. + +This example shows how to start working with **Conda** and virtual +environment (with using module system) + + srun -p ml -N 1 -n 1 -c 7 --mem-per-cpu=5772 --gres=gpu:1 --time=04:00:00 --pty bash # Job submission in ml nodes with 1 gpu on 1 node. + + module load modenv/ml + mkdir conda-virtual-environments #create a folder + cd conda-virtual-environments #go to folder + which python #check which python are you using + module load PythonAnaconda/3.6 #load Anaconda module + which python #check which python are you using now + + conda create -n conda-testenv python=3.6 #create virtual environment with the name conda-testenv and Python version 3.6 + conda activate conda-testenv #activate conda-testenv virtual environment + + conda deactivate #Leave the virtual environment + +\<span style="font-size: 1em;">You can control where a conda environment +lives by providing a path to a target directory when creating the +environment. For example, the following command will create a new +environment in a workspace located in '\<span>scratch\</span>'\</span> + + conda create --prefix /scratch/ws/<name_of_your_workspace>/conda-virtual-environment/<name_of_your_environment> + +%RED%Please pay attention<span class="twiki-macro ENDCOLOR"></span>, +using srun directly on the shell will lead to blocking and launch an +interactive job. Apart from short test runs, it is **recommended to +launch your jobs into the background by using \<a href="Slurm" +target="\_blank">batch jobs\</a>**. For that, you can conveniently put +the parameters directly into the job file which you can submit using +`sbatch [options] <job file>.` + +\<span style="color: #222222; font-size: 1.385em;">Jupyter +Notebooks\</span> + +Jupyter notebooks are a great way for interactive computing in your web +browser. Jupyter allows working with data cleaning and transformation, +numerical simulation, statistical modelling, data visualization and of +course with machine learning. + +There are two general options on how to work Jupyter notebooks using +HPC. + +\<span style="font-size: 1em;">On Taurus, there is \</span>\<a +href="JupyterHub" target="\_self">jupyterhub\</a>\<span +style="font-size: 1em;">, where you can simply run your Jupyter notebook +on HPC nodes. Also, you can run a remote jupyter server within a sbatch +GPU job and with the modules and packages you need. The manual server +setup you can find \</span>\<a href="DeepLearning" +target="\_blank">here.\</a> + +\<span style="font-size: 1em;">With Jupyterhub you can work with general +data analytics tools. This is the recommended way to start working with +the Taurus. However, some special instruments could not be available on +the Jupyterhub. \</span>\<span style="font-size: 1em;">Keep in mind that +the remote Jupyter server can offer more freedom with settings and +approaches.\</span> + +## MPI for Python + +Message Passing Interface (MPI) is a standardized and portable +message-passing standard designed to function on a wide variety of +parallel computing architectures. The Message Passing Interface (MPI) is +a library specification that allows HPC to pass information between its +various nodes and clusters. MPI designed to provide access to advanced +parallel hardware for end-users, library writers and tool developers. + +#### Why use MPI? + +MPI provides a powerful, efficient and portable way to express parallel +programs. \<span style="font-size: 1em;">Among many parallel +computational models, message-passing has proven to be an effective +one.\</span> + +### Parallel Python with mpi4py + +Mpi4py(MPI for Python) package provides bindings of the MPI standard for +the python programming language, allowing any Python program to exploit +multiple processors. + +#### Why use mpi4py? + +Mpi4py based on MPI-2 C++ bindings. It supports almost all MPI calls. +This implementation is popular on Linux clusters and in the SciPy +community. Operations are primarily methods of communicator objects. It +supports communication of pickleable Python objects. Mpi4py provides +optimized communication of NumPy arrays. + +Mpi4py is included as an extension of the SciPy-bundle modules on +taurus. + +Please check the \<a +href`"SoftwareModulesList" target="_blank">software module list</a> for the new modules. The availability of the mpi4py in the module you can check by the <b><span>module whatis <name_of_the module> </span></b>command. The =module whatis` +command displays a short information and included extensions of the +module. + +Moreover, it is possible to install mpi4py in your local conda +environment: + + srun -p ml --time=04:00:00 -n 1 --pty --mem-per-cpu=8000 bash #allocate recources + module load modenv/ml + module load PythonAnaconda/3.6 #load module to use conda + conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create conda virtual environment + + conda activate <location_for_your_environment> #activate your virtual environment + + conda install -c conda-forge mpi4py #install mpi4py + + python #start python + + from mpi4py import MPI #verify your mpi4py + comm = MPI.COMM_WORLD + print("%d of %d" % (comm.Get_rank(), comm.Get_size())) + +### Horovod + +\<a href="<https://github.com/horovod/horovod>" +target="\_blank">Horovod\</a> is the open source distributed training +framework for TensorFlow, Keras, PyTorch. It is supposed to make it easy +to develop distributed deep learning projects and speed them up with +TensorFlow. + +#### Why use Horovod? + +Horovod allows you to easily take a single-GPU TensorFlow and Pytorch +program and successfully train it on many GPUs \<a +href="<https://eng.uber.com/horovod/>" target="\_blank">faster\</a>! In +some cases, the MPI model is much more straightforward and requires far +less code changes than the distributed code from TensorFlow for +instance, with parameter servers. Horovod uses MPI and NCCL which gives +in some cases better results than pure TensorFlow and PyTorch. + +#### Horovod as a module + +Horovod is available as a module with **TensorFlow** or **PyTorch**for +**all** module environments. Please check the [software module +list](SoftwareModulesList) for the current version of the software. +Horovod can be loaded like other software on the Taurus: + + ml av Horovod #Check available modules with Python + module load Horovod #Loading of the module + +#### Horovod installation + +However, if it is necessary to use Horovod with **PyTorch** or use +another version of Horovod it is possible to install it manually. To +install Horovod you need to create a virtual environment and load the +dependencies (e.g. MPI). Installing PyTorch can take a few hours and is +not recommended + +%RED%Note:<span class="twiki-macro ENDCOLOR"></span> You could work with +simple examples in your home directory but **please use \<a +href="WorkSpaces" target="\_blank">workspaces\</a> for your study and +work projects**(see the \<a href="HPCStorageConcept2019" +target="\_blank">Storage concept\</a>). + +Setup: + + srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes)<br /><br />module load modenv/ml #Load dependencies by using modules <br />module load OpenMPI/3.1.4-gcccuda-2018b<br />module load Python/3.6.6-fosscuda-2018b<br />module load cuDNN/7.1.4.18-fosscuda-2018b<br />module load CMake/3.11.4-GCCcore-7.3.0<br /><br />virtualenv --system-site-packages <location_for_your_environment> #create virtual environment<br /><br />source <location_for_your_environment>/bin/activate #activate virtual environment + +Or when you need to use conda: \<br /> + + srun -N 1 --ntasks-per-node=6 -p ml --time=08:00:00 --pty bash #allocate a Slurm job allocation, which is a set of resources (nodes)<br /><br />module load modenv/ml #Load dependencies by using modules <br />module load OpenMPI/3.1.4-gcccuda-2018b<br />module load PythonAnaconda/3.6<br />module load cuDNN/7.1.4.18-fosscuda-2018b<br />module load CMake/3.11.4-GCCcore-7.3.0<br /><br />conda create --prefix=<location_for_your_environment> python=3.6 anaconda #create virtual environment<br /><br />conda activate <location_for_your_environment> #activate virtual environment + +Install Pytorch (not recommended)\<br /> + + cd /tmp<br />git clone https://github.com/pytorch/pytorch #clone Pytorch from the source<br />cd pytorch #go to folder<br />git checkout v1.7.1 #Checkout version (example: 1.7.1)<br />git submodule update --init #Update dependencies<br />python setup.py install #install it with python<br /><br />cd - + +##### Install Horovod for Pytorch with python and pip + +In the example presented installation for the Pytorch without +TensorFlow. Adapt as required and refer to the horovod documentation for +details. + + HOROVOD_GPU_ALLREDUCE=MPI HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod + +##### Verify that Horovod works + + python #start python + import torch #import pytorch + import horovod.torch as hvd #import horovod + hvd.init() #initialize horovod + hvd.size() + hvd.rank() + print('Hello from:', hvd.rank()) + +##### Horovod with NCCL + +If you want to use NCCL instead of MPI you can specify that in the +install command after loading the NCCL module: + + module load NCCL/2.3.7-fosscuda-2018b<br />HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 pip install --no-cache-dir horovod + +\<div id="gtx-trans" style="position: absolute; left: 386px; top: +2567.99px;"> \</div> diff --git a/twiki2md/root/HPCDA/TensorFlow.md b/twiki2md/root/HPCDA/TensorFlow.md new file mode 100644 index 000000000..7dca1c9cd --- /dev/null +++ b/twiki2md/root/HPCDA/TensorFlow.md @@ -0,0 +1,277 @@ +# TensorFlow + + + +\<span style="color: #222222; font-size: 1.385em;">Introduction\</span> + +This is an introduction of how to start working with TensorFlow and run +machine learning applications on the [HPC-DA](HPCDA) system of Taurus. + +\<span style="font-size: 1em;">On the machine learning nodes (machine +learning partition), you can use the tools from\</span> [IBM Power +AI](PowerAI)\<span style="font-size: 1em;"> or the other +modules.\</span> \<span style="font-size: 1em;">PowerAI is an enterprise +software distribution that combines popular open-source deep learning +frameworks, efficient AI development tools (Tensorflow, Caffe, etc). For +this page and examples was used \</span>\<a +href="<https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.5.4/navigation/pai_software_pkgs.html>" +target="\_blank">PowerAI version 1.5.4\</a>\<span style="font-size: +1em;">.\</span> + +\<a href="<https://www.tensorflow.org/guide/>" +target="\_blank">TensorFlow\</a> is a free end-to-end open-source +software library for dataflow and differentiable programming across many +tasks. It is a symbolic math library, used primarily for machine +learning applications. \<span style="font-size: 1em;">It has a +comprehensive, flexible ecosystem of tools, libraries and community +resources. It is available on taurus along with other common machine +learning packages like Pillow, SciPY, Numpy.\</span> + +**Prerequisites:** To work with Tensorflow on Taurus, you obviously need +\<a href="Login" target="\_blank">access\</a> for the Taurus system and +basic knowledge about Python, SLURM system. + +**Aim** of this page is to introduce users on how to start working with +TensorFlow on the \<a href="HPCDA" target="\_self">HPC-DA\</a> system - +part of the TU Dresden HPC system. + +There are three main options on how to work with Tensorflow on the +HPC-DA: **1.** **Modules,** **2.** **JupyterNotebook, 3. Containers**. +The main way using the \<a href="RuntimeEnvironment#Module_Environments" +target="\_blank">Modules system\</a> and Python virtual environment. +Please see the next chapters and the [Python page](Python) for the +HPC-DA system. + +The information about the Jupyter notebook and the **JupyterHub** could +be found \<a href="JupyterHub" target="\_blank">here\</a>. The use of +Containers is described \<a href="TensorFlowContainerOnHPCDA" +target="\_blank">here\</a>. + +\<span +style`"font-size: 1em;">On Taurus, there exist different module environments, each containing a set of software modules. The default is *modenv/scs5* which is already loaded, however for the HPC-DA system using the "ml" partition you need to use *modenv/ml*. To find out which partition are you using use: =ml list.` +You can change the module environment with the command: \</span> + + module load modenv/ml + +\<span style="font-size: 1em;">The machine learning partition is based +on the PowerPC Architecture (ppc64le) (Power9 processors), which means +that the software built for x86_64 will not work on this partition, so +you most likely can't use your already locally installed packages on +Taurus. Also, users need to use the modules which are specially made for +the ml partition (from modenv/ml) and not for the rest of taurus (e.g. +from modenv/scs5). \</span> + +\<span style="font-size: 1em;">Each node on the ml partition has 6x +Tesla V-100 GPUs, with 176 parallel threads on 44 cores per node +(Simultaneous multithreading (SMT) enabled) and 256GB RAM. The +specification could be found [here](Power9).\</span> + +%RED%Note:<span class="twiki-macro ENDCOLOR"></span> Users should not +reserve more than 28 threads per each GPU device so that other users on +the same node still have enough CPUs for their computations left. + +## Get started with Tensorflow + +This example shows how to install and start working with TensorFlow +(with using modules system) and the python virtual environment. Please, +check the next chapter for the details about the virtual environment. + + srun -p ml --gres=gpu:1 -n 1 -c 7 --pty --mem-per-cpu=8000 bash #Job submission in ml nodes with 1 gpu on 1 node with 8000 mb. + + module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml + + mkdir python-environments #create folder + module load TensorFlow #load TensorFlow module. Example output: Module TensorFlow/1.10.0-PythonAnaconda-3.6 and 1 dependency loaded. + which python #check which python are you using + virtualenvv --system-site-packages python-environments/env #create virtual environment "env" which inheriting with global site packages + source python-environments/env/bin/activate #Activate virtual environment "env". Example output: (env) bash-4.2$ + python #start python + import tensorflow as tf + print(tf.VERSION) #example output: 1.10.0 + +Keep in mind that using **srun** directly on the shell will be blocking +and launch an interactive job. Apart from short test runs, it is +recommended to launch your jobs into the background by using batch +jobs:\<span> **sbatch \[options\] \<job file>** \</span>. The example +will be presented later on the page. + +As a Tensorflow example, we will use a \<a +href="<https://www.tensorflow.org/tutorials>" target="\_blank">simple +mnist model\</a>. Even though this example is in Python, the information +here will still apply to other tools. + +The ml partition has very efficacious GPUs to offer. Do not assume that +more power means automatically faster computational speed. The GPU is +only one part of a typical machine learning application. Do not forget +that first the input data needs to be loaded and in most cases even +rescaled or augmented. If you do not specify that you want to use more +than the default one worker (=one CPU thread), then it is very likely +that your GPU computes faster, than it receives the input data. It is, +therefore, possible, that you will not be any faster, than on other GPU +partitions. \<span style="font-size: 1em;">You can solve this by using +multithreading when loading your input data. The \</span>\<a +href="<https://keras.io/models/sequential/#fit_generator>" +target="\_blank">fit_generator\</a>\<span style="font-size: 1em;"> +method supports multiprocessing, just set \`use_multiprocessing\` to +\`True\`, \</span>\<a href="Slurm#Job_Submission" +target="\_blank">request more Threads\</a>\<span style="font-size: +1em;"> from SLURM and set the \`Workers\` amount accordingly.\</span> + +The example below with a \<a +href="<https://www.tensorflow.org/tutorials>" target="\_blank">simple +mnist model\</a> of the python script illustrates using TF-Keras API +from TensorFlow. \<a href="<https://www.tensorflow.org/guide/keras>" +target="\_top">Keras\</a> is TensorFlows high-level API. + +**You can read in detail how to work with Keras on Taurus \<a +href="Keras" target="\_blank">here\</a>.** + + import tensorflow as tf + # Load and prepare the MNIST dataset. Convert the samples from integers to floating-point numbers: + mnist = tf.keras.datasets.mnist + + (x_train, y_train),(x_test, y_test) = mnist.load_data() + x_train, x_test = x_train / 255.0, x_test / 255.0 + + # Build the tf.keras model by stacking layers. Select an optimizer and loss function used for training + model = tf.keras.models.Sequential([ + tf.keras.layers.Flatten(input_shape=(28, 28)), + tf.keras.layers.Dense(512, activation=tf.nn.relu), + tf.keras.layers.Dropout(0.2), + tf.keras.layers.Dense(10, activation=tf.nn.softmax) + ]) + model.compile(optimizer='adam', + loss='sparse_categorical_crossentropy', + metrics=['accuracy']) + + # Train and evaluate model + model.fit(x_train, y_train, epochs=5) + model.evaluate(x_test, y_test) + +The example can train an image classifier with \~98% accuracy based on +this dataset. + +## Python virtual environment + +A virtual environment is a cooperatively isolated runtime environment +that allows Python users and applications to install and update Python +distribution packages without interfering with the behaviour of other +Python applications running on the same system. At its core, the main +purpose of Python virtual environments is to create an isolated +environment for Python projects. + +**Vitualenv**is a standard Python tool to create isolated Python +environments and part of the Python installation/module. We recommend +using virtualenv to work with Tensorflow and Pytorch on Taurus.\<br +/>However, if you have reasons (previously created environments etc) you +can also use conda which is the second way to use a virtual environment +on the Taurus. \<a +href="<https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>" +target="\_blank">Conda\</a> is an open-source package management system +and environment management system. Note that using conda means that +working with other modules from taurus will be harder or impossible. +Hence it is highly recommended to use virtualenv. + +## Running the sbatch script on ML modules (modenv/ml) and SCS5 modules (modenv/scs5) + +Generally, for machine learning purposes the ml partition is used but +for some special issues, the other partitions can be useful also. The +following sbatch script can execute the above Python script both on ml +partition or gpu2 partition.\<br /> When not using the +TensorFlow-Anaconda modules you may need some additional modules that +are not included (e.g. when using the TensorFlow module from modenv/scs5 +on gpu2).\<br />If you have a question about the sbatch script see the +article about \<a href="Slurm" target="\_blank">SLURM\</a>. Keep in mind +that you need to put the executable file (machine_learning_example.py) +with python code to the same folder as the bash script file +\<script_name>.sh (see below) or specify the path. + + #!/bin/bash + #SBATCH --mem=8GB # specify the needed memory + #SBATCH -p ml # specify ml partition or gpu2 partition + #SBATCH --gres=gpu:1 # use 1 GPU per node (i.e. use one GPU per task) + #SBATCH --nodes=1 # request 1 node + #SBATCH --time=00:10:00 # runs for 10 minutes + #SBATCH -c 7 # how many cores per task allocated + #SBATCH -o HLR_<name_your_script>.out # save output message under HLR_${SLURMJOBID}.out + #SBATCH -e HLR_<name_your_script>.err # save error messages under HLR_${SLURMJOBID}.err + + if [ "$SLURM_JOB_PARTITION" == "ml" ]; then + module load modenv/ml + module load TensorFlow/2.0.0-PythonAnaconda-3.7 + else + module load modenv/scs5 + module load TensorFlow/2.0.0-fosscuda-2019b-Python-3.7.4 + module load Pillow/6.2.1-GCCcore-8.3.0 # Optional + module load h5py/2.10.0-fosscuda-2019b-Python-3.7.4 # Optional + fi + + python machine_learning_example.py + + ## when finished writing, submit with: sbatch <script_name> + +Output results and errors file can be seen in the same folder in the +corresponding files after the end of the job. Part of the example +output: + + 1600/10000 [===>..........................] - ETA: 0s + 3168/10000 [========>.....................] - ETA: 0s + 4736/10000 [=============>................] - ETA: 0s + 6304/10000 [=================>............] - ETA: 0s + 7872/10000 [======================>.......] - ETA: 0s + 9440/10000 [===========================>..] - ETA: 0s + 10000/10000 [==============================] - 0s 38us/step + +## TensorFlow 2 + +[TensorFlow +2.0](https://blog.tensorflow.org/2019/09/tensorflow-20-is-now-available.html) +is a significant milestone for TensorFlow and the community. There are +multiple important changes for users. TensorFlow 2.0 removes redundant +APIs, makes APIs more consistent (Unified RNNs, Unified Optimizers), and +better integrates with the Python runtime with Eager execution. Also, +TensorFlow 2.0 offers many performance improvements on GPUs. + +There are a number of TensorFlow 2 modules for both ml and scs5 modenvs +on Taurus. Please check\<a href="SoftwareModulesList" target="\_blank"> +the software modules list\</a> for the information about available +modules or use + + module spider TensorFlow + +%RED%Note:<span class="twiki-macro ENDCOLOR"></span> Tensorflow 2 will +be loaded by default when loading the Tensorflow module without +specifying the version. + +\<span style="font-size: 1em;">TensorFlow 2.0 includes many API changes, +such as reordering arguments, renaming symbols, and changing default +values for parameters. Thus in some cases, it makes code written for the +TensorFlow 1 not compatible with TensorFlow 2. However, If you are using +the high-level APIs (tf.keras) there may be little or no action you need +to take to make your code fully TensorFlow 2.0 \<a +href="<https://www.tensorflow.org/guide/migrate>" +target="\_blank">compatible\</a>. It is still possible to run 1.X code, +unmodified ( [except for +contrib](https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md)), +in TensorFlow 2.0:\</span> + + import tensorflow.compat.v1 as tf + tf.disable_v2_behavior() #instead of "import tensorflow as tf" + +To make the transition to TF 2.0 as seamless as possible, the TensorFlow +team has created the +[`tf_upgrade_v2`](https://www.tensorflow.org/guide/upgrade) utility to +help transition legacy code to the new API. + +## FAQ: + +Q: Which module environment should I use? modenv/ml, modenv/scs5, +modenv/hiera + +A: On the ml partition use modenv/ml, on rome and gpu3 use modenv/hiera, +else stay with the default of modenv/scs5. + +Q: How to change the module environment and know more about modules? + +A: +[https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/RuntimeEnvironment#Modules](RuntimeEnvironment#Modules) diff --git a/twiki2md/root/HPCDA/TensorFlowContainerOnHPCDA.md b/twiki2md/root/HPCDA/TensorFlowContainerOnHPCDA.md new file mode 100644 index 000000000..b1cf0b39a --- /dev/null +++ b/twiki2md/root/HPCDA/TensorFlowContainerOnHPCDA.md @@ -0,0 +1,85 @@ +# Container on HPC-DA (TensorFlow, PyTorch) + +<span class="twiki-macro RED"></span> **Note: This page is under +construction** <span class="twiki-macro ENDCOLOR"></span> + +\<span style="font-size: 1em;">A container is a standard unit of +software that packages up code and all its dependencies so the +application runs quickly and reliably from one computing environment to +another.\</span> + +**Prerequisites:** To work with Tensorflow, you need \<a href="Login" +target="\_blank">access\</a> for the Taurus system and basic knowledge +about containers, Linux systems. + +**Aim** of this page is to introduce users on how to use Machine +Learning Frameworks such as TensorFlow or PyTorch on the \<a +href="HPCDA" target="\_self">HPC-DA\</a> system - part of the TU Dresden +HPC system. + +Using a container is one of the options to use Machine learning +workflows on Taurus. Using containers gives you more flexibility working +with modules and software but at the same time required more effort. + +\<span style="font-size: 1em;">On Taurus \</span>\<a +href="<https://sylabs.io/>" target="\_blank">Singularity\</a>\<span +style="font-size: 1em;"> used as a standard container solution. +Singularity enables users to have full control of their environment. +Singularity containers can be used to package entire scientific +workflows, software and libraries, and even data. This means that +\</span>**you dont have to ask an HPC support to install anything for +you - you can put it in a Singularity container and run!**\<span +style="font-size: 1em;">As opposed to Docker (the most famous container +solution), Singularity is much more suited to being used in an HPC +environment and more efficient in many cases. Docker containers also can +easily be used in Singularity.\</span> + +Future information is relevant for the HPC-DA system (ML partition) +based on Power9 architecture. + +In some cases using Singularity requires a Linux machine with root +privileges, the same architecture and a compatible kernel. For many +reasons, users on Taurus cannot be granted root permissions. A solution +is a Virtual Machine (VM) on the ml partition which allows users to gain +root permissions in an isolated environment. There are two main options +on how to work with VM on Taurus: + +1\. [VM tools](VMTools). Automative algorithms for using virtual +machines; + +2\. [Manual method](Cloud). It required more operations but gives you +more flexibility and reliability. + +Short algorithm to run the virtual machine manually: + + srun -p ml -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash<br />cat ~/.cloud_$SLURM_JOB_ID #Example output: ssh root@192.168.0.1<br />ssh root@192.168.0.1 #Copy and paste output from the previous command <br />./mount_host_data.sh + +with VMtools: + +VMtools contains two main programs: +**\<span>buildSingularityImage\</span>** and +**\<span>startInVM.\</span>** + +Main options on how to create a container on ML nodes: + +1\. Create a container from the definition + +1.1 Create a Singularity definition from the Dockerfile. + +\<span style="font-size: 1em;">2. Importing container from the \</span> +[DockerHub](https://hub.docker.com/search?q=ppc64le&type=image&page=1)\<span +style="font-size: 1em;"> or \</span> +[SingularityHub](https://singularity-hub.org/) + +Two main sources for the Tensorflow containers for the Power9 +architecture: + +<https://hub.docker.com/r/ibmcom/tensorflow-ppc64le> + +<https://hub.docker.com/r/ibmcom/powerai> + +Pytorch: + +<https://hub.docker.com/r/ibmcom/powerai> + +-- Main.AndreiPolitov - 2020-01-03 diff --git a/twiki2md/root/HPCDA/TensorFlowOnJupyterNotebook.md b/twiki2md/root/HPCDA/TensorFlowOnJupyterNotebook.md new file mode 100644 index 000000000..7beed6fb7 --- /dev/null +++ b/twiki2md/root/HPCDA/TensorFlowOnJupyterNotebook.md @@ -0,0 +1,286 @@ +# TENSORFLOW ON JUPYTER NOTEBOOK + +%RED%Note: This page is under construction<span +class="twiki-macro ENDCOLOR"></span> + + + +Disclaimer: This page dedicates a specific question. For more general +questions please check the JupyterHub webpage. + +The Jupyter Notebook is an open-source web application that allows you +to create documents that contain live code, equations, visualizations, +and narrative text. \<span style="font-size: 1em;">Jupyter notebook +allows working with TensorFlow on Taurus with GUI (graphic user +interface) and the opportunity to see intermediate results step by step +of your work. This can be useful for users who dont have huge experience +with HPC or Linux. \</span> + +**Prerequisites:** To work with Tensorflow and jupyter notebook you need +\<a href="Login" target="\_blank">access\</a> for the Taurus system and +basic knowledge about Python, SLURM system and the Jupyter notebook. + +\<span style="font-size: 1em;"> **This page aims** to introduce users on +how to start working with TensorFlow on the \</span>\<a href="HPCDA" +target="\_self">HPC-DA\</a>\<span style="font-size: 1em;"> system - part +of the TU Dresden HPC system with a graphical interface.\</span> + +## Get started with Jupyter notebook + +Jupyter notebooks are a great way for interactive computing in your web +browser. Jupyter allows working with data cleaning and transformation, +numerical simulation, statistical modelling, data visualization and of +course with machine learning. + +\<span style="font-size: 1em;">There are two general options on how to +work Jupyter notebooks using HPC. \</span> + +- \<span style="font-size: 1em;">There is \</span>**\<a + href="JupyterHub" target="\_self">jupyterhub\</a>** on Taurus, where + you can simply run your Jupyter notebook on HPC nodes. JupyterHub is + available here: \<a + href="<https://taurus.hrsk.tu-dresden.de/jupyter>" + target="\_top"><https://taurus.hrsk.tu-dresden.de/jupyter>\</a> +- For more specific cases you can run a manually created **remote + jupyter server.** \<span style="font-size: 1em;"> You can find the + manual server setup \</span>\<a href="DeepLearning" + target="\_blank">here.\</a> + +\<span style="font-size: 13px;">Keep in mind that with Jupyterhub you +can't work with some special instruments. However general data analytics +tools are available. Still and all, the simplest option for beginners is +using JupyterHub.\</span> + +## Virtual environment + +\<span style="font-size: 1em;">For working with TensorFlow and python +packages using virtual environments (kernels) is necessary.\</span> + +Interactive code interpreters that are used by Jupyter Notebooks are +called kernels.\<br />Creating and using your kernel (environment) has +the benefit that you can install your preferred python packages and use +them in your notebooks. + +A virtual environment is a cooperatively isolated runtime environment +that allows Python users and applications to install and upgrade Python +distribution packages without interfering with the behaviour of other +Python applications running on the same system. So the [Virtual +environment](https://docs.python.org/3/glossary.html#term-virtual-environment) +is a self-contained directory tree that contains a Python installation +for a particular version of Python, plus several additional packages. At +its core, the main purpose of Python virtual environments is to create +an isolated environment for Python projects. \<span style="font-size: +1em;">Python virtual environment is the main method to work with Deep +Learning software as TensorFlow on the \</span>\<a href="HPCDA" +target="\_self">HPC-DA\</a>\<span style="font-size: 1em;"> +system.\</span> + +### Conda and Virtualenv + +There are two methods of how to work with virtual environments on +Taurus.\<br />**Vitualenv (venv)**\<span style="font-size: 1em;"> is a +standard Python tool to create isolated Python environments. We +recommend using venv to work with Tensorflow and Pytorch on Taurus. It +has been integrated into the standard library under the \</span>\<a +href="<https://docs.python.org/3/library/venv.html>" +target="\_blank">venv module\</a>\<span style="font-size: 1em;">. +However, if you have reasons (previously created environments etc) you +could easily use conda. The conda is the second way to use a virtual +environment on the Taurus. \</span>\<a +href="<https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>" +target="\_blank">Conda\</a>\<span style="font-size: 1em;"> is an +open-source package management system and environment management system +from the Anaconda.\</span> + +**Note:** Keep in mind that you **can not** use conda for working with +the virtual environments previously created with Vitualenv tool and vice +versa! + +This example shows how to start working with environments and prepare +environment (kernel) for working with Jupyter server + + srun -p ml --gres=gpu:1 -n 1 --pty --mem-per-cpu=8000 bash #Job submission in ml nodes with 1 gpu on 1 node with 8000 mb. + + module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml + + mkdir python-virtual-environments #create folder for your environments + cd python-virtual-environments #go to folder + module load TensorFlow #load TensorFlow module. Example output: Module TensorFlow/1.10.0-PythonAnaconda-3.6 and 1 dependency loaded. + which python #check which python are you using + python3 -m venv --system-site-packages env #create virtual environment "env" which inheriting with global site packages + source env/bin/activate #Activate virtual environment "env". Example output: (env) bash-4.2$ + module load TensorFlow #load TensorFlow module in the virtual environment + +The inscription (env) at the beginning of each line represents that now +you are in the virtual environment. + +Now you can check the working capacity of the current environment. + + python #start python + import tensorflow as tf + print(tf.VERSION) #example output: 1.14.0 + +### Install Ipykernel + +Ipykernel is an interactive Python shell and a Jupyter kernel to work +with Python code in Jupyter notebooks. \<span style="font-size: +1em;">The IPython kernel is the Python execution backend for Jupyter. +\</span>\<span style="font-size: 1em;">The Jupyter Notebook +automatically ensures that the IPython kernel is available.\</span> + + (env) bash-4.2$ pip install ipykernel #example output: Collecting ipykernel + ... + #example output: Successfully installed ... ipykernel-5.1.0 ipython-7.5.0 ... + + (env) bash-4.2$ python -m ipykernel install --user --name env --display-name="env" + + #example output: Installed kernelspec my-kernel in .../.local/share/jupyter/kernels/env + [install now additional packages for your notebooks] + +Deactivate the virtual environment + + (env) bash-4.2$ deactivate + +So now you have a virtual environment with included TensorFlow module. +You can use this workflow for your purposes particularly for the simple +running of your jupyter notebook with Tensorflow code. + +## Examples and running the model + +Below are brief explanations examples of Jupyter notebooks with +Tensorflow models which you can run on ml nodes of HPC-DA. Prepared +examples of TensorFlow models give you an understanding of how to work +with jupyterhub and tensorflow models. It can be useful and instructive +to start your acquaintance with Tensorflow and HPC-DA system from these +simple examples. + +You can use a [remote Jupyter server](DeepLearning) or \<a +href="JupyterHub" target="\_blank">Jupyterhub\</a>. For simplicity, we +will recommend using Jupyterhub for our examples. + +JupyterHub is available here: \<a +href="<https://taurus.hrsk.tu-dresden.de/jupyter>" +target="\_top"><https://taurus.hrsk.tu-dresden.de/jupyter>\</a> + +Please check updates and details \<a href="JupyterHub" +target="\_blank">JupyterHub page\</a>. However, the general pipeline can +be briefly explained as follows. + +After logging, you can start a new session and configure it. There are +simple and advanced forms to set up your session. On the simple form, +you have to choose the "IBM Power (ppc64le)" architecture. You can +select the required number of CPUs and GPUs. For the acquaintance with +the system through the examples below the recommended amount of CPUs and +1 GPU will be enough. With the advanced form, you can use the +configuration with 1 GPU and 7 CPUs. To access for all your workspaces +use " / " in the workspace scope. + +You need to download the file with a jupyter notebook that already +contains all you need for the start of the work. Please put the file +into your previously created virtual environment in your working +directory or use the kernel for your notebook. + +\<span style="font-size: 1em;">Note: You could work with simple examples +in your home directory but according to \</span>\<a +href="HPCStorageConcept2019" target="\_blank">New storage +concept\</a>**\<span style="font-size: 1em;"> please use \</span>\<a +href="WorkSpaces" target="\_blank">workspaces\</a>**\<span +style="font-size: 1em;">** for your study and work projects**. For this +reason, you have to use advanced options and put "/" in "Workspace +scope" field.\</span> + +To download the first example (from the list below) into your previously +created virtual environment you could use the following command: + + ws_list + cd <name_of_your_workspace> #go to workspace + + wget https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/TensorFlowOnJupyterNotebook/Mnistmodel.zip + unzip Example_TensorFlow_Automobileset.zip + +\<span style="font-size: 1em;">Also, you could use kernels for all +notebooks, not only for them which placed in your virtual environment. +See the \</span>\<a href="JupyterHub" +target="\_blank">jupyterhub\</a>\<span style="font-size: 1em;"> +page.\</span> + +Examples: + +1\. Simple MNIST model. The MNIST database is a large database of +handwritten digits that is commonly used for \<a +href="<https://en.wikipedia.org/wiki/Training_set>" title="Training +set">t\</a>raining various image processing systems. This model +illustrates using TF-Keras API. \<a +href="<https://www.tensorflow.org/guide/keras>" +target="\_top">Keras\</a> is TensorFlow's high-level API. Tensorflow and +Keras allow us to import and download the MNIST dataset directly from +their API. Recommended parameters for running this model is 1 GPU and 7 +cores (28 thread) + +[https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/TensorFlowOnJupyterNotebook/Mnistmodel.zip](%ATTACHURL%/Mnistmodel.zip) + +#### + +\<span style="color: #222222; font-size: 1.154em;">Running the +model\</span> + +\<span style="font-size: 1em;">Documents are organized with tabs and a +very versatile split-screen feature. On the left side of the screen, you +can open your file. Use 'File-Open from Path' to go to your workspace +(e.g. /scratch/ws/\<username-name_of_your_ws>). You could run each cell +separately step by step and analyze the result of each step. Default +command for running one cell Shift+Enter'. Also, you could run all cells +with the command 'run all cells' how presented on the picture +below\</span> + +\<img alt="Screenshot_from_2019-09-03_15-20-16.png" height="250" +src="%ATTACHURL%/Screenshot_from_2019-09-03_15-20-16.png" +title="Screenshot_from_2019-09-03_15-20-16.png" width="436" /> + +#### + +#### Additional advanced models + +1\. A simple regression model uses [Automobile +dataset](https://archive.ics.uci.edu/ml/datasets/Automobile). In a +regression problem, we aim to predict the output of a continuous value, +in this case, we try to predict fuel efficiency. This is the simple +model created to present how to work with a jupyter notebook for the +TensorFlow models. Recommended parameters for running this model is 1 +GPU and 7 cores (28 thread) + +[https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/TensorFlowOnJupyterNotebook/Example_TensorFlow_Automobileset.zip](%ATTACHURL%/Example_TensorFlow_Automobileset.zip) + +2\. The regression model uses the +[dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data) +with meteorological data from the Beijing airport and the US embassy. +The data set contains almost 50 thousand on instances and therefore +needs more computational effort. Recommended parameters for running this +model is 1 GPU and 7 cores (28 threads) + +[https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/TensorFlowOnJupyterNotebook/Example_TensorFlow_Meteo_airport.zip](%ATTACHURL%/Example_TensorFlow_Meteo_airport.zip) + +**Note**: All examples created only for study purposes. The main aim is +to introduce users of the HPC-DA system of TU-Dresden with TensorFlow +and Jupyter notebook. Examples do not pretend to completeness or +science's significance. Feel free to improve the models and use them for +your study. + +-- Main.AndreiPolitov - 2019-08-27 + +- [Mnistmodel.zip](%ATTACHURL%/Mnistmodel.zip): Mnistmodel.zip +- [Example_TensorFlow_Automobileset.zip](%ATTACHURL%/Example_TensorFlow_Automobileset.zip): + Example_TensorFlow_Automobileset.zip +- [Example_TensorFlow_Meteo_airport.zip](%ATTACHURL%/Example_TensorFlow_Meteo_airport.zip): + Example_TensorFlow_Meteo_airport.zip +- [Example_TensorFlow_3D_road_network.zip](%ATTACHURL%/Example_TensorFlow_3D_road_network.zip): + Example_TensorFlow_3D_road_network.zip \<div style="visibility: + visible; left: -318px; top: 2579px; opacity: 1;"> \</div> \<div + style="visibility: visible; left: 73px; top: 3248px; opacity: 1;"> + \</div> + +\<div style="visibility: visible; left: 46px; top: 3464px; opacity: 1;"> +\</div> \<div id="gtx-anchor" style="position: absolute; visibility: +hidden; left: 236.9px; top: 1861.9px; width: 65.8px; height: 14.4px;"> +\</div> \<div style="visibility: visible; left: -289px; top: 1886px; +opacity: 1;"> \</div> diff --git a/twiki2md/root/Hardware/HardwareAltix.md b/twiki2md/root/Hardware/HardwareAltix.md new file mode 100644 index 000000000..7e163a400 --- /dev/null +++ b/twiki2md/root/Hardware/HardwareAltix.md @@ -0,0 +1,91 @@ + + +# HPC Component SGI Altix + +The SGI Altix 4700 is a shared memory system with dual core Intel +Itanium 2 CPUs (Montecito) operated by the Linux operating system SuSE +SLES 10 with a 2.6 kernel. Currently, the following Altix partitions are +installed at ZIH: + +\|\*Name \*\|\*Total Cores \*\|**Compute Cores**\|**Memory per Core**\| +\| Mars \|384 \|348 \|1 GB\| \|Jupiter \|512 \|506 \|4 GB\| \|Saturn +\|512 \|506 \|4 GB\| \|Uranus \|512 \|506 \|4 GB\| \|Neptun \|128 \|128 +\|1 GB\| + +\<P> The jobs for these partitions (except \<TT>Neptun\</TT>) are +scheduled by the [Platform LSF](Platform LSF) batch system running on +`mars.hrsk.tu-dresden.de`. The actual placement of a submitted job may +depend on factors like memory size, number of processors, time limit. + +## Filesystems All partitions share the same CXFS filesystems `/work` and `/fastfs`. ... [more information](FileSystems) + +## ccNuma Architecture + +The SGI Altix has a ccNUMA architecture, which stands for Cache Coherent +Non-Uniform Memory Access. It can be considered as a SM-MIMD (*shared +memory - multiple instruction multiple data*) machine. The SGI ccNuma +system has the following properties: + +- Memory is physically distributed but logically shared +- Memory is kept coherent automatically by hardware. +- Coherent memory: memory is always valid (caches hold copies) +- Granularity is L3 cacheline (128 B) +- Bandwidth of NumaLink4 is 6.4 GB/s + +The ccNuma is a compromise between a distributed memory system and a +flat symmetric multi processing machine (SMP). Altough the memory is +shared, the access properties are not the same. + +## Compute Module + +The basic compute module of an Altix system is shown below. + +| | +|---------------------------------------------------------------------------------------------------------------------------------------------------------------| +| \<img src="%ATTACHURLPATH%/altix_brick_web.png" alt="altix_brick_web.png" width='312' height='192' />\<CAPTION ALIGN="BOTTOM">Altix compute blade \</CAPTION> | + +It consists of one dual core Intel Itanium 2 "Montecito" processor, the +local memory of 4 GB (2 GB on `Mars`), and the communication component, +the so-called SHUB. All resources are shared by both cores. They have a +common front side bus, so that accumulated memory bandwidth for both is +not higher than for just one core. + +The SHUB connects local and remote ressources. Via the SHUB and NUMAlink +all CPUs can access remote memory in the whole system. Naturally, the +fastest access provides local memory. There are some hints and commands +that may help you to get optimal memory allocation and process placement +). Four of these blades are grouped together with a NUMA router in a +compute brick. All bricks are connected with NUMAlink4 in a +"fat-tree"-topology. + +| | +|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| \<img src="%ATTACHURLPATH%/memory_access_web.png" alt="memory_access_web.png" width='450' />\<CAPTION align="bottom">Remote memory access via SHUBs and NUMAlink \</CAPTION> | + +## CPU + +The current SGI Altix is based on the dual core Intel Itanium 2 +processor (codename "Montecito"). One core has the following basic +properties: + +| | | +|-------------------------------------|----------------------------| +| clock rate | 1.6 GHz | +| integer units | 6 | +| floating point units (multiply-add) | 2 | +| peak performance | 6.4 GFLOPS | +| L1 cache | 2 x 16 kB, 1 clock latency | +| L2 cache | 256 kB, 5 clock latency | +| L3 cache | 9 MB, 12 clock latency | +| front side bus | 128 bit x 200 MHz | + +The theoretical peak performance of all Altix partitions is hence about +13.1 TFLOPS. + +The processor has hardware support for efficient software pipelining. +For many scientific applications it provides a high sustained +performance exceeding the performance of RISC CPUs with similar peak +performance. On the down side is the fact that the compiler has to +explicitely discover and exploit the parallelism in the application. + +<span class="twiki-macro COMMENT"></span> diff --git a/twiki2md/root/Hardware/HardwareAtlas.md b/twiki2md/root/Hardware/HardwareAtlas.md new file mode 100644 index 000000000..ec8061e72 --- /dev/null +++ b/twiki2md/root/Hardware/HardwareAtlas.md @@ -0,0 +1,47 @@ + + +# MEGWARE PC-Farm Atlas + +The PC farm `Atlas` is a heterogenous cluster based on multicore chips +AMD Opteron 6274 ("Bulldozer"). The nodes are operated by the Linux +operating system SuSE SLES 11 with a 2.6 kernel. Currently, the +following hardware is installed: + +\|CPUs \|AMD Opteron 6274 \| \|number of cores \|5120 \| \|th. peak +performance\| 45 TFlops\| \|compute nodes \| 4-way nodes *Saxonid* with +64 cores\| \|nodes with 64 GB RAM \| 48 \| \|nodes with 128 GB RAM \| 12 +\| \|nodes with 512 GB RAM \| 8 \| + +\<P> + +Mars and Deimos users: Please read the [migration +hints](MigrateToAtlas). + +All nodes share the HOME and `/fastfs/` [file system](FileSystems) with +our other HPC systems. Each node has 180 GB local disk space for scratch +mounted on `/tmp` . The jobs for the compute nodes are scheduled by the +[Platform LSF](Platform LSF) batch system from the login nodes +`atlas.hrsk.tu-dresden.de` . + +A QDR Infiniband interconnect provides the communication and I/O +infrastructure for low latency / high throughput data traffic. + +Users with a login on the [SGI Altix](HardwareAltix) can access their +home directory via NFS below the mount point `/hpc_work`. + +## CPU AMD Opteron 6274 + +\| Clock rate \| 2.2 GHz\| \| cores \| 16 \| \| L1 data cache \| 16 KB +per core \| \| L1 instruction cache \| 64 KB shared in a *module* (i.e. +2 cores) \| \| L2 cache \| 2 MB per module\| \| L3 cache \| 12 MB total, +6 MB shared between 4 modules = 8 cores\| \| FP units \| 1 per module +(supports fused multiply-add)\| \| th. peak performance\| 8.8 GFlops per +core (w/o turbo) \| + +The CPU belongs to the x86_64 family. Since it is fully capable of +running x86-code, one should compare the performances of the 32 and 64 +bit versions of the same code. + +For more architectural details, see the [AMD Bulldozer block +diagram](http://upload.wikimedia.org/wikipedia/commons/e/ec/AMD_Bulldozer_block_diagram_%288_core_CPU%29.PNG) +and [topology of Atlas compute nodes](%ATTACHURL%/Atlas_Knoten.pdf). diff --git a/twiki2md/root/Hardware/HardwareDeimos.md b/twiki2md/root/Hardware/HardwareDeimos.md new file mode 100644 index 000000000..643fab9f4 --- /dev/null +++ b/twiki2md/root/Hardware/HardwareDeimos.md @@ -0,0 +1,42 @@ + + +# Linux Networx PC-Farm Deimos + +The PC farm `Deimos` is a heterogenous cluster based on dual core AMD +Opteron CPUs. The nodes are operated by the Linux operating system SuSE +SLES 10 with a 2.6 kernel. Currently, the following hardware is +installed: + +\|CPUs \|AMD Opteron X85 dual core \| \|RAM per core \|2 GB \| \|Number +of cores \|2584 \| \|total peak performance \|13.4 TFLOPS \| \|single +chip nodes \|384 \| \|dual nodes \|230 \| \|quad nodes \|88 \| \|quad +nodes (32 GB RAM) \|24 \| + +\<P> All nodes share a 68 TB [file +system](RuntimeEnvironment#Filesystem) on DDN hardware. Each node has +per core 40 GB local disk space for scratch mounted on `/tmp` . The jobs +for the compute nodes are scheduled by the [Platform LSF](Platform LSF) +batch system from the login nodes `deimos.hrsk.tu-dresden.de` . + +Two separate Infiniband networks (10 Gb/s) with low cascading switches +provide the communication and I/O infrastructure for low latency / high +throughput data traffic. An additional gigabit Ethernet network is used +for control and service purposes. + +Users with a login on the [SGI Altix](HardwareAltix) can access their +home directory via NFS below the mount point `/hpc_work`. + +## CPU + +The cluster is based on dual-core AMD Opteron X85 processor. One core +has the following basic properties: + +\|clock rate \|2.6 GHz \| \|floating point units \|2 \| \|peak +performance \|5.2 GFLOPS \| \|L1 cache \|2x64 kB \| \|L2 cache \|1 MB \| +\|memory bus \|128 bit x 200 MHz \| + +The CPU belongs to the x86_64 family. Since it is fully capable of +running x86-code, one should compare the performances of the 32 and 64 +bit versions of the same code. + +<span class="twiki-macro COMMENT"></span> diff --git a/twiki2md/root/Hardware/HardwarePhobos.md b/twiki2md/root/Hardware/HardwarePhobos.md new file mode 100644 index 000000000..3221dc590 --- /dev/null +++ b/twiki2md/root/Hardware/HardwarePhobos.md @@ -0,0 +1,38 @@ + + +# Linux Networx PC-Cluster Phobos + +------- **Phobos was shut down on 1 November 2010.** ------- + +`Phobos` is a cluster based on AMD Opteron CPUs. The nodes are operated +by the Linux operating system SuSE SLES 9 with a 2.6 kernel. Currently, +the following hardware is installed: + +\|CPUs \|AMD Opteron 248 (single core) \| \|total peak performance +\|563.2 GFLOPS \| \|Number of nodes \|64 compute + 1 master \| \|CPUs +per node \|2 \| \|RAM per node \|4 GB \| + +\<P> All nodes share a 4.4 TB SAN [file system](FileSystems). Each node +has additional local disk space mounted on `/scratch`. The jobs for the +compute nodes are scheduled by a [Platform LSF](Platform LSF) batch +system running on the login node `phobos.hrsk.tu-dresden.de`. + +Two separate Infiniband networks (10 Gb/s) with low cascading switches +provide the infrastructure for low latency / high throughput data +traffic. An additional GB/Ethernetwork is used for control and service +purposes. + +## CPU + +`Phobos` is based on single-core AMD Opteron 248 processor. It has the +following basic properties: + +\|clock rate \|2.2 GHz \| \|floating point units \|2 \| \|peak +performance \|4.4 GFLOPS \| \|L1 cache \|2x64 kB \| \|L2 cache \|1 MB \| +\|memory bus \|128 bit x 200 MHz \| + +The CPU belongs to the x86_64 family. Although it is fully capable of +running x86-code, one should always try to use 64-bit programs due to +their potentially higher performance. + +<span class="twiki-macro COMMENT"></span> diff --git a/twiki2md/root/Hardware/HardwareTitan.md b/twiki2md/root/Hardware/HardwareTitan.md new file mode 100644 index 000000000..4388cdbc8 --- /dev/null +++ b/twiki2md/root/Hardware/HardwareTitan.md @@ -0,0 +1,53 @@ + + +# Windows HPC Server 2008 - Cluster Titan + +The Dell Blade Server `Titan` is a homogenous cluster based on quad core +Intel Xeon CPUs. The cluster consists of one management and 8 compute +nodes, which are connected via a gigabit Ethernet network. The +connection to the cluster is only available over a terminal server +(titants1.hpcms.zih.tu-dresden.de, 141.30.63.227) via the remote desktop +protocol. + +The nodes are operated by the Windows operating system Microsoft HPC +Server 2008. Currently, the following hardware is installed: + +\* Compute Node: \|CPUs \|Intel Xeon E5440 Quad-Core \| \|RAM per core +\|2 GB \| \|Number of cores \|64 \| \|total peak performance \|724,48 +GFLOPS \| + +\* Management Node: + +\|CPUs \|Intel Xeon E5410 Quad-Core \| \|RAM per core \|2 GB \| \|Number +of cores \|8 \| + +\<P> The management node shares 1.2 TB disk space via NTFS over all +nodes. Each node has a local disk of 120 GB. The jobs for the compute +nodes are scheduled by the Microsoft scheduler, which is a part of the +Microsoft HPC Pack, from the management node. The job submission can be +done via the graphical user interface Microsoft HPC Job Manager. + +Two separate gigabit Ethernet networks are available for communication +and I/O infrastructure. + +## CPU + +The cluster is based on quad core Intel Xeon E5440 processor. One core +has the following basic properties: + +\|clock rate \|2.83 GHz \| \|floating point units \|2 \| \|peak +performance \|11.26 GFLOPS \| \|L1 cache \|32 KB I + 32KB on chip per +core \| \|L2 cache \|12 MB I+D on chip per chip, 6MB shared/ 2 cores \| +\|FSB \|1333 MHz \| + +The management node is based on a quad core Intel Xeon E5410 processor. +One core has the following basic properties: + +\|clock rate \|2.33 GHz \| \|floating point units \|2 \| \|peak +performance \|9.32 GFLOPS \| \|L1 cache \|32 KB I + 32KB on chip per +core \| \|L2 cache \|12 MB I+D on chip per chip, 6MB shared/ 2 cores \| +\|FSB \|1333 MHz \| + +The CPU belongs to the x86_64 family. Since it is fully capable of +running x86-code, one should compare the performances of the 32 and 64 +bit versions of the same code. diff --git a/twiki2md/root/Hardware/HardwareVenus.md b/twiki2md/root/Hardware/HardwareVenus.md new file mode 100644 index 000000000..00c6046dc --- /dev/null +++ b/twiki2md/root/Hardware/HardwareVenus.md @@ -0,0 +1,22 @@ +# SGI UV2000 (venus) + +The SGI UV2000 is a shared memory system based on Intel Sandy Bridge +processors. It is operated by the Linux operating system SLES 11 SP 3 +with a kernel version 3.x. + +| | | +|----------------------------|-------| +| Number of CPU sockets | 64 | +| Physical cores per sockets | 8 | +| Total number of cores | 512 | +| Total memory | 8 TiB | + +From our experience, most parallel applications benefit from using the +additional hardware hyperthreads. + +## Filesystems + +Venus uses the same HOME file system as all our other HPC installations. +For computations, please use `/scratch`. + +... [More information on file systems](FileSystems) diff --git a/twiki2md/root/HardwareTaurus/SDFlex.md b/twiki2md/root/HardwareTaurus/SDFlex.md new file mode 100644 index 000000000..c5db16537 --- /dev/null +++ b/twiki2md/root/HardwareTaurus/SDFlex.md @@ -0,0 +1,42 @@ +# Large shared-memory node - HPE Superdome Flex + +- Hostname: taurussmp8 +- Access to all shared file systems +- SLURM partition `julia` +- 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores) +- 48 TB RAM (usable: 47 TB - one TB is used for cache coherence + protocols) +- 370 TB of fast NVME storage available at `/nvme/<projectname>` + +### Local temporary NVMe storage + +There are 370 TB of NVMe devices installed. For immediate access for all +projects, a volume of 87 TB of fast NVMe storage is available at +/nvme/1/\<projectname>. For testing, we have set a quota of 100 GB per +project on this NVMe storage.This is + +With a more detailled proposal on how this unique system (large shared +memory + NVMe storage) can speed up their computations, a project's +quota can be increased or dedicated volumes of up to the full capacity +can be set up. + +## Hints for usage + +- granularity should be a socket (28 cores) +- can be used for OpenMP applications with large memory demands +- To use OpenMPI it is necessary to export the following environment + variables, so that OpenMPI uses shared memory instead of Infiniband + for message transport. \<pre>export OMPI_MCA_pml=ob1;   export + OMPI_MCA_mtl=^mxm\</pre> +- Use `I_MPI_FABRICS=shm` so that Intel MPI doesn't even consider + using InfiniBand devices itself, but only shared-memory instead + +## Open for Testing + +- At the moment we have set a quota of 100 GB per project on this NVMe + storage. As soon as the first projects come up with proposals how + this unique system (large shared memory + NVMe storage) can speed up + their computations, we will gladly increase this limit, for selected + projects. +- Test users might have to clean-up their /nvme storage within 4 weeks + to make room for large projects. diff --git a/twiki2md/root/HardwareTriton.md b/twiki2md/root/HardwareTriton.md new file mode 100644 index 000000000..ce88271b9 --- /dev/null +++ b/twiki2md/root/HardwareTriton.md @@ -0,0 +1,48 @@ +# Hardware + +## IBM-iDataPlex + +is a cluster based on quadcore Intel Xeon CPUs. The nodes are operated +by the Linux operating system SuSE SLES 11. Currently, the following +hardware is installed: + +\|CPUs \|Intel quadcore E5530 \| \|RAM per core \|6 GB \| \|Number of +cores \|512 \| \|total peak performance \|4.9 TFLOPS \| \|dual nodes +\|64 \| + +The jobs for the compute nodes are scheduled by the +[LoadLeveler](LoadLeveler) batch system from the login node +triton.hrsk.tu-dresden.de . + +## CPU + +The cluster is based on dual-core Intel Xeon E5530 processor. One core +has the following basic properties: + +\|clock rate \|2.4 GHz \| \|Cores \|4 \| \|Threads \|8 \| \|Intel Smart +Cache \|8MB \| \|Intel QPI Speed \|5.86 GT/s \| \|Max TDP \|80 W \| + +# Software + +| Compilers | Version | +|:--------------------------------|---------------:| +| Intel (C, C++, Fortran) | 11.1.069 | +| GNU | 4.3.2 | +| **Libraries** | | +| MKL | 11.0.069 | +| IPP | 11.0.069 | +| TBB | 11.0.069 | +| FFTW | 2.1.5, 3.2.2 | +| hypre | 2.6.0b | +| **Applications** | | +| Ansys | 12.1 | +| Comsol | 3.4, 3.5, 3.5a | +| CP2K | 2010may | +| Gaussian | g09 | +| GnuPlot | 4.4.0 | +| Gromacs | 4.0.7 | +| LAMMPS | 2010may | +| NAMD | 2.7b1 | +| QuantumEspresso | 4.1.3 | +| **Tools** | | +| [Totalview Debugger](Debuggers) | 8.8 | diff --git a/twiki2md/root/Introduction.md b/twiki2md/root/Introduction.md new file mode 100644 index 000000000..ae6de5f86 --- /dev/null +++ b/twiki2md/root/Introduction.md @@ -0,0 +1,18 @@ +# Introduction + +The Center for Information Services and High Performance Computing (ZIH) +is a central scientific unit of TU Dresden with a strong competence in +parallel computing and software tools. We have a strong commitment to +support *real users*, collaborating to create new algorithms, +applications and to tackle the problems that need to be solved to create +new scientific insight with computational methods. Our compute complex +"Hochleistungs-Rechner-/-Speicher-Komplex" (HRSK) is focused on +data-intensive computing. High scalability, big memory and fast +I/O-systems are the outstanding properties of this project, aside from +the significant performance increase. The infrastructure is provided not +only to TU Dresden but to all universities and public research +institutes in Saxony. + +\<img alt="" +src="<http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/hpc/bilder/hpc_hardware07>" +title="HRSK overview" /> diff --git a/twiki2md/root/JupyterHub/JupyterHubForTeaching.md b/twiki2md/root/JupyterHub/JupyterHubForTeaching.md new file mode 100644 index 000000000..51cf46276 --- /dev/null +++ b/twiki2md/root/JupyterHub/JupyterHubForTeaching.md @@ -0,0 +1,161 @@ +# JupyterHub for Teaching + +On this page we want to introduce to you some useful features if you +want to use JupyterHub for teaching. + +<span class="twiki-macro RED"></span> **PLEASE UNDERSTAND:** <span +class="twiki-macro ENDCOLOR"></span> JupyterHub uses compute resources +from the HPC system Taurus. Please be aware of the following notes: + +- The HPC system operates at a lower availability level than your + usual Enterprise Cloud VM. There can always be downtimes, e.g. of + the file systems or the batch system. +- Scheduled downtimes are announced by email. Please plan your courses + accordingly. +- Access to HPC resources is handled through projects. See your course + as a project. Projects need to be registered beforehand (more info + on the page [Access](Compendium.Access)). +- Don't forget to [add your + users](ProjectManagement#manage_project_members_40dis_45_47enable_41) + (eg. students or tutors) to your project. +- It might be a good idea to [request a + reservation](Slurm#Reservations) of part of the compute resources + for your project/course to avoid unnecessary waiting times in the + batch system queue. + + + +## Clone a repository with a link + +This feature bases on +[nbgitpuller](https://github.com/jupyterhub/nbgitpuller) ( +[documentation](https://jupyterhub.github.io/nbgitpuller/)) + +This extension for jupyter notebooks can clone every public git +repository into the users work directory. It's offering a quick way to +distribute notebooks and other material to your students. + +\<a href="%ATTACHURL%/gitpull_progress.png">\<img alt="Git pull progress +screen" width="475" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHubForTeaching/gitpull_progress.png>" +style="border: 1px solid #888;" title="Git pull progress screen"/>\</a> + +A sharable link for this feature looks like this: + +<https://taurus.hrsk.tu-dresden.de/jupyter/hub/user-redirect/git-pull?repo=https://github.com/jdwittenauer/ipython-notebooks&urlpath=/tree/ipython-notebooks/notebooks/language/Intro.ipynb> +\<a href="%ATTACHURL%/url-git-pull.png?t=1604588695">\<img alt="URL with +git-pull parameters" width="100%" style="max-width: 2717px" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHubForTeaching/url-git-pull.png?t=1604588695>" +style="border: 1px solid #888;" title="URL with git-pull +parameters"/>\</a> + +\<span +style`"font-size: 1em;">This example would clone the repository </span> [[https://github.com/jdwittenauer/ipython-notebooks][github.com/jdwittenauer/ipython-notebooks]]<span style="font-size: 1em;"> and afterwards open the </span> =Intro.ipynb` +\<span style="font-size: 1em;"> notebook in the given path.\</span> + +The following parameters are available: + +<table> +<thead> +<tr class="header"> +<th style="text-align: left;">parameter</th> +<th style="text-align: left;">info</th> +</tr> +</thead> +<tbody> +<tr class="odd"> +<td style="text-align: left;">repo</td> +<td style="text-align: left;">path to git repository</td> +</tr> +<tr class="even"> +<td style="text-align: left;">branch</td> +<td style="text-align: left;">branch in the repository to pull from<br /> +default: <code>master</code></td> +</tr> +<tr class="odd"> +<td style="text-align: left;">urlpath</td> +<td style="text-align: left;">URL to redirect the user to a certain file<br /> +<a href="https://jupyterhub.github.io/nbgitpuller/topic/url-options.html#urlpath">more info</a></td> +</tr> +<tr class="even"> +<td style="text-align: left;">depth</td> +<td style="text-align: left;">clone only a certain amount of latest commits<br /> +not recommended</td> +</tr> +</tbody> +</table> + +This [link +generator](https://jupyterhub.github.io/nbgitpuller/link?hub=https://taurus.hrsk.tu-dresden.de/jupyter/) +might help creating those links + +## Spawner options passthrough with URL params + +The spawn form now offers a quick start mode by passing url +parameters. +An example: The following link would create a jupyter notebook session +on the `interactive` partition with the `test` environment being loaded: + + https://taurus.hrsk.tu-dresden.de/jupyter/hub/spawn#/~(partition~'interactive~environment~'test) + +/ \<a href="%ATTACHURL%/url-quick-start.png?t=1604586059">\<img alt="URL +with quickstart parameters" width="100%" style="max-width: 800px" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHubForTeaching/url-quick-start.png?t=1604586059>" +style="border: 1px solid #888;" title="URL with quickstart +parameters"/>\</a> + +\<span style="font-size: 1em;">Every parameter of the advanced form can +be set with this parameter. If the parameter is not mentioned, the +default value will be loaded.\</span> + +| parameter | default value | +|:----------------|:-----------------------------------------| +| partition | default | +| nodes | 1 | +| ntasks | 1 | +| cpuspertask | 1 | +| gres | *empty* (no generic resources) | +| mempercpu | 1000 | +| runtime | 8:00:00 | +| reservation | *empty* (use no reservation) | +| project | *empty* (use default project) | +| modules | *empty* (do not load additional modules) | +| environment | production | +| launch | JupyterLab | +| workspace_scope | *empty* (home directory) | + +You can use the advanced form to generate a url for the settings you +want. The address bar contains the encoded parameters starting with +`#/`. + +### Combination of quickstart and git-pull feature + +You can combine both features in a single link: + + https://taurus.hrsk.tu-dresden.de/jupyter/hub/user-redirect/git-pull?repo=https://github.com/jdwittenauer/ipython-notebooks&urlpath=/tree/ipython-notebooks/notebooks/language/Intro.ipynb#/~(partition~'interactive~environment~'test) + +/ \<a +href="%ATTACHURL%/url-git-pull-and-quick-start.png?t=1604588695">\<img +alt="URL with git-pull and quickstart parameters" width="100%" +style="max-width: 3332px" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHubForTeaching/url-git-pull-and-quick-start.png?t=1604588695>" +style="border: 1px solid #888;" title="URL with git-pull and quickstart +parameters"/>\</a> + +## Open a notebook automatically with a single link + +With the following link you will be redirected to a certain file in your +home directory. The file needs to exist, otherwise a 404 error will be +thrown. + +<https://taurus.hrsk.tu-dresden.de/jupyter/user-redirect/notebooks/demo.ipynb> +\<a href="%ATTACHURL%/url-user-redirect.png">\<img alt="URL with +git-pull and quickstart parameters" width="100%" style="max-width: +700px" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHubForTeaching/url-user-redirect.png?t=1604587961>" +style="border: 1px solid #888;" title="URL with git-pull and quickstart +parameters"/>\</a> + +\<span +style`"font-size: 1em;">This link would redirect to </span> =https://taurus.hrsk.tu-dresden.de/jupyter/user/{login}/notebooks/demo.ipynb` +\<span style="font-size: 1em;">.\</span> diff --git a/twiki2md/root/Login/SSHMitPutty.md b/twiki2md/root/Login/SSHMitPutty.md new file mode 100644 index 000000000..59ab28be4 --- /dev/null +++ b/twiki2md/root/Login/SSHMitPutty.md @@ -0,0 +1,79 @@ +\<br /> + +## Prerequisites for Access to a Linux Cluster From a Windows Workstation + +To work at an HPC system at ZIH you need + +- a program that provides you a command shell (like \<a + href="<http://www.chiark.greenend.org.uk/%7Esgtatham/putty/download.html>" + target="\_top">"putty"\</a> or \<a + href="<http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/datennetz_dienste/secure_shell/>" + target="\_top">"Secure Shell ssh3.2"\</a>; both free)\<br /> (The + putty.exe is only to download at the desctop. (No installation)) + +and if you would like to use graphical software from the HPC system + +- an X-Server (like \<a + href="<http://www.straightrunning.com/XmingNotes/>" + target="\_top">X-Ming\</a> or \<a + href="<http://www.cygwin.com/cygwin/>" target="\_top">CygWin32\</a>) + +at your local PC. Here, you can find installation descriptions for the X +servers: \<a +href="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/Login/install-Xming.pdf>" +target="\_top">X-Ming Installation\</a>, \<a +href="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/Login/cygwin_doku_de.pdf>" +target="\_top">CygWin Installation\</a>. \<br /> Please note: You have +also to install additional fonts for X-Ming at your PC. (also to find at +\<a href="<http://www.straightrunning.com/XmingNotes/>" +target="\_top">this website\</a>) If you would like transfer files +between your PC and an HPC machine, you should also have + +- \<a href="<http://winscp.net/eng/docs/lang:de>" + target="\_top">WinSCP\</a> (an SCP program is also included in the + "Secure Shell ssh3.2" software; see above) + +installed at your PC.\<br /> We advice putty + Xming (+ WinSCP). \<br +/>Please note: If you use software with OpenGL (like abaqus), please +install "Xming-mesa" instead of "Xmin". + +After installation you have to start always at first the X-server. At +the bottom right corner you will get an new icon (a black X for X-Ming). +Now you can start putty.exe. A window will appear where you have to give +the name of the computer and you have to switch ON the "X11 forwarding". +(please look at the figures) + +\<img alt="" src="%PUBURL%/Compendium/Login/putty1.jpg" title="putty: +name of HPC-machine" width="300" /> \<img alt="" +src="%PUBURL%/Compendium/Login/putty2.jpg" title="putty: switch on X11" +width="300" /> \<br /> + +Now you can "Open" the connection. You will get a window from the remote +machine, where you can put your linux commands. If you would like to use +commercial software, please follow the next instructions about the +modules. + +## \<a name="Copy_Files_From_the_HRSK_Machines_to_Your_Local_Machine">\</a> Copy Files From the HRSK Machines to Your Local Machine + +Take the following steps if your Workstation has a Windows operating +system. You need putty (see above) and your favorite SCP program, in +this example WinSCP. + +- Make a connection to login1.zih.tu-dresden.de\<br /> \<img + alt="tunnel1.png" src="%PUBURL%/Compendium/Login/tunnel1.png" + width="300" /> +- Setup SSH tunnel (data from your machine port 1222 will be directed + to deimos port 22)\<br /> \<img alt="tunnel2.png" + src="%PUBURL%/Compendium/Login/tunnel2.png" width="300" /> +- After clicking on the "Add" button, the tunnel should look like + that\<br /> \<img alt="tunnel3.png" + src="%PUBURL%/Compendium/Login/tunnel3.png" width="300" /> +- Click "Open" and enter your login and password (upon successful + login, the tunnel will exist)\<br /> \<img alt="tunnel4.png" + src="%PUBURL%/Compendium/Login/tunnel4.png" width="300" /> +- Put the putty window in the background (leave it running) and open + WinSCP (or your favorite SCP program), connect to localhost:1222\<br + /> \<img alt="tunnel5.png" + src="%PUBURL%/Compendium/Login/tunnel5.png" width="300" /> +- After hitting "Login" and entering your username/password, you can + access your files on deimos. diff --git a/twiki2md/root/Login/SecurityRestrictions.md b/twiki2md/root/Login/SecurityRestrictions.md new file mode 100644 index 000000000..53d678203 --- /dev/null +++ b/twiki2md/root/Login/SecurityRestrictions.md @@ -0,0 +1,31 @@ +# Security Restrictions on Taurus + +As a result of the security incident the German HPC sites in Gau +Alliance are now adjusting their measurements to prevent infection and +spreading of the malware. + +The most important items for HPC systems at ZIH are: + +- All users (who haven't done so recently) have to [change their ZIH + password](https://selfservice.zih.tu-dresden.de/l/index.php/pswd/change_zih_password). + **Login to Taurus is denied with an old password.** +- All old (private and public) keys have been moved away. +- All public ssh keys for Taurus have to be re-generated \<br /> + - using only the ED25519 algorithm (`ssh-keygen -t ed25519`) + - **passphrase for the private key must not be empty** +- Ideally, there should be no private key on Taurus except for local + use. Keys to other systems must be passphrase-protected! +- **ssh to Taurus** is only possible from inside TU Dresden Campus + (login\[1,2\].zih.tu-dresden.de will be blacklisted). Users from + outside can use VPN (see + [here](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn)). +- **ssh from Taurus** is only possible inside TU Dresden Campus. + (Direct ssh access to other computing centers was the spreading + vector of the recent incident.) + +Data transfer is possible via the taurusexport nodes. We are working on +a bandwidth-friendly solution. + +We understand that all this will change convenient workflows. If the +measurements would render your work on Taurus completely impossible, +please contact the HPC support. diff --git a/twiki2md/root/PerformanceTools/IOTrack.md b/twiki2md/root/PerformanceTools/IOTrack.md new file mode 100644 index 000000000..f20334c8e --- /dev/null +++ b/twiki2md/root/PerformanceTools/IOTrack.md @@ -0,0 +1,27 @@ +# Introduction + +IOTrack is a small tool developed at ZIH that tracks the I/O requests of +all processes and dumps a statistic per process at the end of the +program run. + +# How it works + +On taurus load the module via + + module load iotrack + +Then, instead of running your normal command, put "iotrack" in front of +it. So, + + python xyz.py arg1 arg2 + +changes to: + + iotrack python xyz.py arg1 arg2 + +# Technical Details + +The functionality is implemented in a library that is preloaded via +LD_PRELOAD. Thus, this will not work for static binaries. + +-- Main.MichaelKluge - 2013-07-16 diff --git a/twiki2md/root/PerformanceTools/PapiLibrary.md b/twiki2md/root/PerformanceTools/PapiLibrary.md new file mode 100644 index 000000000..5516c7e31 --- /dev/null +++ b/twiki2md/root/PerformanceTools/PapiLibrary.md @@ -0,0 +1,55 @@ +# PAPI Library + +Related work: [PAPI +documentation](http://icl.cs.utk.edu/projects/papi/wiki/Main_Page), +[Intel 64 and IA-32 Architectures Software Developers Manual (Per +thread/per core +PMCs)](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-system-programming-manual-325384.pdf) + +Additional sources for **Sandy Bridge** Processors: [Intel Xeon +Processor E5-2600 Product Family Uncore Performance Monitoring Guide +(Uncore +PMCs)](http://www.intel.com/content/dam/www/public/us/en/documents/design-guides/xeon-e5-2600-uncore-guide.pdf) + +Additional sources for **Haswell** Processors: [Intel Xeon Processor +E5-2600 v3 Product Family Uncore Performance Monitoring Guide (Uncore +PMCs) - Download +link](http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v3-uncore-performance-monitoring.html) + +## Introduction + +PAPI enables users and developers to monitor how their code performs on +a specific architecture. To do so, they can register events that are +counted by the hardware in performance monitoring counters (PMCs). These +counters relate to a specific hardware unit, for example a processor +core. Intel Processors used on taurus support eight PMCs per processor +core. As the partitions on taurus are run with HyperThreading Technology +(HTT) enabled, each CPU can use four of these. In addition to the **four +core PMCs**, Intel processors also support **a number of uncore PMCs** +for non-core resources. (see the uncore manuals listed in top of this +documentation). + +## Usage + +[Score-P](ScoreP) supports per-core PMCs. To include uncore PMCs into +Score-P traces use the software module **scorep-uncore/2016-03-29**on +the Haswell partition. If you do so, disable profiling to include the +uncore measurements. This metric plugin is available at +[github](https://github.com/score-p/scorep_plugin_uncore/). + +If you want to use PAPI directly in your software, load the latest papi +module, which establishes the environment variables **PAPI_INC**, +**PAPI_LIB**, and **PAPI_ROOT**. Have a look at the [PAPI +documentation](http://icl.cs.utk.edu/projects/papi/wiki/Main_Page) for +details on the usage. + +## Related Software + +[Score-P](ScoreP) + +[Linux Perf Tools](PerfTools) + +If you just need a short summary of your job, you might want to have a +look at [perf stat](PerfTools) + +-- Main.UlfMarkwardt - 2012-10-09 diff --git a/twiki2md/root/PerformanceTools/PerfTools.md b/twiki2md/root/PerformanceTools/PerfTools.md new file mode 100644 index 000000000..1a9fd9851 --- /dev/null +++ b/twiki2md/root/PerformanceTools/PerfTools.md @@ -0,0 +1,236 @@ +(This page is under construction) + +# Introduction + +perf consists of two parts: the kernel space implementation and the +userland tools. This wiki entry focusses on the latter. These tools are +installed on taurus, and others and provides support for sampling +applications and reading performance counters. + +# Installation + +On taurus load the module via + + module load perf/r31 + +# Configuration + +Admins can change the behaviour of the perf tools kernel part via the +following interfaces + +| | | +|---------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------| +| File Name | Description | +| /proc/sys/kernel/perf_event_max_sample_rate | describes the maximal sample rate for perf record and native access. This is used to limit the performance influence of sampling. | +| /proc/sys/kernel/perf_event_mlock_kb | defines the number of pages that can be used for sampling via perf record or the native interface | +| /proc/sys/kernel/perf_event_paranoid | defines access rights: | +| | -1 - Not paranoid at all | +| | 0 - Disallow raw tracepoint access for unpriv | +| | 1 - Disallow cpu events for unpriv | +| | 2 - Disallow kernel profiling for unpriv | +| /proc/sys/kernel/kptr_restrict | Defines whether the kernel address maps are restricted | + +# perf stat + +`perf stat` provides a general performance statistic for a program. You +can attach to a running (own) process, monitor a new process or monitor +the whole system. The latter is only available for root user, as the +performance data can provide hints on the internals of the application. + +## For users + +Run `perf stat <Your application>`. This will provide you with a general +overview on some counters. + + Performance counter stats for 'ls':= + 2,524235 task-clock # 0,352 CPUs utilized + 15 context-switches # 0,006 M/sec + 0 CPU-migrations # 0,000 M/sec + 292 page-faults # 0,116 M/sec + 6.431.241 cycles # 2,548 GHz + 3.537.620 stalled-cycles-frontend # 55,01% frontend cycles idle + 2.634.293 stalled-cycles-backend # 40,96% backend cycles idle + 6.157.440 instructions # 0,96 insns per cycle + # 0,57 stalled cycles per insn + 1.248.527 branches # 494,616 M/sec + 34.044 branch-misses # 2,73% of all branches + 0,007167707 seconds time elapsed + +- Generally speaking **task clock** tells you how parallel your job + has been/how many cpus were used. +- **[Context switches](http://en.wikipedia.org/wiki/Context_switch)** + are an information about how the scheduler treated the application. + Also interrupts cause context switches. Lower is better. +- **CPU migrations** are an information on whether the scheduler moved + the application between cores. Lower is better. Please pin your + programs to CPUs to avoid migrations. This can be done with + environment variables for OpenMP and MPI, with `likwid-pin`, + `numactl` and `taskset`. +- **[Page faults](http://en.wikipedia.org/wiki/Page_fault)** describe + how well the Translation Lookaside Buffers fit for the program. + Lower is better. +- **Cycles** tells you how many CPU cycles have been spent in + executing the program. The normalized value tells you the actual + average frequency of the CPU(s) running the application. +- **stalled-cycles-...** tell you how well the processor can execute + your code. Every stall cycle is a waste of CPU time and energy. The + reason for such stalls can be numerous. It can be wrong branch + predictions, cache misses, occupation of CPU resources by long + running instructions and so on. If these stall cycles are to high + you might want to review your code. +- The normalized **instructions** number tells you how well your code + is running. More is better. Current x86 CPUs can run 3 to 5 + instructions per cycle, depending on the instruction mix. A count of + less then 1 is not favorable. In such a case you might want to + review your code. +- **branches** and **branch-misses** tell you how many jumps and loops + are performed in your code. Correctly + [predicted](http://en.wikipedia.org/wiki/Branch_prediction) branches + should not hurt your performance, **branch-misses** on the other + hand hurt your performance very badly and lead to stall cycles. +- Other events can be passed with the `-e` flag. For a full list of + predefined events run `perf list` +- PAPI runs on top of the same infrastructure as perf stat, so you + might want to use their meaningful event names. Otherwise you can + use raw events, listed in the processor manuals. ( + [Intel](http://download.intel.com/products/processor/manual/325384.pdf), + [AMD](http://support.amd.com/us/Processor_TechDocs/42300_15h_Mod_10h-1Fh_BKDG.pdf)) + +## For admins + +Administrators can run a system wide performance statistic, e.g., with +`perf stat -a sleep 1` which measures the performance counters for the +whole computing node over one second.\<span style="font-size: 1em;"> +\</span> + +# perf record + +`perf record` provides the possibility to sample an application or a +system. You can find performance issues and hot parts of your code. By +default perf record samples your program at a 4000 Hz. It records CPU, +Instruction Pointer and, if you specify it, the call chain. If your code +runs long (or often) enough, you can find hot spots in your application +and external libraries. Use **perf report** to evaluate the result. You +should have debug symbols available, otherwise you won't be able to see +the name of the functions that are responsible for your load. You can +pass one or multiple events to define the **sampling event**. \<br /> +**What is a sampling event?** \<br /> Sampling reads values at a +specific sampling frequency. This frequency is usually static and given +in Hz, so you have for example 4000 events per second and a sampling +frequency of 4000 Hz and a sampling rate of 250 microseconds. With the +sampling event, the concept of a static sampling frequency in time is +somewhat redefined. Instead of a constant factor in time (sampling rate) +you define a constant factor in events. So instead of a sampling rate of +250 microseconds, you have a sampling rate of 10,000 floating point +operations. \<br /> **Why would you need sampling events?** \<br /> +Passing an event allows you to find the functions that produce cache +misses, floating point operations, ... Again, you can use events defined +in `perf list` and raw events. \<br />\<br /> Use the `-g` flag to +receive a call graph. + +## For users + +Just run `perf record ./myapp` or attach to a running process. + +### Using perf with MPI + +Perf can also be used to record data for indivdual MPI processes. This +requires a wrapper script (perfwrapper) with the following content. Also +make sure that the wrapper script is executable (chmod +x). + + #!/bin/bash + <span style="font-size: 1em;">perf record -o perf.data.$SLURM_JOB_ID.$SLURM_PROCID $@</span> + +To start the MPI program type \<span>srun ./perfwrapper ./myapp +\</span>on your command line. The result will be n independent perf.data +files that can be analyzed individually with perf report. + +## For admins + +This tool is very effective, if you want to help users find performance +problems and hot-spots in their code but also helps to find OS daemons +that disturb such applications. You would start `perf record -a -g` to +monitor the whole node. + +# perf report + +perf report is a command line UI for evaluating the results from perf +record. It creates something like a profile from the recorded samplings. +These profiles show you what the most used have been. If you added a +callchain, it also gives you a callchain profile.\<br /> \*Disclaimer: +Sampling is not an appropriate way to gain exact numbers. So this is +merely a rough overview and not guaranteed to be absolutely +correct.\*\<span style="font-size: 1em;"> \</span> + +## On taurus + +On taurus, users are not allowed to see the kernel functions. If you +have multiple events defined, then the first thing you select in +`perf report` is the type of event. Press right + + Available samples + 96 cycles + 11 cache-misse + +**Hint: The more samples you have, the more exact is the profile. 96 or +11 samples is not enough by far.** I repeat the measurement and set +`-F 50000` to increase the sampling frequency. **Hint: The higher the +frequency, the higher the influence on the measurement.** If youd'd +select cycles, you would get such a screen: + + Events: 96 cycles + + 49,13% test_gcc_perf test_gcc_perf [.] main.omp_fn.0 + + 34,48% test_gcc_perf test_gcc_perf [.] + + 6,92% test_gcc_perf test_gcc_perf [.] omp_get_thread_num@plt + + 5,20% test_gcc_perf libgomp.so.1.0.0 [.] omp_get_thread_num + + 2,25% test_gcc_perf test_gcc_perf [.] main.omp_fn.1 + + 2,02% test_gcc_perf [kernel.kallsyms] [k] 0xffffffff8102e9ea + +Increased sample frequency: + + Events: 7K cycles + + 42,61% test_gcc_perf test_gcc_perf [.] p + + 40,28% test_gcc_perf test_gcc_perf [.] main.omp_fn.0 + + 6,07% test_gcc_perf test_gcc_perf [.] omp_get_thread_num@plt + + 5,95% test_gcc_perf libgomp.so.1.0.0 [.] omp_get_thread_num + + 4,14% test_gcc_perf test_gcc_perf [.] main.omp_fn.1 + + 0,69% test_gcc_perf [kernel.kallsyms] [k] 0xffffffff8102e9ea + + 0,04% test_gcc_perf ld-2.12.so [.] check_match.12442 + + 0,03% test_gcc_perf libc-2.12.so [.] printf + + 0,03% test_gcc_perf libc-2.12.so [.] vfprintf + + 0,03% test_gcc_perf libc-2.12.so [.] __strchrnul + + 0,03% test_gcc_perf libc-2.12.so [.] _dl_addr + + 0,02% test_gcc_perf ld-2.12.so [.] do_lookup_x + + 0,01% test_gcc_perf libc-2.12.so [.] _int_malloc + + 0,01% test_gcc_perf libc-2.12.so [.] free + + 0,01% test_gcc_perf libc-2.12.so [.] __sigprocmask + + 0,01% test_gcc_perf libgomp.so.1.0.0 [.] 0x87de + + 0,01% test_gcc_perf libc-2.12.so [.] __sleep + + 0,01% test_gcc_perf ld-2.12.so [.] _dl_check_map_versions + + 0,01% test_gcc_perf ld-2.12.so [.] local_strdup + + 0,00% test_gcc_perf libc-2.12.so [.] __execvpe + +Now you select the most often sampled function and zoom into it by +pressing right. If debug symbols are not available, perf report will +show which assembly instruction is hit most often when sampling. If +debug symbols are available, it will also show you the source code lines +for these assembly instructions. You can also go back and check which +instruction caused the cache misses or whatever event you were passing +to perf record. + +# perf script + +If you need a trace of the sampled data, you can use perf script +command, which by default prints all samples to stdout. You can use +various interfaces (e.g., python) to process such a trace. + +# perf top + +perf top is only available for admins, as long as the paranoid flag is +not changed (see configuration). + +It behaves like the top command, but gives you not only an overview of +the processes and the time they are consuming but also on the functions +that are processed by these. + +-- Main.RobertSchoene - 2013-04-29 diff --git a/twiki2md/root/PerformanceTools/ScoreP.md b/twiki2md/root/PerformanceTools/ScoreP.md new file mode 100644 index 000000000..5a32563b7 --- /dev/null +++ b/twiki2md/root/PerformanceTools/ScoreP.md @@ -0,0 +1,136 @@ +# Score-P + +The Score-P measurement infrastructure is a highly scalable and +easy-to-use tool suite for profiling, event tracing, and online analysis +of HPC applications.\<br />Currently, it works with the analysis tools +[Vampir](Vampir), Scalasca, Periscope, and Tau.\<br />Score-P supports +lots of features e.g. + +- MPI, SHMEM, OpenMP, pthreads, and hybrid programs +- Manual source code instrumentation +- Monitoring of CUDA applications +- Recording hardware counter by using PAPI library +- Function filtering and grouping + +Only the basic usage is shown in this Wiki. For a comprehensive Score-P +user manual refer to the [Score-P website](http://www.score-p.org). + +Before using Score-P, set up the correct environment with + + module load scorep + +To make measurements with Score-P, the user's application program needs +to be instrumented, i.e., at specific important points (\`\`events'') +Score-P measurement calls have to be activated. By default, Score-P +handles this automatically. In order to enable instrumentation of +function calls, MPI as well as OpenMP events, the user only needs to +prepend the Score-P wrapper to the usual compiler and linker commands. +Following wrappers exist: + +The following sections show some examples depending on the +parallelization type of the program. + +## Serial programs + +| | | +|----------------------|------------------------------------| +| original | ifort a.f90 b.f90 -o myprog | +| with instrumentation | scorep ifort a.f90 b.f90 -o myprog | + +This will instrument user functions (if supported by the compiler) and +link the Score-P library. + +## MPI parallel programs + +If your MPI implementation uses MPI compilers, Score-P will detect MPI +parallelization automatically: + +| | | +|----------------------|-------------------------------| +| original | mpicc hello.c -o hello | +| with instrumentation | scorep mpicc hello.c -o hello | + +MPI implementations without own compilers (as on the Altix) require the +user to link the MPI library manually. Even in this case, Score-P will +detect MPI parallelization automatically: + +| | | +|----------------------|-----------------------------------| +| original | icc hello.c -o hello -lmpi | +| with instrumentation | scorep icc hello.c -o hello -lmpi | + +However, if Score-P falis to detect MPI parallelization automatically +you can manually select MPI instrumentation: + +| | | +|----------------------|---------------------------------------------| +| original | icc hello.c -o hello -lmpi | +| with instrumentation | scorep --mpp=mpi icc hello.c -o hello -lmpi | + +If you want to instrument MPI events only (creates less overhead and +smaller trace files) use the option --nocompiler to disable automatic +instrumentation of user functions. + +## OpenMP parallel programs + +When Score-P detects OpenMP flags on the command line, OPARI2 is invoked +for automatic source code instrumentation of OpenMP events: + +| | | +|----------------------|---------------------------------| +| original | ifort -openmp pi.f -o pi | +| with instrumentation | scorep ifort -openmp pi.f -o pi | + +## Hybrid MPI/OpenMP parallel programs + +With a combination of the above mentioned approaches, hybrid +applications can be instrumented: + +| | | +|----------------------|--------------------------------------------| +| original | mpif90 -openmp hybrid.F90 -o hybrid | +| with instrumentation | scorep mpif90 -openmp hybrid.F90 -o hybrid | + +## Score-P instrumenter option overview + +| Type of instrumentation | Instrumenter switch | Default value | Runtime measurement control | +|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------:|:-----------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| MPI | --mpp=mpi | (auto) | (see Sec. [Selection of MPI Groups](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#mpi_groups) ) | +| SHMEM | --mpp=shmem | (auto) | | +| OpenMP | --thread=omp | (auto) | | +| Pthread | --thread=pthread | (auto) | | +| Compiler (see Sec. [Automatic Compiler Instrumentation](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#compiler_instrumentation) ) | --compiler/--nocompiler | enabled | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) ) | +| PDT instrumentation (see Sec. [Source-Code Instrumentation Using PDT](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#tau_instrumentation) ) | --pdt/--nopdt | disabled | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) ) | +| POMP2 user regions (see Sec. [Semi-Automatic Instrumentation of POMP2 User Regions](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#pomp_instrumentation) ) | --pomp/--nopomp | depends on OpenMP usage | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) ) | +| Manual (see Sec. [Manual Region Instrumentation](https://silc.zih.tu-dresden.de/scorep-current/html/instrumentation.html#manual_instrumentation) ) | --user/--nouser | disabled | Filtering (see Sec. [Filtering](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#filtering) )\<br /> and\<br /> selective recording (see Sec. [Selective Recording](https://silc.zih.tu-dresden.de/scorep-current/html/measurement.html#selective_recording) ) | + +## Application Measurement + +After the application run, you will find an experiment directory in your +current working directory, which contains all recorded data. + +In general, you can record a profile and/or a event trace. Whether a +profile and/or a trace is recorded, is specified by the environment +variables \<span> +`[[https://silc.zih.tu-dresden.de/scorep-current/html/scorepmeasurementconfig.html#SCOREP_ENABLE_PROFILING][SCOREP_ENABLE_PROFILING]]` +\</span> and \<span> +`[[https://silc.zih.tu-dresden.de/scorep-current/html/scorepmeasurementconfig.html#SCOREP_ENABLE_TRACING][SCOREP_ENABLE_TRACING]]` +\</span>. If the value of this variables is zero or false, +profiling/tracing is disabled. Otherwise Score-P will record a profile +and/or trace. By default, profiling is enabled and tracing is disabled. +For more information please see [the list of Score-P measurement +configuration +variables](https://silc.zih.tu-dresden.de/scorep-current/html/scorepmeasurementconfig.html). + +You may start with a profiling run, because of its lower space +requirements. According to profiling results, you may configure the +trace buffer limits, filtering or selective recording for recording +traces. + +Score-P allows to configure several parameters via environment +variables. After the measurement run you can find a \<span>scorep.cfg +\</span>file in your experiment directory which contains the +configuration of the measurement run. If you had not set configuration +values explicitly, the file will contain the default values. + +-- Main.RonnyTschueter - 2014-09-11 diff --git a/twiki2md/root/PerformanceTools/Vampir.md b/twiki2md/root/PerformanceTools/Vampir.md new file mode 100644 index 000000000..2d681ac83 --- /dev/null +++ b/twiki2md/root/PerformanceTools/Vampir.md @@ -0,0 +1,186 @@ +# Vampir + +Contents: + +1 [Introduction](#VampirIntro) 1 [Starting Vampir](#VampirUsage) 1 +[Using VampirServer](#VampirServerUsage) 1 [Advanced +usage](#VampirAdvanced) 1 [Manual server +startup](#VampirManualServerStartup) 1 [Port +forwarding](#VampirPortForwarding) 1 [Nightly builds +(unstable)](#VampirServerUnstable) + +#VampirIntro + +## Introduction + +Vampir is a graphical analysis framework that provides a large set of +different chart representations of event based performance data +generated through program instrumentation. These graphical displays, +including state diagrams, statistics, and timelines, can be used by +developers to obtain a better understanding of their parallel program +inner working and to subsequently optimize it. Vampir allows to focus on +appropriate levels of detail, which allows the detection and explanation +of various performance bottlenecks such as load imbalances and +communication deficiencies. [Follow this link for further +information](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampir). + +A growing number of performance monitoring environments like +[VampirTrace](Compendium.VampirTrace), Score-P, TAU or KOJAK can produce +trace files that are readable by Vampir. The tool supports trace files +in Open Trace Format (OTF, OTF2) that is developed by ZIH and its +partners and is especially designed for massively parallel programs. + +\<img alt="" src="%ATTACHURLPATH%/vampir-framework.png" title="Vampir +Framework" /> + +#VampirUsage + +## Starting Vampir + +Prior to using Vampir you need to set up the correct environment on one +the HPC systems with: + + module load vampir + +For members of TU Dresden the Vampir tool is also available as +[download](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampir/vampir_download_tu) +for installation on your personal computer. + +Make sure, that compressed display forwarding (e.g. +`ssh -XC taurus.hrsk.tu-dresden.de`) is enabled. Start the GUI by typing + + vampir + +on your command line or by double-clicking the Vampir icon on your +personal computer. + +Please consult the [Vampir user +manual](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampir/dateien/Vampir-User-Manual.pdf) +for a tutorial on using the tool. + +#VampirServerUsage + +## Using VampirServer + +VampirServer provides additional scalable analysis capabilities to the +Vampir GUI mentioned above. To use VampirServer on the HPC resources of +TU Dresden proceed as follows: start the Vampir GUI as described above +and use the *Open Remote* dialog with the parameters indicated in the +following figure to start and connect a VampirServer instance running on +taurus.hrsk.tu-dresden.de. Make sure to fill in your personal ZIH login +name. + +\<img alt="" src="%ATTACHURLPATH%/vampir_open_remote_dialog.png" +title="Vampir Open Remote Dialog" /> + +Click on the Connect button and wait until the connection is +established. Enter your password when requested. Depending on the +available resources on the target system, this setup can take some time. +Please be patient and take a look at available resources beforehand. + +#VampirAdvanced + +## Advanced Usage + +#VampirManualServerStartup + +### Manual Server Startup + +VampirServer is a parallel MPI program, which can also be started +manually by typing: + + vampirserver start + +Above automatically allocates its resources via the respective batch +system. Use + + vampirserver start mpi + +or + + vampirserver start srun + +if you want to start vampirserver without batch allocation or inside an +interactive allocation. The latter is needed whenever you manually take +care of the resource allocation by yourself. + +After scheduling this job the server prints out the port number it is +serving on, like `Listen port: 30088`. + +Connecting to the most recently started server can be achieved by +entering `auto-detect` as *Setup name* in the *Open Remote* dialog of +Vampir. + +\<img alt="" +src="%ATTACHURLPATH%/vampir_open_remote_dialog_auto_start.png" +title="Vampir Open Remote Dialog" /> + +Please make sure you stop VampirServer after finishing your work with +the front-end or with + + vampirserver stop + +Type + + vampirserver help + +for further information. The [user manual of +VampirServer](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampir/dateien/VampirServer-User-Manual.pdf) +can be found at *installation directory* /doc/vampirserver-manual.pdf. +Type + + which vampirserver + +to find the revision dependent *installation directory*. + +#VampirPortForwarding + +### Port Forwarding + +VampirServer listens to a given socket port. It is possible to forward +this port (SSH tunnel) to a remote machine. This procedure is not +recommended and not needed at ZIH. However, the following example shows +the tunneling to a VampirServer on a compute node at Taurus. The same +procedure works on Venus. + +Start VampirServer on Taurus and wait for its scheduling: + + vampirserver start + +and wait for scheduling + + Launching VampirServer... + Submitting slurm 30 minutes job (this might take a while)... + salloc: Granted job allocation 2753510 + VampirServer 8.1.0 (r8451) + Licensed to ZIH, TU Dresden + Running 4 analysis processes... (abort with vampirserver stop 594) + VampirServer listens on: taurusi1253:30055 + +Open a second console on your local desktop and create an ssh tunnel to +the compute node with: + + ssh -L 30000:taurusi1253:30055 taurus.hrsk.tu-dresden.de + +Now, the port 30000 on your desktop is connected to the VampirServer +port 30055 at the compute node taurusi1253 of Taurus. Finally, start +your local Vampir client and establish a remote connection to +`localhost`, port 30000 as described in the manual. + +Remark: Please substitute the ports given in this example with +appropriate numbers and available ports. + +#VampirServerUnstable + +### Nightly builds (unstable) + +Expert users who subscribed to the development program can test new, +unstable tool features. The corresponding Vampir and VampirServer +software releases are provided as nightly builds. Unstable versions of +VampirServer are also installed on the HPC systems. The most recent +version can be launched/connected by entering `unstable` as *Setup name* +in the *Open Remote* dialog of Vampir. + +\<img alt="" +src="%ATTACHURLPATH%/vampir_open_remote_dialog_unstable.png" +title="Connecting to unstable VampirServer" /> diff --git a/twiki2md/root/PerformanceTools/VampirTrace.md b/twiki2md/root/PerformanceTools/VampirTrace.md new file mode 100644 index 000000000..eee845e9c --- /dev/null +++ b/twiki2md/root/PerformanceTools/VampirTrace.md @@ -0,0 +1,103 @@ +# VampirTrace + +VampirTrace is a performance monitoring tool, that produces tracefiles +during a program run. These tracefiles can be analyzed and visualized by +the tool [Vampir](Compendium.Vampir). Vampir Supports lots of features +e.g. + +- MPI, OpenMP, pthreads, and hybrid programs +- Manual source code instrumentation +- Recording hardware counter by using PAPI library +- Memory allocation tracing +- I/O tracing +- Function filtering and grouping + +Only the basic usage is shown in this Wiki. For a comprehensive +VampirTrace user manual refer to the [VampirTrace +Website](http://www.tu-dresden.de/zih/vampirtrace). + +Before using VampirTrace, set up the correct environment with + + module load vampirtrace + +To make measurements with VampirTrace, the user's application program +needs to be instrumented, i.e., at specific important points +(\`\`events'') VampirTrace measurement calls have to be activated. By +default, VampirTrace handles this automatically. In order to enable +instrumentation of function calls, MPI as well as OpenMP events, the +user only needs to replace the compiler and linker commands with +VampirTrace's wrappers. Following wrappers exist: + +| | | +|----------------------|-----------------------------| +| Programming Language | VampirTrace Wrapper Command | +| C | `vtcc` | +| C++ | `vtcxx` | +| Fortran 77 | `vtf77` | +| Fortran 90 | `vtf90` | + +The following sections show some examples depending on the +parallelization type of the program. + +## Serial programs + +Compiling serial code is the default behavior of the wrappers. Simply +replace the compiler by VampirTrace's wrapper: + +| | | +|----------------------|-------------------------------| +| original | `ifort a.f90 b.f90 -o myprog` | +| with instrumentation | `vtf90 a.f90 b.f90 -o myprog` | + +This will instrument user functions (if supported by compiler) and link +the VampirTrace library. + +## MPI parallel programs + +If your MPI implementation uses MPI compilers (this is the case on +Deimos), you need to tell VampirTrace's wrapper to use this compiler +instead of the serial one: + +| | | +|----------------------|--------------------------------------| +| original | `mpicc hello.c -o hello` | +| with instrumentation | `vtcc -vt:cc mpicc hello.c -o hello` | + +MPI implementations without own compilers (as on the Altix) require the +user to link the MPI library manually. In this case, you simply replace +the compiler by VampirTrace's compiler wrapper: + +| | | +|----------------------|-------------------------------| +| original | `icc hello.c -o hello -lmpi` | +| with instrumentation | `vtcc hello.c -o hello -lmpi` | + +If you want to instrument MPI events only (creates smaller trace files +and less overhead) use the option `-vt:inst manual` to disable automatic +instrumentation of user functions. + +## OpenMP parallel programs + +When VampirTrace detects OpenMP flags on the command line, OPARI is +invoked for automatic source code instrumentation of OpenMP events: + +| | | +|----------------------|----------------------------| +| original | `ifort -openmp pi.f -o pi` | +| with instrumentation | `vtf77 -openmp pi.f -o pi` | + +## Hybrid MPI/OpenMP parallel programs + +With a combination of the above mentioned approaches, hybrid +applications can be instrumented: + +| | | +|----------------------|-----------------------------------------------------| +| original | `mpif90 -openmp hybrid.F90 -o hybrid` | +| with instrumentation | `vtf90 -vt:f90 mpif90 -openmp hybrid.F90 -o hybrid` | + +By default, running a VampirTrace instrumented application should result +in a tracefile in the current working directory where the application +was executed. + +-- Main.jurenz - 2009-12-17 diff --git a/twiki2md/root/Slurm/BindingAndDistributionOfTasks.md b/twiki2md/root/Slurm/BindingAndDistributionOfTasks.md new file mode 100644 index 000000000..6fae22d63 --- /dev/null +++ b/twiki2md/root/Slurm/BindingAndDistributionOfTasks.md @@ -0,0 +1,230 @@ + + +# Binding and Distribution of Tasks + +## General + +To specify a pattern the commands --cpu_bind=\<cores \| sockets> and +--distribution=\<block \| cyclic> are needed. cpu_bind defines the +resolution in which the tasks will be allocated. While --distribution +determinates the order in which the tasks will be allocated to the cpus. +Keep in mind that the allocation pattern also depends on your +specification. + + #!/bin/bash + #SBATCH --nodes=2 # request 2 nodes + #SBATCH --cpus-per-task=4 # use 4 cores per task + #SBATCH --tasks-per-node=4 # allocate 4 tasks per node - 2 per socket + + srun --ntasks 8 --cpus-per-task 4 --cpu_bind=cores --distribution=block:block ./application + +In the following sections there are some selected examples of the +combinations between --cpu_bind and --distribution for different job +types. + +## MPI Strategies + +### Default Binding and Dsitribution Pattern + +The default binding uses --cpu_bind=cores in combination with +--distribution=block:cyclic. The default (as well as block:cyclic) +allocation method will fill up one node after another, while filling +socket one and two in alternation. Resulting in only even ranks on the +first socket of each node and odd on each second socket of each node. + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAw4AAADeCAIAAAAb9sCoAAAABmJLR0QA/wD/AP+gvaeTAAAfBklEQVR4nO3dfXBU1f348bshJEA2ISGbB0gIZAMJxqciIhCktGKxaqs14UEGC9gBJVUjxIo4EwFlpiqMOgydWipazTBNVATbGevQMQQYUMdSEEUNYGIID8kmMewmm2TzeH9/3On+9pvN2T27N9nsJu/XX+Tu/dx77uee8+GTu8tiUFVVAQAAQH/ChnoAAAAAwYtWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQChcT7DBYBiocQAIOaqqDvUQfEC9AkYyPfWKp0oAAABCup4qaULrN0sA+oXuExrqFTDS6K9XPFUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUauX72s58ZDIZPP/3UuSU5OfnDDz+UP8KXX35pNBrl9y8uLs7JyYmKikpOTvZhoABGvMDXq40bN2ZnZ48bNy4tLW3Tpk2dnZ0+DBfDC63SiBYfH//0008H7HQmk2nDhg3btm0L2BkBDBsBrld2u33Pnj2XLl0qLS0tLS3dunVrwE6NYEOrNKKtXbu2srLygw8+cH+ptrZ26dKliYmJqampjz/+eFtbm7b90qVLd911V2xs7A033HDixAnn/s3Nzfn5+ZMnT05ISHjwwQcbGxvdj3nPPfcsW7Zs8uTJg3Q5AIaxANerN954Y8GCBfHx8Tk5OQ8//LBrOEYaWqURzWg0btu27dlnn+3q6urzUl5e3ujRoysrK0+ePHnq1KnCwkJt+9KlS1NTU+vq6v71r3/95S9/ce6/cuVKi8Vy+vTpmpqa8ePHr1mzJmBXAWAkGMJ6dfz48VmzZg3o1SCkqDroPwKG0MKFC7dv397V1TVjxozdu3erqpqUlHTw4EFVVSsqKhRFqa+v1/YsKysbM2ZMT09PRUWFwWBoamrSthcXF0dFRamqWlVVZTAYnPvbbDaDwWC1Wvs9b0lJSVJS0mBfHQZVKK79UBwznIaqXqmqumXLlvT09MbGxkG9QAwe/Ws/PNCtGYJMeHj4Sy+9tG7dulWrVjk3Xr58OSoqKiEhQfvRbDY7HI7GxsbLly/Hx8fHxcVp26dPn679obq62mAwzJ4923mE8ePHX7lyZfz48YG6DgDDX+Dr1QsvvLBv377y8vL4+PjBuioEPVolKPfff/8rr7zy0ksvObekpqa2trY2NDRo1ae6ujoyMtJkMqWkpFit1o6OjsjISEVR6urqtP3T0tIMBsOZM2fojQAMqkDWq82bNx84cODo0aOpqamDdkEIAXxWCYqiKDt37ty1a1dLS4v2Y2Zm5ty5cwsLC+12u8ViKSoqWr16dVhY2IwZM2bOnPnaa68pitLR0bFr1y5t/4yMjMWLF69du7a2tlZRlIaGhv3797ufpaenx+FwaJ8zcDgcHR0dAbo8AMNIYOpVQUHBgQMHDh06ZDKZHA4HXxYwktEqQVEUZc6cOffee6/zn40YDIb9+/e3tbWlp6fPnDnzpptuevXVV7WX3n///bKysltuueWOO+644447nEcoKSmZNGlSTk5OdHT03Llzjx8/7n6WN954Y+zYsatWrbJYLGPHjuWBNgA/BKBeWa3W3bt3X7hwwWw2jx07duzYsdnZ2YG5OgQhg/MTT/4EGwyKoug5AoBQFIprPxTHDEA//Wufp0oAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABC4UM9AAAInKqqqqEeAoAQY1BV1f9gg0FRFD1HABCKQnHta2MGMDLpqVcD8FSJAgQg+JnN5qEeAoCQNABPlQCMTKH1VAkA/KOrVQIAABje+BdwAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrq+gpLvVRoJ/Ps6CebGSBBaXzXCnBwJqFcQ0VOveKoEAAAgNAD/sUlo/WYJefp/02JuDFeh+1s4c3K4ol5BRP/c4KkSAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACA0PBvlb799ttf//rXJpNp3LhxM2bMeOaZZ/w4yIwZMz788EPJnX/yk5+Ulpb2+1JxcXFOTk5UVFRycrIfw8DACqq5sXHjxuzs7HHjxqWlpW3atKmzs9OPwSDUBdWcpF4FlaCaGyOtXg3zVqm3t/eXv/zlpEmTvv7668bGxtLSUrPZPITjMZlMGzZs2LZt2xCOAZpgmxt2u33Pnj2XLl0qLS0tLS3dunXrEA4GQyLY5iT1KngE29wYcfVK1UH/EQbbpUuXFEX59ttv3V+6evXqkiVLEhISUlJSHnvssdbWVm37tWvX8vPz09LSoqOjZ86cWVFRoapqVlbWwYMHtVcXLly4atWqzs5Om822fv361NRUk8m0fPnyhoYGVVUff/zx0aNHm0ymKVOmrFq1qt9RlZSUJCUlDdY1Dxw995e54d/c0GzZsmXBggUDf80DJ/jvr7vgH3NwzknqVTAIzrmhGQn1apg/VZo0aVJmZub69evffffdmpoa15fy8vJGjx5dWVl58uTJU6dOFRYWattXrFhx8eLFzz77zGq1vvPOO9HR0c6Qixcvzp8///bbb3/nnXdGjx69cuVKi8Vy+vTpmpqa8ePHr1mzRlGU3bt3Z2dn7969u7q6+p133gngtcI3wTw3jh8/PmvWrIG/ZgS3YJ6TGFrBPDdGRL0a2k4tACwWy+bNm2+55Zbw8PBp06aVlJSoqlpRUaEoSn19vbZPWVnZmDFjenp6KisrFUW5cuVKn4NkZWU999xzqampe/bs0bZUVVUZDAbnEWw2m8FgsFqtqqrefPPN2llE+C0tSATh3FBVdcuWLenp6Y2NjQN4pQMuJO5vHyEx5iCck9SrIBGEc0MdMfVq+LdKTi0tLa+88kpYWNhXX331ySefREVFOV/64YcfFEWxWCxlZWXjxo1zj83KykpKSpozZ47D4dC2HD58OCwsbIqL2NjYb775RqX06I4NvOCZG88//7zZbK6urh7Q6xt4oXV/NaE15uCZk9SrYBM8c2Pk1Kth/gacK6PRWFhYOGbMmK+++io1NbW1tbWhoUF7qbq6OjIyUntTtq2trba21j18165dCQkJ9913X1tbm6IoaWlpBoPhzJkz1f9z7dq17OxsRVHCwkZQVoeHIJkbmzdv3rdv39GjR6dMmTIIV4lQEiRzEkEoSObGiKpXw3yR1NXVPf3006dPn25tbW1qanrxxRe7urpmz56dmZk5d+7cwsJCu91usViKiopWr14dFhaWkZGxePHiRx55pLa2VlXVs2fPOqdaZGTkgQMHYmJi7r777paWFm3PtWvXajs0NDTs379f2zM5OfncuXP9jqenp8fhcHR1dSmK4nA4Ojo6ApIG9CPY5kZBQcGBAwcOHTpkMpkcDsew/8e3cBdsc5J6FTyCbW6MuHo1tA+1BpvNZlu3bt306dPHjh0bGxs7f/78jz76SHvp8uXLubm5JpNp4sSJ+fn5drtd297U1LRu3bqUlJTo6Ohbbrnl3Llzqsu/Guju7v7tb3972223NTU1Wa3WgoKCqVOnGo1Gs9n85JNPakc4cuTI9OnTY2Nj8/Ly+ozn9ddfd02+64PTIKTn/jI3fJob165d67MwMzIyApcL3wX//XUX/GMOqjmpUq+CSVDNjRFYrwzOo/jBYDBop/f7CAhmeu4vc2N4C8X7G4pjhjzqFUT0399h/gYcAACAHrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQuH6D2EwGPQfBMMScwPBhjkJEeYGRHiqBAAAIGRQVXWoxwAAABCkeKoEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgpOvbuvlu05HAv2/eYm6MBKH1rWzMyZGAegURPfWKp0oAAABCA/B/wIXWb5aQp/83LebGcBW6v4UzJ4cr6hVE9M8NnioBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAIDdtW6cSJE/fee++ECROioqJuvPHGoqKi1tbWAJy3u7u7oKBgwoQJMTExK1eubG5u7nc3o9FocBEZGdnR0RGA4Y1YQzUfLBbLsmXLTCZTbGzsXXfdde7cuX53Ky4uzsnJiYqKSk5Odt2+Zs0a13lSWloagDEj8KhXcEW9CjbDs1X65z//uWjRoptvvvmzzz6rr6/ft29ffX39mTNnZGJVVe3q6vL71M8///yhQ4dOnjz5/fffX7x4cf369f3uZrFYWv4nNzf3gQceiIyM9Puk8GwI50N+fr7Vaj1//vyVK1cmTpy4dOnSfnczmUwbNmzYtm2b+0uFhYXOqbJkyRK/R4KgRb2CK+pVMFJ10H+EwdDT05OamlpYWNhne29vr6qqV69eXbJkSUJCQkpKymOPPdba2qq9mpWVVVRUdPvtt2dmZpaXl9tstvXr16empppMpuXLlzc0NGi7vfrqq1OmTBk/fvzEiRO3b9/ufvbExMS33npL+3N5eXl4ePi1a9c8jLahoSEyMvLw4cM6r3ow6Lm/wTM3hnY+ZGRk7N27V/tzeXl5WFhYd3e3aKglJSVJSUmuW1avXv3MM8/4e+mDKHjur7zgHDP1aqBQr6hXIgPQ7Qzt6QeD1n2fPn2631fnzZu3YsWK5ubm2traefPmPfroo9r2rKysG264obGxUfvxV7/61QMPPNDQ0NDW1vbII4/ce++9qqqeO3fOaDReuHBBVVWr1frf//63z8Fra2tdT609zT5x4oSH0e7cuXP69Ok6LncQDY/SM4TzQVXVTZs2LVq0yGKx2Gy2hx56KDc318NQ+y09EydOTE1NnTVr1ssvv9zZ2el7AgZF8NxfecE5ZurVQKFeUa9EaJX68cknnyiKUl9f7/5SRUWF60tlZWVjxozp6elRVTUrK+tPf/qTtr2qqspgMDh3s9lsBoPBarVWVlaOHTv2vffea25u7vfU58+fVxSlqqrKuSUsLOzjjz/2MNrMzMydO3f6fpWBMDxKzxDOB23nhQsXatm47rrrampqPAzVvfQcOnTo008/vXDhwv79+1NSUtx/1xwqwXN/5QXnmKlXA4V6pW2nXrnTf3+H4WeVEhISFEW5cuWK+0uXL1+OiorSdlAUxWw2OxyOxsZG7cdJkyZpf6iurjYYDLNnz546derUqVNvuumm8ePHX7lyxWw2FxcX//nPf05OTv7pT3969OjRPsePjo5WFMVms2k/trS09Pb2xsTEvP32285PurnuX15eXl1dvWbNmoG6drgbwvmgquqdd95pNpubmprsdvuyZctuv/321tZW0Xxwt3jx4nnz5k2bNi0vL+/ll1/et2+fnlQgCFGv4Ip6FaSGtlMbDNp7vU899VSf7b29vX268vLy8sjISGdXfvDgQW37999/P2rUKKvVKjpFW1vbH//4x7i4OO39Y1eJiYl/+9vftD8fOXLE83v/y5cvf/DBB327vADSc3+DZ24M4XxoaGhQ3N7g+Pzzz0XHcf8tzdV77703YcIET5caQMFzf+UF55ipVwOFeqVtp165G4BuZ2hPP0j+8Y9/jBkz5rnnnqusrHQ4HGfPns3Pzz9x4kRvb+/cuXMfeuihlpaWurq6+fPnP/LII1qI61RTVfXuu+9esmTJ1atXVVWtr69///33VVX97rvvysrKHA6HqqpvvPFGYmKie+kpKirKysqqqqqyWCwLFixYsWKFaJD19fURERHB+QFJzfAoPeqQzocpU6asW7fOZrO1t7e/8MILRqOxqanJfYTd3d3t7e3FxcVJSUnt7e3aMXt6evbu3VtdXW21Wo8cOZKRkeH8aMKQC6r7Kylox0y9GhDUK+cRqFd90CoJHT9+/O67746NjR03btyNN9744osvav9Y4PLly7m5uSaTaeLEifn5+Xa7Xdu/z1SzWq0FBQVTp041Go1ms/nJJ59UVfXUqVO33XZbTExMXFzcnDlzjh075n7ezs7OJ554IjY21mg0rlixwmaziUa4Y8eOoP2ApGbYlB516ObDmTNnFi9eHBcXFxMTM2/ePNHfNK+//rrrs96oqChVVXt6eu688874+PiIiAiz2fzss8+2tbUNeGb8E2z3V0Ywj5l6pR/1yhlOvepD//01OI/iB+2dSz1HQDDTc3+ZG8NbKN7fUBwz5FGvIKL//g7Dj3UDAAAMFFolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAoXD9h/D6vw1jxGJuINgwJyHC3IAIT5UAAACEdP0fcAAAAMMbT5UAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEdH1bN99tOhL4981bzI2RILS+lY05ORJQryCip17xVAkAAEBoAP4POD1dPLHBH6tHKF4vsfKxoSgU80ysfKweoXi9xMrH6sFTJQAAACFaJQAAACFaJQAAAKFBaZW6u7sLCgomTJgQExOzcuXK5uZm+diNGzdmZ2ePGzcuLS1t06ZNnZ2dfpx95syZBoOhrq7Op8B///vfc+bMGTNmTEJCwqZNm+QDLRbLsmXLTCZTbGzsXXfdde7cOc/7FxcX5+TkREVFJScn9xm517yJYmXyJop1nt2/vPnE8xg8KyoqSk9Pj4yMjI+Pv++++77//nv52DVr1hhclJaWyscajUbX2MjIyI6ODsnYy5cv5+XlxcfHT5gw4fe//73XQFF+ZPIm2kcmb6JYPXkLZh7y6bUOiGJl6oBoncqsfVGszNr3vI/nte8h1muuRLEyuRLNWz1/v8gQ3V+ZOiCKlakDolzJrH1RrMzaF8XKrH1RrEyuRLEyuRJdl56/X7xQdRAdoaioKDMzs7Ky0mKxzJ8/f8WKFfKxa9euPXbsWGNj44kTJyZPnrx582b5WM327dsXLVqkKEptba18bFlZmdFo/Otf/1pXV1dTU3Ps2DH52AceeOAXv/jFjz/+aLfbV69efeONN3qO/eijj959990dO3YkJSW57iPKm0ysKG8ysRr3vOmZIaJYz2PwHPv5559XVlY2NzdXVVXdf//9OTk58rGrV68uLCxs+Z+uri75WLvd7gzMzc1dvny5fOxtt9324IMP2my2q1evzp0798knn/QcK8qPaLtMrChvMrGivOmvHoEnc72iOiATK6oDrrGidSqz9kWxMmvfc131vPZFsTK5EsXK5Eo0b2Vy5SuZ+yuqAzKxojogkyuZtS+KlVn7oliZtS+KlcmVKFYmV6LrksmVfwalVUpMTHzrrbe0P5eXl4eHh1+7dk0y1tWWLVsWLFggf15VVb/55puMjIwvvvhC8bFVysnJeeaZZzyPRxSbkZGxd+9e7c/l5eVhYWHd3d1eY0tKSvrcTlHeZGJdueZNMrbfvA1U6XHnefxez9vZ2Zmfn3/PPffIx65evdrv++vU0NAQGRl5+PBhydgrV64oilJRUaH9ePDgQaPR2NHR4TVWlB/37T7NjT55k4kV5U1/6Qk8mesV1QGZWFEdEOXKdZ3Kr333WNF2yVif1r5rrHyu3GN9ylWfeetrrmT4tI761AGvsR7qgPz9lVn7olhVYu27x/q69vs9r9dc9Yn1NVf9/l0gnyt5A/8GXF1dXX19/cyZM7UfZ82a1d3d/e233/pxqOPHj8+aNUt+/56ent/97nevvfZadHS0TydyOByff/55T0/PddddFxcXt2jRoq+++ko+PC8vr6SkpL6+vrm5+c033/zNb34zatQonwaghGbeAq+4uDg5OTk6Ovrrr7/++9//7mvs5MmTb7311h07dnR1dflx9rfffjstLe3nP/+55P7OJepkt9t9et9woAxt3kJFgOuAc536sfZFa1xm7bvu4+vad8b6kSvX80rmyn3eDmCd9FsA6oCvNdxDrE9r3z1Wfu33O2bJXDlj5XOlp6b5Q0+f1e8Rzp8/ryhKVVXV/2/HwsI+/vhjmVhXW7ZsSU9Pb2xslDyvqqo7d+5cunSpqqrfffed4stTpdraWkVR0tPTz549a7fbN2zYkJKSYrfbJc9rs9kWLlyovXrdddfV1NTInLdP5+shb15jXfXJm0ysKG96ZojnWL+fKrW1tV29evXYsWMzZ85cu3atfOyhQ4c+/fTTCxcu7N+/PyUlpbCw0Ncxq6qamZm5c+dOn8Z86623Oh8mz5s3T1GUzz77zGvsgD9V6jdvMrGivOmvHoHn9Xo91AGZXInqQL+5cl2nPq19VVwbva599318WvuusT7lyv28krlyn7e+5kqSTzW2Tx2QiRXVAfn7K/mkxD1Wcu27x/q09kVz0muu3GMlc+Xh74LBeKo08K2StoROnz6t/ah95u7EiRMysU7PP/+82Wyurq6WP++FCxcmTZpUV1en+t4qtbS0KIqyY8cO7cf29vZRo0YdPXpUJra3t3f27NkPP/xwU1OT3W7funVrWlqaTJvVb5nuN2/yy9g9b15jPeRtYEuPzPjlz3vs2DGDwdDa2upH7L59+xITE3097+HDhyMiIhoaGnwa88WLF3Nzc5OSktLT07du3aooyvnz573GDtIbcOr/zZuvsa550196As/r9XqoA15jPdQB99g+69SntS+qjTJrv88+Pq39PrE+5apPrE+50jjnrU+5kie/FtzrgEysqA7I31+Zte/5703Pa99zrOe1L4qVyZV7rHyu3K9LExpvwCUnJycmJn755Zfaj6dOnQoPD8/OzpY/wubNm/ft23f06NEpU6bIRx0/fryxsfH66683mUxaK3r99de/+eabMrFGo3HatGnOL/T06Zs9f/zxx//85z8FBQVxcXFRUVFPPfVUTU3N2bNn5Y+gCcW8Da1Ro0b58UanoigRERHd3d2+Ru3Zsyc3N9dkMvkUlZaW9sEHH9TV1VVVVaWmpqakpEybNs3XUw+sAOcthASmDrivU/m1L1rjMmvffR/5te8eK58r91j/aqY2b/XXSZ0GtQ74V8PlY0Vr32ush7XvIdZrrvqN9aNm+l3TfKCnzxIdoaioKCsrq6qqymKxLFiwwKd/AffEE09Mnz69qqqqvb29vb3d/TOwotjW1tZL/3PkyBFFUU6dOiX/Jtqrr75qNpvPnTvX3t7+hz/8YfLkyfJPLKZMmbJu3Tqbzdbe3v7CCy8YjcampiYPsd3d3e3t7cXFxUlJSe3t7Q6HQ9suyptMrChvXmM95E3PDBHFisbvNbazs/PFF1+sqKiwWq1ffPHFrbfempeXJxnb09Ozd+/e6upqq9V65MiRjIyMRx99VH7MqqrW19dHRET0+4Fuz7EnT5784YcfGhsbDxw4kJCQ8Pbbb3uOFeVHtN1rrIe8eY31kDf91SPwZPIsqgMysaI64BorWqcya18UK7P2+91Hcu2Lji+TK1Gs11x5mLcyuRqMuaEK6oBMrKgOyORKZu33Gyu59vuNlVz7Hv6+9porUazXXHm4Lplc+WdQWqXOzs4nnngiNjbWaDSuWLHCZrNJxl67dk35vzIyMuTP6+TrG3Cqqvb29m7ZsiUpKSkmJuaOO+74+uuv5WPPnDmzePHiuLi4mJiYefPmef0XUq+//rrrNUZFRWnbRXnzGushbzLnFeVNz/QSxXodgyi2q6vrvvvuS0pKioiImDp16saNG+XnVU9Pz5133hkfHx8REWE2m5999tm2tjb5MauqumPHjunTp/f7kufYXbt2JSYmjh49Ojs7u7i42GusKD+i7V5jPeTNa6yHvOmZG0NFJs+iOiATK6oDzlgP69Tr2hfFyqx9mboqWvseYr3mykOs11x5mLcyddJXMvdXFdQBmVhRHZDJlde1L4qVWfuiWJm173leec6Vh1ivufJwXTJ10j8G51H8ELr/bR6xxBI7VLFDJRRzRSyxxA5trIb/2AQAAECIVgkAAECIVgkAAECIVgkAAEBoAD7WjeFNz8foMLyF4se6MbxRryDCx7oBAAAGha6nSgAAAMMbT5UAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACE/h82xQH7rLtt0wAAAABJRU5ErkJggg==" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 ./application + +### Core Bound + +Note: With this command the tasks will be bound to a core for the entire +runtime of your application. + +#### Distribution: block:block + +This method allocates the tasks linearly to the cores. + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAw4AAADeCAIAAAAb9sCoAAAABmJLR0QA/wD/AP+gvaeTAAAe5UlEQVR4nO3dfVRUdf7A8TuIoDIgyPCgIMigYPS0ZqZirrvZ6la7tYEPeWzV9mjJVqS0mZ1DanXOVnqq43HPtq7WFsezUJm2e07bcU+IerQ6ratZVqhBiA8wQDgDAwyP9/fH/TW/+TF8Z74zw8MdeL/+gjv3c7/fO/fz/fDhzjAYVFVVAAAA0JeQoZ4AAACAftEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACIUGEmwwGPprHgCCjqqqQz0FH1CvgJEskHrFXSUAAAChgO4qaYLrN0sAgQveOzTUK2CkCbxecVcJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFZp5PrZz35mMBg++eQT55bExMQPPvhA/ghffPGF0WiU37+oqCg7OzsiIiIxMdGHiQIY8Qa/Xm3cuDErK2vcuHEpKSmbNm3q6OjwYboYXmiVRrTY2Ninnnpq0IYzmUwbNmzYtm3boI0IYNgY5Hplt9t379596dKlkpKSkpKSrVu3DtrQ0BtapRFt7dq1FRUV77//vvtDNTU1S5cujY+PT05Ofuyxx1pbW7Xtly5dWrx4cXR09A033HDixAnn/k1NTXl5eZMnT46Li3vggQcaGhrcj3n33XcvW7Zs8uTJA3Q6AIaxQa5Xe/bsmT9/fmxsbHZ29kMPPeQajpGGVmlEMxqN27Zte+aZZzo7O3s9lJubO3r06IqKipMnT546daqgoEDbvnTp0uTk5Nra2n/9619/+ctfnPuvXLnSYrGcPn26urp6/Pjxa9asGbSzADASDGG9On78+MyZM/v1bBBU1AAEfgQMoQULFrzwwgudnZ3Tp0/ftWuXqqoJCQkHDx5UVbW8vFxRlLq6Om3P0tLSMWPGdHd3l5eXGwyGxsZGbXtRUVFERISqqpWVlQaDwbm/zWYzGAxWq7XPcYuLixMSEgb67DCggnHtB+Oc4TRU9UpV1S1btqSlpTU0NAzoCWLgBL72Qwe7NYPOhIaGvvTSS+vWrVu1apVz4+XLlyMiIuLi4rRvzWazw+FoaGi4fPlybGxsTEyMtn3atGnaF1VVVQaDYdasWc4jjB8//sqVK+PHjx+s8wAw/A1+vXr++ef37dtXVlYWGxs7UGcF3aNVgnLfffe98sorL730knNLcnJyS0tLfX29Vn2qqqrCw8NNJlNSUpLVam1vbw8PD1cUpba2Vts/JSXFYDCcOXOG3gjAgBrMerV58+YDBw4cPXo0OTl5wE4IQYD3KkFRFGXHjh07d+5sbm7Wvs3IyJgzZ05BQYHdbrdYLIWFhatXrw4JCZk+ffqMGTNee+01RVHa29t37typ7Z+enr5o0aK1a9fW1NQoilJfX79//373Ubq7ux0Oh/Y+A4fD0d7ePkinB2AYGZx6lZ+ff+DAgUOHDplMJofDwYcFjGS0SlAURZk9e/Y999zj/LMRg8Gwf//+1tbWtLS0GTNm3HTTTa+++qr20HvvvVdaWnrLLbfccccdd9xxh/MIxcXFkyZNys7OjoyMnDNnzvHjx91H2bNnz9ixY1etWmWxWMaOHcsNbQB+GIR6ZbVad+3adeHCBbPZPHbs2LFjx2ZlZQ3O2UGHDM53PPkTbDAoihLIEQAEo2Bc+8E4ZwCBC3ztc1cJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAiFYJAABAKHSoJwAAg6eysnKopwAgyBhUVfU/2GBQFCWQIwAIRsG49rU5AxiZAqlX/XBXiQIEQP/MZvNQTwFAUOqHu0oARqbguqsEAP4JqFUCAAAY3vgLOAAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAAKGAPoKSz1UaCfz7OAlyYyQIro8aISdHAuoVRAKpV9xVAgAAEOqHf2wSXL9ZQl7gv2mRG8NV8P4WTk4OV9QriASeG9xVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEBr+rdI333zz61//2mQyjRs3bvr06U8//bQfB5k+ffoHH3wgufNPfvKTkpKSPh8qKirKzs6OiIhITEz0YxroX7rKjY0bN2ZlZY0bNy4lJWXTpk0dHR1+TAbBTlc5Sb3SFV3lxkirV8O8Verp6fnlL385adKkr776qqGhoaSkxGw2D+F8TCbThg0btm3bNoRzgEZvuWG323fv3n3p0qWSkpKSkpKtW7cO4WQwJPSWk9Qr/dBbboy4eqUGIPAjDLRLly4pivLNN9+4P3T16tUlS5bExcUlJSU9+uijLS0t2vZr167l5eWlpKRERkbOmDGjvLxcVdXMzMyDBw9qjy5YsGDVqlUdHR02m239+vXJyckmk2n58uX19fWqqj722GOjR482mUypqamrVq3qc1bFxcUJCQkDdc79J5DrS274lxuaLVu2zJ8/v//Puf/o//q60/+c9ZmT1Cs90GduaEZCvRrmd5UmTZqUkZGxfv36d955p7q62vWh3Nzc0aNHV1RUnDx58tSpUwUFBdr2FStWXLx48dNPP7VarW+//XZkZKQz5OLFi/Pmzbv99tvffvvt0aNHr1y50mKxnD59urq6evz48WvWrFEUZdeuXVlZWbt27aqqqnr77bcH8VzhGz3nxvHjx2fOnNn/5wx903NOYmjpOTdGRL0a2k5tEFgsls2bN99yyy2hoaFTp04tLi5WVbW8vFxRlLq6Om2f0tLSMWPGdHd3V1RUKIpy5cqVXgfJzMx89tlnk5OTd+/erW2prKw0GAzOI9hsNoPBYLVaVVW9+eabtVFE+C1NJ3SYG6qqbtmyJS0traGhoR/PtN8FxfXtJSjmrMOcpF7phA5zQx0x9Wr4t0pOzc3Nr7zySkhIyJdffvnxxx9HREQ4H/r+++8VRbFYLKWlpePGjXOPzczMTEhImD17tsPh0LYcPnw4JCQk1UV0dPTXX3+tUnoCjh18+smN5557zmw2V1VV9ev59b/gur6a4JqzfnKSeqU3+smNkVOvhvkLcK6MRmNBQcGYMWO+/PLL5OTklpaW+vp67aGqqqrw8HDtRdnW1taamhr38J07d8bFxd17772tra2KoqSkpBgMhjNnzlT96Nq1a1lZWYqihISMoGd1eNBJbmzevHnfvn1Hjx5NTU0dgLNEMNFJTkKHdJIbI6peDfNFUltb+9RTT50+fbqlpaWxsfHFF1/s7OycNWtWRkbGnDlzCgoK7Ha7xWIpLCxcvXp1SEhIenr6okWLHn744ZqaGlVVz54960y18PDwAwcOREVF3XXXXc3Nzdqea9eu1Xaor6/fv3+/tmdiYuK5c+f6nE93d7fD4ejs7FQUxeFwtLe3D8rTgD7oLTfy8/MPHDhw6NAhk8nkcDiG/R/fwp3ecpJ6pR96y40RV6+G9qbWQLPZbOvWrZs2bdrYsWOjo6PnzZv34Ycfag9dvnw5JyfHZDJNnDgxLy/Pbrdr2xsbG9etW5eUlBQZGXnLLbecO3dOdfmrga6urt/+9re33XZbY2Oj1WrNz8+fMmWK0Wg0m81PPPGEdoQjR45MmzYtOjo6Nze313xef/111yff9capDgVyfckNn3Lj2rVrvRZmenr64D0XvtP/9XWn/znrKidV6pWe6Co3RmC9MjiP4geDwaAN7/cRoGeBXF9yY3gLxusbjHOGPOoVRAK/vsP8BTgAAIBA0CoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAIhQZ+CIPBEPhBMCyRG9AbchIi5AZEuKsEAAAgZFBVdajnAAAAoFPcVQIAABCiVQIAABCiVQIAABCiVQIAABCiVQIAABCiVQIAABCiVQIAABAK6NO6+WzTkcC/T94iN0aC4PpUNnJyJKBeQSSQesVdJQAAAKF++B9wwfWbJeQF/psWuTFcBe9v4eTkcEW9gkjgucFdJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAAKFh2yqdOHHinnvumTBhQkRExI033lhYWNjS0jII43Z1deXn50+YMCEqKmrlypVNTU197mY0Gg0uwsPD29vbB2F6I9ZQ5YPFYlm2bJnJZIqOjl68ePG5c+f63K2oqCg7OzsiIiIxMdF1+5o1a1zzpKSkZBDmjMFHvYIr6pXeDM9W6Z///OfChQtvvvnmTz/9tK6ubt++fXV1dWfOnJGJVVW1s7PT76Gfe+65Q4cOnTx58rvvvrt48eL69ev73M1isTT/KCcn5/777w8PD/d7UHg2hPmQl5dntVrPnz9/5cqViRMnLl26tM/dTCbThg0btm3b5v5QQUGBM1WWLFni90ygW9QruKJe6ZEagMCPMBC6u7uTk5MLCgp6be/p6VFV9erVq0uWLImLi0tKSnr00UdbWlq0RzMzMwsLC2+//faMjIyysjKbzbZ+/frk5GSTybR8+fL6+nptt1dffTU1NXX8+PETJ0584YUX3EePj49/8803ta/LyspCQ0OvXbvmYbb19fXh4eGHDx8O8KwHQiDXVz+5MbT5kJ6evnfvXu3rsrKykJCQrq4u0VSLi4sTEhJct6xevfrpp5/299QHkH6urzx9zpl61V+oV9QrkX7odoZ2+IGgdd+nT5/u89G5c+euWLGiqamppqZm7ty5jzzyiLY9MzPzhhtuaGho0L791a9+df/999fX17e2tj788MP33HOPqqrnzp0zGo0XLlxQVdVqtf73v//tdfCamhrXobW72SdOnPAw2x07dkybNi2A0x1Aw6P0DGE+qKq6adOmhQsXWiwWm8324IMP5uTkeJhqn6Vn4sSJycnJM2fOfPnllzs6Onx/AgaEfq6vPH3OmXrVX6hX1CsRWqU+fPzxx4qi1NXVuT9UXl7u+lBpaemYMWO6u7tVVc3MzPzTn/6kba+srDQYDM7dbDabwWCwWq0VFRVjx4599913m5qa+hz6/PnziqJUVlY6t4SEhHz00UceZpuRkbFjxw7fz3IwDI/SM4T5oO28YMEC7dm47rrrqqurPUzVvfQcOnTok08+uXDhwv79+5OSktx/1xwq+rm+8vQ5Z+pVf6FeadupV+4Cv77D8L1KcXFxiqJcuXLF/aHLly9HRERoOyiKYjabHQ5HQ0OD9u2kSZO0L6qqqgwGw6xZs6ZMmTJlypSbbrpp/PjxV65cMZvNRUVFf/7znxMTE3/6058ePXq01/EjIyMVRbHZbNq3zc3NPT09UVFRb731lvOdbq77l5WVVVVVrVmzpr/OHe6GMB9UVb3zzjvNZnNjY6Pdbl+2bNntt9/e0tIiygd3ixYtmjt37tSpU3Nzc19++eV9+/YF8lRAh6hXcEW90qmh7dQGgvZa75NPPtlre09PT6+uvKysLDw83NmVHzx4UNv+3XffjRo1ymq1ioZobW394x//GBMTo71+7Co+Pv5vf/ub9vWRI0c8v/a/fPnyBx54wLfTG0SBXF/95MYQ5kN9fb3i9gLHZ599JjqO+29prt59990JEyZ4OtVBpJ/rK0+fc6Ze9RfqlbadeuWuH7qdoR1+gPzjH/8YM2bMs88+W1FR4XA4zp49m5eXd+LEiZ6enjlz5jz44IPNzc21tbXz5s17+OGHtRDXVFNV9a677lqyZMnVq1dVVa2rq3vvvfdUVf32229LS0sdDoeqqnv27ImPj3cvPYWFhZmZmZWVlRaLZf78+StWrBBNsq6uLiwsTJ9vkNQMj9KjDmk+pKamrlu3zmaztbW1Pf/880ajsbGx0X2GXV1dbW1tRUVFCQkJbW1t2jG7u7v37t1bVVVltVqPHDmSnp7ufGvCkNPV9ZWk2zlTr/oF9cp5BOpVL7RKQsePH7/rrruio6PHjRt34403vvjii9ofC1y+fDknJ8dkMk2cODEvL89ut2v790o1q9Wan58/ZcoUo9FoNpufeOIJVVVPnTp12223RUVFxcTEzJ49+9ixY+7jdnR0PP7449HR0UajccWKFTabTTTD7du36/YNkpphU3rUocuHM2fOLFq0KCYmJioqau7cuaKfNK+//rrrvd6IiAhVVbu7u++8887Y2NiwsDCz2fzMM8+0trb2+zPjH71dXxl6njP1KnDUK2c49aqXwK+vwXkUP2ivXAZyBOhZINeX3BjegvH6BuOcIY96BZHAr+8wfFs3AABAf6FVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEAoN/BBe/9swRixyA3pDTkKE3IAId5UAAACEAvofcAAAAMMbd5UAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEAvq0bj7bdCTw75O3yI2RILg+lY2cHAmoVxAJpF5xVwkAAECoH/4HXCBdPLH6jw1EMJ4vsfKxwSgYn2di5WMDEYznS6x8bCC4qwQAACBEqwQAACBEqwQAACA0IK1SV1dXfn7+hAkToqKiVq5c2dTUJB9bVFSUnZ0dERGRmJjo67gbN27MysoaN25cSkrKpk2bOjo65GMLCwvT0tLCw8NjY2Pvvffe7777ztfRu7q6ZsyYYTAYamtr5aPWrFljcFFSUuLToP/+979nz549ZsyYuLi4TZs2yQcajUbXccPDw9vb230a2j8Wi2XZsmUmkyk6Onrx4sXnzp2Tj718+XJubm5sbOyECRN+//vfe52wKJdk8lMUK5Ofon1k8lMUK5OfnufmOT9FsQHmp255eK68rilRrMyaEsXKrAtRrMy6EOWezFoQxcqsBVGszFoQ7RN4rfbM89w8ryNRrMw68jCu15wUxcrkpChWJidFsTI5KbqOMjkpig2kf/BCDYDoCIWFhRkZGRUVFRaLZd68eStWrJCP/fDDD995553t27cnJCT4Ou7atWuPHTvW0NBw4sSJyZMnb968WT72s88+q6ioaGpqqqysvO+++7Kzs+VjNS+88MLChQsVRampqZGPXb16dUFBQfOPOjs75WNLS0uNRuNf//rX2tra6urqY8eOycfa7XbnoDk5OcuXL5ePlSGKvf/++3/xi1/88MMPdrt99erVN954o3zsbbfd9sADD9hstqtXr86ZM+eJJ57wHCvKJVF+ysSKtsvEivJTJlaUnzKxGvf8lIkV5Wfg1WPwyZyvaE3JxIrWlEysaF3IxIrWhWusKPdk1oIoVmYtiGJl1oJoH5m14CuZcTWe15EoVmYdiWJlclIUK5OToliZnBTFyuSk6DrK5KQoViYn/TMgrVJ8fPybb76pfV1WVhYaGnrt2jXJWE1xcbEfrZKrLVu2zJ8/34/Yjo6OvLy8u+++26fYr7/+Oj09/fPPP1d8b5WefvppD/PxEJudne13rFN9fX14ePjhw4f9iPVj3PT09L1792pfl5WVhYSEdHV1ycReuXJFUZTy8nLt24MHDxqNxvb2dq+x7rkkyk+ZWNF2+ViNa376FNsrPyVj+8xPmVhRfgZeegafzPmK1pRP16jXmpKJFa0Lr7Ee1oXoGrnmnvxacI8VnYt8rPt2n2K9rgV5kuNKriP3WF/XkWusfE72OWeN15x0j5XPyV6xvuZkr+voU072+fNaPifl9f8LcLW1tXV1dTNmzNC+nTlzZldX1zfffNPvA3l2/PjxmTNn+hRSVFSUmJgYGRn51Vdf/f3vf5cP7O7u/t3vfvfaa69FRkb6OM3/HXfy5Mm33nrr9u3bOzs7JaMcDsdnn33W3d193XXXxcTELFy48Msvv/Rj9LfeeislJeXnP/+5H7F+yM3NLS4urqura2pqeuONN37zm9+MGjVKJtCZ7k52u92Pe+/kp6/8y89gNIRrajDXhTP3/FgLfuSt11iZY/bax++14CvXcX1dR+5zll9Hzlg/crLP51MyJ11jfc1JZ6x8TrpfR/mcHLQc+F+B9Fl9HuH8+fOKolRWVv5fOxYS8tFHH8nEOgV4V2nLli1paWkNDQ0+xba2tl69evXYsWMzZsxYu3atfOyOHTuWLl2qquq3336r+HhX6dChQ5988smFCxf279+flJRUUFAgGVtTU6MoSlpa2tmzZ+12+4YNG5KSkux2u/z5ajIyMnbs2NHnQ4FkiCjWZrMtWLBAe/S6666rrq6Wj7311ludN3Xnzp2rKMqnn37qNbZXLnnIT6+xHrbLx6pu+SkZ22d+ysSK8lMmVpSfgVePwef1fD2sKZ+ub681JRMrWhcysaJ10ec1cs09n9aCKqirkr/Bi2qy17XQZ6zkWpAnM678OnKP9Wkducb6lJPu4zp5zUn3WPmcdI+VzEn36yifkx5+Xg/EXaX+b5W0S3v69GntW+09WSdOnJCJdQqkVXruuefMZnNVVZUfsZpjx44ZDIaWlhaZ2AsXLkyaNKm2tlb1q1VytW/fvvj4eMnY5uZmRVG2b9+ufdvW1jZq1KijR4/6NO7hw4fDwsLq6+v7fLTfS09PT8+sWbMeeuihxsZGu92+devWlJQU+fbu4sWLOTk5CQkJaWlpW7duVRTl/PnzXmP7/HHYZ37K/FgSbZePdc9P+ViNa356jfWQn76O65qfgZeewef1fD2sKfnnyn1NeY31sC5kxhWtC/fYXrnn01oQ1VWZtSCKlVkLnuu557Ugz+u4Pq0jz3P2vI56xfqUk6JxZXKyV6xPOek+rnxOapzX0aec7BXr3BIcL8AlJibGx8d/8cUX2renTp0KDQ3Nysrq94H6tHnz5n379h09ejQ1NTWQ44waNUryBvjx48cbGhquv/56k8mktc/XX3/9G2+84cegYWFhXV1dkjsbjcapU6c6P4TUv08j3b17d05Ojslk8iPWDz/88MN//vOf/Pz8mJiYiIiIJ598srq6+uzZs5LhKSkp77//fm1tbWVlZXJyclJS0tSpU32dA/k5OPkZjIZqTQ3OunDPPfm1EEjeimJljimzj/xakOc+rvw68jpnD+vIPVY+Jz2M6zUn3WPlc7LPcf2o1dp19K8+D0QO9BZInyU6QmFhYWZmZmVlpcVimT9/vk9/AdfV1dXW1lZUVJSQkNDW1uZwOORjH3/88WnTplVWVra1tbW1tbm/51cU29HR8eKLL5aXl1ut1s8///zWW2/Nzc2VjG1pabn0oyNHjiiKcurUKck7Jd3d3Xv37q2qqrJarUeOHElPT3/kkUfkz/fVV181m83nzp1ra2v7wx/+MHnyZMk7YZq6urqwsLA+39DtNdYrUWxqauq6detsNltbW9vzzz9vNBobGxslY0+ePPn99983NDQcOHAgLi7urbfe8jyuKJdE+SkTK9ouEyvKT6+xHvLTa6yH/PQa6yE/A68eg0/mGonWlEysKlhTMrGidSETK1oXrrGi3JNZC6JYmbUgipVZC33uI7kWfOV1XMl11Ges5DoSPScyOenhZ5/XnBTFyuSkKNZrTnq4jl5z0kOsTE76Z0BapY6Ojscffzw6OtpoNK5YscJms8nHvv7664qLiIgIydhr164p/196erpkbGdn57333puQkBAWFjZlypSNGzf6NGcnX1+A6+7uvvPOO2NjY8PCwsxm8zPPPNPa2io/bk9Pz5YtWxISEqKiou64446vvvrKpzlv37592rRpHk6nv0qPqzNnzixatCgmJiYqKmru3Lk+/eXdzp074+PjR48enZWVVVRU5HVcUS6J8lMmVrTda6yH/PQa6yE/Zebs5OGFgz5jPeRnILkxVGSeK9Gaknye+1xTMrGidSETK1oXzlgPued1LXiI9boWRLEya0G0j+Ra8JXM+TqJ1pEoVmYdeRjXa056nrPnnPQQ6zUnPcR6zUkP19FrTnqIlanP/jE4j+KH4P23ecQSS+xQxQ6VYHyuiCWW2KGN1fCPTQAAAIRolQAAAIRolQAAAIRolQAAAIT64W3dGN4CeRsdhrdgfFs3hjfqFUR4WzcAAMCACOiuEgAAwPDGXSUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAAAhWiUAAACh/wGLggH7ga71+AAAAABJRU5ErkJggg==" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 --cpu_bind=cores --distribution=block:block ./application + +#### Distribution: cyclic:cyclic + +--distribution=cyclic:cyclic will allocate your tasks to the cores in a +round robin approach. It starts with the first socket of the first node, +then the first socket of the second node until one task is placed on +every first socket of every node. After that it will place a task on +every second socket of every node and so on. + +\<img alt="" +src="<data:;base64,iVBORw0KGgoAAAANSUhEUgAAAw4AAADeCAIAAAAb9sCoAAAABmJLR0QA/wD/AP+gvaeTAAAfCElEQVR4nO3de1BU5/348bOIoLIgyHJREGRRMORWY4yKWtuYapO0SQNe4piq6aiRJqKSxugMURNnmkQnyTh2am1MmjBOIYnRtDNpxk4QdTTJpFZjNAYvEMQLLBDchQWW6/n+cab748fuczi7y8JZeL/+kt3zec5zPs+Hxw9nl8Ugy7IEAAAAd4IGegIAAAD6RasEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgFOxLsMFg6Kt5AAg4siwP9BQ8wH4FDGW+7FfcVQIAABDy6a6SIrB+sgTgu8C9Q8N+BQw1vu9X3FUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUaun72s58ZDIYvvvjC+Uh8fPwnn3yifYRvvvnGaDRqP76goCAzMzMsLCw+Pt6DiQIY8vp/v9q4cWNGRsaoUaOSkpI2bdrU1tbmwXQxuNAqDWnR0dEvvPBCv53OZDJt2LBh+/bt/XZGAINGP+9Xdrt93759169fLyoqKioq2rZtW7+dGnpDqzSkrVq1qqys7OOPP3Z9qqqqatGiRbGxsYmJic8991xzc7Py+PXr1xcsWBAZGXnXXXedOnXKeXxDQ0NOTs748eNjYmKefPLJuro61zEfeeSRxYsXjx8/3k+XA2AQ6+f96u23354zZ050dHRmZubTTz/dPRxDDa3SkGY0Grdv375ly5b29vYeT2VnZw8fPrysrOz06dNnzpzJy8tTHl+0aFFiYmJ1dfW//vWvv/zlL87jly1bZrFYzp49W1lZOXr06JUrV/bbVQAYCgZwvzp58uTUqVP79GoQUGQf+D4CBtDcuXN37NjR3t4+efLkPXv2yLIcFxd3+PBhWZZLS0slSaqpqVGOLC4uHjFiRGdnZ2lpqcFgqK+vVx4vKCgICwuTZbm8vNxgMDiPt9lsBoPBarW6PW9hYWFcXJy/rw5+FYjf+4E4ZzgN1H4ly/LWrVtTUlLq6ur8eoHwH9+/94P7uzWDzgQHB7/22murV69evny588EbN26EhYXFxMQoX5rNZofDUVdXd+PGjejo6KioKOXxSZMmKf+oqKgwGAzTpk1zjjB69OibN2+OHj26v64DwODX//vVK6+8cuDAgZKSkujoaH9dFXSPVgnS448//sYbb7z22mvORxITE5uammpra5Xdp6KiIjQ01GQyJSQkWK3W1tbW0NBQSZKqq6uV45OSkgwGw7lz5+iNAPhVf+5XmzdvPnTo0PHjxxMTE/12QQgAvFcJkiRJu3bt2r17d2Njo/JlWlrajBkz8vLy7Ha7xWLJz89fsWJFUFDQ5MmTp0yZ8tZbb0mS1Nraunv3buX41NTU+fPnr1q1qqqqSpKk2tragwcPup6ls7PT4XAo7zNwOBytra39dHkABpH+2a9yc3MPHTp05MgRk8nkcDj4sIChjFYJkiRJ06dPf/TRR52/NmIwGA4ePNjc3JySkjJlypR77rnnzTffVJ766KOPiouL77vvvgcffPDBBx90jlBYWDhu3LjMzMzw8PAZM2acPHnS9Sxvv/32yJEjly9fbrFYRo4cyQ1tAF7oh/3KarXu2bPnypUrZrN55MiRI0eOzMjI6J+rgw4ZnO948ibYYJAkyZcRAASiQPzeD8Q5A/Cd79/73FUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQCh7oCQBA/ykvLx/oKQAIMAZZlr0PNhgkSfJlBACBKBC/95U5AxiafNmv+uCuEhsQAP0zm80DPQUAAakP7ioBGJoC664SAHjHp1YJAABgcOM34AAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIR8+ghKPldpKPDu4ySojaEgsD5qhJocCtivIOLLfsVdJQAAAKE++MMmgfWTJbTz/SctamOwCtyfwqnJwYr9CiK+1wZ3lQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIRolQAAAIQGf6t08eLFX//61yaTadSoUZMnT37xxRe9GGTy5MmffPKJxoN/8pOfFBUVuX2qoKAgMzMzLCwsPj7ei2mgb+mqNjZu3JiRkTFq1KikpKRNmza1tbV5MRkEOl3VJPuVruiqNobafjXIW6Wurq5f/vKX48aNO3/+fF1dXVFRkdlsHsD5mEymDRs2bN++fQDnAIXeasNut+/bt+/69etFRUVFRUXbtm0bwMlgQOitJtmv9ENvtTHk9ivZB76P4G/Xr1+XJOnixYuuT926dWvhwoUxMTEJCQnPPvtsU1OT8vjt27dzcnKSkpLCw8OnTJlSWloqy3J6evrhw4eVZ+fOnbt8+fK2tjabzbZ27drExESTybRkyZLa2lpZlp977rnhw4ebTKbk5OTly5e7nVVhYWFcXJy/rrnv+LK+1IZ3taHYunXrnDlz+v6a+47+19eV/uesz5pkv9IDfdaGYijsV4P8rtK4cePS0tLWrl37wQcfVFZWdn8qOzt7+PDhZWVlp0+fPnPmTF5envL40qVLr1279uWXX1qt1vfffz88PNwZcu3atVmzZs2ePfv9998fPnz4smXLLBbL2bNnKysrR48evXLlSkmS9uzZk5GRsWfPnoqKivfff78frxWe0XNtnDx5curUqX1/zdA3PdckBpaea2NI7FcD26n1A4vFsnnz5vvuuy84OHjixImFhYWyLJeWlkqSVFNToxxTXFw8YsSIzs7OsrIySZJu3rzZY5D09PSXXnopMTFx3759yiPl5eUGg8E5gs1mMxgMVqtVluV7771XOYsIP6XphA5rQ5blrVu3pqSk1NXV9eGV9rmAWN8eAmLOOqxJ9iud0GFtyENmvxr8rZJTY2PjG2+8ERQU9O23337++edhYWHOp3744QdJkiwWS3Fx8ahRo1xj09PT4+Lipk+f7nA4lEeOHj0aFBSU3E1kZOR3330ns/X4HNv/9FMbL7/8stlsrqio6NPr63uBtb6KwJqzfmqS/Upv9FMbQ2e/GuQvwHVnNBrz8vJGjBjx7bffJiYmNjU11dbWKk9VVFSEhoYqL8o2NzdXVVW5hu/evTsmJuaxxx5rbm6WJCkpKclgMJw7d67if27fvp2RkSFJUlDQEMrq4KCT2ti8efOBAweOHz+enJzsh6tEINFJTUKHdFIbQ2q/GuTfJNXV1S+88MLZs2ebmprq6+tfffXV9vb2adOmpaWlzZgxIy8vz263WyyW/Pz8FStWBAUFpaamzp8/f82aNVVVVbIsX7hwwVlqoaGhhw4dioiIePjhhxsbG5UjV61apRxQW1t78OBB5cj4+PhLly65nU9nZ6fD4Whvb5ckyeFwtLa29ksa4IbeaiM3N/fQoUNHjhwxmUwOh2PQ//ItXOmtJtmv9ENvtTHk9quBvanlbzabbfXq1ZMmTRo5cmRkZOSsWbM+/fRT5akbN25kZWWZTKaxY8fm5OTY7Xbl8fr6+tWrVyckJISHh993332XLl2Su/3WQEdHx29/+9sHHnigvr7earXm5uZOmDDBaDSazeb169crIxw7dmzSpEmRkZHZ2dk95rN3797uye9+41SHfFlfasOj2rh9+3aPb8zU1NT+y4Xn9L++rvQ/Z13VpMx+pSe6qo0huF8ZnKN4wWAwKKf3egTomS/rS20MboG4voE4Z2jHfgUR39d3kL8ABwAA4AtaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAAKFg34cwGAy+D4JBidqA3lCTEKE2IMJdJQAAACGDLMsDPQcAAACd4q4SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAkE+f1s1nmw4F3n3yFrUxFATWp7JRk0MB+xVEfNmvuKsEAAAg1Ad/Ay6wfrKEdr7/pEVtDFaB+1M4NTlYsV9BxPfa4K4SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACA0KBtlU6dOvXoo4+OGTMmLCzs7rvvzs/Pb2pq6ofzdnR05ObmjhkzJiIiYtmyZQ0NDW4PMxqNhm5CQ0NbW1v7YXpD1kDVg8ViWbx4sclkioyMXLBgwaVLl9weVlBQkJmZGRYWFh8f3/3xlStXdq+ToqKifpgz+h/7Fbpjv9Kbwdkq/fOf/5w3b96999775Zdf1tTUHDhwoKam5ty5c1piZVlub2/3+tQvv/zykSNHTp8+ffXq1WvXrq1du9btYRaLpfF/srKynnjiidDQUK9PCnUDWA85OTlWq/Xy5cs3b94cO3bsokWL3B5mMpk2bNiwfft216fy8vKcpbJw4UKvZwLdYr9Cd+xXeiT7wPcR/KGzszMxMTEvL6/H411dXbIs37p1a+HChTExMQkJCc8++2xTU5PybHp6en5+/uzZs9PS0kpKSmw229q1axMTE00m05IlS2pra5XD3nzzzeTk5NGjR48dO3bHjh2uZ4+NjX333XeVf5eUlAQHB9++fVtltrW1taGhoUePHvXxqv3Bl/XVT20MbD2kpqbu379f+XdJSUlQUFBHR4doqoWFhXFxcd0fWbFixYsvvujtpfuRftZXO33Omf2qr7BfsV+J9EG3M7Cn9wel+z579qzbZ2fOnLl06dKGhoaqqqqZM2c+88wzyuPp6el33XVXXV2d8uWvfvWrJ554ora2trm5ec2aNY8++qgsy5cuXTIajVeuXJFl2Wq1/ve//+0xeFVVVfdTK3ezT506pTLbXbt2TZo0yYfL9aPBsfUMYD3Isrxp06Z58+ZZLBabzfbUU09lZWWpTNXt1jN27NjExMSpU6e+/vrrbW1tnifAL/Szvtrpc87sV32F/Yr9SoRWyY3PP/9ckqSamhrXp0pLS7s/VVxcPGLEiM7OTlmW09PT//SnPymPl5eXGwwG52E2m81gMFit1rKyspEjR3744YcNDQ1uT3358mVJksrLy52PBAUFffbZZyqzTUtL27Vrl+dX2R8Gx9YzgPWgHDx37lwlG3fccUdlZaXKVF23niNHjnzxxRdXrlw5ePBgQkKC68+aA0U/66udPufMftVX2K+Ux9mvXPm+voPwvUoxMTGSJN28edP1qRs3boSFhSkHSJJkNpsdDkddXZ3y5bhx45R/VFRUGAyGadOmTZgwYcKECffcc8/o0aNv3rxpNpsLCgr+/Oc/x8fH//SnPz1+/HiP8cPDwyVJstlsypeNjY1dXV0RERHvvfee851u3Y8vKSmpqKhYuXJlX107XA1gPciy/NBDD5nN5vr6ervdvnjx4tmzZzc1NYnqwdX8+fNnzpw5ceLE7Ozs119//cCBA76kAjrEfoXu2K90amA7NX9QXut9/vnnezze1dXVoysvKSkJDQ11duWHDx9WHr969eqwYcOsVqvoFM3NzX/84x+joqKU14+7i42N/dvf/qb8+9ixY+qv/S9ZsuTJJ5/07PL6kS/rq5/aGMB6qK2tlVxe4Pjqq69E47j+lNbdhx9+OGbMGLVL7Uf6WV/t9Dln9qu+wn6lPM5+5aoPup2BPb2f/OMf/xgxYsRLL71UVlbmcDguXLiQk5Nz6tSprq6uGTNmPPXUU42NjdXV1bNmzVqzZo0S0r3UZFl++OGHFy5ceOvWLVmWa2pqPvroI1mWv//+++LiYofDIcvy22+/HRsb67r15Ofnp6enl5eXWyyWOXPmLF26VDTJmpqakJAQfb5BUjE4th55QOshOTl59erVNputpaXllVdeMRqN9fX1rjPs6OhoaWkpKCiIi4traWlRxuzs7Ny/f39FRYXVaj127FhqaqrzrQkDTlfrq5Fu58x+1SfYr5wjsF/1QKskdPLkyYcffjgyMnLUqFF33333q6++qvyywI0bN7Kyskwm09ixY3Nycux2u3J8j1KzWq25ubkTJkwwGo1ms3n9+vWyLJ85c+aBBx6IiIiIioqaPn36iRMnXM/b1ta2bt26yMhIo9G4dOlSm80mmuHOnTt1+wZJxaDZeuSBq4dz587Nnz8/KioqIiJi5syZov9p9u7d2/1eb1hYmCzLnZ2dDz30UHR0dEhIiNls3rJlS3Nzc59nxjt6W18t9Dxn9ivfsV85w9mvevB9fQ3OUbygvHLpywjQM1/Wl9oY3AJxfQNxztCO/Qoivq/vIHxbNwAAQF+hVQIAABCiVQIAABCiVQIAABCiVQIAABCiVQIAABCiVQIAABCiVQIAABCiVQIAABAK9n2IXv/aMIYsagN6Q01ChNqACHeVAAAAhHz6G3AAAACDG3eVAAAAhGiVAAAAhGiVAAAAhGiVAAAAhGiVAAAAhGiVAAAAhGiVAAAAhHz6tG4+23Qo8O6Tt6iNoSCwPpWNmhwK2K8g4st+xV0lAAAAoT74G3C+dPHE6j/WF4F4vcRqjw1EgZhnYrXH+iIQr5dY7bG+4K4SAACAEK0SAACAEK0SAACAkF9apY6Ojtzc3DFjxkRERCxbtqyhocGLEaZMmWIwGKqrq7VHWSyWxYsXm0ymyMjIBQsWXLp0Sf34goKCzMzMsLCw+Pj47o9v3LgxIyNj1KhRSUlJmzZtamtr0x4rSdK///3v6dOnjxgxIiYmZtOmTa6xovG15E19bup5E8V6mjdfaMmtil5z251ojbTkWWV9pd7yLIrVkmdRfrTkTeWYXvOWn5+fkpISGhoaHR392GOPXb16VXuuAp36WqtbuXKloZuioiLtsTdu3MjOzo6Ojh4zZszvf//71tZW7+YpWjstsUajsfv8Q0NDXachqisteRPFasmbKNbTvPlCS25FtOS2O1E+teRZdIyWPItiteRZtEZa8iaK1ZI30fi+fC/3QvaBaIT8/Py0tLSysjKLxTJr1qylS5dqj1Xs2LFj3rx5kiRVVVVpj33iiSd+8Ytf/Pjjj3a7fcWKFXfffbd67KeffvrBBx/s3LkzLi6u+zGrVq06ceJEXV3dqVOnxo8fv3nzZu2xxcXFRqPxr3/9a3V1dWVl5YkTJ1xjReOL8qYlVpQ3LbGivPlSIaJY9fmrx4pyK4oVrZGWPItiFep5FsVqybMoP1pqUnSMlpr86quvysrKGhoaysvLH3/88czMTO25ChSiOauvtXrsihUr8vLyGv+nvb1de+wDDzzw5JNP2my2W7duzZgxY/369eqxonmK1k5LrN1ud04+KytryZIlrrGiuhKNqSVWlDctsaK8+WO/EuVWS6wot6JYUT615Fl0jJY8i2K15Fm0RlpqUhSrpSZF42vJlXf80irFxsa+++67yr9LSkqCg4Nv376tMVaW5e+++y41NfXrr7+WPGyVUlNT9+/f7zxvUFBQR0dHr7GFhYUqW+TWrVvnzJmjPTYzM/PFF1/UPufu44vypiVWFuRNS6wob/7YelTm32usKLfqsa5rpD3PbmtDY55dYz3Nsyg/6jXpeoxHNdnW1paTk/PII48oX3pak3qmPmf1fUAUu2LFCi9qUpblmzdvSpJUWlqqfHn48GGj0dja2tprrMo8e6ydR7G1tbWhoaFHjx5VmbPsriZdx9QSK8pbr7EqefPrftUjtx7F9siteqxojbTk2fUY7XnuEetFnt3uV73WpEqslpp0uy7aa1K7vn8Brrq6uqamZsqUKcqXU6dO7ejouHjxosbwzs7O3/3ud2+99VZ4eLinp87Ozi4sLKypqWloaHjnnXd+85vfDBs2zNNBejh58uTUqVM1HuxwOL766qvOzs477rgjKipq3rx53377rcbxvchb97l5mrfusf7Im6dz6JUXuXUrgOpTlB8teXMeoz1vBQUF8fHx4eHh58+f//vf/y75nKshoqCgYPz48ffff//OnTvb29s1Rjm3bye73e7R6zs95tBj7Tz13nvvJSUl/fznP1c/zKPvWfVYj/LmjO3bvGnRb7n1k36rT9f11Z43t3Wlnjff18UzvvRZbke4fPmyJEnl5eX/rx0LCvrss8+0xMqyvGvXrkWLFsmy/P3330se3lWy2Wxz585Vnr3jjjsqKyu1xKr8pLV169aUlJS6ujqNsVVVVZIkpaSkXLhwwW63b9iwISEhwW63i+bcfXyVvPUaK4vzpiVWlDdfKqTX2B5z6DVWJbfqsT3WyKM8u9aG9jy7xnqUZ1F+eq3JHsdor8nm5uZbt26dOHFiypQpq1at8jRX+qc+Z+/uKh05cuSLL764cuXKwYMHExIS8vLytMfef//9zhc4Zs6cKUnSl19+2Wus23m6rp32WEVaWtquXbvU5+y2JjX+BN8jVpQ3LbGivPlpv3KbW42xih65VY/t27tK2vPsGutRnl1rQ2NNuo1VqNekyrr4465S37dKytZ89uxZ5UvlfaCnTp3SEnvlypVx48ZVV1fLnrdKXV1d06ZNe/rpp+vr6+12+7Zt25KSkrz4r9Tp5ZdfNpvNFRUV2mMbGxslSdq5c6fyZUtLy7Bhw44fP+42tsf4KnnrNVYlb73GquTNT1uP6xy0xKrkVj3WbTurMc89Yj3Kc49Yj/Isyo+WmuxxjEc1qThx4oTBYGhqavIoV/qnPmfvWqXuDhw4EBsbqz322rVrWVlZcXFxKSkp27ZtkyTp8uXLvcaqz9O5dh7FHj16NCQkpLa2VuW8oprU8t+S+vd797xpiRXlzX/7laJ7brXHuuZWPbZvW6Xu1PPsGqs9z+rrq16TolgtNek6vuhafN+v+v4FuPj4+NjY2G+++Ub58syZM8HBwRkZGVpiT548WVdXd+edd5pMJqWNvfPOO9955x0tsT/++ON//vOf3NzcqKiosLCw559/vrKy8sKFC95dxebNmw8cOHD8+PHk5GTtUUajceLEic4PBlX5hFDX8bXnzTVWe95cY/s2b1r4O7fq9F+fovxoyZvrMd7lbdiwYcOGDfMlV0NQSEhIR0eH9uOTkpI+/vjj6urq8vLyxMTEhISEiRMn+j4NZe08Ctm3b19WVpbJZBId4N33rMZYlby5jfVT3rTwR277jZ/qU0ttiPKmEutR3rxYF4/50meJRsjPz09PTy8vL7dYLHPmzNH+G3BNTU3X/+fYsWOSJJ05c0bLnSFFcnLy6tWrbTZbS0vLK6+8YjQa6+vrVWI7OjpaWloKCgri4uJaWlocDofy+Lp16yZNmlReXt7S0tLS0uJ8r6WW2DfffNNsNl+6dKmlpeUPf/jD+PHjXbtp0fiivPUaq5I3LecV5c2XChHFiuagJVaUW1GsaI205NltrMY8i86rJc+i/GipSdExvdZkW1vbq6++WlpaarVav/766/vvvz87O1t7rgKFaM6i9eo1trOzc//+/RUVFVar9dixY6mpqc8884z2854+ffqHH36oq6s7dOhQTEzMe++9px7rdp4qa6elJmVZrqmpCQkJ6fGmYy11JRqz11iVvGk5ryhvfb5fqeS211iF29yKYkX51JJnt8dozLNofC15drtGGmtS5f8C9ZpUGV9Lrrzjl1apra1t3bp1kZGRRqNx6dKlNptNe6yTF+9VOnfu3Pz586OioiIiImbOnNnrbxzs3btX6iYsLEyW5du3b0v/v9TUVI2xsix3dXVt3bo1Li4uIiLiwQcfPH/+fI9YlfFFedMSK8qbllhR3nwpL7exWuavcl5RbkWxojXqNc8qsU4qL8CJYnvNsyg/WmpS5Zhea7K9vf2xxx6Li4sLCQmZMGHCxo0bnTnRkqtAIZpzr2stiu3s7HzooYeio6NDQkLMZvOWLVuam5u1n3f37t2xsbHDhw/PyMgoKCjodc5u56mydhrreefOnZMmTRKdV6WuRGP2GquSNy3nFeXNl5p0G6uS215jFW5zK4oV5bPXPIuO0ZJnlfF7zbNojbTUpPr/Beo1qTK+llx5x+AcxQuB+2fziCWW2IGKHSiBmCtiiSV2YGMV/GETAAAAIVolAAAAIVolAAAAIVolAAAAoT54WzcGN1/eRofBLRDf1o3Bjf0KIrytGwAAwC98uqsEAAAwuHFXCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQOj/AItyAftZS8fsAAAAAElFTkSuQmCC>" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 --cpu_bind=cores --distribution=cyclic:cyclic + +#### Distribution: cyclic:block + +The cyclic:block distribution will allocate the tasks of your job in +alternation on node level, starting with first node filling the sockets +linearly. + +\<img alt="" +src="<data:;base64,iVBORw0KGgoAAAANSUhEUgAAAw4AAADeCAIAAAAb9sCoAAAABmJLR0QA/wD/AP+gvaeTAAAe3klEQVR4nO3de3BU9f3/8bMhJEA2ISGbCyQkZAMJxlsREQhSWrFQtdWacJHBAnZASdUIsSLORECZqQqjDkOnlIpWM0wTFcF2xjp0DAEG1LEURFEDmBjCJdkkht1kk2yu5/fHme5vv9l8dj+7m8vZ5Pn4i5w973M+57Wf/fDO2WUxqKqqAAAAoC8hQz0AAAAA/aJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEAoNpNhgMPTXOAAEHVVVh3oIPmC9AkayQNYr7ioBAAAIBXRXSRNcv1kCCFzw3qFhvQJGmsDXK+4qAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqAQAACNEqjVw/+9nPDAbDp59+6tySmJj44Ycfyh/hyy+/NBqN8vsXFRVlZ2dHREQkJib6MFAAI97gr1cbN27MysoaN25cSkrKpk2bOjo6fBguhhdapREtNjb2mWeeGbTTmUymDRs2bNu2bdDOCGDYGOT1ym6379279/LlyyUlJSUlJVu3bh20U0NvaJVGtLVr11ZUVHzwwQfuD9XU1CxdujQ+Pj45OfmJJ55obW3Vtl++fHnx4sXR0dE33XTTyZMnnfs3NTXl5eVNnjw5Li7uoYceamhocD/mvffeu2zZssmTJw/Q5QAYxgZ5vXrjjTfmz58fGxubnZ39yCOPuJZjpKFVGtGMRuO2bduee+65zs7OXg/l5uaOHj26oqLi1KlTp0+fLigo0LYvXbo0OTm5trb2X//611/+8hfn/itXrrRYLGfOnKmurh4/fvyaNWsG7SoAjARDuF6dOHFi5syZ/Xo1CCpqAAI/AobQggULtm/f3tnZOX369N27d6uqmpCQcOjQIVVVy8vLFUWpq6vT9iwtLR0zZkx3d3d5ebnBYGhsbNS2FxUVRUREqKpaWVlpMBic+9tsNoPBYLVa+zxvcXFxQkLCQF8dBlQwvvaDccxwGqr1SlXVLVu2pKWlNTQ0DOgFYuAE/toPHezWDDoTGhr68ssvr1u3btWqVc6NV65ciYiIiIuL0340m80Oh6OhoeHKlSuxsbExMTHa9mnTpml/qKqqMhgMs2bNch5h/PjxV69eHT9+/GBdB4Dhb/DXqxdffHH//v1lZWWxsbEDdVXQPVolKA888MCrr7768ssvO7ckJye3tLTU19drq09VVVV4eLjJZEpKSrJare3t7eHh4Yqi1NbWavunpKQYDIazZ8/SGwEYUIO5Xm3evPngwYPHjh1LTk4esAtCEOCzSlAURdm5c+euXbuam5u1HzMyMubMmVNQUGC32y0WS2Fh4erVq0NCQqZPnz5jxozXX39dUZT29vZdu3Zp+6enpy9atGjt2rU1NTWKotTX1x84cMD9LN3d3Q6HQ/ucgcPhaG9vH6TLAzCMDM56lZ+ff/DgwcOHD5tMJofDwZcFjGS0SlAURZk9e/Z9993n/GcjBoPhwIEDra2taWlpM2bMuOWWW1577TXtoffff7+0tPS2226766677rrrLucRiouLJ02alJ2dHRkZOWfOnBMnTrif5Y033hg7duyqVassFsvYsWO5oQ3AD4OwXlmt1t27d1+8eNFsNo8dO3bs2LFZWVmDc3XQIYPzE0/+FBsMiqIEcgQAwSgYX/vBOGYAgQv8tc9dJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAACFaJQAAAKHQoR4AAAyeysrKoR4CgCBjUFXV/2KDQVGUQI4AIBgF42tfGzOAkSmQ9aof7iqxAAHQP7PZPNRDABCU+uGuEoCRKbjuKgGAfwJqlQAAAIY3/gUcAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAUEBfQcn3Ko0E/n2dBHNjJAiurxphTo4ErFcQCWS94q4SAACAUD/8xybB9Zsl5AX+mxZzY7gK3t/CmZPDFesVRAKfG9xVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEBr+rdK3337761//2mQyjRs3bvr06c8++6wfB5k+ffqHH34oufNPfvKTkpKSPh8qKirKzs6OiIhITEz0YxjoX7qaGxs3bszKyho3blxKSsqmTZs6Ojr8GAyCna7mJOuVruhqboy09WqYt0o9PT2//OUvJ02a9PXXXzc0NJSUlJjN5iEcj8lk2rBhw7Zt24ZwDNDobW7Y7fa9e/devny5pKSkpKRk69atQzgYDAm9zUnWK/3Q29wYceuVGoDAjzDQLl++rCjKt99+6/7QtWvXlixZEhcXl5SU9Pjjj7e0tGjbr1+/npeXl5KSEhkZOWPGjPLyclVVMzMzDx06pD26YMGCVatWdXR02Gy29evXJycnm0ym5cuX19fXq6r6xBNPjB492mQypaamrlq1qs9RFRcXJyQkDNQ1959Anl/mhn9zQ7Nly5b58+f3/zX3H/0/v+70P2Z9zknWKz3Q59zQjIT1apjfVZo0aVJGRsb69evffffd6upq14dyc3NHjx5dUVFx6tSp06dPFxQUaNtXrFhx6dKlzz77zGq1vvPOO5GRkc6SS5cuzZs3784773znnXdGjx69cuVKi8Vy5syZ6urq8ePHr1mzRlGU3bt3Z2Vl7d69u6qq6p133hnEa4Vv9Dw3Tpw4MXPmzP6/Zuibnuckhpae58aIWK+GtlMbBBaLZfPmzbfddltoaOjUqVOLi4tVVS0vL1cUpa6uTtuntLR0zJgx3d3dFRUViqJcvXq110EyMzOff/755OTkvXv3alsqKysNBoPzCDabzWAwWK1WVVVvvfVW7Swi/JamEzqcG6qqbtmyJS0traGhoR+vtN8FxfPbS1CMWYdzkvVKJ3Q4N9QRs14N/1bJqbm5+dVXXw0JCfnqq68++eSTiIgI50M//PCDoigWi6W0tHTcuHHutZmZmQkJCbNnz3Y4HNqWI0eOhISEpLqIjo7+5ptvVJaegGsHn37mxgsvvGA2m6uqqvr1+vpfcD2/muAas37mJOuV3uhnboyc9WqYvwHnymg0FhQUjBkz5quvvkpOTm5paamvr9ceqqqqCg8P196UbW1trampcS/ftWtXXFzc/fff39raqihKSkqKwWA4e/Zs1f9cv349KytLUZSQkBGU6vCgk7mxefPm/fv3Hzt2LDU1dQCuEsFEJ3MSOqSTuTGi1qth/iKpra195plnzpw509LS0tjY+NJLL3V2ds6aNSsjI2POnDkFBQV2u91isRQWFq5evTokJCQ9PX3RokWPPvpoTU2Nqqrnzp1zTrXw8PCDBw9GRUXdc889zc3N2p5r167Vdqivrz9w4IC2Z2Ji4vnz5/scT3d3t8Ph6OzsVBTF4XC0t7cPSgzog97mRn5+/sGDBw8fPmwymRwOx7D/x7dwp7c5yXqlH3qbGyNuvRram1oDzWazrVu3btq0aWPHjo2Ojp43b95HH32kPXTlypWcnByTyTRx4sS8vDy73a5tb2xsXLduXVJSUmRk5G233Xb+/HnV5V8NdHV1/fa3v73jjjsaGxutVmt+fv6UKVOMRqPZbH7qqae0Ixw9enTatGnR0dG5ubm9xrNnzx7X8F1vnOpQIM8vc8OnuXH9+vVeL8z09PTBy8J3+n9+3el/zLqakyrrlZ7oam6MwPXK4DyKHwwGg3Z6v48APQvk+WVuDG/B+PwG45ghj/UKIoE/v8P8DTgAAIBA0CoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAIhQZ+CIPBEPhBMCwxN6A3zEmIMDcgwl0lAAAAIYOqqkM9BgAAAJ3irhIAAIAQrRIAAIAQrRIAAIAQrRIAAIAQrRIAAIAQrRIAAIAQrRIAAIBQQN/WzXebjgT+ffMWc2MkCK5vZWNOjgSsVxAJZL3irhIAAIBQP/wfcMH1myXkBf6bFnNjuAre38KZk8MV6xVEAp8b3FUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQGrat0smTJ++7774JEyZERETcfPPNhYWFLS0tg3Derq6u/Pz8CRMmREVFrVy5sqmpqc/djEajwUV4eHh7e/sgDG/EGqr5YLFYli1bZjKZoqOjFy9efP78+T53Kyoqys7OjoiISExMdN2+Zs0a13lSUlIyCGPG4GO9givWK70Znq3SP//5z4ULF956662fffZZXV3d/v376+rqzp49K1OrqmpnZ6ffp37hhRcOHz586tSp77///tKlS+vXr+9zN4vF0vw/OTk5Dz74YHh4uN8nhWdDOB/y8vKsVuuFCxeuXr06ceLEpUuX9rmbyWTasGHDtm3b3B8qKChwTpUlS5b4PRLoFusVXLFe6ZEagMCPMBC6u7uTk5MLCgp6be/p6VFV9dq1a0uWLImLi0tKSnr88cdbWlq0RzMzMwsLC++8886MjIyysjKbzbZ+/frk5GSTybR8+fL6+nptt9deey01NXX8+PETJ07cvn27+9nj4+Pfeust7c9lZWWhoaHXr1/3MNr6+vrw8PAjR44EeNUDIZDnVz9zY2jnQ3p6+r59+7Q/l5WVhYSEdHV1iYZaXFyckJDgumX16tXPPvusv5c+gPTz/MrT55hZr/oL6xXrlUg/dDtDe/qBoHXfZ86c6fPRuXPnrlixoqmpqaamZu7cuY899pi2PTMz86abbmpoaNB+/NWvfvXggw/W19e3trY++uij9913n6qq58+fNxqNFy9eVFXVarX+97//7XXwmpoa11Nrd7NPnjzpYbQ7d+6cNm1aAJc7gIbH0jOE80FV1U2bNi1cuNBisdhstocffjgnJ8fDUPtceiZOnJicnDxz5sxXXnmlo6PD9wAGhH6eX3n6HDPrVX9hvWK9EqFV6sMnn3yiKEpdXZ37Q+Xl5a4PlZaWjhkzpru7W1XVzMzMP/3pT9r2yspKg8Hg3M1msxkMBqvVWlFRMXbs2Pfee6+pqanPU1+4cEFRlMrKSueWkJCQjz/+2MNoMzIydu7c6ftVDobhsfQM4XzQdl6wYIGWxg033FBdXe1hqO5Lz+HDhz/99NOLFy8eOHAgKSnJ/XfNoaKf51eePsfMetVfWK+07axX7gJ/fofhZ5Xi4uIURbl69ar7Q1euXImIiNB2UBTFbDY7HI6Ghgbtx0mTJml/qKqqMhgMs2bNmjJlypQpU2655Zbx48dfvXrVbDYXFRX9+c9/TkxM/OlPf3rs2LFex4+MjFQUxWazaT82Nzf39PRERUW9/fbbzk+6ue5fVlZWVVW1Zs2a/rp2uBvC+aCq6t133202mxsbG+12+7Jly+68886WlhbRfHC3aNGiuXPnTp06NTc395VXXtm/f38gUUCHWK/givVKp4a2UxsI2nu9Tz/9dK/tPT09vbrysrKy8PBwZ1d+6NAhbfv3338/atQoq9UqOkVra+sf//jHmJgY7f1jV/Hx8X/729+0Px89etTze//Lly9/6KGHfLu8QRTI86ufuTGE86G+vl5xe4Pj888/Fx3H/bc0V++9996ECRM8Xeog0s/zK0+fY2a96i+sV9p21it3/dDtDO3pB8g//vGPMWPGPP/88xUVFQ6H49y5c3l5eSdPnuzp6ZkzZ87DDz/c3NxcW1s7b968Rx99VCtxnWqqqt5zzz1Lliy5du2aqqp1dXXvv/++qqrfffddaWmpw+FQVfWNN96Ij493X3oKCwszMzMrKystFsv8+fNXrFghGmRdXV1YWJg+PyCpGR5Ljzqk8yE1NXXdunU2m62tre3FF180Go2NjY3uI+zq6mpraysqKkpISGhra9OO2d3dvW/fvqqqKqvVevTo0fT0dOdHE4acrp5fSbodM+tVv2C9ch6B9aoXWiWhEydO3HPPPdHR0ePGjbv55ptfeukl7R8LXLlyJScnx2QyTZw4MS8vz263a/v3mmpWqzU/P3/KlClGo9FsNj/11FOqqp4+ffqOO+6IioqKiYmZPXv28ePH3c/b0dHx5JNPRkdHG43GFStW2Gw20Qh37Nih2w9IaobN0qMO3Xw4e/bsokWLYmJioqKi5s6dK/qbZs+ePa73eiMiIlRV7e7uvvvuu2NjY8PCwsxm83PPPdfa2trvyfhHb8+vDD2PmfUqcKxXznLWq14Cf34NzqP4QXvnMpAjQM8CeX6ZG8NbMD6/wThmyGO9gkjgz+8w/Fg3AABAf6FVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEKJVAgAAEAoN/BBe/7dhjFjMDegNcxIizA2IcFcJAABAKKD/Aw4AAGB4464SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAUEDf1s13m44E/n3zFnNjJAiub2VjTo4ErFcQCWS94q4SAACAUD/8H3CBdPHU6r82EMF4vdTK1wajYMyZWvnaQATj9VIrXxsI7ioBAAAI0SoBAAAI0SoBAAAIDUir1NXVlZ+fP2HChKioqJUrVzY1NcnXbty4MSsra9y4cSkpKZs2bero6PDj7DNmzDAYDLW1tT4V/vvf/549e/aYMWPi4uI2bdokX2ixWJYtW2YymaKjoxcvXnz+/HnP+xcVFWVnZ0dERCQmJvYaudfcRLUyuYlqnWf3LzevPJzXa+aiWpnMRZnI5CyqlcnZ8z6ec/ZQ6zUrUa1MVoWFhWlpaeHh4bGxsffff//3338vn1Ww8/y68EyUm4w1a9YYXJSUlMjXGo1G19rw8PD29nbJ2itXruTm5sbGxk6YMOH3v/+910JRPjK5ifaRyU1UG0huMkTnlclcVCuTuej1K5OzqFYmZ1GtTM6iWpmsRLUyWYmuK5DXshdqAERHKCwszMjIqKiosFgs8+bNW7FihXzt2rVrjx8/3tDQcPLkycmTJ2/evFm+VrN9+/aFCxcqilJTUyNfW1paajQa//rXv9bW1lZXVx8/fly+9sEHH/zFL37x448/2u321atX33zzzZ5rP/roo3fffXfHjh0JCQmu+4hyk6kV5SZTq3HPLZAZInNeUeYytaLMXWtFmcjkLKqVydnzHPacs6hWJitRrUxWn3/+eUVFRVNTU2Vl5QMPPJCdnS2fVbAQjdnz68JzrSg3mdrVq1cXFBQ0/09nZ6d8rd1udxbm5OQsX75cvvaOO+546KGHbDbbtWvX5syZ89RTT3muFeUj2i5TK8pNplaU20CvV6LMZWpFmcu8fmVyFtXK5CyqlclZVCuTlahWJivRdclk5Z8BaZXi4+Pfeust7c9lZWWhoaHXr1+XrHW1ZcuW+fPny59XVdVvvvkmPT39iy++UHxslbKzs5999lnP4xHVpqen79u3T/tzWVlZSEhIV1eX19ri4uJeT6coN5laV665Sdb2mVt/LT2i84oyl6kVZS4as2sm8jm714q2S9b6lLNrrXxW7rU+ZdXR0ZGXl3fvvfdqP/qalZ55HrPn15TX6+2Vm0zt6tWr/V5znOrr68PDw48cOSJZe/XqVUVRysvLtR8PHTpkNBrb29u91orycd/u03rVKzeZWlFuA71eOfXK3Guth8zl1xyZnEW1qkTO7rW+5tzneb1m1avW16z6fN3JZyWv/9+Aq62traurmzFjhvbjzJkzu7q6vv32Wz8OdeLEiZkzZ8rv393d/bvf/e7111+PjIz06UQOh+Pzzz/v7u6+4YYbYmJiFi5c+NVXX8mX5+bmFhcX19XVNTU1vfnmm7/5zW9GjRrl0wCU4MwtEIOcuTMTP3IW5SmTs+s+vubsrPUjK9fzSmZVVFSUmJgYGRn59ddf//3vf1f6dU4OY+65+VQ7efLk22+/fceOHZ2dnX6c/e23305JSfn5z38uub/zrw0nu93u0/uG/WVocwvEIGTu6xruodannN1r5XPuc8ySWTlr5bMKZP74I5A+q88jXLhwQVGUysrK/9+OhYR8/PHHMrWutmzZkpaW1tDQIHleVVV37ty5dOlSVVW/++47xZe7SjU1NYqipKWlnTt3zm63b9iwISkpyW63S57XZrMtWLBAe/SGG26orq6WOW+vztdDbl5rXfXKTaZWlFsgM8TreT1kLjNmUeZ9jtk1E59yVsXz0GvO7vv4lLNrrU9ZuZ9XMqvW1tZr164dP358xowZa9eu9SMrnfM8Zr/vKrnnJll7+PDhTz/99OLFiwcOHEhKSiooKPB1zKqqZmRk7Ny506cx33777c43OObOnasoymeffea1tt/vKvWZm0ytKLcBXa9c9cpcplaUufyaI3mnxL1WMmf3Wp9yFq2TXrNyr5XMysPrbiDuKvV/q6Qt62fOnNF+1D4HevLkSZlapxdeeMFsNldVVcmf9+LFi5MmTaqtrVV9b5Wam5sVRdmxY4f2Y1tb26hRo44dOyZT29PTM2vWrEceeaSxsdFut2/dujUlJUWmzeqzdegzN/mXsXtuXms95DagS4+HzL3WesjcvbZXJj7lLJqHMjn32sennHvV+pRVr1qfstIcP37cYDC0tLT4lJX+eR5zgG/AqS65+VG7f//++Ph4X8975MiRsLCw+vp6n8Z86dKlnJychISEtLS0rVu3Kopy4cIFr7UD9Aac+n9z87XWNbcBXa+c3DOXqRVlLr/myOTs+e9Nzzl7rvWcs6hWJiv3Wvms3K9LExxvwCUmJsbHx3/55Zfaj6dPnw4NDc3KypI/wubNm/fv33/s2LHU1FT5qhMnTjQ0NNx4440mk0lrRW+88cY333xTptZoNE6dOtX5hZ4+fbPnjz/++J///Cc/Pz8mJiYiIuLpp5+urq4+d+6c/BE0wZhbIAYnc/dM5HMW5SmTs/s+8jm718pn5V7r3/wcNWrUqFGjAp+TI42Wmx+FYWFhXV1dvlbt3bs3JyfHZDL5VJWSkvLBBx/U1tZWVlYmJycnJSVNnTrV11P3r0HOLRADmrl/a7h8rShnr7UecvZQ6zWrPmv9mJ9+zx8fBNJniY5QWFiYmZlZWVlpsVjmz5/v07+Ae/LJJ6dNm1ZZWdnW1tbW1ub+eUNRbUtLy+X/OXr0qKIop0+fln8T7bXXXjObzefPn29ra/vDH/4wefJk+d8OU1NT161bZ7PZ2traXnzxRaPR2NjY6KG2q6urra2tqKgoISGhra3N4XBo20W5ydSKcvNa6yG3QGaIzJhFmcvUijJ3rRVlIpOzqFYm5z73kcxZdHyZrES1XrPq6Oh46aWXysvLrVbrF198cfvtt+fm5spnFSxEYxbNMa+1HnLzWtvd3b1v376qqiqr1Xr06NH09PTHHntMfsyqqtbV1YWFhfX5gW7PtadOnfrhhx8aGhoOHjwYFxf39ttve64V5SPa7rXWQ25eaz3kNtDrlSrIXKZWlLnM61cm5z5rJXPus1YyZw9/X3vNSlTrNSsP1yWTlX8GpFXq6Oh48skno6OjjUbjihUrbDabZO3169eV/ys9PV3+vE6+vgGnqmpPT8+WLVsSEhKioqLuuuuur7/+Wr727NmzixYtiomJiYqKmjt3rtd/jbJnzx7Xa4yIiNC2i3LzWushN5nzinILZHrJnFeUuUytKHNnrYdMvOYsqpXJWWYOi3L2UOs1Kw+1XrPq7Oy8//77ExISwsLCpkyZsnHjRmcmMnMyWIjG7PV1Iar1kJvX2u7u7rvvvjs2NjYsLMxsNj/33HOtra3yY1ZVdceOHdOmTevzIc+1u3btio+PHz16dFZWVlFRkddaUT6i7V5rPeTmtdZDboHMSZnrVQWZy9SKMnfWenj9es1ZVCuTs6hWJmfPa53nrDzUes3Kw3XJzEn/GJxH8UPw/rd51FJL7VDVDpVgzIpaaqkd2loN/7EJAACAEK0SAACAEK0SAACAEK0SAACAUD98rBvDWyAfo8PwFowf68bwxnoFET7WDQAAMCACuqsEAAAwvHFXCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQOj/AVpnAfsg0n+oAAAAAElFTkSuQmCC>" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 --cpu_bind=cores --distribution=cyclic:block ./application + +### Socket Bound + +Note: The general distribution onto the nodes and sockets stays the +same. The mayor difference between socket and cpu bound lies within the +ability of the tasks to "jump" from one core to another inside a socket +while executing the application. These jumps can slow down the execution +time of your application. + +#### Default Distribution + +The default distribution uses --cpu_bind=sockets with +--distribution=block:cyclic. The default allocation method (as well as +block:cyclic) will fill up one node after another, while filling socket +one and two in alternation. Resulting in only even ranks on the first +socket of each node and odd on each second socket of each node. + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAvoAAADyCAIAAACzsfbGAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3daXQUVdrA8Wq27AukIQECCQkQCAoCIov44iAHFJARCJtggsqWIyLiCOggghsoisOAo4zoSE6cZGQTj8twDmGZAdzZiSAkhCVASITurJ2EpN4PNdMnk+7qrqR64+b/+5RU31t1763nPjypNB2DLMsSAACAuJp5ewAAAADuRbkDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAE10JPZ4PB4KpxALjtyLLs4SuSc4CmTE/O4ekOAAAQnK6nOwrP/4QHwLu8+5SFnAM0NfpzDk93AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3mq7777/fYDAcOnTIeiQqKurzzz/XfoajR48GBwdrb5+WljZkyJCgoKCoqKgGDBSAEDyfc5599tnExMTAwMDOnTsvXry4qqqqAcOFWCh3mrSIiIjnn3/eY5czGo0LFy5csWKFx64IwKd4OOeUlpZu3Ljx0qVLmZmZmZmZL7/8sscuDV9DudOkzZo1KycnZ9u2bbYvXb16ddKkSe3atYuOjp4/f355ebly/NKlS6NGjQoPD7/jjjsOHjxobV9cXJyamtqpU6e2bdtOnTq1qKjI9pyjR4+ePHlyp06d3DQdAD7Owznnww8/vO+++yIiIoYMGfL444/X7Y6mhnKnSQsODl6xYsULL7xQXV1d76WJEye2bNkyJyfnp59+Onz48KJFi5TjkyZNio6Ovnbt2tdff/3BBx9Y20+fPr2goODIkSMXL14MCwubOXOmx2YB4HbhxZxz4MCB/v37u3Q2uK3IOug/A7xo2LBhr776anV1dY8ePdavXy/LcmRk5I4dO2RZPn36tCRJ169fV1pmZWX5+/vX1NScPn3aYDDcuHFDOZ6WlhYUFCTLcm5ursFgsLY3m80Gg8FkMtm9bkZGRmRkpLtnB7fy1t4n59zWvJVzZFlevnx5ly5dioqK3DpBuI/+vd/C0+UVfEyLFi1Wr149e/bs5ORk68HLly8HBQW1bdtW+TYuLs5isRQVFV2+fDkiIqJ169bK8W7duilf5OXlGQyGAQMGWM8QFhaWn58fFhbmqXkAuD14Pue88sor6enpe/fujYiIcNes4PModyD9/ve/f+edd1avXm09Eh0dXVZWVlhYqGSfvLw8Pz8/o9HYsWNHk8lUWVnp5+cnSdK1a9eU9p07dzYYDMeOHaO+AeCUJ3PO0qVLt2/fvn///ujoaLdNCLcB3rsDSZKkNWvWrFu3rqSkRPm2e/fugwYNWrRoUWlpaUFBwbJly1JSUpo1a9ajR4++ffu+++67kiRVVlauW7dOaR8fHz9y5MhZs2ZdvXpVkqTCwsKtW7faXqWmpsZisSi/s7dYLJWVlR6aHgAf45mcs2DBgu3bt+/atctoNFosFv4jelNGuQNJkqSBAweOGTPG+l8hDAbD1q1by8vLu3Tp0rdv3969e69du1Z5acuWLVlZWf369Rs+fPjw4cOtZ8jIyOjQocOQIUNCQkIGDRp04MAB26t8+OGHAQEBycnJBQUFAQEBPFgGmiwP5ByTybR+/fqzZ8/GxcUFBAQEBAQkJiZ6ZnbwQQbrO4Aa09lgkCRJzxkA3I68tffJOUDTpH/v83QHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIroX+UxgMBv0nAQCNyDkAGoqnOwAAQHAGWZa9PQYAAAA34ukOAAAQHOUOAAAQHOUOAAAQHOUOAAAQHOUOAAAQHOUOAAAQnK6PGeTDvpqCxn1UAbHRFHj+YyyIq6aAnAM1enIOT3cAAIDgXPBHJPigQlHp/2mJ2BCVd3+SJq5ERc6BGv2xwdMdAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOPHLnezs7IcffthoNAYGBvbo0WPJkiWNOEmPHj0+//xzjY3vuuuuzMxMuy+lpaUNGTIkKCgoKiqqEcOAa/lUbDz77LOJiYmBgYGdO3devHhxVVVVIwYDX+BTcUXO8Sk+FRtNLecIXu7U1tY++OCDHTp0OHHiRFFRUWZmZlxcnBfHYzQaFy5cuGLFCi+OAQpfi43S0tKNGzdeunQpMzMzMzPz5Zdf9uJg0Gi+FlfkHN/ha7HR5HKOrIP+M7jbpUuXJEnKzs62fenKlStJSUlt27bt2LHjU089VVZWphy/efNmampq586dQ0JC+vbte/r0aVmWExISduzYobw6bNiw5OTkqqoqs9k8b9686Ohoo9E4ZcqUwsJCWZbnz5/fsmVLo9EYExOTnJxsd1QZGRmRkZHumrPr6Lm/xEbjYkOxfPny++67z/Vzdh1v3V/iipzjjr6e4ZuxoWgKOUfwpzsdOnTo3r37vHnz/vGPf1y8eLHuSxMnTmzZsmVOTs5PP/10+PDhRYsWKcenTZt24cKFb7/91mQybd68OSQkxNrlwoUL995779ChQzdv3tyyZcvp06cXFBQcOXLk4sWLYWFhM2fOlCRp/fr1iYmJ69evz8vL27x5swfniobx5dg4cOBA//79XT9nuJ8vxxW8y5djo0nkHO9WWx5QUFCwdOnSfv36tWjRomvXrhkZGbIsnz59WpKk69evK22ysrL8/f1rampycnIkScrPz693koSEhJdeeik6Onrjxo3KkdzcXIPBYD2D2Ww2GAwmk0mW5T59+ihXUcNPWj7CB2NDluXly5d36dKlqKjIhTN1OW/dX+KKnOOOvh7jg7EhN5mcI365Y1VSUvLOO+80a9bs+PHju3fvDgoKsr50/vx5SZIKCgqysrICAwNt+yYkJERGRg4cONBisShH9uzZ06xZs5g6wsPDT506JZN6dPf1PN+JjZUrV8bFxeXl5bl0fq5HuaOF78QVOcfX+E5sNJ2cI/gvs+oKDg5etGiRv7//8ePHo6Ojy8rKCgsLlZfy8vL8/PyUX3CWl5dfvXrVtvu6devatm07bty48vJySZI6d+5sMBiOHTuW9183b95MTEyUJKlZsya0qmLwkdhYunRpenr6/v37Y2Ji3DBLeJqPxBV8kI/ERpPKOYJvkmvXrj3//PNHjhwpKyu7cePGqlWrqqurBwwY0L1790GDBi1atKi0tLSgoGDZsmUpKSnNmjWLj48fOXLknDlzrl69KsvyyZMnraHm5+e3ffv20NDQhx56qKSkRGk5a9YspUFhYeHWrVuVllFRUWfOnLE7npqaGovFUl1dLUmSxWKprKz0yDLADl+LjQULFmzfvn3Xrl1Go9FisQj/n0JF5WtxRc7xHb4WG00u53j34ZK7mc3m2bNnd+vWLSAgIDw8/N577/3qq6+Uly5fvjxhwgSj0di+ffvU1NTS0lLl+I0bN2bPnt2xY8eQkJB+/fqdOXNGrvNO+Fu3bj322GP33HPPjRs3TCbTggULYmNjg4OD4+LinnnmGeUM+/bt69atW3h4+MSJE+uN5/3336+7+HUfYPogPfeX2GhQbNy8ebPexoyPj/fcWjSct+4vcUXOcUdfz/Cp2GiCOcdgPUsjGAwG5fKNPgN8mZ77S2yIzVv3l7gSGzkHavTfX8F/mQUAAEC5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABNdC/ymUP8sO2CI24A7EFdQQG1DD0x0AACA4gyzL3h4DAACAG/F0BwAACI5yBwAACI5yBwAACI5yBwAACI5yBwAACI5yBwAACI5yBwAACE7Xpyrz+ZVNQeM+mYnYaAo8/6ldxFVTQM6BGj05h6c7AABAcC74m1l8LrOo9P+0RGyIyrs/SRNXoiLnQI3+2ODpDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEJyw5c7BgwfHjBnTpk2boKCgO++8c9myZWVlZR647q1btxYsWNCmTZvQ0NDp06cXFxfbbRYcHGyow8/Pr7Ky0gPDa7K8FQ8FBQWTJ082Go3h4eGjRo06c+aM3WZpaWlDhgwJCgqKioqqe3zmzJl14yQzM9MDY0bjkHNQFznH14hZ7nzxxRcPPPBAnz59vv322+vXr6enp1+/fv3YsWNa+sqyXF1d3ehLr1y5cteuXT/99NO5c+cuXLgwb948u80KCgpK/mvChAnjx4/38/Nr9EXhmBfjITU11WQy/frrr/n5+e3bt580aZLdZkajceHChStWrLB9adGiRdZQSUpKavRI4FbkHNRFzvFFsg76z+AONTU10dHRixYtqne8trZWluUrV64kJSW1bdu2Y8eOTz31VFlZmfJqQkLCsmXLhg4d2r17971795rN5nnz5kVHRxuNxilTphQWFirN1q5dGxMTExYW1r59+1dffdX26u3atfv444+Vr/fu3duiRYubN286GG1hYaGfn9+ePXt0ztod9Nxf34kN78ZDfHz8pk2blK/37t3brFmzW7duqQ01IyMjMjKy7pGUlJQlS5Y0dupu5K376ztxVRc5x1XIOeQcNS6oWLx7eXdQKugjR47YfXXw4MHTpk0rLi6+evXq4MGD586dqxxPSEi44447ioqKlG/Hjh07fvz4wsLC8vLyOXPmjBkzRpblM2fOBAcHnz17VpZlk8n0888/1zv51atX615aeap88OBBB6Nds2ZNt27ddEzXjcRIPV6MB1mWFy9e/MADDxQUFJjN5hkzZkyYMMHBUO2mnvbt20dHR/fv3//NN9+sqqpq+AK4BeVOXeQcVyHnkHPUUO7YsXv3bkmSrl+/bvvS6dOn676UlZXl7+9fU1Mjy3JCQsKGDRuU47m5uQaDwdrMbDYbDAaTyZSTkxMQEPDZZ58VFxfbvfSvv/4qSVJubq71SLNmzb755hsHo+3evfuaNWsaPktPECP1eDEelMbDhg1TVqNnz54XL150MFTb1LNr165Dhw6dPXt269atHTt2tP150Vsod+oi57gKOUc5Ts6xpf/+CvjenbZt20qSlJ+fb/vS5cuXg4KClAaSJMXFxVkslqKiIuXbDh06KF/k5eUZDIYBAwbExsbGxsb27t07LCwsPz8/Li4uLS3tL3/5S1RU1P/93//t37+/3vlDQkIkSTKbzcq3JSUltbW1oaGhn3zyifWdX3Xb7927Ny8vb+bMma6aO2x5MR5kWR4xYkRcXNyNGzdKS0snT548dOjQsrIytXiwNXLkyMGDB3ft2nXixIlvvvlmenq6nqWAm5BzUBc5x0d5t9pyB+X3ps8991y947W1tfUq67179/r5+Vkr6x07dijHz50717x5c5PJpHaJ8vLyN954o3Xr1srvYutq167d3/72N+Xrffv2Of49+pQpU6ZOndqw6XmQnvvrO7HhxXgoLCyUbH7R8N1336mdx/Ynrbo+++yzNm3aOJqqB3nr/vpOXNVFznEVco5ynJxjywUVi3cv7yY7d+709/d/6aWXcnJyLBbLyZMnU1NTDx48WFtbO2jQoBkzZpSUlFy7du3ee++dM2eO0qVuqMmy/NBDDyUlJV25ckWW5evXr2/ZskWW5V9++SUrK8tisciy/OGHH7Zr18429SxbtiwhISE3N7egoOC+++6bNm2a2iCvX7/eqlUr33zDoEKM1CN7NR5iYmJmz55tNpsrKipeeeWV4ODgGzdu2I7w1q1bFRUVaWlpkZGRFRUVyjlramo2bdqUl5dnMpn27dsXHx9v/TW/11Hu1EPOcQlyjvUM5Jx6KHdUHThw4KGHHgoPDw8MDLzzzjtXrVqlvAH+8uXLEyZMMBqN7du3T01NLS0tVdrXCzWTybRgwYLY2Njg4OC4uLhnnnlGluXDhw/fc889oaGhrVu3Hjhw4L/+9S/b61ZVVT399NPh4eHBwcHTpk0zm81qI3zrrbd89g2DCmFSj+y9eDh27NjIkSNbt24dGho6ePBgtX9p3n///brPXIOCgmRZrqmpGTFiRERERKtWreLi4l544YXy8nKXr0zjUO7YIufoR86xdifn1KP//hqsZ2kE5beAes4AX6bn/hIbYvPW/SWuxEbOgRr991fAtyoDAADURbkDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAE10L/KZz+hVU0WcQG3IG4ghpiA2p4ugMAAASn629mAQAA+D6e7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMHp+lRlPr+yKWjcJzMRG02B5z+1i7hqCsg5UKMn5/B0BwAACM4FfzOLz2UWlf6flogNUXn3J2niSlTkHKjRHxs83QEAAIKj3AEAAIKj3AEAAIKj3AEAAIKj3AEAAIKj3AEAAIKj3JEkSerRo4fBYDAYDJGRkSkpKaWlpY04idFoPHfunMvHBu8iNuAOxBXUEBtuQrnzH1u2bJFl+dChQz/++OPq1au9PRz4EGID7kBcQQ2x4Q6UO/8jPj5+7Nixx48fV7598cUXO3fuHBoaOmjQoMOHDysHjUbj22+/PXDgwK5duz799NO2J9m3b19MTMz333/vuXHD/YgNuANxBTXEhmtR7vwPs9mclZXVq1cv5ds777zz559/vnHjxqRJk6ZOnWr9vM6jR48eOnToxIkTu3fvzsrKqnuGr7/+Ojk5eefOnQMHDvT06OFOxAbcgbiCGmLDxWQd9J/BRyQkJISHh0dGRrZo0WL06NHl5eW2bcLDwy9fvizLckRExPfff68cnDt37po1a5SvIyIiXn/99ZiYmFOnTnls5G6l5/4SG8SGSNd1OeLKLnKOTGyo0H9/ebrzH2vWrDl8+HBaWtrBgwdzcnKUg5988kn//v07deoUGxtbUlJSVFSkHG/Tpo3yhb+/f933kf3pT3+aPHlyYmKihwcPtyI24A7EFdQQG+5AufMf4eHhHTp0mDZt2pNPPrl06VJJks6ePfvcc8+lp6dfunQpLy8vNDRUdvbH57Zs2bJt27YNGzZ4ZMjwEGID7kBcQQ2x4Q4u+Ivognn++ee7dOly7NixW7duBQUFxcfHS5KUnp5+8+ZNp307duyYlZU1bNiwwMDAJ554wv2DhUcRG3AH4gpqiA0XotypLyoqKiUlZfXq1RkZGRMnTuzTp0/btm0HDx7cqVMnLd1jY2OzsrLuv/9+f3//Rx991N2jhScRG3AH4gpqiA0XMjh9IOaos8EgSZKeM8CX6bm/xIbYvHV/iSuxkXOgRv/95b07AABAcJQ7AABAcJQ7AABAcJQ7AABAcJQ7AABAcD5R7pjN5qSkpKCgoJiYmE8//dS2gcVimTt3bpcuXYKCgu6+++49e/Y4ONvp06f9/PxmzJih1mDjxo3x8fF+fn49e/Y8ffq0bYNffvll+PDhoaGh0dHRr7zyit2TrF27tk+fPi1atFA+A0r7XNT6apyj2nU1zv2243i+irlz53bo0MHPz69Lly5qfz34wQcfNPxX165dbRtYLBbD/7J7qpMnTw4bNiwkJKR79+5ff/11g8asZS5qbbTMUa2v07nfphysp9M9rtbX6d53sE8bnce07H2nbRzsfcd9Ha+Vg75a8qRa3GrJk3qo5Vgte1wtNpzufQdr5XTvO+jrdO876Ksl76nFpNO1cnBdLXlSbV5a8mTj+ES5s3DhQovFkp+f//HHH8+ZMyc7O7teg8rKysDAwG3btl28eHHatGnjxo0rLCxUO9v8+fPvuecetVe3bNny2muv/fWvf7127VpaWlp4eLhtm+Tk5N69excVFWVlZb333ns7d+60bdOpU6fXX3993LhxDZ2LWl+Nc1S7rpa5344cz1eRnJz8ww8/FBUVbd68edWqVbt27bLbLC0traKioqKiwu5N8ff3r/iv/Pz8li1bjh8/vl6b6urqRx55ZMSIEb/99tv69eunTJly6dIl7WPWMhe1Nlrm6OD8jud+m1Kbr5Y97mCdHe99B/u00XlMy9532sbB3nfQ1+laOeirJU+qxa2WPKmH3furZY+r9dWy9x2sldO973idHe99x7HheO+r9dWyVmp9NeZJtXlpyZONpOcPbuk/gyzLFoslICDgxx9/VL6dMGHCiy++qHz95JNPPvnkk7ZdWrduvW/fPrttPv300xkzZixZsmT69OnWg3Xb9OrVKzMz0/acddsEBgZa/+ja+PHj33jjDbXxpKSkLFmypHFzqddX+xzV+tqdux567q9LYsPKdr52Y+PKlStRUVHffvutbZtRo0ZlZGTYntnueTZs2DBw4EDbNidOnGjZsmV1dbVy/He/+93q1avVzqN2f7XMxUFsOJijWl+1uevh2vur57q289Wyx9X6at/7Cus+1ZnH1I5r7Os076n11b5Wtn0btFZ149bBWrk25zjYR2p7XK1vg/a+wvb+asxjdvvKGva+bd8G5T216zpdq3p9G7pW9ealsF0r/TnH+5+qnJubW1FR0bt3b+Xb3r17HzlyRPl61KhRtu3PnTtXWlras2dP2zbFxcUrVqzYv3//unXr6naxtikrKzt16lR2dnb79u2bN2/+2GOPvfbaa82bN693nocffvjvf/977969z58//+OPPy5btszBePTMRY2DOapRm7uo6q3Jc889l5aWVlxcvG7dukGDBtlts2TJksWLFycmJq5cuXLgwIF22yg2b948c+ZMtWtZ1dbWnjx50nEbLTT21TJHNXbnLiSNe1xNg/Z+3X2qM4+pHdfS12neU+vb0LWqd12Na2Ubtw7WymM07nE1Tve+2v2tR2Nf7Xvftq/2vKc2Zi1r5WC+DtbK7rzcSE+tpP8Msiz/8MMPfn5+1m/Xrl37wAMPqDUuKysbMGDAihUr7L66YMGCt956S5ZltSccZ86ckSRpxIgRRUVFZ8+ejY+P//Of/2zbLC8vT/nTJJIkvfTSSw4GX68CbdBc1H7ycDxHtb5O594Ieu6vS2LDyvGTMFmWzWbzhQsXPvroo/Dw8FOnTtk2+PLLLw8fPpydnf3iiy+GhIRcuHBB7VTZ2dmtWrX67bffbF+qqqqKjY19+eWXy8vLv/zyy+bNm48fP76hY3Y6F7U2Tueo1lf73LVz7f3Vc91689W4x+32lRuy9+vtU5fkMS1737aN9r1fr2+D1sr2uhrXyjZuHayVa3OO2l5zsMfV+jZo76vdRy17325fjXvftq/2va82Zi1rVa+v9rVyMC93PN3x/nt3goODKysrq6qqlG+Li4uDg4PttqysrHzkkUd69eq1fPly21ePHTu2e/fuhQsXOrhWQECAJEl/+MMfIiIiunbtOnv2bNt3UVVWVg4fPnzu3LkWiyUnJ+eLL754//33XT4XNY7nqEbL3MUWGhrauXPnJ554YsSIEenp6bYNxowZ07dv3549e77++us9e/b85ptv1E61efPmsWPHtmnTxvalli1b7tixY/fu3ZGRkatWrRo3blx0dLQrp+GQ0zmq0T53AWjZ42q0733bfao/j2nZ+7ZttO99277a18q2r/a1so1b/XlSJwd7XI32vd+4HO64r5a9b7evxr3vYMxO18q2r/a1anROaxzv/zIrLi7O39//+PHjd999tyRJJ06c6NWrl22z6urqpKSk8PDwTZs2KX87o55///vfubm57du3lySpvLy8trY2Ozv78OHDddtER0eHh4dbu9s9T25ubm5u7tNPP+3n5xcXF5eUlJSVlZWamurCuahxOkc1WubedLRq1cppg5qaGrsv1dbWpqenv/fee2p977rrrgMHDihf9+vXb8KECY0epx5O5+igo9rcxaBlj6vRuPft7lOdeUzL3rfbRuPet9tX41rZ7du4PKnErc48qZPTPa5Gy95vdA7X3tfu3tfSV23vO+jrdK3U+jYiTzY6p2nn/ac7fn5+U6ZMWblypdls3rt37z//+c/p06crL82aNWvWrFmSJNXU1EyfPr26uvqjjz6qrq62WCy1tbX12jz++ONnz549evTo0aNHH3/88dGjR1t/UrG2MRgMycnJb7/9tslkunDhwqZNm8aOHVuvTWxsbFhY2AcffFBdXX3p0qVt27b16dOnXhtJkm7dumWxWGpqampqapQvNM5Fra+WOar1dTD3253d+Up11sRkMm3YsCEvL++3335LT0//6quvHn744XptSkpKMjMzr169WlhY+O677/78888jR46s10axe/fuysrK0aNH1x1D3TbffffdtWvX8vPzlyxZUlZWNnXqVNs2amN2Ohe1NlrmqNbXwdxvd3bnq2WPq/XVsvfV9qmePKZl76u10ZL31PpqWSu1vlrWSi1uHayVW2ND4XSPq/V1uvcd3Eene1+tr5a9r9ZXS95zMGana+Wgr9O1cjAvB/dOLz2/CdN/BoXJZJowYUJAQECnTp3S09Otx0eOHPnRRx/Jsnz+/Pl6w7a+29zapq56v8Ou26a8vDwlJSUkJKRDhw5//OMfa2pqbNvs2bNnwIABQUFBkZGR8+bNq6iosHfzt+cAAAHsSURBVG2zZMmSuuN59913Nc5Fra/GOapdV23ueui5v66KDbX5WtekuLh41KhRrVu3DgoK6t+//86dO619rW3MZvPQoUNDQ0ODg4MHDRq0e/du2zaKRx99dP78+fXGULfNCy+8EBYWFhAQMGbMmPPnz9ttozZmp3NRa6Nljmp9HcxdD1fdXz3XVVtPLXtcra/Tve9gnzY6j2nZ+w7aWKnlPQd9na6Vg75O18pB3KqtlZ640hIbsoY9rtbX6d53sFZO975aXy17X62vlrznOK4cr5WDvk7XysG81NZKT2z85wy6Ouu+vAPV1dW9evWqqqq6jdr4Wl+dXJV6XM7X7jux4fvXvR3vUVPrK3sp59yOa9XU+squyDkG61kaQfldnZ4zwJfpub/Ehti8dX+JK7GRc6BG//31/nt3AAAA3IpyBwAACI5yBwAACI5yBwAACI5yBwAACM4Fn6rc0M+ORNNBbMAdiCuoITaghqc7AABAcLo+dwcAAMD38XQHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAI7v8BE+cBiPwLm7cAAAAASUVORK5CYII=" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 -cpu_bind=sockets ./application + +#### + +#### Distribution: block:block + +This method allocates the tasks linearly to the cores. + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAvoAAADyCAIAAACzsfbGAAAABmJLR0QA/wD/AP+gvaeTAAAdq0lEQVR4nO3deVRU5/3H8TsUHWAQRhlkcQQEFUWjUWNc8zPHpJq4tSruFjSN24lbSBRN3BMbjYk21RNrtWnkkELdTRNTzxExrZqmGhUTFaMQRFFZijOswzLc3x+35VCGQWWYxYf36y/m3ufeeS73y9fP3BnvqGRZlgAAAMTl5uwJAAAA2BdxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAATnbsvGKpWqueYB4Ikjy7KDn5GeA7RktvQcru4AAADB2XR1R+H4V3gAnMu5V1noOUBLY3vP4eoOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMSdluv5559XqVRnz56tXRIYGHjkyJFH38OlS5e8vb0ffXxCQsLgwYM1Gk1gYOBjTBSAEBzfc15//fWoqCgvL6+QkJDly5dXVlY+xnQhFuJOi+bn57ds2TKHPZ1Op1u6dOm6desc9owAXIqDe05JScmuXbtu376dnJycnJy8du1ahz01XA1xp0V79dVXMzIyDh48aLnq3r17kyZNat++vV6vX7hwYVlZmbL89u3bI0eO1Gq1PXv2PHPmTO34oqKiBQsWdOzY0d/ff+rUqQUFBZb7HDVq1OTJkzt27GinwwHg4hzcc3bv3v3cc8/5+fkNHjx49uzZdTdHS0PcadG8vb3XrVu3cuXKqqqqeqsmTpzYqlWrjIyM8+fPX7hwIS4uTlk+adIkvV5///79Y8eO/f73v68dP2PGjNzc3IsXL2ZnZ/v6+s6aNcthRwHgSeHEnnP69Ol+/fo169HgiSLbwPY9wImGDRv2zjvvVFVVdevWbfv27bIsBwQEHD58WJbl9PR0SZLy8vKUkSkpKR4eHmazOT09XaVSFRYWKssTEhI0Go0sy5mZmSqVqna80WhUqVQGg6HB501KSgoICLD30cGunPW3T895ojmr58iyvGbNmk6dOhUUFNj1AGE/tv/tuzs6XsHFuLu7b9q0ac6cOTExMbUL79y5o9Fo/P39lYfh4eEmk6mgoODOnTt+fn5t27ZVlnfp0kX5ISsrS6VS9e/fv3YPvr6+OTk5vr6+jjoOAE8Gx/ecDRs2JCYmpqam+vn52euo4PKIO5B+8YtffPjhh5s2bapdotfrS0tL8/Pzle6TlZWlVqt1Ol2HDh0MBkNFRYVarZYk6f79+8r4kJAQlUqVlpZGvgHwUI7sOStWrDh06NDXX3+t1+vtdkB4AvDZHUiSJG3ZsuWjjz4qLi5WHnbt2nXgwIFxcXElJSW5ubmrVq2KjY11c3Pr1q1bnz59tm3bJklSRUXFRx99pIyPiIgYMWLEq6++eu/ePUmS8vPzDxw4YPksZrPZZDIp79mbTKaKigoHHR4AF+OYnrN48eJDhw4dP35cp9OZTCb+I3pLRtyBJEnSgAEDRo8eXftfIVQq1YEDB8rKyjp16tSnT59evXpt3bpVWbV///6UlJS+ffsOHz58+PDhtXtISkoKDg4ePHhwmzZtBg4cePr0actn2b17t6enZ0xMTG5urqenJxeWgRbLAT3HYDBs3779xo0b4eHhnp6enp6eUVFRjjk6uCBV7SeAmrKxSiVJki17APAkctbfPj0HaJls/9vn6g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABCcu+27UKlUtu8EAB4RPQfA4+LqDgAAEJxKlmVnzwEAAMCOuLoDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA4m24zyM2+WoKm3aqA2mgJHH8bC+qqJaDnwBpbeg5XdwAAgOCa4UskuFGhqGx/tURtiMq5r6SpK1HRc2CN7bXB1R0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAghM/7ly9enXs2LE6nc7Ly6tbt27x8fFN2Em3bt2OHDnyiIOffvrp5OTkBlclJCQMHjxYo9EEBgY2YRpoXi5VG6+//npUVJSXl1dISMjy5csrKyubMBm4ApeqK3qOS3Gp2mhpPUfwuFNTU/PSSy8FBwd///33BQUFycnJ4eHhTpyPTqdbunTpunXrnDgHKFytNkpKSnbt2nX79u3k5OTk5OS1a9c6cTJoMlerK3qO63C12mhxPUe2ge17sLfbt29LknT16lXLVXfv3o2Ojvb39+/QocNrr71WWlqqLH/w4MGCBQtCQkLatGnTp0+f9PR0WZYjIyMPHz6srB02bFhMTExlZaXRaJw/f75er9fpdFOmTMnPz5dleeHCha1atdLpdKGhoTExMQ3OKikpKSAgwF7H3HxsOb/URtNqQ7FmzZrnnnuu+Y+5+Tjr/FJX9Bx7bOsYrlkbipbQcwS/uhMcHNy1a9f58+f/5S9/yc7Orrtq4sSJrVq1ysjIOH/+/IULF+Li4pTl06ZNu3Xr1jfffGMwGPbu3dumTZvaTW7dujVkyJChQ4fu3bu3VatWM2bMyM3NvXjxYnZ2tq+v76xZsyRJ2r59e1RU1Pbt27Oysvbu3evAY8XjceXaOH36dL9+/Zr/mGF/rlxXcC5Xro0W0XOcm7YcIDc3d8WKFX379nV3d+/cuXNSUpIsy+np6ZIk5eXlKWNSUlI8PDzMZnNGRoYkSTk5OfV2EhkZuXr1ar1ev2vXLmVJZmamSqWq3YPRaFSpVAaDQZbl3r17K89iDa+0XIQL1oYsy2vWrOnUqVNBQUEzHmmzc9b5pa7oOfbY1mFcsDbkFtNzxI87tYqLiz/88EM3N7fLly+fOHFCo9HUrvrpp58kScrNzU1JSfHy8rLcNjIyMiAgYMCAASaTSVly8uRJNze30Dq0Wu2VK1dkWo/N2zqe69TG+vXrw8PDs7KymvX4mh9x51G4Tl3Rc1yN69RGy+k5gr+ZVZe3t3dcXJyHh8fly5f1en1paWl+fr6yKisrS61WK29wlpWV3bt3z3Lzjz76yN/ff9y4cWVlZZIkhYSEqFSqtLS0rP968OBBVFSUJElubi3otyoGF6mNFStWJCYmfv3116GhoXY4Sjiai9QVXJCL1EaL6jmC/5Hcv39/2bJlFy9eLC0tLSwsfO+996qqqvr379+1a9eBAwfGxcWVlJTk5uauWrUqNjbWzc0tIiJixIgRc+fOvXfvnizLP/zwQ22pqdXqQ4cO+fj4vPzyy8XFxcrIV199VRmQn59/4MABZWRgYOD169cbnI/ZbDaZTFVVVZIkmUymiooKh/wa0ABXq43FixcfOnTo+PHjOp3OZDIJ/59CReVqdUXPcR2uVhstruc49+KSvRmNxjlz5nTp0sXT01Or1Q4ZMuTLL79UVt25c2fChAk6nS4oKGjBggUlJSXK8sLCwjlz5nTo0KFNmzZ9+/a9fv26XOeT8NXV1b/61a+effbZwsJCg8GwePHisLAwb2/v8PDwJUuWKHs4depUly5dtFrtxIkT681n586ddX/5dS9guiBbzi+18Vi18eDBg3p/mBEREY77XTw+Z51f6oqeY49tHcOlaqMF9hxV7V6aQKVSKU/f5D3AldlyfqkNsTnr/FJXYqPnwBrbz6/gb2YBAAAQdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIzt32XShfyw5YojZgD9QVrKE2YA1XdwAAgOBUsiw7ew4AAAB2xNUdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgbLqrMvevbAmadmcmaqMlcPxdu6irloCeA2ts6Tlc3QEAAIJrhu/M4r7MorL91RK1ISrnvpKmrkRFz4E1ttcGV3cAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMEJG3fOnDkzevTodu3aaTSap556atWqVaWlpQ543urq6sWLF7dr187Hx2fGjBlFRUUNDvP29lbVoVarKyoqHDC9FstZ9ZCbmzt58mSdTqfVakeOHHn9+vUGhyUkJAwePFij0QQGBtZdPmvWrLp1kpyc7IA5o2noOaiLnuNqxIw7n3/++QsvvNC7d+9vvvkmLy8vMTExLy8vLS3tUbaVZbmqqqrJT71+/frjx4+fP3/+5s2bt27dmj9/foPDcnNzi/9rwoQJ48ePV6vVTX5SNM6J9bBgwQKDwfDjjz/m5OQEBQVNmjSpwWE6nW7p0qXr1q2zXBUXF1dbKtHR0U2eCeyKnoO66DmuSLaB7XuwB7PZrNfr4+Li6i2vqamRZfnu3bvR0dH+/v4dOnR47bXXSktLlbWRkZGrVq0aOnRo165dU1NTjUbj/Pnz9Xq9TqebMmVKfn6+Mmzr1q2hoaG+vr5BQUHvvPOO5bO3b9/+k08+UX5OTU11d3d/8OBBI7PNz89Xq9UnT5608ajtwZbz6zq14dx6iIiI2LNnj/Jzamqqm5tbdXW1takmJSUFBATUXRIbGxsfH9/UQ7cjZ51f16mruug5zYWeQ8+xphkSi3Of3h6UBH3x4sUG1w4aNGjatGlFRUX37t0bNGjQvHnzlOWRkZE9e/YsKChQHo4ZM2b8+PH5+fllZWVz584dPXq0LMvXr1/39va+ceOGLMsGg+G7776rt/N79+7VfWrlqvKZM2came2WLVu6dOliw+HakRitx4n1IMvy8uXLX3jhhdzcXKPROHPmzAkTJjQy1QZbT1BQkF6v79ev3+bNmysrKx//F2AXxJ266DnNhZ5Dz7GGuNOAEydOSJKUl5dnuSo9Pb3uqpSUFA8PD7PZLMtyZGTkjh07lOWZmZkqlap2mNFoVKlUBoMhIyPD09Nz3759RUVFDT71jz/+KElSZmZm7RI3N7evvvqqkdl27dp1y5Ytj3+UjiBG63FiPSiDhw0bpvw2unfvnp2d3chULVvP8ePHz549e+PGjQMHDnTo0MHy9aKzEHfqouc0F3qOspyeY8n28yvgZ3f8/f0lScrJybFcdefOHY1GowyQJCk8PNxkMhUUFCgPg4ODlR+ysrJUKlX//v3DwsLCwsJ69erl6+ubk5MTHh6ekJDw8ccfBwYG/t///d/XX39db/9t2rSRJMloNCoPi4uLa2pqfHx8Pv3009pPftUdn5qampWVNWvWrOY6dlhyYj3Isvziiy+Gh4cXFhaWlJRMnjx56NChpaWl1urB0ogRIwYNGtS5c+eJEydu3rw5MTHRll8F7ISeg7roOS7KuWnLHpT3Td944416y2tqauol69TUVLVaXZusDx8+rCy/efPmz372M4PBYO0pysrKfvOb37Rt21Z5L7au9u3b/+lPf1J+PnXqVOPvo0+ZMmXq1KmPd3gOZMv5dZ3acGI95OfnSxZvNPzzn/+0th/LV1p17du3r127do0dqgM56/y6Tl3VRc9pLvQcZTk9x1IzJBbnPr2dHD161MPDY/Xq1RkZGSaT6YcffliwYMGZM2dqamoGDhw4c+bM4uLi+/fvDxkyZO7cucomdUtNluWXX345Ojr67t27sizn5eXt379fluVr166lpKSYTCZZlnfv3t2+fXvL1rNq1arIyMjMzMzc3Nznnntu2rRp1iaZl5fXunVr1/zAoEKM1iM7tR5CQ0PnzJljNBrLy8s3bNjg7e1dWFhoOcPq6ury8vKEhISAgIDy8nJln2azec+ePVlZWQaD4dSpUxEREbVv8zsdcaceek6zoOfU7oGeUw9xx6rTp0+//PLLWq3Wy8vrqaeeeu+995QPwN+5c2fChAk6nS4oKGjBggUlJSXK+HqlZjAYFi9eHBYW5u3tHR4evmTJElmWL1y48Oyzz/r4+LRt23bAgAF///vfLZ+3srJy0aJFWq3W29t72rRpRqPR2gzff/99l/3AoEKY1iM7rx7S0tJGjBjRtm1bHx+fQYMGWfuXZufOnXWvuWo0GlmWzWbziy++6Ofn17p16/Dw8JUrV5aVlTX7b6ZpiDuW6Dm2o+fUbk7Pqcf286uq3UsTKO8C2rIHuDJbzi+1ITZnnV/qSmz0HFhj+/kV8KPKAAAAdRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBudu+i4d+wypaLGoD9kBdwRpqA9ZwdQcAAAjOpu/MAgAAcH1c3QEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACM6muypz/8qWoGl3ZqI2WgLH37WLumoJ6Dmwxpaew9UdAAAguGb4zizuyywq218tURuicu4raepKVPQcWGN7bXB1BwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdSZKkbt26qVQqlUoVEBAQGxtbUlLShJ3odLqbN282+9zgXNQG7IG6gjXUhp0Qd/5j//79siyfPXv23LlzmzZtcvZ04EKoDdgDdQVrqA17IO78j4iIiDFjxly+fFl5+NZbb4WEhPj4+AwcOPDChQvKQp1O98EHHwwYMKBz586LFi2y3MmpU6dCQ0O//fZbx80b9kdtwB6oK1hDbTQv4s7/MBqNKSkpPXr0UB4+9dRT3333XWFh4aRJk6ZOnVp7v85Lly6dPXv2+++/P3HiREpKSt09HDt2LCYm5ujRowMGDHD07GFP1AbsgbqCNdRGM5NtYPseXERkZKRWqw0ICHB3dx81alRZWZnlGK1We+fOHVmW/fz8vv32W2XhvHnztmzZovzs5+e3cePG0NDQK1euOGzmdmXL+aU2qA2RnrfZUVcNoufI1IYVtp9fru78x5YtWy5cuJCQkHDmzJmMjAxl4aefftqvX7+OHTuGhYUVFxcXFBQoy9u1a6f84OHhUfdzZL/97W8nT54cFRXl4MnDrqgN2AN1BWuoDXsg7vyHVqsNDg6eNm3ar3/96xUrVkiSdOPGjTfeeCMxMfH27dtZWVk+Pj7yw758bv/+/QcPHtyxY4dDpgwHoTZgD9QVrKE27KEZvhFdMMuWLevUqVNaWlp1dbVGo4mIiJAkKTEx8cGDBw/dtkOHDikpKcOGDfPy8nrllVfsP1k4FLUBe6CuYA210YyIO/UFBgbGxsZu2rQpKSlp4sSJvXv39vf3HzRoUMeOHR9l87CwsJSUlOeff97Dw2P69On2ni0cidqAPVBXsIbaaEaqh14Qa2xjlUqSJFv2AFdmy/mlNsTmrPNLXYmNngNrbD+/fHYHAAAIjrgDAAAER9wBAACCI+4AAADBEXcaYDQao6OjNRpNaGjoZ599ZjnAZDKp/hff4gbgobZu3dq7d293d3flZip17dq1KyIiQq1Wd+/ePT09vd5ak8k0b968Tp06aTSaZ5555uTJk7Wr5s2bFxwcrFarO3XqRCN6QjVyfhXp6elqtXrmzJkNbm6tBhqptxaIuNOApUuXmkymnJycTz75ZO7cuVevXq03wMPDo/y/cnJyWrVqNX78eKdMFQ5w7dq14cOH+/j46PX6DRs2NDjGWlt56aWXajNx586dHTJfuK6OHTtu3Lhx3Lhx9Zbv37//3Xff/cMf/nD//v2EhAStVltvQEVFhZeX18GDB7Ozs6dNmzZu3Lj8/HxlVUxMzL/+9a+CgoK9e/e+9957x48fd8SRoFk1cn4VCxcufPbZZ61tbq0GrNVbC2XLN1DYvgcXZDKZPD09z507pzycMGHCW2+91cj4HTt2DBgwwCFTczRbzq9ItfHMM88sWbKkoqIiPT29ffv2R44csRyzb9++v/71r+PHj4+Pj6+7fOTIkQkJCUoyrqiocNSU7c5Z51eMuoqNja1XJz169EhOTn70PbRt2/bUqVP1Ft69ezcwMPCbb75phik6CT1HUe/8fvbZZzNnzoyPj58xY0bjGzZYA5b19iSy/fxydae+zMzM8vLyXr16KQ979ep15cqVRsbv3bs3JibGIVODc1y9enX69OmtW7eOjIwcMmSI5dU+SZImTZo0ZswYHx8fy1WtWrXy8PDw8PBo3bq1/SeLJ09paemVK1euXr0aFBSk1+tXrlxpNpsbGX/z5s2SkpLu3bvXLnnjjTf8/f3DwsLWrl07cOBA+08ZdlTv/BYVFa1bt+79999vfCtq4KGIO/WVlJSo1eraf5l8fHzqfulaPdeuXUtLS5s6daqjZgcnGDt27J///GeTyXTt2rVz586NHDnysTaPj48PCQl56aWXvv32WzvNEE+0nJwcSZLOnj37ww8/nDp1av/+/R9//LG1wWVlZdOnT3/77bfbt29fu3Dt2rXffffdzp07V65c2WAcx5PC8vyuXr16zpw5QUFBjW9IDTwUcac+b2/vioqKyspK5WFRUZG3t7ckSTt27FA+gTFmzJjawXv37h0zZkztF9JCSJs3b/7iiy88PT2joqJmz57dt2/fR9920aJFR44cOX78eL9+/X7+859nZ2fbb554Qnl6ekqS9Oabb/r5+XXu3HnOnDnHjh2TGuo5FRUVv/zlL3v06LFmzZq6e/Dx8QkJCXnllVdefPHFxMRExx8CmoXl+U1LSztx4sTSpUvrjbSsDWrgoYg79YWHh3t4eFy+fFl5+P333/fo0UOSpIULFyrv/33xxRfKqpqamsTERN7JEltFRcXw4cPnzZtnMpkyMjI+//zznTt3Slbir6XRo0f36dOne/fuGzdu7N69+1dffeWoieOJodfrtVqtco986b83y5csek5VVVV0dLRWq92zZ0/tGEu8Z/qEavD8/uMf/8jMzAwKCtLpdL/73e8OHDigvNyy/PeoLmqgQcSd+tRq9ZQpU9avX280GlNTU//2t7/NmDGjwZEnTpyoqKgYNWqUg2cIR8rMzMzMzFy0aJFarQ4PD4+Ojk5JSZEe1m4a1Lp168Y/kwHhVVdXm0wms9lsNpuVHyRJUqlUMTExH3zwgcFguHXr1p49eywztNlsnjFjRlVV1R//+MeqqiqTyVRTUyNJksFg2LFjR1ZW1r///e/ExMQvv/xy7NixTjgw2Mba+Z09e/aNGzcuXbp06dKl2bNnjxo1SrnyV1cjNdBgvbVctnzO2fY9uCaDwTBhwgRPT8+OHTsmJiZaGzZ9+vTaf/OEZMv5FaY2ysrKfH19t23bVllZmZ2d/fTTT2/YsMFyWFVVVXl5+cyZM998883y8vLq6mpZlouKipKSku7evZuXl7d161ZPT88bN244/Ajswlnn90mvq/j4+Lrtd9u2bcrysrKy2NjYNm3aBAcHv/3222azud6GP/30U73WnZSUJMtyUVHRyJEj27Ztq9Fo+vXrd/ToUUcfUrNqsT3H2vmty9r/zGqkBqzV25PI9vPLN6LDKr6dWJGamhofH3/16lVvb+/x48dv27bNw8Oj3pgVK1Zs3ry59uG2bduWLl1aVFQ0evToy5cv19TU9OzZ8913333hhRccO3d74RvRYQ/0HFhj+/kl7sAqWg+sIe7AHug5sMb288tndwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABOdu+y4auZ05WjhqA/ZAXcEaagPWcHUHAAAIzqbbDAIAALg+ru4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcP8PtzynrHMtHtYAAAAASUVORK5CYII=" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 --cpu_bind=sockets --distribution=block:block ./application + +#### + +#### Distribution: block:cyclic + +The block:cyclic distribution will allocate the tasks of your job in +alternation between the first node and the second node while filling the +sockets linearly. + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAvoAAADyCAIAAACzsfbGAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3daXQUVdrA8Wq27AukIQECCQkQCAoCIov44iAHFJARCJtggsqWIyLiCOggghsoisOAo4zoSE6cZGQTj8twDmGZAdzZiSAkhCVASITurJ2EpN4PNdMnk+7qrqR64+b/+5RU31t1763nPjypNB2DLMsSAACAuJp5ewAAAADuRbkDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAE10JPZ4PB4KpxALjtyLLs4SuSc4CmTE/O4ekOAAAQnK6nOwrP/4QHwLu8+5SFnAM0NfpzDk93AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3AACA4Ch3mq7777/fYDAcOnTIeiQqKurzzz/XfoajR48GBwdrb5+WljZkyJCgoKCoqKgGDBSAEDyfc5599tnExMTAwMDOnTsvXry4qqqqAcOFWCh3mrSIiIjnn3/eY5czGo0LFy5csWKFx64IwKd4OOeUlpZu3Ljx0qVLmZmZmZmZL7/8sscuDV9DudOkzZo1KycnZ9u2bbYvXb16ddKkSe3atYuOjp4/f355ebly/NKlS6NGjQoPD7/jjjsOHjxobV9cXJyamtqpU6e2bdtOnTq1qKjI9pyjR4+ePHlyp06d3DQdAD7Owznnww8/vO+++yIiIoYMGfL444/X7Y6mhnKnSQsODl6xYsULL7xQXV1d76WJEye2bNkyJyfnp59+Onz48KJFi5TjkyZNio6Ovnbt2tdff/3BBx9Y20+fPr2goODIkSMXL14MCwubOXOmx2YB4HbhxZxz4MCB/v37u3Q2uK3IOug/A7xo2LBhr776anV1dY8ePdavXy/LcmRk5I4dO2RZPn36tCRJ169fV1pmZWX5+/vX1NScPn3aYDDcuHFDOZ6WlhYUFCTLcm5ursFgsLY3m80Gg8FkMtm9bkZGRmRkpLtnB7fy1t4n59zWvJVzZFlevnx5ly5dioqK3DpBuI/+vd/C0+UVfEyLFi1Wr149e/bs5ORk68HLly8HBQW1bdtW+TYuLs5isRQVFV2+fDkiIqJ169bK8W7duilf5OXlGQyGAQMGWM8QFhaWn58fFhbmqXkAuD14Pue88sor6enpe/fujYiIcNes4PModyD9/ve/f+edd1avXm09Eh0dXVZWVlhYqGSfvLw8Pz8/o9HYsWNHk8lUWVnp5+cnSdK1a9eU9p07dzYYDMeOHaO+AeCUJ3PO0qVLt2/fvn///ujoaLdNCLcB3rsDSZKkNWvWrFu3rqSkRPm2e/fugwYNWrRoUWlpaUFBwbJly1JSUpo1a9ajR4++ffu+++67kiRVVlauW7dOaR8fHz9y5MhZs2ZdvXpVkqTCwsKtW7faXqWmpsZisSi/s7dYLJWVlR6aHgAf45mcs2DBgu3bt+/atctoNFosFv4jelNGuQNJkqSBAweOGTPG+l8hDAbD1q1by8vLu3Tp0rdv3969e69du1Z5acuWLVlZWf369Rs+fPjw4cOtZ8jIyOjQocOQIUNCQkIGDRp04MAB26t8+OGHAQEBycnJBQUFAQEBPFgGmiwP5ByTybR+/fqzZ8/GxcUFBAQEBAQkJiZ6ZnbwQQbrO4Aa09lgkCRJzxkA3I68tffJOUDTpH/v83QHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIroX+UxgMBv0nAQCNyDkAGoqnOwAAQHAGWZa9PQYAAAA34ukOAAAQHOUOAAAQHOUOAAAQHOUOAAAQHOUOAAAQHOUOAAAQnK6PGeTDvpqCxn1UAbHRFHj+YyyIq6aAnAM1enIOT3cAAIDgXPBHJPigQlHp/2mJ2BCVd3+SJq5ERc6BGv2xwdMdAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOModAAAgOPHLnezs7IcffthoNAYGBvbo0WPJkiWNOEmPHj0+//xzjY3vuuuuzMxMuy+lpaUNGTIkKCgoKiqqEcOAa/lUbDz77LOJiYmBgYGdO3devHhxVVVVIwYDX+BTcUXO8Sk+FRtNLecIXu7U1tY++OCDHTp0OHHiRFFRUWZmZlxcnBfHYzQaFy5cuGLFCi+OAQpfi43S0tKNGzdeunQpMzMzMzPz5Zdf9uJg0Gi+FlfkHN/ha7HR5HKOrIP+M7jbpUuXJEnKzs62fenKlStJSUlt27bt2LHjU089VVZWphy/efNmampq586dQ0JC+vbte/r0aVmWExISduzYobw6bNiw5OTkqqoqs9k8b9686Ohoo9E4ZcqUwsJCWZbnz5/fsmVLo9EYExOTnJxsd1QZGRmRkZHumrPr6Lm/xEbjYkOxfPny++67z/Vzdh1v3V/iipzjjr6e4ZuxoWgKOUfwpzsdOnTo3r37vHnz/vGPf1y8eLHuSxMnTmzZsmVOTs5PP/10+PDhRYsWKcenTZt24cKFb7/91mQybd68OSQkxNrlwoUL995779ChQzdv3tyyZcvp06cXFBQcOXLk4sWLYWFhM2fOlCRp/fr1iYmJ69evz8vL27x5swfniobx5dg4cOBA//79XT9nuJ8vxxW8y5djo0nkHO9WWx5QUFCwdOnSfv36tWjRomvXrhkZGbIsnz59WpKk69evK22ysrL8/f1rampycnIkScrPz693koSEhJdeeik6Onrjxo3KkdzcXIPBYD2D2Ww2GAwmk0mW5T59+ihXUcNPWj7CB2NDluXly5d36dKlqKjIhTN1OW/dX+KKnOOOvh7jg7EhN5mcI365Y1VSUvLOO+80a9bs+PHju3fvDgoKsr50/vx5SZIKCgqysrICAwNt+yYkJERGRg4cONBisShH9uzZ06xZs5g6wsPDT506JZN6dPf1PN+JjZUrV8bFxeXl5bl0fq5HuaOF78QVOcfX+E5sNJ2cI/gvs+oKDg5etGiRv7//8ePHo6Ojy8rKCgsLlZfy8vL8/PyUX3CWl5dfvXrVtvu6devatm07bty48vJySZI6d+5sMBiOHTuW9183b95MTEyUJKlZsya0qmLwkdhYunRpenr6/v37Y2Ji3DBLeJqPxBV8kI/ERpPKOYJvkmvXrj3//PNHjhwpKyu7cePGqlWrqqurBwwY0L1790GDBi1atKi0tLSgoGDZsmUpKSnNmjWLj48fOXLknDlzrl69KsvyyZMnraHm5+e3ffv20NDQhx56qKSkRGk5a9YspUFhYeHWrVuVllFRUWfOnLE7npqaGovFUl1dLUmSxWKprKz0yDLADl+LjQULFmzfvn3Xrl1Go9FisQj/n0JF5WtxRc7xHb4WG00u53j34ZK7mc3m2bNnd+vWLSAgIDw8/N577/3qq6+Uly5fvjxhwgSj0di+ffvU1NTS0lLl+I0bN2bPnt2xY8eQkJB+/fqdOXNGrvNO+Fu3bj322GP33HPPjRs3TCbTggULYmNjg4OD4+LinnnmGeUM+/bt69atW3h4+MSJE+uN5/3336+7+HUfYPogPfeX2GhQbNy8ebPexoyPj/fcWjSct+4vcUXOcUdfz/Cp2GiCOcdgPUsjGAwG5fKNPgN8mZ77S2yIzVv3l7gSGzkHavTfX8F/mQUAAEC5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABEe5AwAABNdC/ymUP8sO2CI24A7EFdQQG1DD0x0AACA4gyzL3h4DAACAG/F0BwAACI5yBwAACI5yBwAACI5yBwAACI5yBwAACI5yBwAACI5yBwAACE7Xpyrz+ZVNQeM+mYnYaAo8/6ldxFVTQM6BGj05h6c7AABAcC74m1l8LrOo9P+0RGyIyrs/SRNXoiLnQI3+2ODpDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEBzlDgAAEJyw5c7BgwfHjBnTpk2boKCgO++8c9myZWVlZR647q1btxYsWNCmTZvQ0NDp06cXFxfbbRYcHGyow8/Pr7Ky0gPDa7K8FQ8FBQWTJ082Go3h4eGjRo06c+aM3WZpaWlDhgwJCgqKioqqe3zmzJl14yQzM9MDY0bjkHNQFznH14hZ7nzxxRcPPPBAnz59vv322+vXr6enp1+/fv3YsWNa+sqyXF1d3ehLr1y5cteuXT/99NO5c+cuXLgwb948u80KCgpK/mvChAnjx4/38/Nr9EXhmBfjITU11WQy/frrr/n5+e3bt580aZLdZkajceHChStWrLB9adGiRdZQSUpKavRI4FbkHNRFzvFFsg76z+AONTU10dHRixYtqne8trZWluUrV64kJSW1bdu2Y8eOTz31VFlZmfJqQkLCsmXLhg4d2r17971795rN5nnz5kVHRxuNxilTphQWFirN1q5dGxMTExYW1r59+1dffdX26u3atfv444+Vr/fu3duiRYubN286GG1hYaGfn9+ePXt0ztod9Nxf34kN78ZDfHz8pk2blK/37t3brFmzW7duqQ01IyMjMjKy7pGUlJQlS5Y0dupu5K376ztxVRc5x1XIOeQcNS6oWLx7eXdQKugjR47YfXXw4MHTpk0rLi6+evXq4MGD586dqxxPSEi44447ioqKlG/Hjh07fvz4wsLC8vLyOXPmjBkzRpblM2fOBAcHnz17VpZlk8n0888/1zv51atX615aeap88OBBB6Nds2ZNt27ddEzXjcRIPV6MB1mWFy9e/MADDxQUFJjN5hkzZkyYMMHBUO2mnvbt20dHR/fv3//NN9+sqqpq+AK4BeVOXeQcVyHnkHPUUO7YsXv3bkmSrl+/bvvS6dOn676UlZXl7+9fU1Mjy3JCQsKGDRuU47m5uQaDwdrMbDYbDAaTyZSTkxMQEPDZZ58VFxfbvfSvv/4qSVJubq71SLNmzb755hsHo+3evfuaNWsaPktPECP1eDEelMbDhg1TVqNnz54XL150MFTb1LNr165Dhw6dPXt269atHTt2tP150Vsod+oi57gKOUc5Ts6xpf/+CvjenbZt20qSlJ+fb/vS5cuXg4KClAaSJMXFxVkslqKiIuXbDh06KF/k5eUZDIYBAwbExsbGxsb27t07LCwsPz8/Li4uLS3tL3/5S1RU1P/93//t37+/3vlDQkIkSTKbzcq3JSUltbW1oaGhn3zyifWdX3Xb7927Ny8vb+bMma6aO2x5MR5kWR4xYkRcXNyNGzdKS0snT548dOjQsrIytXiwNXLkyMGDB3ft2nXixIlvvvlmenq6nqWAm5BzUBc5x0d5t9pyB+X3ps8991y947W1tfUq67179/r5+Vkr6x07dijHz50717x5c5PJpHaJ8vLyN954o3Xr1srvYutq167d3/72N+Xrffv2Of49+pQpU6ZOndqw6XmQnvvrO7HhxXgoLCyUbH7R8N1336mdx/Ynrbo+++yzNm3aOJqqB3nr/vpOXNVFznEVco5ynJxjywUVi3cv7yY7d+709/d/6aWXcnJyLBbLyZMnU1NTDx48WFtbO2jQoBkzZpSUlFy7du3ee++dM2eO0qVuqMmy/NBDDyUlJV25ckWW5evXr2/ZskWW5V9++SUrK8tisciy/OGHH7Zr18429SxbtiwhISE3N7egoOC+++6bNm2a2iCvX7/eqlUr33zDoEKM1CN7NR5iYmJmz55tNpsrKipeeeWV4ODgGzdu2I7w1q1bFRUVaWlpkZGRFRUVyjlramo2bdqUl5dnMpn27dsXHx9v/TW/11Hu1EPOcQlyjvUM5Jx6KHdUHThw4KGHHgoPDw8MDLzzzjtXrVqlvAH+8uXLEyZMMBqN7du3T01NLS0tVdrXCzWTybRgwYLY2Njg4OC4uLhnnnlGluXDhw/fc889oaGhrVu3Hjhw4L/+9S/b61ZVVT399NPh4eHBwcHTpk0zm81qI3zrrbd89g2DCmFSj+y9eDh27NjIkSNbt24dGho6ePBgtX9p3n///brPXIOCgmRZrqmpGTFiRERERKtWreLi4l544YXy8nKXr0zjUO7YIufoR86xdifn1KP//hqsZ2kE5beAes4AX6bn/hIbYvPW/SWuxEbOgRr991fAtyoDAADURbkDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAER7kDAAAE10L/KZz+hVU0WcQG3IG4ghpiA2p4ugMAAASn629mAQAA+D6e7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMFR7gAAAMHp+lRlPr+yKWjcJzMRG02B5z+1i7hqCsg5UKMn5/B0BwAACM4FfzOLz2UWlf6flogNUXn3J2niSlTkHKjRHxs83QEAAIKj3AEAAIKj3AEAAIKj3AEAAIKj3AEAAIKj3AEAAIKj3JEkSerRo4fBYDAYDJGRkSkpKaWlpY04idFoPHfunMvHBu8iNuAOxBXUEBtuQrnzH1u2bJFl+dChQz/++OPq1au9PRz4EGID7kBcQQ2x4Q6UO/8jPj5+7Nixx48fV7598cUXO3fuHBoaOmjQoMOHDysHjUbj22+/PXDgwK5duz799NO2J9m3b19MTMz333/vuXHD/YgNuANxBTXEhmtR7vwPs9mclZXVq1cv5ds777zz559/vnHjxqRJk6ZOnWr9vM6jR48eOnToxIkTu3fvzsrKqnuGr7/+Ojk5eefOnQMHDvT06OFOxAbcgbiCGmLDxWQd9J/BRyQkJISHh0dGRrZo0WL06NHl5eW2bcLDwy9fvizLckRExPfff68cnDt37po1a5SvIyIiXn/99ZiYmFOnTnls5G6l5/4SG8SGSNd1OeLKLnKOTGyo0H9/ebrzH2vWrDl8+HBaWtrBgwdzcnKUg5988kn//v07deoUGxtbUlJSVFSkHG/Tpo3yhb+/f933kf3pT3+aPHlyYmKihwcPtyI24A7EFdQQG+5AufMf4eHhHTp0mDZt2pNPPrl06VJJks6ePfvcc8+lp6dfunQpLy8vNDRUdvbH57Zs2bJt27YNGzZ4ZMjwEGID7kBcQQ2x4Q4u+Ivognn++ee7dOly7NixW7duBQUFxcfHS5KUnp5+8+ZNp307duyYlZU1bNiwwMDAJ554wv2DhUcRG3AH4gpqiA0XotypLyoqKiUlZfXq1RkZGRMnTuzTp0/btm0HDx7cqVMnLd1jY2OzsrLuv/9+f3//Rx991N2jhScRG3AH4gpqiA0XMjh9IOaos8EgSZKeM8CX6bm/xIbYvHV/iSuxkXOgRv/95b07AABAcJQ7AABAcJQ7AABAcJQ7AABAcJQ7AABAcD5R7pjN5qSkpKCgoJiYmE8//dS2gcVimTt3bpcuXYKCgu6+++49e/Y4ONvp06f9/PxmzJih1mDjxo3x8fF+fn49e/Y8ffq0bYNffvll+PDhoaGh0dHRr7zyit2TrF27tk+fPi1atFA+A0r7XNT6apyj2nU1zv2243i+irlz53bo0MHPz69Lly5qfz34wQcfNPxX165dbRtYLBbD/7J7qpMnTw4bNiwkJKR79+5ff/11g8asZS5qbbTMUa2v07nfphysp9M9rtbX6d53sE8bnce07H2nbRzsfcd9Ha+Vg75a8qRa3GrJk3qo5Vgte1wtNpzufQdr5XTvO+jrdO876Ksl76nFpNO1cnBdLXlSbV5a8mTj+ES5s3DhQovFkp+f//HHH8+ZMyc7O7teg8rKysDAwG3btl28eHHatGnjxo0rLCxUO9v8+fPvuecetVe3bNny2muv/fWvf7127VpaWlp4eLhtm+Tk5N69excVFWVlZb333ns7d+60bdOpU6fXX3993LhxDZ2LWl+Nc1S7rpa5344cz1eRnJz8ww8/FBUVbd68edWqVbt27bLbLC0traKioqKiwu5N8ff3r/iv/Pz8li1bjh8/vl6b6urqRx55ZMSIEb/99tv69eunTJly6dIl7WPWMhe1Nlrm6OD8jud+m1Kbr5Y97mCdHe99B/u00XlMy9532sbB3nfQ1+laOeirJU+qxa2WPKmH3furZY+r9dWy9x2sldO973idHe99x7HheO+r9dWyVmp9NeZJtXlpyZONpOcPbuk/gyzLFoslICDgxx9/VL6dMGHCiy++qHz95JNPPvnkk7ZdWrduvW/fPrttPv300xkzZixZsmT69OnWg3Xb9OrVKzMz0/acddsEBgZa/+ja+PHj33jjDbXxpKSkLFmypHFzqddX+xzV+tqdux567q9LYsPKdr52Y+PKlStRUVHffvutbZtRo0ZlZGTYntnueTZs2DBw4EDbNidOnGjZsmV1dbVy/He/+93q1avVzqN2f7XMxUFsOJijWl+1uevh2vur57q289Wyx9X6at/7Cus+1ZnH1I5r7Os076n11b5Wtn0btFZ149bBWrk25zjYR2p7XK1vg/a+wvb+asxjdvvKGva+bd8G5T216zpdq3p9G7pW9ealsF0r/TnH+5+qnJubW1FR0bt3b+Xb3r17HzlyRPl61KhRtu3PnTtXWlras2dP2zbFxcUrVqzYv3//unXr6naxtikrKzt16lR2dnb79u2bN2/+2GOPvfbaa82bN693nocffvjvf/977969z58//+OPPy5btszBePTMRY2DOapRm7uo6q3Jc889l5aWVlxcvG7dukGDBtlts2TJksWLFycmJq5cuXLgwIF22yg2b948c+ZMtWtZ1dbWnjx50nEbLTT21TJHNXbnLiSNe1xNg/Z+3X2qM4+pHdfS12neU+vb0LWqd12Na2Ubtw7WymM07nE1Tve+2v2tR2Nf7Xvftq/2vKc2Zi1r5WC+DtbK7rzcSE+tpP8Msiz/8MMPfn5+1m/Xrl37wAMPqDUuKysbMGDAihUr7L66YMGCt956S5ZltSccZ86ckSRpxIgRRUVFZ8+ejY+P//Of/2zbLC8vT/nTJJIkvfTSSw4GX68CbdBc1H7ycDxHtb5O594Ieu6vS2LDyvGTMFmWzWbzhQsXPvroo/Dw8FOnTtk2+PLLLw8fPpydnf3iiy+GhIRcuHBB7VTZ2dmtWrX67bffbF+qqqqKjY19+eWXy8vLv/zyy+bNm48fP76hY3Y6F7U2Tueo1lf73LVz7f3Vc91689W4x+32lRuy9+vtU5fkMS1737aN9r1fr2+D1sr2uhrXyjZuHayVa3OO2l5zsMfV+jZo76vdRy17325fjXvftq/2va82Zi1rVa+v9rVyMC93PN3x/nt3goODKysrq6qqlG+Li4uDg4PttqysrHzkkUd69eq1fPly21ePHTu2e/fuhQsXOrhWQECAJEl/+MMfIiIiunbtOnv2bNt3UVVWVg4fPnzu3LkWiyUnJ+eLL754//33XT4XNY7nqEbL3MUWGhrauXPnJ554YsSIEenp6bYNxowZ07dv3549e77++us9e/b85ptv1E61efPmsWPHtmnTxvalli1b7tixY/fu3ZGRkatWrRo3blx0dLQrp+GQ0zmq0T53AWjZ42q0733bfao/j2nZ+7ZttO99277a18q2r/a1so1b/XlSJwd7XI32vd+4HO64r5a9b7evxr3vYMxO18q2r/a1anROaxzv/zIrLi7O39//+PHjd999tyRJJ06c6NWrl22z6urqpKSk8PDwTZs2KX87o55///vfubm57du3lySpvLy8trY2Ozv78OHDddtER0eHh4dbu9s9T25ubm5u7tNPP+3n5xcXF5eUlJSVlZWamurCuahxOkc1WubedLRq1cppg5qaGrsv1dbWpqenv/fee2p977rrrgMHDihf9+vXb8KECY0epx5O5+igo9rcxaBlj6vRuPft7lOdeUzL3rfbRuPet9tX41rZ7du4PKnErc48qZPTPa5Gy95vdA7X3tfu3tfSV23vO+jrdK3U+jYiTzY6p2nn/ac7fn5+U6ZMWblypdls3rt37z//+c/p06crL82aNWvWrFmSJNXU1EyfPr26uvqjjz6qrq62WCy1tbX12jz++ONnz549evTo0aNHH3/88dGjR1t/UrG2MRgMycnJb7/9tslkunDhwqZNm8aOHVuvTWxsbFhY2AcffFBdXX3p0qVt27b16dOnXhtJkm7dumWxWGpqampqapQvNM5Fra+WOar1dTD3253d+Up11sRkMm3YsCEvL++3335LT0//6quvHn744XptSkpKMjMzr169WlhY+O677/78888jR46s10axe/fuysrK0aNH1x1D3TbffffdtWvX8vPzlyxZUlZWNnXqVNs2amN2Ohe1NlrmqNbXwdxvd3bnq2WPq/XVsvfV9qmePKZl76u10ZL31PpqWSu1vlrWSi1uHayVW2ND4XSPq/V1uvcd3Eene1+tr5a9r9ZXS95zMGana+Wgr9O1cjAvB/dOLz2/CdN/BoXJZJowYUJAQECnTp3S09Otx0eOHPnRRx/Jsnz+/Pl6w7a+29zapq56v8Ou26a8vDwlJSUkJKRDhw5//OMfa2pqbNvs2bNnwIABQUFBkZGR8+bNq6iosHfzt+cAAAHsSURBVG2zZMmSuuN59913Nc5Fra/GOapdV23ueui5v66KDbX5WtekuLh41KhRrVu3DgoK6t+//86dO619rW3MZvPQoUNDQ0ODg4MHDRq0e/du2zaKRx99dP78+fXGULfNCy+8EBYWFhAQMGbMmPPnz9ttozZmp3NRa6Nljmp9HcxdD1fdXz3XVVtPLXtcra/Tve9gnzY6j2nZ+w7aWKnlPQd9na6Vg75O18pB3KqtlZ640hIbsoY9rtbX6d53sFZO975aXy17X62vlrznOK4cr5WDvk7XysG81NZKT2z85wy6Ouu+vAPV1dW9evWqqqq6jdr4Wl+dXJV6XM7X7jux4fvXvR3vUVPrK3sp59yOa9XU+squyDkG61kaQfldnZ4zwJfpub/Ehti8dX+JK7GRc6BG//31/nt3AAAA3IpyBwAACI5yBwAACI5yBwAACI5yBwAACM4Fn6rc0M+ORNNBbMAdiCuoITaghqc7AABAcLo+dwcAAMD38XQHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAIjnIHAAAI7v8BE+cBiPwLm7cAAAAASUVORK5CYII=" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCh --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 --cpu_bind=sockets --distribution=block:cyclic ./application + +## Hybrid Strategies + +### Default Binding and Distribution Pattern + +The default binding pattern of hybrid jobs will split the cores +allocated to a rank between the sockets of a node. The example shows +that Rank 0 has 4 cores at its disposal. Two of them on first socket +inside the first node and two on the second socket inside the first +node. + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAvoAAADyCAIAAACzsfbGAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3de1iUdf7/8XsQA+SoDgdhZHA4CaUlpijmYdXooJvrsbxqzXa1dCsPbFlt5qF2O2xbXV52bdtlV25c7iVrhrVXWVaEupJ2gjxUYAIDgjgcZJCDIIf7+8f9a36zjCAwM/eMn3k+/oJ77rnf9z3z5u1r7hnn1siyLAEAAIjLy9U7AAAA4FzEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABCctz131mg0jtoPANccWZZVrsjMATyZPTOHszsAAEBwdp3dUaj/Cg+Aa7n2LAszB/A09s8czu4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9zxXDNmzNBoNF9++aVlSURExPvvv9/3LXz//fcBAQF9Xz8zMzMtLc3f3z8iIqIfOwpACOrPnPXr1ycnJw8ZMiQ6OnrDhg2XL1/ux+5CLMQdjzZ8+PDHH39ctXJarXbdunVbtmxRrSIAt6LyzGlqanrzzTfPnj2blZWVlZW1efNm1UrD3RB3PNqKFSuKi4vfe+8925uqqqoWL14cFham0+keeeSRlpYWZfnZs2dvu+22kJCQG264IS8vz7L+xYsXV69ePXLkyNDQ0Hvuuae2ttZ2m3feeeeSJUtGjhzppMMB4OZUnjk7duyYOnXq8OHD09LSHnjgAeu7w9MQdzxaQEDAli1bnnrqqfb29m43LVy4cPDgwcXFxd9++21+fn5GRoayfPHixTqd7vz58/v37//HP/5hWf/ee+81mUwFBQXl5eXBwcHLly9X7SgAXCtcOHOOHDkyfvx4hx4NrimyHezfAlxo+vTpzz33XHt7++jRo7dv3y7Lcnh4+L59+2RZLiwslCSpurpaWTMnJ8fX17ezs7OwsFCj0Vy4cEFZnpmZ6e/vL8tySUmJRqOxrN/Q0KDRaMxm8xXr7t69Ozw83NlHB6dy1d8+M+ea5qqZI8vypk2bRo0aVVtb69QDhPPY/7fvrXa8gpvx9vZ+8cUXV65cuWzZMsvCiooKf3//0NBQ5VeDwdDa2lpbW1tRUTF8+PChQ4cqy+Pj45UfjEajRqOZMGGCZQvBwcGVlZXBwcFqHQeAa4P6M+fZZ5/dtWtXbm7u8OHDnXVUcHvEHUjz5s175ZVXXnzxRcsSnU7X3NxcU1OjTB+j0ejj46PVaqOiosxmc1tbm4+PjyRJ58+fV9aPjo7WaDTHjx8n3wC4KjVnzpNPPpmdnX3o0CGdTue0A8I1gM/uQJIk6eWXX962bVtjY6Pya0JCwqRJkzIyMpqamkwm08aNG++//34vL6/Ro0ePGzfutddekySpra1t27ZtyvqxsbHp6ekrVqyoqqqSJKmmpmbv3r22VTo7O1tbW5X37FtbW9va2lQ6PABuRp2Zs2bNmuzs7AMHDmi12tbWVv4juicj7kCSJCk1NXXOnDmW/wqh0Wj27t3b0tIyatSocePGjR079tVXX1Vuevfdd3NyclJSUmbOnDlz5kzLFnbv3h0ZGZmWlhYYGDhp0qQjR47YVtmxY4efn9+yZctMJpOfnx8nlgGPpcLMMZvN27dv//nnnw0Gg5+fn5+fX3JysjpHBzeksXwCaCB31mgkSbJnCwCuRa7622fmAJ7J/r99zu4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBedu/CY1GY/9GAKCPmDkA+ouzOwAAQHAaWZZdvQ8AAABOxNkdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADB2fU1g3zZlycY2FcV0BueQP2vsaCvPAEzBz2xZ+ZwdgcAAAjOAReR4IsKRWX/qyV6Q1SufSVNX4mKmYOe2N8bnN0BAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjx486PP/7461//WqvVDhkyZPTo0U888cQANjJ69Oj333+/jyvfdNNNWVlZV7wpMzMzLS3N398/IiJiALsBx3Kr3li/fn1ycvKQIUOio6M3bNhw+fLlAewM3IFb9RUzx624VW942swRPO50dXXdfvvtkZGRJ0+erK2tzcrKMhgMLtwfrVa7bt26LVu2uHAfoHC33mhqanrzzTfPnj2blZWVlZW1efNmF+4MBszd+oqZ4z7crTc8bubIdrB/C8529uxZSZJ+/PFH25vOnTu3aNGi0NDQqKiohx9+uLm5WVleX1+/evXq6OjowMDAcePGFRYWyrKcmJi4b98+5dbp06cvW7bs8uXLDQ0Nq1at0ul0Wq327rvvrqmpkWX5kUceGTx4sFar1ev1y5Ytu+Je7d69Ozw83FnH7Dj2PL/0xsB6Q7Fp06apU6c6/pgdx1XPL33FzHHGfdXhnr2h8ISZI/jZncjIyISEhFWrVv373/8uLy+3vmnhwoWDBw8uLi7+9ttv8/PzMzIylOVLly4tKys7evSo2Wx+5513AgMDLXcpKyubMmXKLbfc8s477wwePPjee+81mUwFBQXl5eXBwcHLly+XJGn79u3Jycnbt283Go3vvPOOiseK/nHn3jhy5Mj48eMdf8xwPnfuK7iWO/eGR8wc16YtFZhMpieffDIlJcXb2zsuLm737t2yLBcWFkqSVF1drayTk5Pj6+vb2dlZXFwsSVJlZWW3jSQmJj7zzDM6ne7NN99UlpSUlGg0GssWGhoaNBqN2WyWZfnGG29UqvSEV1puwg17Q5blTZs2jRo1qra21oFH6nCuen7pK2aOM+6rGjfsDdljZo74cceisbHxlVde8fLyOnHixOeff+7v72+5qbS0VJIkk8mUk5MzZMgQ2/smJiaGh4enpqa2trYqS7744gsvLy+9lZCQkB9++EFm9Nh9X/W5T29s3brVYDAYjUaHHp/jEXf6wn36ipnjbtynNzxn5gj+Zpa1gICAjIwMX1/fEydO6HS65ubmmpoa5Saj0ejj46O8wdnS0lJVVWV7923btoWGht51110tLS2SJEVHR2s0muPHjxt/UV9fn5ycLEmSl5cHPapicJPeePLJJ3ft2nXo0CG9Xu+Eo4Ta3KSv4IbcpDc8auYI/kdy/vz5xx9/vKCgoLm5+cKFCy+88EJ7e/uECRMSEhImTZqUkZHR1NRkMpk2btx4//33e3l5xcbGpqenP/jgg1VVVbIsnzp1ytJqPj4+2dnZQUFBd9xxR2Njo7LmihUrlBVqamr27t2rrBkREVFUVHTF/ens7GxtbW1vb5ckqbW1ta2tTZWHAVfgbr2xZs2a7OzsAwcOaLXa1tZW4f9TqKjcra+YOe7D3XrD42aOa08uOVtDQ8PKlSvj4+P9/PxCQkKmTJny0UcfKTdVVFQsWLBAq9WOGDFi9erVTU1NyvILFy6sXLkyKioqMDAwJSWlqKhItvokfEdHx29/+9uJEydeuHDBbDavWbMmJiYmICDAYDCsXbtW2cLBgwfj4+NDQkIWLlzYbX/eeOMN6wff+gSmG7Ln+aU3+tUb9fX13f4wY2Nj1Xss+s9Vzy99xcxxxn3V4Va94YEzR2PZygBoNBql/IC3AHdmz/NLb4jNVc8vfSU2Zg56Yv/zK/ibWQAAAMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAILztn8TymXZAVv0BpyBvkJP6A30hLM7AABAcBpZll29DwAAAE7E2R0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgODs+lZlvr/SEwzsm5noDU+g/rd20VeegJmDntgzczi7AwAABOeAa2bxvcyisv/VEr0hKte+kqavRMXMQU/s7w3O7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAghM27uTl5c2ZM2fYsGH+/v5jxozZuHFjc3OzCnU7OjrWrFkzbNiwoKCge++99+LFi1dcLSAgQGPFx8enra1Nhd3zWK7qB5PJtGTJEq1WGxIScttttxUVFV1xtczMzLS0NH9//4iICOvly5cvt+6TrKwsFfYZA8PMgTVmjrsRM+785z//mTVr1o033nj06NHq6updu3ZVV1cfP368L/eVZbm9vX3Apbdu3XrgwIFvv/32zJkzZWVlq1atuuJqJpOp8RcLFiyYP3++j4/PgIuidy7sh9WrV5vN5tOnT1dWVo4YMWLx4sVXXE2r1a5bt27Lli22N2VkZFhaZdGiRQPeEzgVMwfWmDnuSLaD/Vtwhs7OTp1Ol5GR0W15V1eXLMvnzp1btGhRaGhoVFTUww8/3NzcrNyamJi4cePGW265JSEhITc3t6GhYdWqVTqdTqvV3n333TU1Ncpqr776ql6vDw4OHjFixHPPPWdbPSws7O2331Z+zs3N9fb2rq+v72Vva2pqfHx8vvjiCzuP2hnseX7dpzdc2w+xsbFvvfWW8nNubq6Xl1dHR0dPu7p79+7w8HDrJffff/8TTzwx0EN3Ilc9v+7TV9aYOY7CzGHm9MQBicW15Z1BSdAFBQVXvHXy5MlLly69ePFiVVXV5MmTH3roIWV5YmLiDTfcUFtbq/w6d+7c+fPn19TUtLS0PPjgg3PmzJFluaioKCAg4Oeff5Zl2Ww2f/fdd902XlVVZV1aOaucl5fXy96+/PLL8fHxdhyuE4kxelzYD7Isb9iwYdasWSaTqaGh4b777luwYEEvu3rF0TNixAidTjd+/PiXXnrp8uXL/X8AnIK4Y42Z4yjMHGZOT4g7V/D5559LklRdXW17U2FhofVNOTk5vr6+nZ2dsiwnJia+/vrryvKSkhKNRmNZraGhQaPRmM3m4uJiPz+/PXv2XLx48YqlT58+LUlSSUmJZYmXl9fHH3/cy94mJCS8/PLL/T9KNYgxelzYD8rK06dPVx6NpKSk8vLyXnbVdvQcOHDgyy+//Pnnn/fu3RsVFWX7etFViDvWmDmOwsxRljNzbNn//Ar42Z3Q0FBJkiorK21vqqio8Pf3V1aQJMlgMLS2ttbW1iq/RkZGKj8YjUaNRjNhwoSYmJiYmJixY8cGBwdXVlYaDIbMzMy///3vERER06ZNO3ToULftBwYGSpLU0NCg/NrY2NjV1RUUFPTPf/7T8skv6/Vzc3ONRuPy5csddeyw5cJ+kGV59uzZBoPhwoULTU1NS5YsueWWW5qbm3vqB1vp6emTJ0+Oi4tbuHDhSy+9tGvXLnseCjgJMwfWmDluyrVpyxmU903/+Mc/dlve1dXVLVnn5ub6+PhYkvW+ffuU5WfOnBk0aJDZbO6pREtLy/PPPz906FDlvVhrYWFhO3fuVH4+ePBg7++j33333ffcc0//Dk9F9jy/7tMbLuyHmpoayeaNhmPHjvW0HdtXWtb27NkzbNiw3g5VRa56ft2nr6wxcxyFmaMsZ+bYckBicW15J/nggw98fX2feeaZ4uLi1tbWU6dOrV69Oi8vr6ura9KkSffdd19jY+P58+enTJny4IMPKnexbjVZlu+4445FixadO3dOluXq6up3331XluWffvopJyentbVVluUdO3aEhYXZjp6NGzcmJiaWlJSYTKapU6cuXbq0p52srq6+7rrr3PMDgwoxRo/s0n7Q6/UrV65saGi4dOnSs88+GxAQcOHCBds97OjouHTpUmZmZnh4+KVLl5RtdnZ2vvXWW0aj0Ww2Hzx4MDY21vI2v8sRd7ph5jgEM8eyBWZON8SdHh05cuSOO+4ICQkZMmTImDFjXnjhBeUD8BUVFQsWLNBqtSNGjFi9enVTU5OyfrdWM5vNa9asiYmJCQgIMBgMa9eulWU5Pz9/4sSJQUFBQ4cOTU1NPXz4sG3dy5cvP/rooyEhIQEBAUuXLm1oaOhpD//617+67QcGFcKMHtl1/XD8+PH09PShQ4cGBQVNnjy5p39p3njjDetzrv7+/rIsd3Z2zp49e/jw4dddd53BYHjqqadaWloc/sgMDHHHFjPHfswcy92ZOd3Y//xqLFsZAOVdQHu2AHdmz/NLb4jNVc8vfSU2Zg56Yv/zK+BHlQEAAKwRdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwXnbv4mrXmEVHovegDPQV+gJvYGecHYHAAAIzq5rZgEAALg/zu4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARn17cq8/2VnmBg38xEb3gC9b+1i77yBMwc9MSemcPZHQAAIDgHXDPLVa/wqKtOXXt42mPlaXVdxdMeZ0+raw9Pe6w8ra49OLsDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABCcw+KO2Wz29vaOiYnR6/V/+MMf+v6f8o1G4+zZs3u69cMPPzQYDDExMZmZmWrWnT9/fkhIyKJFi3pawRl1S0tLZ86cGRUVlZSU9Mknn6hWt6WlJSUlRafT6fX6bdu29XGDfUdv2F9X1N6wh5OeX0mSWlpa9Hr9unXr1Kzr7++v0+l0Ot3ixYvVrHv27NmZM2eGhYUlJSW1traqU7egoED3C29v77y8vD5us4/oDYfUFa03ZDtYb6G+vj4qKkqW5dbW1gkTJnz88cd93EhpaemsWbOueFN7e7vBYDAajTU1NdHR0Q0NDerUlWU5Nzc3Ozt74cKF1gudXbe4uPjo0aOyLJ86dSo8PLyzs1Oduh0dHefPn5dlua6uLjIyUvm5W93+ojccW1ek3rCHCs+vLMsbN25cvHjx2rVr1ayr1+ttF6pQd/bs2Tt27JBluby8vL29XbW6ipqamhEjRnR0dNjW7S96w+F1hekNhePfzPLx8Zk4ceKZM2ckSWpra5s1a1ZKSsq4ceMOHTokSZLRaExNTX3ooYduvfXWRx991PqOeXl5kydPrqmpsSz5+uuvExIS9Hq9VqudMWNGTk6OOnUlSZoxY0ZgYKDKx2swGCZNmiRJ0vXXXy9JUnNzszp1Bw0aFB4eLklSR0dHQECAn59fXw58AOgNesMZHPv8lpSU/Pjjj3feeafKdV1yvKWlpUajccWKFZIkjRw50tu7t+/Zd8bxvvfee3fdddegQYMG9lBcFb1Bb/x/9mQl6y1YUt7FixfHjh2bm5sry3JnZ2d9fb0sy1VVVWlpabIsl5aWBgcH19TUyLI8bdq0kpISJeXl5eWlpqaaTCbr7b/77ru///3vlZ//9Kc/bd++XZ26is8++6wvr+AdXleW5U8//XTKlClq1m1oaIiOjh40aNAbb7xxxbr9RW84o64sRG/YQ4XjXbhwYWFh4c6dO6/6Ct6xdQMCAgwGw/jx4z/55BPV6n766aczZsyYP3/+TTfdtHnzZjWPVzFz5sycnJwr1u0vesOxdUXqjf+3Bbvu/L+HPWjQIL1ef9111y1btkxZ2NXV9fTTT6elpU2fPj04OFiW5dLS0mnTpim3rly5Mjc3t7S0VK/XjxkzxnKe3KKP/6Q5vK7iqv+kOaluWVlZUlLSTz/9pHJd5V6jRo0qLy+3rdtf9Aa94QzOPt5PPvlk/fr1siz3/k+aMx5no9Eoy3J+fn5kZGRdXZ06dT/++GNfX9/CwsJLly5NnTrV8maEOn1lMpkiIyMt71bI7j1z6A3Vjld2dG8oHPlmVkREhNFoLCsr++qrr3744QdJkvbv319cXHzo0KGDBw/6+voqqw0ePFj5wcvLq6OjQ5KksLAwPz+/EydOdNtgZGTkuXPnlJ8rKysjIyPVqeuq45UkyWw233XXXdu3bx89erSadRUxMTGpqamnTp3q/4NxFfQGveEMDj/eY8eO7dmzJyYm5rHHHnv77befffZZdepKkqTX6yVJGjduXHJy8unTp9WpGxUVlZiYmJiY6Ovre+utt548eVK145Uk6b333ps3b56T3smiN+iNbhz/2Z2IiIgtW7Zs3bpVkqT6+nqDweDt7f3111+bTKae7hIUFPTBBx889thj33zzjfXyiRMnFhUVlZeX19XV5ebm9v6BeQfW7RcH1r18+fKCBQvWr18/a9YsNetWVVUp0aGiouLYsWPJyclXrT4w9Aa94QwOPN7NmzdXVFQYjca//e1vv/vd7zZt2qRO3bq6ugsXLkiSVFRUdOrUqdjYWHXq3nDDDV1dXRUVFZ2dnf/973+TkpLUqavYs2fPkiVLeqloP3qD3rBwyvfuLF68+MSJE4WFhfPmzfv666+XLl36r3/9Kzo6upe7REREZGdnP/DAA0VFRZaF3t7er7322owZM1JSUrZu3RoUFKROXUmSbrvttqVLl+7fv1+n0xUUFKhT9/PPPz98+PDTTz+t/B88o9GoTt26urrZs2dHRUXNmjXrz3/+s/JKwknoDXrDGRz4/Lqk7tmzZ1NTU6Oion7zm9+8/vrroaGh6tTVaDTbtm1LT09PSkq6/vrr586dq05dSZJMJtPp06enTZvWe0X70Rv0hkJjeUtsIHfWaCRJsmcL1BW17rW4z9SlLnWv3brX4j5TV826fKsyAAAQHHEHAAAIjrgDAAAEp17c6eUKR1e9CNGAlfZ8pSEVLgbU09VVrnoBFHv0dJUTZ1+kxh5/+ctf4uPj4+Li1q9fb3tTQkJCQkLCvn377Kxi22b96skBd2m3O/bSk7Yr29OlV9zhnnrSdmWndqk6mDkWzJxumDk9rSzyzLHnS3v6vgXbKxw1NDR0dXUpt17xIkQOqWt7pSFL3Z4uBuSQugrrq6tYH+8VL4DiqLrdrnJiXVfR7UIkjqo74PuePXtWr9e3tLS0t7enpKR88803ln3+7rvvbrrppkuXLtXV1SmT1J663dqsvz3Ze5f2vW4vPWm78lW7tO91FT31pO3KvXep/dNjYJg5vWPm9GVNZo5nzhyVzu7YXuFo7NixlZWVyq19vwhRf9leachS19kXA+p2dRXr43WeUpurnNjWdfZFavorICDA19e3ra1NuQTd8OHDLftcWFiYmprq6+s7bNiwkSNHHj582J5C3dqsvz054C7tdsdeetJ2ZXu61HaHe+lJ5/0Nugozh5nTE2aOZ84cleLOuXPnoqKilJ91Ol1lZWVWVtZVvz/AgT777LO4uLjAwEDruhcvXtTr9ZGRkevXr7/qF7f014YNG55//nnLr9Z16+rqYmNjb7755gMHDji26JkzZ3Q63YIFC8aNG7dly5ZudRUqfLVXv4SEhGRkZERHR0dGRs6bN2/UqFGWfR4zZsyRI0caGxvPnz+fn5/v2Nntnj1py4Fd2ktP2nJel6rDPZ9fZo47YOZ45szp7RqnTqWETXWUl5evXbs2Ozu7W92goKCysjKj0Thz5sw5c+aMHDnSURUPHDgQHR2dmJh49OhRZYl13VOnTun1+oKCgrlz5548eXLYsGGOqtvZ2Xns2LHvv/9er9enp6dPmjTp9ttvt16hurq6sLBw+vTpjqpov/Ly8ldffbWkpMTX1/dXv/rV3LlzLY/VmDFjVq1aNX369IiIiLS0tN4vyWs/d+hJW47q0t570pbzutRV3OH5Zea4A2aOZ84clc7u9PEKR85w1SsNOeNiQL1fXaUvF0AZmKte5cSpF6kZmIKCgptvvlmr1QYEBMycOfOrr76yvvWRRx7Jz8/fv39/fX19XFycA+u6c0/asr9L+3jFHwvndak63Pn5Zea4FjOnL8SbOSrFHdsrHG3evNlsNju7ru2Vhix1nXoxINurq1jq9usCKP1le5WTbo+zu51VliQpPj7+m2++aWpqamtrO3z4cEJCgvU+l5WVSZL04Ycfms3m1NRUB9Z1w5605cAu7aUnbTm1S9Xhhs8vM8dNMHM8dObY8znnfm3hgw8+GDVqVHR09M6dO2VZHjlyZGNjo3JTenq6Vqv18/OLiorKz893YN2PPvpo0KBBUb8oLS211D158mRSUlJkZGRCQsKuXbv6srUBPGI7d+5UPpFuqVtQUBAXFxcZGTl69Oi9e/c6vO4XX3yRlJQUHx+/bt06+X8f5/Pnz0dGRnZ2dvZxU/Z0SL/u+/zzz8fFxcXGxmZkZMj/u88TJ04MCwu7+eabT506ZWdd2zbrV0/23qV9r9tLT9qufNUu7dfxKmx70nblq3ap/dNjYJg5V8XM6QtmjgfOHPXijrWioqJHH32UuqLWtee+nvZYeVpdO11zx0tdderac19Pe6w8ra4FlwilrlPqXov7TF3qUvfarXst7jN11azLRSQAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAME54GsGITZ7vvILYnPVV41BbMwc9ISvGQQAAOiRXWd3AAAA3B9ndwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACC4/wNeW27o5DoAAAACSURBVCEI/r8gawAAAABJRU5ErkJggg==" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=4 + #SBATCH --cpus-per-task=4 + + export OMP_NUM_THREADS=4 + + srun --ntasks 8 --cpus-per-task $OMP_NUM_THREADS ./application + +### + +### + +### Core Bound + +#### Distribution: block:block + +This method allocates the tasks linearly to the cores. + +\<img alt="" +src="<data:;base64,iVBORw0KGgoAAAANSUhEUgAAAvoAAADyCAIAAACzsfbGAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3df1RUdf7H8TuIIgy/lOGXICgoGPZjEUtByX54aK3VVhDStVXyqMlapLSZfsOf7UnUU2buycgs5XBytkzdPadadkWyg9nZTDHN/AEK/uLn6gy/HPl1v3/csxyOMIQMd2b4zPPxF3Pnzr2f952Pb19zZ+aORpZlCQAAQFxOth4AAACAuog7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDhnSx6s0Wj6ahwA+h1Zlq28R3oO4Mgs6Tmc3QEAAIKz6OyOwvqv8ADYlm3PstBzAEdjec/h7A4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB3H9dhjj2k0mu+++659SUBAwMGDB3u+haKiInd3956vn5OTExcXp9VqAwIC7mGgAIRg/Z6zfPnyqKgoNze3kJCQFStWNDU13cNwIRbijkPz8fF57bXXrLY7nU63bNmydevWWW2PAOyKlXtOfX19dnb21atX9Xq9Xq9fu3at1XYNe0PccWgLFy4sKSn54osvOt9VXl6enJzs5+cXHBz80ksvNTY2KsuvXr361FNPeXt733///UePHm1fv7a2Ni0tbfjw4b6+vrNnz66pqem8zaeffjolJWX48OEqlQPAzlm55+zcuTM+Pt7HxycuLu6FF17o+HA4GuKOQ3N3d1+3bt2qVauam5vvuispKWngwIElJSXHjx8/ceJERkaGsjw5OTk4OLiiouKrr7764IMP2tefO3duZWXlyZMnr1y54uXllZqaarUqAPQXNuw5hYWFMTExfVoN+hXZApZvATY0ZcqUN998s7m5ecyYMdu3b5dl2d/f/8CBA7Isnzt3TpKkqqoqZc38/PzBgwe3traeO3dOo9HcvHlTWZ6Tk6PVamVZvnTpkkajaV/faDRqNBqDwdDlfvfu3evv7692dVCVrf7t03P6NVv1HFmW16xZM3LkyJqaGlULhHos/7fvbO14BTvj7OyclZW1aNGiefPmtS+8du2aVqv19fVVboaFhZlMppqammvXrvn4+AwZMkRZPnr0aOWP0tJSjUbz8MMPt2/By8vr+vXrXl5e1qoDQP9g/Z6zYcOG3NzcgoICHx8ftaqC3SPuQHr22WfffvvtrKys9iXBwcENDQ3V1dVK9yktLXVxcdHpdEFBQQaD4c6dOy4uLpIkVVRUKOuHhIRoNJpTp06RbwD8Kmv2nJUrV+7fv//IkSPBwcGqFYR+gM/uQJIkacuWLdu2baurq1NuRkRETJw4MSMjo76+vrKyMjMzc/78+U5OTmPGjImOjt66daskSXfu3Nm2bZuyfnh4eEJCwsKFC8vLyyVJqq6u3rdvX+e9tLa2mkwm5T17k8l0584dK5UHwM5Yp+ekp6fv378/Ly9Pp9OZTCa+iO7IiDuQJEmaMGHCM8880/5VCI1Gs2/fvsbGxpEjR0ZHRz/44IPvvPOOctfnn3+en58/bty4J5544oknnmjfwt69e4cNGxYXF+fh4TFx4sTCwsLOe9m5c6erq+u8efMqKytdXV05sQw4LCv0HIPBsH379osXL4aFhbm6urq6ukZFRVmnOtghTfsngHrzYI1GkiRLtgCgP7LVv316DuCYLP+3z9kdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOGfLN6HRaCzfCAD0ED0HwL3i7A4AABCcRpZlW48BAABARZzdAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQnEWXGeRiX46gd5cqYG44AutfxoJ55QjoOTDHkp7D2R0AACC4PvgRCS5UKCrLXy0xN0Rl21fSzCtR0XNgjuVzg7M7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAAQnftw5e/bs9OnTdTqdm5vbmDFjXn/99V5sZMyYMQcPHuzhyr/5zW/0en2Xd+Xk5MTFxWm12oCAgF4MA33LrubG8uXLo6Ki3NzcQkJCVqxY0dTU1IvBwB7Y1byi59gVu5objtZzBI87bW1tv/3tb4cNG3b69Omamhq9Xh8WFmbD8eh0umXLlq1bt86GY4DC3uZGfX19dnb21atX9Xq9Xq9fu3atDQeDXrO3eUXPsR/2NjccrufIFrB8C2q7evWqJElnz57tfNeNGzdmzZrl6+sbFBS0dOnShoYGZfmtW7fS0tJCQkI8PDyio6PPnTsny3JkZOSBAweUe6dMmTJv3rympiaj0bhkyZLg4GCdTvfcc89VV1fLsvzSSy8NHDhQp9OFhobOmzevy1Ht3bvX399frZr7jiXPL3Ojd3NDsWbNmvj4+L6vue/Y6vllXtFz1Hisddjn3FA4Qs8R/OzOsGHDIiIilixZ8re//e3KlSsd70pKSho4cGBJScnx48dPnDiRkZGhLJ8zZ05ZWdmxY8cMBsOePXs8PDzaH1JWVjZp0qTJkyfv2bNn4MCBc+fOraysPHny5JUrV7y8vFJTUyVJ2r59e1RU1Pbt20tLS/fs2WPFWnFv7HluFBYWxsTE9H3NUJ89zyvYlj3PDYfoObZNW1ZQWVm5cuXKcePGOTs7jxo1au/evbIsnzt3TpKkqqoqZZ38/PzBgwe3traWlJRIknT9+vW7NhIZGbl69erg4ODs7GxlyaVLlzQaTfsWjEajRqMxGAyyLD/00EPKXszhlZadsMO5IcvymjVrRo4cWVNT04eV9jlbPb/MK3qOGo+1GjucG7LD9Bzx4067urq6t99+28nJ6aeffjp06JBWq22/6/Lly5IkVVZW5ufnu7m5dX5sZGSkv7//hAkTTCaTsuTw4cNOTk6hHXh7e//8888yrcfix1qf/cyN9evXh4WFlZaW9ml9fY+40xP2M6/oOfbGfuaG4/Qcwd/M6sjd3T0jI2Pw4ME//fRTcHBwQ0NDdXW1cldpaamLi4vyBmdjY2N5eXnnh2/bts3X13fGjBmNjY2SJIWEhGg0mlOnTpX+z61bt6KioiRJcnJyoKMqBjuZGytXrszNzT1y5EhoaKgKVcLa7GRewQ7ZydxwqJ4j+D+SioqK11577eTJkw0NDTdv3ty4cWNzc/PDDz8cERExceLEjIyM+vr6ysrKzMzM+fPnOzk5hYeHJyQkLF68uLy8XJblM2fOtE81FxeX/fv3e3p6Tps2ra6uTllz4cKFygrV1dX79u1T1gwICDh//nyX42ltbTWZTM3NzZIkmUymO3fuWOUwoAv2NjfS09P379+fl5en0+lMJpPwXwoVlb3NK3qO/bC3ueFwPce2J5fUZjQaFy1aNHr0aFdXV29v70mTJn355ZfKXdeuXUtMTNTpdIGBgWlpafX19crymzdvLlq0KCgoyMPDY9y4cefPn5c7fBK+paXlj3/84yOPPHLz5k2DwZCenj5ixAh3d/ewsLBXXnlF2cI333wzevRob2/vpKSku8azY8eOjge/4wlMO2TJ88vcuKe5cevWrbv+YYaHh1vvWNw7Wz2/zCt6jhqPtQ67mhsO2HM07VvpBY1Go+y+11uAPbPk+WVuiM1Wzy/zSmz0HJhj+fMr+JtZAAAAxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgnO2fBPKz7IDnTE3oAbmFcxhbsAczu4AAADBaWRZtvUYAAAAVMTZHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Cy6qjLXr3QEvbsyE3PDEVj/ql3MK0dAz4E5lvQczu4AAADB9cFvZnFdZlFZ/mqJuSEq276SZl6Jip4DcyyfG5zdAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAEJ2zcOXr06DPPPDN06FCtVvvAAw9kZmY2NDRYYb8tLS3p6elDhw719PScO3dubW1tl6u5u7trOnBxcblz544VhuewbDUfKisrU1JSdDqdt7f3U089df78+S5Xy8nJiYuL02q1AQEBHZenpqZ2nCd6vd4KY0bv0HPQET3H3ogZd/7xj388+eSTDz300LFjx6qqqnJzc6uqqk6dOtWTx8qy3Nzc3Otdr1+/Pi8v7/jx48XFxWVlZUuWLOlytcrKyrr/SUxMnDlzpouLS693iu7ZcD6kpaUZDIYLFy5cv349MDAwOTm5y9V0Ot2yZcvWrVvX+a6MjIz2qTJr1qxejwSqouegI3qOPZItYPkW1NDa2hocHJyRkXHX8ra2NlmWb9y4MWvWLF9f36CgoKVLlzY0NCj3RkZGZmZmTp48OSIioqCgwGg0LlmyJDg4WKfTPffcc9XV1cpq77zzTmhoqJeXV2Bg4Jtvvtl5735+fh9//LHyd0FBgbOz861bt7oZbXV1tYuLy+HDhy2sWg2WPL/2MzdsOx/Cw8M/+ugj5e+CggInJ6eWlhZzQ927d6+/v3/HJfPnz3/99dd7W7qKbPX82s+86oie01foOfQcc/ogsdh292pQEvTJkye7vDc2NnbOnDm1tbXl5eWxsbEvvviisjwyMvL++++vqalRbv7ud7+bOXNmdXV1Y2Pj4sWLn3nmGVmWz58/7+7ufvHiRVmWDQbDjz/+eNfGy8vLO+5aOat89OjRbka7ZcuW0aNHW1CuisRoPTacD7Isr1ix4sknn6ysrDQajc8//3xiYmI3Q+2y9QQGBgYHB8fExGzatKmpqeneD4AqiDsd0XP6Cj2HnmMOcacLhw4dkiSpqqqq813nzp3reFd+fv7gwYNbW1tlWY6MjPzrX/+qLL906ZJGo2lfzWg0ajQag8FQUlLi6ur62Wef1dbWdrnrCxcuSJJ06dKl9iVOTk5ff/11N6ONiIjYsmXLvVdpDWK0HhvOB2XlKVOmKEfjvvvuu3LlSjdD7dx68vLyvvvuu4sXL+7bty8oKKjz60VbIe50RM/pK/QcZTk9pzPLn18BP7vj6+srSdL169c733Xt2jWtVqusIElSWFiYyWSqqalRbg4bNkz5o7S0VKPRPPzwwyNGjBgxYsSDDz7o5eV1/fr1sLCwnJyc999/PyAg4NFHHz1y5Mhd2/fw8JAkyWg0Kjfr6ura2to8PT13797d/smvjusXFBSUlpampqb2Ve3ozIbzQZblqVOnhoWF3bx5s76+PiUlZfLkyQ0NDebmQ2cJCQmxsbGjRo1KSkratGlTbm6uJYcCKqHnoCN6jp2ybdpSg/K+6auvvnrX8ra2truSdUFBgYuLS3uyPnDggLK8uLh4wIABBoPB3C4aGxvfeuutIUOGKO/FduTn5/fJJ58of3/zzTfdv4/+3HPPzZ49+97KsyJLnl/7mRs2nA/V1dVSpzcavv/+e3Pb6fxKq6PPPvts6NCh3ZVqRbZ6fu1nXnVEz+kr9BxlOT2nsz5ILLbdvUr+/ve/Dx48ePXq1SUlJSaT6cyZM2lpaUePHm1ra5s4ceLzzz9fV1dXUVExadKkxYsXKw/pONVkWZ42bdqsWbNu3Lghy3JVVdXnn38uy/Ivv/ySn59vMplkWd65c6efn1/n1pOZmRkZGXnp0qXKysr4+Pg5c+aYG2RVVdWgQYPs8wODCjFaj2zT+RAaGrpo0SKj0Xj79u0NGza4u7vfvHmz8whbWlpu376dk5Pj7+9/+/ZtZZutra0fffRRaWmpwWD45ptvwsPD29/mtznizl3oOX2CntO+BXrOXYg7ZhUWFk6bNs3b29vNze2BBx7YuHGj8gH4a9euJSYm6nS6wMDAtLS0+vp6Zf27pprBYEhPTx8xYoS7u3tYWNgrr7wiy/KJEyceeeQRT0/PIUOGTJgw4dtvv+2836amppdfftnb29vd3X3OnDlGo9HcCDdv3my3HxhUCNN6ZNvNh1OnTiUkJAwZMsTT0zM2Ntbc/zQ7duzoeM5Vq9XKstza2jp16lQfH59BgwaFhYWtWrWqsbGxz49M7xB3OqPnWI6e0/5wes5dLH9+Ne1b6QXlXUBLtgB7Zsnzy9wQm62eX+aV2Og5MMfy51fAjyoDAAB0RNwBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAATnbPkmfvUXVuGwmBtQA/MK5jA3YA5ndwAAgOAs+s0sAAAA+8fZHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Cy6qjLXr3QEvbsyE3PDEVj/ql3MK0dAz4E5lvQczu4AAADB9cFvZjnOdZmVVw+OVq8lHO1YOVq9tuJox9nR6rWEox0rR6vXEpzdAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOJvFnccee0yj0Wg0Gg8Pj0ceeSQvL6/XmxozZszBgwe7WaGlpSU9PX3o0KGenp5z586tra3t9b56zZr15uTkxMXFabXagICAXu/Fhqx5rJYvXx4VFeXm5hYSErJixYqmpqZe76vXrFlvZmbmyJEjXVxcfHx8ZsyYUVxc3Ot99TvWPM6KlpaW6OhojUZTUVHR6331mjXrTU1N1XSg1+t7vS+bsPLc+Ne//jVhwoTBgwf7+vquWLGi1/vqNWvW6+7u3nFuuLi43Llzp9e7s4Qtz+783//9X3Nz89WrV6dOnTpz5syamhqVdrR+/fq8vLzjx48XFxeXlZUtWbJEpR11z2r16nS6ZcuWrVu3TqXtW4HVjlV9fX12dvbVq1f1er1er1+7dq1KO+qe1eqdPn16fn5+TU3N8ePHnZyc5s+fr9KO7JPVjrMiKyvLx8dH1V10z5r1ZmRk1P3PrFmz1NuRSqx2rA4fPpyUlLRw4cKysrITJ05Mnz5dpR11z2r1VlZWtk+MxMTEmTNnuri4qLSv7tky7mg0GmdnZ29v71deeeX27du//PKLsvydd96JjIz08PAYMWLEW2+91b7+mDFj1q1b9/jjj99///3jx48/ffr0XRs0GAyPPfbY/Pnzm5ubOy7/8MMPV65cGRYW5ufn95e//OXzzz83GAxqV9eZ1ep9+umnU1JShg8frnZF6rHasdq5c2d8fLyPj09cXNwLL7xw9OhRtUvrktXqnTBhQlhYmIeHR3Bw8LBhw7y9vdUuza5Y7ThLknT27Nndu3dv3LhR1Yq6Z816Bw4c6P4/zs59cEU3K7PasVq9evXSpUsXLVrk7+8/fPjw+Ph4tUvrktXq1Wq1yqwwmUxffvnliy++qHZp5tjFZ3f0er2rq2tkZKRyMzg4+J///Gdtbe2BAwfee++9L774on3NL7/88sCBA2fOnElOTl66dGnHjZSVlU2aNGny5Ml79uwZOHBg+/KKioqqqqro6GjlZkxMTEtLy9mzZ9UvyyxV6xWMNY9VYWFhTEyMSoX0kBXqzcnJCQgI8PDwOH369Keffqp2RfZJ7ePc2tq6YMGCrVu3enh4WKGcX2WdeTV8+PDx48dv3ry5cxjqR1Q9ViaT6fvvv29tbb3vvvuGDBny5JNP/vTTT9apyxyr9djdu3eHhIQ8/vjj6tXyK2QLWLKFKVOmaLVaf39/JfodPny4y9VWrFiRlpam/B0ZGblz507l77Nnz7q6urYvX716dXBwcHZ2ductXLhwQZKkS5cutS9xcnL6+uuvezHmflFvu7179/r7+/dutApL6u1fx0qW5TVr1owcObKmpqZ3Y+5H9TY2Nt64cePbb7+Njo5euHBh78Zsefew/n6teZy3bNmSnJwsy7Lyorm8vLx3Y+4v9ebl5X333XcXL17ct29fUFBQRkZG78YsfM8pLy+XJGnkyJFnzpypr69ftmxZUFBQfX19L8bcL+rtKCIiYsuWLb0bsNwXPceWZ3cWLVpUVFT07bffRkVFffLJJ+3LDx48+Oijj4aEhISGhn744YfV1dXtd+l0OuUPV1fX27dvt7S0KDc//PDDoKCgLj+IoLy6MhqNys26urq2tjZPT0+ViuqGdeoVg5WP1YYNG3JzcwsKCmz1SQtr1uvq6hoYGBgfH79t27Zdu3Y1NjaqU5M9ss5xLi4u3rp16/bt29UspUesNq8SEhJiY2NHjRqVlJS0adOm3Nxc1WpSi3WOlbu7uyRJaWlpY8eO1Wq1GzdurKio+PHHH1UszAwr99iCgoLS0tLU1NS+r6THbBl3lK8OjRs37tNPP9Xr9YWFhZIklZeXp6SkrF27tqysTPlYsdyD3wTZtm2br6/vjBkzOvfugIAAPz+/oqIi5eaJEyecnZ2joqL6vJxfZZ16xWDNY7Vy5crc3NwjR46Ehob2cRk9Zqu5MWDAgAEDBvRBAf2EdY5zYWFhTU3N2LFjdTpdbGysJEljx47dtWuXGhV1zybzatCgQe3/EfYj1jlW7u7uo0aNav/5Jxv+9pyV50Z2dnZiYmJ7YLIJu/jsTnh4+IIFC1avXi1JUl1dnSRJDz74oEajuXHjRg8/W+Di4rJ//35PT89p06YpW+ho8eLFWVlZly9frqqqWr16dXJysm0/oal2va2trSaTSXn73GQy2epbf31C7WOVnp6+f//+vLw8nU5nMpls8kX0jlStt7m5OSsr6/z580aj8YcffsjIyHj22Wdt9S0J21L1OKekpJSUlBQVFRUVFSnf0T106NDs2bNVqKOnVK23ra1t165dZWVlRqPxyJEjq1atSk5OVqMK61C75/zpT3/asWPHhQsXTCZTZmbmsGHDxo8f3+dV9Jza9UqSVF1dfeDAgcWLF/ftyO+VXcQdSZLeeOONY8eOHT58OCIiYu3atZMmTZo0adKSJUsSEhJ6uIWBAwfq9frQ0NCpU6feunWr411r1qxJSEgYN25ceHh4cHDwBx98oEIF90bVenfu3Onq6jpv3rzKykpXV1fbfhXWcuodK4PBsH379osXL4aFhbm6urq6utrktN9d1KtXo9EcO3ZsypQpfn5+KSkp8fHxH3/8sTpF9APqHWc3N7fg//H395ckKTAwUKvVqlJGj6nac/R6fUxMjJ+f34IFC1JSUrZu3apCBdaj6rFatmzZ888//+ijj/r7+584ceKrr75yc3NToYh7oGq9kiTt3r07NDTUlh9SliRJkjQ9OVVl9sEO+QP01Kv2Y/sj6hV7v7ZCvdZ5bH9EvffKXs7uAAAAqIS4AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcPYbd1paWtLT04cOHerp6Tl37tza2lpza2ZmZo4cOdLFxcXHx2fGjBnFxcXWHGffamlpiY6O1mg0FRUV5tZxd3fXdODi4tKvLyTYQ5WVlSkpKTqdztvb+6mnnjp//nyXq+Xk5MTFxSkXDO35XXbC3AiXL18eFRXl5uYWEhKyYsWKbq6FaG4LqampHeeMXq9Xq4b+jJ5jbh16Dj3nXrdghz3HfuPO+vXr8/Lyjh8/XlxcrFzN2tya06dPz8/Pr6mpOX78uJOTU7/+JamsrKxfvSpgZWVl3f8kJibOnDnTES6Mm5aWZjAYLly4cP369cDAQHOXbdXpdMuWLVu3bt093WUnzI2wvr4+Ozv76tWrer1er9evXbv2XrcgSVJGRkb7tJk1a1afDlwQ9Bxz6Dn0nHvdgmSHPceS3xe1fAvd8PPz+/jjj5W/CwoKnJ2db9261f1Dmpqa0tLSnn76aZWGpGq9siz//PPP4eHh//nPf6Se/YRydXW1i4uLuR+ztZwl9fb5sQoPD//oo4+UvwsKCpycnFpaWsyt3M2vwVv+Q/Fd6sN6ux/hmjVr4uPj73UL8+fPf/311/tkeAq1/y3YZL/0nF9dn55jbmV6jv33HDs9u1NRUVFVVRUdHa3cjImJaWlpOXv2rLn1c3JyAgICPDw8Tp8+3cOf+bA3ra2tCxYs2Lp1q/IT7j2xe/fukJAQm1+Z2zqSkpL27t1bVVVVW1u7a9eu3//+9w7125btCgsLY2JievHAnJyc4cOHjx8/fvPmzcrvqaEjek5P0HNsPSgbEKbn2GncUX5mzMvLS7np4eHh5OTUzVvpycnJJ0+e/Pe//93Q0PDnP//ZSqPsU1u3bg0JCZk+fXrPH7Jz506b/+ia1bzxxhstLS3+/v5eXl4//vjju+++a+sR2cDatWsvX76cmZl5rw/8wx/+8MUXXxQUFKxateq9995buXKlGsPr1+g5PUHPcTQi9Rw7jTvKqw2j0ajcrKura2tr8/T0lCRp9+7d7Z9+al/f1dU1MDAwPj5+27Ztu3bt6uZn6O1TcXHx1q1bt2/f3vmuLuuVJKmgoKC0tDQ1NdVKQ7QpWZanTp0aFhZ28+bN+vr6lJSUyZMnNzQ0mDs4QtqwYUNubm5BQUH7Jy16Xn5CQkJsbOyoUaOSkpI2bdqUm5ur/nj7GXpOO3qORM+RJEm4nmOncScgIMDPz6+oqEi5eeLECWdnZ+XXqlNTU+96M+8uAwYM6HenHAsLC2tqasaOHavT6WJjYyVJGjt27K5duyTz9WZnZycmJup0OtuM2Lr++9///vDDD+np6UOGDDMTiKgAAAKYSURBVNFqta+++uqVK1fOnDnzq5NBGCtXrszNzT1y5EhoaGj7wt6VP2jQoJaWFhXG2L/Rc+g5HdFzxOs5dhp3JElavHhxVlbW5cuXq6qqVq9enZyc7O3t3Xm15ubmrKys8+fPG43GH374ISMj49lnn+133xpISUkpKSkpKioqKio6ePCgJEmHDh2aPXu2ufWrq6sPHDjgOGeVdTpdaGjo+++/X1tbazKZ3n33XXd394iIiM5rtra2mkwm5X1ik8nU8euy3dxlJ8yNMD09ff/+/Xl5eTqdzmQydfOl0C630NbWtmvXrrKyMqPReOTIkVWrVpn7jomDo+fQc9rRcwTsOZZ8ztnyLXSjqanp5Zdf9vb2dnd3nzNnjtFo7HK15ubmGTNm+Pv7Dxo0aMSIEcuXLze3puVUrbfdL7/8Iv3atyQ2b948evRotUdiSb19fqxOnTqVkJAwZMgQT0/P2NhYc98N2bFjR8fprdVqe3KX5fqk3i5HeOvWrbv+zYaHh9/TFlpbW6dOnerj4zNo0KCwsLBVq1Y1NjZaOFTr/Fuw8n7pOd2sQ8+h5/R8C/bZczSyBWfklHfvLNlC/0K91nlsf0S9Yu/XVqjXOo/tj6j3Xtnvm1kAAAB9grgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwzpZvwhGupd2Ro9VrCUc7Vo5Wr6042nF2tHot4WjHytHqtQRndwAAgOAsuswgAACA/ePsDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAAT3/8Z/zKE559m+AAAAAElFTkSuQmCC>" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=4 + #SBATCH --cpus-per-task=4 + + export OMP_NUM_THREADS=4 + + srun --ntasks 8 --cpus-per-task $OMP_NUM_THREADS --cpu_bind=cores --distribution=block:block ./application + +#### + +#### + +#### Distribution: cyclic:block + +The cyclic:block distribution will allocate the tasks of your job in +alternation between the first node and the second node while filling the +sockets linearly. + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAvoAAADyCAIAAACzsfbGAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3df1RUdf7H8TuIIszwQxl+CYKCgmE/FrEUleyHh9ZabQUhXVslj5qsRUqb6Tf82Z5EPWXmnozMUg4nZ8vU3XOqZVckO5idzRTTzB+g4C9+rs7wy5Ff9/vHPTuHI0LIcGeGzzwffzl37tz7ed/5zNsXd2buaGRZlgAAAMTlYu8BAAAAqIu4AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIJztebBGo2mt8YBoM+RZdnGe6TnAM7Mmp7D2R0AACA4q87uKGz/Fx4A+7LvWRZ6DuBsrO85nN0BAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrjjvB577DGNRvPdd99ZlgQGBh44cKD7WygqKtLpdN1fPycnZ8KECVqtNjAw8B4GCkAItu85y5Yti46O9vDwCA0NXb58eVNT0z0MF2Ih7jg1X1/f1157zWa70+v1S5cuXbt2rc32CMCh2Ljn1NfXZ2dnX7lyxWAwGAyGNWvW2GzXcDTEHae2YMGCkpKSL774ouNd5eXlycnJ/v7+ISEhL730UmNjo7L8ypUrTz31lI+Pz/3333/kyBHL+rW1tWlpaUOHDvXz85s1a1ZNTU3HbT799NMpKSlDhw5VqRwADs7GPWfHjh3x8fG+vr4TJkx44YUX2j8czoa449R0Ot3atWtXrlzZ3Nx8x11JSUn9+/cvKSk5duzY8ePHMzIylOXJyckhISEVFRVfffXVBx98YFl/zpw5lZWVJ06cuHz5sre3d2pqqs2qANBX2LHnFBYWxsbG9mo16FNkK1i/BdjR5MmT33zzzebm5lGjRm3btk2W5YCAgP3798uyfPbsWUmSqqqqlDXz8/MHDhzY2tp69uxZjUZz48YNZXlOTo5Wq5Vl+eLFixqNxrK+yWTSaDRGo/Gu+92zZ09AQIDa1UFV9nrt03P6NHv1HFmWV69ePXz48JqaGlULhHqsf+272jpewcG4urpmZWUtXLhw7ty5loVXr17VarV+fn7KzfDwcLPZXFNTc/XqVV9f30GDBinLR44cqfyjtLRUo9E8/PDDli14e3tfu3bN29vbVnUA6Bts33PWr1+fm5tbUFDg6+urVlVweMQdSM8+++zbb7+dlZVlWRISEtLQ0FBdXa10n9LSUjc3N71eHxwcbDQab9++7ebmJklSRUWFsn5oaKhGozl58iT5BsCvsmXPWbFixb59+w4fPhwSEqJaQegD+OwOJEmSNm/evHXr1rq6OuVmZGTk+PHjMzIy6uvrKysrMzMz582b5+LiMmrUqJiYmC1btkiSdPv27a1btyrrR0REJCQkLFiwoLy8XJKk6urqvXv3dtxLa2ur2WxW3rM3m823b9+2UXkAHIxtek56evq+ffvy8vL0er3ZbOaL6M6MuANJkqRx48Y988wzlq9CaDSavXv3NjY2Dh8+PCYm5sEHH3znnXeUuz7//PP8/PwxY8Y88cQTTzzxhGULe/bsGTJkyIQJEzw9PcePH19YWNhxLzt27HB3d587d25lZaW7uzsnlgGnZYOeYzQat23bduHChfDwcHd3d3d39+joaNtUBweksXwCqCcP1mgkSbJmCwD6Inu99uk5gHOy/rXP2R0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA4V+s3odForN8IAHQTPQfAveLsDgAAEJxGlmV7jwEAAEBFnN0BAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABCcVZcZ5GJfzqBnlypgbjgD21/GgnnlDOg56Iw1PYezOwAAQHC98CMSXKhQVNb/tcTcEJV9/5JmXomKnoPOWD83OLsDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDix50zZ85MmzZNr9d7eHiMGjXq9ddf78FGRo0adeDAgW6u/Jvf/MZgMNz1rpycnAkTJmi12sDAwB4MA73LoebGsmXLoqOjPTw8QkNDly9f3tTU1IPBwBE41Lyi5zgUh5obztZzBI87bW1tv/3tb4cMGXLq1KmamhqDwRAeHm7H8ej1+qVLl65du9aOY4DC0eZGfX19dnb2lStXDAaDwWBYs2aNHQeDHnO0eUXPcRyONjecrufIVrB+C2q7cuWKJElnzpzpeNf169dnzpzp5+cXHBy8ZMmShoYGZfnNmzfT0tJCQ0M9PT1jYmLOnj0ry3JUVNT+/fuVeydPnjx37tympiaTybR48eKQkBC9Xv/cc89VV1fLsvzSSy/1799fr9eHhYXNnTv3rqPas2dPQECAWjX3HmueX+ZGz+aGYvXq1fHx8b1fc++x1/PLvKLnqPFY23DMuaFwhp4j+NmdIUOGREZGLl68+G9/+9vly5fb35WUlNS/f/+SkpJjx44dP348IyNDWT579uyysrKjR48ajcbdu3d7enpaHlJWVjZx4sRJkybt3r27f//+c+bMqaysPHHixOXLl729vVNTUyVJ2rZtW3R09LZt20pLS3fv3m3DWnFvHHluFBYWxsbG9n7NUJ8jzyvYlyPPDafoOfZNWzZQWVm5YsWKMWPGuLq6jhgxYs+ePbIsnz17VpKkqqoqZZ38/PyBAwe2traWlJRIknTt2rU7NhIVFbVq1aqQkJDs7GxlycWLFzUajWULJpNJo9EYjUZZlh966CFlL53hLy0H4YBzQ5bl1atXDx8+vKamphcr7XX2en6ZV/QcNR5rMw44N2Sn6Tnixx2Lurq6t99+28XF5aeffjp48KBWq7XcdenSJUmSKisr8/PzPTw8Oj42KioqICBg3LhxZrNZWXLo0CEXF5ewdnx8fH7++WeZ1mP1Y23PcebGunXrwsPDS0tLe7W+3kfc6Q7HmVf0HEfjOHPDeXqO4G9mtafT6TIyMgYOHPjTTz+FhIQ0NDRUV1crd5WWlrq5uSlvcDY2NpaXl3d8+NatW/38/KZPn97Y2ChJUmhoqEajOXnyZOn/3Lx5Mzo6WpIkFxcnOqpicJC5sWLFitzc3MOHD4eFhalQJWzNQeYVHJCDzA2n6jmCv0gqKipee+21EydONDQ03LhxY8OGDc3NzQ8//HBkZOT48eMzMjLq6+srKyszMzPnzZvn4uISERGRkJCwaNGi8vJyWZZPnz5tmWpubm779u3z8vKaOnVqXV2dsuaCBQuUFaqrq/fu3ausGRgYeO7cubuOp7W11Ww2Nzc3S5JkNptv375tk8OAu3C0uZGenr5v3768vDy9Xm82m4X/UqioHG1e0XMch6PNDafrOfY9uaQ2k8m0cOHCkSNHuru7+/j4TJw48csvv1Tuunr1amJiol6vDwoKSktLq6+vV5bfuHFj4cKFwcHBnp6eY8aMOXfunNzuk/AtLS1//OMfH3nkkRs3bhiNxvT09GHDhul0uvDw8FdeeUXZwjfffDNy5EgfH5+kpKQ7xrN9+/b2B7/9CUwHZM3zy9y4p7lx8+bNO16YERERtjsW985ezy/zip6jxmNtw6HmhhP2HI1lKz2g0WiU3fd4C3Bk1jy/zA2x2ev5ZV6JjZ6Dzlj//Ar+ZhYAAABxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgXK3fhPKz7EBHzA2ogXmFzjA30BnO7gAAAMFpZFm29xgAAABUxNkdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgrLqqMtevdAY9uzITc8MZ2P6qXcwrZ0DPQWes6Tmc3QEAAILrhd/M4rrMorL+ryXmhqjs+5c080pU9Bx0xvq5wdkdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwwsadI0eOPPPMM4MHD9ZqtQ888EBmZmZDQ4MN9tvS0pKenj548GAvL685c+bU1tbedTWdTqdpx83N7fbt2zYYntOy13yorKxMSUnR6/U+Pj5PPfXUuXPn7rpaTk7OhAkTtFptYGBg++Wpqant54nBYLDBmNEz9By0R89xNGLGnX/84x9PPvnkQw89dPTo0aqqqtzc3KqqqpMnT3bnsbIsNzc393jX69aty8vLO3bsWHFxcVlZ2eLFi++6WmVlZd3/JCYmzpgxw83Nrcc7RdfsOB/S0tKMRuP58+evXbsWFBSUnJx819X0ev3SpUvXrl3b8a6MjAzLVJk5c2aPRwJV0XPQHj3HEclWsH4LamhtbQ0JCcnIyLhjeVtbmyzL169fnzlzpp+fX3Bw8JIlSxoaGpR7o6KiMjMzJ02aFBkZWVBQYDKZFi9eHBISotfrn3vuuerqamW1d955JywszNvbOygo6M033+y4d39//48//lj5d0FBgaur682bN7sYbXV1tZub26FDh6ysWg3WPL+OMzfsOx8iIiI++ugj5d8FBQUuLi4tLS2dDXXPnj0BAQHtl8ybN+/111/vaekqstfz6zjzqj16Tm+h59BzOtMLicW+u1eDkqBPnDhx13vj4uJmz55dW1tbXl4eFxf34osvKsujoqLuv//+mpoa5ebvfve7GTNmVFdXNzY2Llq06JlnnpFl+dy5czqd7sKFC7IsG43GH3/88Y6Nl5eXt9+1clb5yJEjXYx28+bNI0eOtKJcFYnReuw4H2RZXr58+ZNPPllZWWkymZ5//vnExMQuhnrX1hMUFBQSEhIbG7tx48ampqZ7PwCqIO60R8/pLfQcek5niDt3cfDgQUmSqqqqOt519uzZ9nfl5+cPHDiwtbVVluWoqKi//vWvyvKLFy9qNBrLaiaTSaPRGI3GkpISd3f3zz77rLa29q67Pn/+vCRJFy9etCxxcXH5+uuvuxhtZGTk5s2b771KWxCj9dhxPigrT548WTka99133+XLl7sYasfWk5eX99133124cGHv3r3BwcEd/160F+JOe/Sc3kLPUZbTczqy/vkV8LM7fn5+kiRdu3at411Xr17VarXKCpIkhYeHm83mmpoa5eaQIUOUf5SWlmo0mocffnjYsGHDhg178MEHvb29r127Fh4enpOT8/777wcGBj766KOHDx++Y/uenp6SJJlMJuVmXV1dW1ubl5fXrl27LJ/8ar9+QUFBaWlpampqb9WOjuw4H2RZnjJlSnh4+I0bN+rr61NSUiZNmtTQ0NDZfOgoISEhLi5uxIgRSUlJGzduzM3NteZQQCX0HLRHz3FQ9k1balDeN3311VfvWN7W1nZHsi4oKHBzc7Mk6/379yvLi4uL+/XrZzQaO9tFY2PjW2+9NWjQIOW92Pb8/f0/+eQT5d/ffPNN1++jP/fcc7Nmzbq38mzImufXceaGHedDdXW11OGNhu+//76z7XT8S6u9zz77bPDgwV2VakP2en4dZ161R8/pLfQcZTk9p6NeSCz23b1K/v73vw8cOHDVqlUlJSVms/n06dNpaWlHjhxpa2sbP378888/X1dXV1FRMXHixEWLFikPaT/VZFmeOnXqzJkzr1+/LstyVVXV559/LsvyL7/8kp+fbzabZVnesWOHv79/x9aTmZkZFRV18eLFysrK+Pj42bNndzbIqqqqAQMGOOYHBhVitB7ZrvMhLCxs4cKFJpPp1q1b69ev1+l0N27c6DjClpaWW7du5eTkBAQE3Lp1S9lma2vrRx99VFpaajQav/nmm4iICMvb/HZH3LkDPadX0HMsW6Dn3IG406nCwsKpU6f6+Ph4eHg88MADGzZsUD4Af/Xq1cTERL1eHxQUlJaWVl9fr6x/x1QzGo3p6enDhg3T6XTh4eGvvPKKLMvHjx9/5JFHvLy8Bg0aNG7cuG+//bbjfpuaml5++WUfHx+dTjd79myTydTZCDdt2uSwHxhUCNN6ZPvNh5MnTyYkJAwaNMjLyysuLq6z/2m2b9/e/pyrVquVZbm1tXXKlCm+vr4DBgwIDw9fuXJlY2Njrx+ZniHudETPsR49x/Jwes4drH9+NZat9IDyLqA1W4Ajs+b5ZW6IzV7PL/NKbPQcdMb651fAjyoDAAC0R9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAATnav0mfvUXVuG0mBtQA/MKnWFuoDOc3QEAAIKz6jezAAAAHB9ndwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgrPqqspcv9IZ9OzKTMwNZ2D7q3Yxr5wBPQedsabncHYHAAAIrhd+M8t5rsus/PXgbPVaw9mOlbPVay/OdpydrV5rONuxcrZ6rcHZHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgrNb3Hnsscc0Go1Go/H09HzkkUfy8vJ6vKlRo0YdOHCgixVaWlrS09MHDx7s5eU1Z86c2traHu+rx2xZ77Jly6Kjoz08PEJDQ5cvX97U1NTjfdmFLY+VoqWlJSYmRqPRVFRU9HhfPWbjev/1r3+NGzdu4MCBfn5+y5cv7/G++hxbHuecnJwJEyZotdrAwMAe78VKtqw3MzNz+PDhbm5uvr6+06dPLy4u7vG+7MKWxyo1NVXTjsFg6PG+esyW9ep0uvb1urm53b59u8e7s4Y9z+783//9X3Nz85UrV6ZMmTJjxoyamhqVdrRu3bq8vLxjx44VFxeXlZUtXrxYpR11zWb11tfXZ2dnX7lyxWAwGAyGNWvWqLQj9djsWCmysrJ8fX1V3UXXbFbvoUOHkpKSFixYUFZWdvz48WnTpqm0I8dks+Os1+uXLl26du1albbfTTard9q0afn5+TU1NceOHXNxcZk3b55KO1KPLXtORkZG3f/MnDlTvR11wWb1VlZWWopNTEycMWOGm5ubSvvqmj3jjkajcXV19fHxeeWVV27duvXLL78oy995552oqChPT89hw4a99dZblvVHjRq1du3axx9//P777x87duypU6fu2KDRaHzsscfmzZvX3NzcfvmHH364YsWK8PBwf3//v/zlL59//rnRaFS7uo5sVu+OHTvi4+N9fX0nTJjwwgsvHDlyRO3Sep3NjpUkSWfOnNm1a9eGDRtUrahrNqt31apVS5YsWbhwYUBAwNChQ+Pj49UuzaHY7Dg//fTTKSkpQ4cOVbuirtms3nHjxoWHh3t6eoaEhAwZMsTHx0ft0nqdLXtO//79df/j6toLV7/rAZvVq9VqlUrNZvOXX3754osvql1aZxziszsGg8Hd3T0qKkq5GRIS8s9//rO2tnb//v3vvffeF198YVnzyy+/3L9//+nTp5OTk5csWdJ+I2VlZRMnTpw0adLu3bv79+9vWV5RUVFVVRUTE6PcjI2NbWlpOXPmjPpldUrVeu9QWFgYGxurUiE2oPaxam1tnT9//pYtWzw9PW1Qzq9StV6z2fz999+3trbed999gwYNevLJJ3/66Sfb1OVobPkadAQ2qDcnJycwMNDT0/PUqVOffvqp2hWpxzbHaujQoWPHjt20aVPHMGRjNnst7Nq1KzQ09PHHH1evll8hW8GaLUyePFmr1QYEBCjR79ChQ3ddbfny5Wlpacq/o6KiduzYofz7zJkz7u7uluWrVq0KCQnJzs7uuIXz589LknTx4kXLEhcXl6+//roHY+4T9ba3evXq4cOH19TU9GzM1tTbV47V5s2bk5OTZVlW/rgpLy/v2Zj7RL3l5eWSJA0fPvz06dP19fVLly4NDg6ur6/vwZit7x490yeOs8WePXsCAgJ6NlpFH6q3sbHx+vXr3377bUxMzIIFC3o2ZmfoOXl5ed99992FCxf27t0bHByckZHRszH3lXotIiMjN2/e3LMBy73Rc+x5dmfhwoVFRUXffvttdHT0J598Yll+4MCBRx99NDQ0NCws7MMPP6yurrbcpdfrlX+4u7vfunWrpaVFufnhhx8GBwff9Q1j5a92k8mk3Kyrq2tra/Py8lKpqC7Ypl6L9evX5+bmFhQU2PdTKT1jm2NVXFy8ZcuWbdu2qVlKt9imXp1OJ0lSWlra6NGjtVrthg0bKioqfvzxRxULczA2fg3anS3rdXd3DwoKio+P37p1686dOxsbG9WpSS02O1YJCQlxcXEjRoxISkrauHFjbm6uajV1xcavhYKCgtLS0tTU1N6vpNvsGXeUry2MGTPm008/NRgMhYWFkiSVl5enpKSsWbOmrKxM+Vix3I3fBNm6daufn9/06dM7vsYCAwP9/f2LioqUm8ePH3d1dY2Oju71cn6VbepVrFixIjc39/Dhw2FhYb1chk3Y5lgVFhbW1NSMHj1ar9fHxcVJkjR69OidO3eqUVHXbFOvTqcbMWKE5adnnPAXpG35GnQE9qq3X79+/fr164UCbMgux2rAgAGW0GBjNq43Ozs7MTHREpjswiE+uxMRETF//vxVq1ZJklRXVydJ0oMPPqjRaK5fv97N94Dd3Nz27dvn5eU1depUZQvtLVq0KCsr69KlS1VVVatWrUpOTrbvJ+nUrjc9PX3fvn15eXl6vd5sNve5L6K3p+qxSklJKSkpKSoqKioqUr5LefDgwVmzZqlQR3epPTf+9Kc/bd++/fz582azOTMzc8iQIWPHju31Khyf2se5tbXVbDYrH8swm832+uathar1Njc3Z2VlnTt3zmQy/fDDDxkZGc8++6y9vn1jPVWPVVtb286dO8vKykwm0+HDh1euXJmcnKxGFd2n9mtBkqTq6ur9+/cvWrSod0d+rxwi7kiS9MYbbxw9evTQoUORkZFr1qyZOHHixIkTFy9enJCQ0M0t9O/f32AwhIWFTZky5ebNm+3vWr16dUJCwpgxYyIiIkJCQj744AMVKrg36tVrNBq3bdt24cKF8PBwd3d3d3d3u5zK6kXqHSsPD4+Q/wkICJAkKSgoSKvVqlJGt6n6Wli6dOnzzz//6KOPBgQEHD9+/KuvvvLw8FChiD5A1eO8Y8cOd3f3uXPnVlZWuru7O8IbyurVq9Fojh49OnnyZH9//5SUlPj4+I8//lidImxE1blhMBhiY2P9/f3nz5+fkpKyZcsWFSq4N6rWK0nSrl27wsLC7PkhZUmSJEnTnVNVnT7YKX+AnnrVfmxfRL1i79deqNc2j+2LqPdeOcrZHQAAAJUQdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACM5x405LS0t6evrgwYO9vLzmzJlTW1vb2ZqZmZnDhw93c3Pz9fWdPn16cXGxLcfZu1paWmJiYjQaTUVFRWfr6HQ6TTtubm52v4iZDVRWVqakpOj1eh8fn6eeeurcuXN3XS0nJ2fChAnKBUO7f5eD6GyEy5Yti46O9vDwCA0NXb58eRfXjexsC6mpqe3njMFgUKuGvoye09k69Bx6zr1uwQF7juPGnXXr1uXl5R07dqy4uFi5mnVna06bNi0/P7+mpubYsWMuLi4O/is2XcvKyvrVK5JVVlbW/U9iYuKMGTP67gVMuy8tLc1oNJ4/f/7atWtBQUGdXYpUr9cvXbp07dq193SXg+hshPX19dnZ2VeuXDEYDAaDYc2aNfe6BUmSMjIyLNNm5syZvTpwQdBzOkPPoefc6xYkB+w51vy+qPVb6IK/v//HH3+s/LugoMDV1fXmzZtdP6SpqSktLe3pp59WaUiq1ivL8s8//xwREfGf//xH6t5Pc1dXV7u5uXX2Y7bWs6beXj9WERERH330kfLvgoICFxeXlpaWzlbu4peorf+R6rvqxXq7HuHq1avj4+PvdQvz5s17/fXXe2V4CrVfC3bZLz3nV9en53S2Mj3H8XuOg57dqaioqKqqiomJUW7Gxsa2tLScOXOms/VzcnICAwM9PT1PnTrVzZ/5cDStra3z58/fsmWL8hPu3bFr167Q0FC7X5nbNpKSkvbs2VNVVVVbW7tz587f//73fe43CHtFYWFhbGxsDx6Yk5MzdOjQsWPHbtq0SfktJ7RHz+kOeo69B2UHwvQcB407ys+MeXt7Kzc9PT1dXFy6eCs9OTn5xIkT//73vxsaGv785z/baJS9asuWLaGhodOmTev+Q3bs2GH3H12zmTfeeKOlpSUgIMDb2/vHH39899137T0iO1izZs2lS5cyMzPv9YF/+MMfvvjii4KCgpUrV7733nsrVqxQY3h9Gj2nO+g5zkaknuOgcUf5a8NkMik36+rq2travLy8JEnatWuX5dNPlvXd3d2DgoLi4+O3bt26c+fOLn6G3jEVFxdv2bJl27ZtHe+6a72SJBUUFJSWlqamptpoiHYly/KUKVPCw8Nv3LhRX1+fkpIyadKkhoaGzg6OkNavX5+bm1tQUGD5pEX3y09ISIiLixsxYkRSUtLGjRtzc3PVH28fQ8+xoOdI9BxJkoTrOQ4adwIDA/39/YuKipSbx48fd3V1VX7ZOzU19Y438+7Qr1+/PnfKsbCwsKamZvTo0Xq9Pi4uTpKk0aNH79y5U+q83uzs7MTERL1eb58R29Z///vfH374IWpri2kAAAKdSURBVD09fdCgQVqt9tVXX718+fLp06d/dTIIY8WKFbm5uYcPHw4LC7Ms7Fn5AwYMaGlpUWGMfRs9h57THj1HvJ7joHFHkqRFixZlZWVdunSpqqpq1apVycnJPj4+HVdrbm7Oyso6d+6cyWT64YcfMjIynn322T73rYGUlJSSkpKioqKioqIDBw5IknTw4MFZs2Z1tn51dfX+/fud56yyXq8PCwt7//33a2trzWbzu+++q9PpIiMjO67Z2tpqNpuV94nNZnP7r8t2cZeD6GyE6enp+/bty8vL0+v1ZrO5iy+F3nULbW1tO3fuLCsrM5lMhw8fXrlyZWffMXFy9Bx6jgU9R8CeY83nnK3fQheamppefvllHx8fnU43e/Zsk8l019Wam5unT58eEBAwYMCAYcOGLVu2rLM1radqvRa//PKL9Gvfkti0adPIkSPVHok19fb6sTp58mRCQsKgQYO8vLzi4uI6+27I9u3b209vrVbbnbus1yv13nWEN2/evOM1GxERcU9baG1tnTJliq+v74ABA8LDw1euXNnY2GjlUG3zWrDxfuk5XaxDz6HndH8LjtlzNLIVZ+SUd++s2ULfQr22eWxfRL1i79deqNc2j+2LqPdeOe6bWQAAAL2CuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHCu1m/CGa6l3Z6z1WsNZztWzlavvTjbcXa2eq3hbMfK2eq1Bmd3AACA4Ky6zCAAAIDj4+wOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABPf/Kg/MoRR2M6oAAAAASUVORK5CYII=" +/> + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=4 + #SBATCH --cpus-per-task=4 + + export OMP_NUM_THREADS=4<br /><br />srun --ntasks 8 --cpus-per-task $OMP_NUM_THREADS --cpu_bind=cores --distribution=cyclic:block ./application diff --git a/twiki2md/root/SoftwareDevelopment/Compilers.md b/twiki2md/root/SoftwareDevelopment/Compilers.md new file mode 100644 index 000000000..f1dead17f --- /dev/null +++ b/twiki2md/root/SoftwareDevelopment/Compilers.md @@ -0,0 +1,121 @@ +# Compilers + +The following compilers are available on our platforms: + +| | | | | +|----------------------|-----------|------------|-------------| +| | **Intel** | **GNU** | **PGI** | +| **C Compiler** | `icc` | `gcc` | `pgcc` | +| **C++ Compiler** | `icpc` | `g++` | `pgc++` | +| **Fortran Compiler** | `ifort` | `gfortran` | `pgfortran` | + +For an overview of the installed compiler versions, please see our +automatically updated [SoftwareModulesList](SoftwareModulesList) + +All C compiler support ANSI C and C99 with a couple of different +language options. The support for Fortran77, Fortran90, Fortran95, and +Fortran2003 differs from one compiler to the other. Please check the man +pages to verify that your code can be compiled. + +Please note that the linking of C++ files normally requires the C++ +version of the compiler to link the correct libraries. For serious +problems with Intel's compilers please refer to +[FurtherDocumentation](FurtherDocumentation). + +## Compiler Flags + +Common options are: + +- `-g` to include information required for debugging +- `-pg` to generate gprof -style sample-based profiling information + during the run +- `-O0`, `-O1`, `-O2`, `-O3` to customize the optimization level from + no ( -O0 ) to aggressive ( -O3 ) optimization +- `-I` to set search path for header files +- `-L` to set search path for libraries + +Please note that aggressive optimization allows deviation from the +strict IEEE arithmetic. Since the performance impact of options like -mp +is very hard the user herself has to balance speed and desired accuracy +of her application. There are several options for profiling, +profile-guided optimization, data alignment and so on. You can list all +available compiler options with the option -help . Reading the man-pages +is a good idea, too. + +The user benefits from the (nearly) same set of compiler flags for +optimization for the C,C++, and Fortran-compilers. In the following +table, only a couple of important compiler-dependant options are listed. +For more detailed information, the user should refer to the man pages or +use the option -help to list all options of the compiler. + +\| **GCC** \| **Open64** \| **Intel** \| **PGI** \| **Pathscale** \| +Description\* \| + +| | | | | | | +|----------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------------|-------------------------------------------------------------------------------------| +| `-fopenmp` | `-mp` | `-openmp` | `-mp` | `-mp` | turn on OpenMP support | +| `-ieee-fp` | `-fno-fast-math` | `-mp` | `-Kieee` | `-no-fast-math` | use this flag to limit floating-point optimizations and maintain declared precision | +| `-ffast-math` | `-ffast-math` | `-mp1` | `-Knoieee` | `-ffast-math` | some floating-point optimizations are allowed, less performance impact than `-mp` . | +| `-Ofast` | `-Ofast` | `-fast` | `-fast` | `-Ofast` | Maximize performance, implies a couple of other flags | +| | | `-fpe`<span class="twiki-macro FOOTNOTE">ifort only</span> `-ftz`<span class="twiki-macro FOOTNOTE">flushes denormalized numbers to zero: On Itanium 2 an underflow raises an underflow exception that needs to be handled in software. This takes about 1000 cycles!</span> | `-Ktrap`... | | Controls the behavior of the processor when floating-point exceptions occur. | +| `-mavx` `-msse4.2` | `-mavx` `-msse4.2` | `-msse4.2` | `-fastsse` | `-mavx` | "generally optimal flags" for supporting SSE instructions | +| | `-ipa` | `-ipo` | `-Mipa` | `-ipa` | inter procedure optimization (across files) | +| | | `-ip` | `-Mipa` | | inter procedure optimization (within files) | +| | `-apo` | `-parallel` | `-Mconcur` | `-apo` | Auto-parallelizer | +| `-fprofile-generate` | | `-prof-gen` | `-Mpfi` | `-fb-create` | Create intrumented code to generate profile in file \<FN> | +| `-fprofile-use` | | `-prof-use` | `-Mpfo` | `-fb-opt` | Use profile data for optimization. - Leave all other optimization options | + +*We can not generally give advice as to which option should be used - +even -O0 sometimes leads to a fast code. To gain maximum performance +please test the compilers and a few combinations of optimization flags. +In case of doubt, you can also contact ZIH and ask the staff for help.* + +### Vector Extensions + +To build an executable for different node types (e.g. Sandybridge and +Westmere) the option `-msse4.2 -axavx` (for Intel compilers) uses SSE4.2 +as default path and runs along a different execution path if AVX is +available. This increases the size of the program code (might result in +poorer L1 instruction cache hits) but enables to run the same program on +different hardware types. + +To optimize for the host architecture, the flags: + +| GCC | Intel | +|:--------------|:-------| +| -march=native | -xHost | + +can be used. + +The following matrix shows some proper optimization flags for the +different hardware in Taurus, as of 2020-04-08: + +| Arch | GCC | Intel Compiler | +|:-----------------------|:-------------------|:-----------------| +| **Intel Sandy Bridge** | -march=sandybridge | -xAVX | +| **Intel Haswell** | -march=haswell | -xCORE-AVX2 | +| **AMD Rome** | -march=znver2 | -march=core-avx2 | +| **Intel Cascade Lake** | -march=cascadelake | -xCOMMON-AVX512 | + +## Compiler Optimization Hints + +To achieve the best performance the compiler needs to exploit the +parallelism in the code. Therefore it is sometimes necessary to provide +the compiler with some hints. Some possible directives are (Fortran +style): + +| | | +|--------------------------|-----------------------------------| +| `CDEC$ ivdep` | ignore assumed vector dependences | +| `CDEC$ swp` | try to software-pipeline | +| `CDEC$ noswp` | disable softeware-pipeling | +| `CDEC$ loop count (n)` | hint for optimzation | +| `CDEC$ distribute point` | split this large loop | +| `CDEC$ unroll (n)` | unroll (n) times | +| `CDEC$ nounroll` | do not unroll | +| `CDEC$ prefetch a` | prefetch array a | +| `CDEC$ noprefetch a` | do not prefetch array a | + +The compiler directives are the same for `ifort` and `icc` . The syntax +for C/C++ is like `#pragma ivdep`, `#pragma swp`, and so on. +[Further Documentation](Further Documentation) diff --git a/twiki2md/root/SoftwareDevelopment/Debuggers.md b/twiki2md/root/SoftwareDevelopment/Debuggers.md new file mode 100644 index 000000000..0e4c0b8fc --- /dev/null +++ b/twiki2md/root/SoftwareDevelopment/Debuggers.md @@ -0,0 +1,247 @@ +# Debuggers + +This section describes how to start the debuggers on the HPC systems of +ZIH. + +Detailed i nformation about how to use the debuggers can be found on the +website of the debuggers (see below). + + + +## Overview of available Debuggers + +| | | | | +|--------------------|-----------------------------------|--------------------------------------------------------------------------------------------|---------------------------------------------------------| +| | **GNU Debugger** | **DDT** | **Totalview** | +| Interface | command line | graphical user interface | | +| Languages | C, C++, Fortran | C, C++, Fortran, F95 | | +| Parallel Debugging | Threads | Threads, MPI, hybrid | | +| Debugger Backend | GDB | | own backend | +| Website | <http://www.gnu.org/software/gdb> | [arm.com](https://developer.arm.com/products/software-development-tools/hpc/documentation) | <https://www.roguewave.com/products-services/totalview> | +| Licenses at ZIH | free | 1024 | 32 | + +## General Advices + +- You need to compile your code with the flag `-g` to enable + debugging. This tells the compiler to include information about + variable and function names, source code lines etc. into the + executable. +- It is also recommendable to reduce or even disable optimizations + (`-O0`). At least inlining should be disabled (usually + `-fno-inline`) +- For parallel applications: try to reconstruct the problem with less + processes before using a parallel debugger. +- The flag `-traceback` of the Intel Fortran compiler causes to print + stack trace and source code location when the program terminates + abnormally. +- If your program crashes and you get an address of the failing + instruction, you can get the source code line with the command + `addr2line -e <executable> <address>` +- Use the compiler's check capabilites to find typical problems at + compile time or run time + - Read manual (`man gcc`, `man ifort`, etc.) + - Intel C compile time checks: + `-Wall -Wp64 -Wuninitialized -strict-ansi` + - Intel Fortran compile time checks: `-warn all -std95` + - Intel Fortran run time checks: `-C -fpe0 -traceback` +- Use [memory debuggers](Compendium.Debuggers#Memory_Debugging) to + verify the proper usage of memory. +- Core dumps are useful when your program crashes after a long + runtime. +- More hints: [Slides about typical Bugs in parallel + Programs](%ATTACHURL%/typical_bugs.pdf) + +## GNU Debugger + +The GNU Debugger (GDB) offers only limited to no support for parallel +applications and Fortran 90. However, it might be the debugger you are +most used to. GDB works best for serial programs. You can start GDB in +several ways: + +| | | +|-------------------------------|--------------------------------| +| | Command | +| Run program under GDB | `gdb <executable>` | +| Attach running program to GDB | `gdb --pid <process ID>` | +| Open a core dump | `gdb <executable> <core file>` | + +This [GDB Reference +Sheet](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf) makes life +easier when you often use GDB. + +Fortran 90 programmers which like to use the GDB should issue an +`module load ddt` before their debug session. This makes the GDB +modified by DDT available, which has better support for Fortran 90 (e.g. +derived types). + +## DDT + +\<img alt="" src="%ATTACHURL%/ddt.png" title="DDT Main Window" +width="500" /> + +- Commercial tool of Arm shipped as "Forge" together with MAP profiler +- Intuitive graphical user interface +- Great support for parallel applications +- We have 1024 licences, so many user can use this tool for parallel + debugging +- Don't expect that debugging an MPI program with 100ths of process + will work without problems + - The more processes and nodes involved, the higher is the + probability for timeouts or other problems + - Debug with as few processes as required to reproduce the bug you + want to find +- Module to load before using: `module load ddt` +- Start: `ddt <executable>` +- If you experience problems in DDTs configuration when changing the + HPC system, you should issue `rm -r ~/.ddt.` +- More Info + - [Slides about basic DDT + usage](%ATTACHURL%/parallel_debugging_ddt.pdf) + - [Official + Userguide](https://developer.arm.com/docs/101136/latest/ddt) + +### Serial Program Example (Taurus, Venus) + + % module load ddt + % salloc --x11 -n 1 --time=2:00:00 + salloc: Granted job allocation 123456 + % ddt ./myprog + +- uncheck MPI, uncheck OpenMP +- hit *Run*. + +### Multithreaded Program Example (Taurus, Venus) + + % module load ddt + % salloc --x11 -n 1 --cpus-per-task=<number of threads> --time=2:00:00 + salloc: Granted job allocation 123456 + % srun --x11=first ddt ./myprog + +- uncheck MPI +- select OpenMP, set number of threads (if OpenMP) +- hit *Run* + +### MPI-Parallel Program Example (Taurus, Venus) + + % module load ddt + % module load bullxmpi # Taurus only + % salloc --x11 -n <number of processes> --time=2:00:00 + salloc: Granted job allocation 123456 + % ddt -np <number of processes> ./myprog + +- select MPI +- set the MPI implementation to "SLURM (generic)" +- set number of processes +- hit *Run* + +## Totalview + +\<img alt="" src="%ATTACHURL%/totalview-main.png" title="Totalview Main +Window" /> + +- Commercial tool +- Intuitive graphical user interface +- Good support for parallel applications +- Great support for complex data structures, also for Fortran 90 + derived types +- We have only 32 licences + - Large parallel runs are not possible + - Use DDT for these cases +- Module to load before using: `module load totalview` +- Start: `totalview <executable>` + +### Serial Program Example (Taurus, Venus) + + % module load totalview + % salloc --x11 -n 1 --time=2:00:00 + salloc: Granted job allocation 123456 + % srun --x11=first totalview ./myprog + +- ensure *Parallel system* is set to *None* in the *Parallel* tab +- hit *Ok* +- hit the *Go* button to start the program + +### Multithreaded Program Example (Taurus, Venus) + + % module load totalview + % salloc --x11 -n 1 --cpus-per-task=<number of threads> --time=2:00:00 + salloc: Granted job allocation 123456 + % export OMP_NUM_THREADS=<number of threads> # or any other method to set the number of threads + % srun --x11=first totalview ./myprog + +- ensure *Parallel system* is set to *None* in the *Parallel* tab +- hit *Ok* +- set breakpoints, if necessary +- hit the *Go* button to start the program + +### MPI-Parallel Program Example (Taurus, Venus) + + % module load totalview + % module load bullxmpi # Taurus only + % salloc -n <number of processes> --time=2:00:00 + salloc: Granted job allocation 123456 + % totalview + +- select your executable program with the button *Browse...* +- ensure *Parallel system* is set to *SLURM* in the *Parallel* tab +- set the number of Tasks in the *Parallel* tab +- hit *Ok* +- set breakpoints, if necessary +- hit the *Go* button to start the program + +## Memory Debugging + +- Memory debuggers find memory management bugs, e.g. + - Use of non-initialized memory + - Access memory out of allocated bounds +- Very valuable tools to find bugs +- DDT and Totalview have memory debugging included (needs to be + enabled before run) + +### Valgrind + +- <http://www.valgrind.org> +- Simulation of the program run in a virtual machine which accurately + observes memory operations +- Extreme run time slow-down + - Use small program runs +- Sees more memory errors than the other debuggers +- Not available on mars + +<!-- --> + +- for serial programs: + +<!-- --> + + % module load Valgrind + % valgrind ./myprog + +- for MPI parallel programs (every rank writes own valgrind logfile): + +<!-- --> + + % module load Valgrind + % mpirun -np 4 valgrind --log-file=valgrind.%p.out ./myprog + +### DUMA + +**Note: DUMA is no longer installed on our systems** + +- DUMA = Detect Unintended Memory Access +- <http://duma.sourceforge.net> +- Replaces memory management functions through own versions and keeps + track of allocated memory +- Easy to use +- Triggers program crash when an error is detected + - use GDB or other debugger to find location +- Almost no run-time slow-down +- Does not see so many bugs like Valgrind + +<!-- --> + + % module load duma + icc -o myprog myprog.o ... -lduma # link program with DUMA library (-lduma) + % bsub -W 2:00 -n 1 -Is bash + <<Waiting for dispatch ...>> + % gdb ./myprog diff --git a/twiki2md/root/SoftwareDevelopment/DebuggingTools.md b/twiki2md/root/SoftwareDevelopment/DebuggingTools.md new file mode 100644 index 000000000..30b7ae112 --- /dev/null +++ b/twiki2md/root/SoftwareDevelopment/DebuggingTools.md @@ -0,0 +1,20 @@ +Debugging is an essential but also rather time consuming step during +application development. Tools dramatically reduce the amount of time +spent to detect errors. Besides the "classical" serial programming +errors, which may usually be easily detected with a regular debugger, +there exist programming errors that result from the usage of OpenMP, +Pthreads, or MPI. These errors may also be detected with debuggers +(preferably debuggers with support for parallel applications), however, +specialized tools like MPI checking tools (e.g. Marmot) or thread +checking tools (e.g. Intel Thread Checker) can simplify this task. The +following sections provide detailed information about the different +types of debugging tools: + +- [Debuggers](Debuggers) -- debuggers (with and without support for + parallel applications) +- [MPI Usage Error Detection](MPI Usage Error Detection) -- tools to + detect MPI usage errors +- [Thread Checking](Thread Checking) -- tools to detect OpenMP/Pthread + usage errors + +-- Main.hilbrich - 2009-12-21 diff --git a/twiki2md/root/SoftwareDevelopment/Libraries.md b/twiki2md/root/SoftwareDevelopment/Libraries.md new file mode 100644 index 000000000..5ee93b848 --- /dev/null +++ b/twiki2md/root/SoftwareDevelopment/Libraries.md @@ -0,0 +1,100 @@ +# Libraries + +The following libraries are available on our platforms: + +| | | | | +|-----------|-----------------------|-----------------|------------| +| | **Taurus** | **Venus** | **module** | +| **Boost** | 1.49, 1.5\[4-9\], 160 | 1.49, 1.51,1.54 | boost | +| **MKL** | 2013, 2015 | 2013 | mkl | +| **FFTW** | 3.3.4 | | fftw | + +## The Boost Library + +Boost provides free peer-reviewed portable C++ source libraries, ranging +from multithread and MPI support to regular expression and numeric +funtions. See at <http://www.boost.org> for detailed documentation. + +## BLAS/LAPACK + +### Example + + program ExampleProgram + + external dgesv + integer:: n, m, c, d, e, Z(2) !parameter definition + double precision:: A(2,2), B(2) + + n=2; m=1; c=2; d=2; + + A(1,1) = 1.0; A(1,2) = 2.0; !parameter setting + A(2,1) = 3.0; A(2,2) = 4.0; + + B(1) = 14.0; B(2) = 32.0; + + Call dgesv(n,m,A,c,Z,B,d,e); !call the subroutine + + write(*,*) "Solution ", B(1), " ", B(2) !display on desktop + + end program ExampleProgram + +### Math Kernel Library (MKL) + +The Intel Math Kernel Library is a collection of basic linear algebra +subroutines (BLAS) and fast fourier transformations (FFT). It contains +routines for: + +- Solvers such as linear algebra package (LAPACK) and BLAS +- Eigenvector/eigenvalue solvers (BLAS, LAPACK) +- PDEs, signal processing, seismic, solid-state physics (FFTs) +- General scientific, financial - vector transcendental functions, + vector markup language (XML) + +More speciï¬cally it contains the following components: + +- BLAS: + - Level 1 BLAS: vector-vector operations, 48 functions + - Level 2 BLAS: matrix-vector operations, 66 functions + - Level 3 BLAS: matrix-matrix operations, 30 functions +- LAPACK (linear algebra package), solvers and eigensolvers, hundreds + of routines, more than 1000 user callable routines +- FFTs (fast Fourier transform): one and two dimensional, with and + without frequency ordering (bit reversal). There are wrapper + functions to provide an interface to use MKL instead of FFTW. +- VML (vector math library), set of vectorized transcendental + functions +- Parallel Sparse Direct Linear Solver (Pardiso) + +Please note: MKL comes in an OpenMP-parallel version. If you want to use +it, make sure you know how to place your jobs. {{ In \[c't 18, 2010\], +Andreas Stiller proposes the usage of `GOMP_CPU_AFFINITY` to allow the +mapping of AMD cores. KMP_AFFINITY works only for Intel processors. }} + +#### Linking with the MKL + +For linker flag combinations, Intel provides the MKL Link Line Advisor +at +<http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/> +(please make sure that JavaScript is enabled for this page). + +Can be compiled with MKL 11 like this + + ifort -I$MKL_INC -L$MKL_LIB -lmkl_core -lm -lmkl_gf_ilp64 -lmkl_lapack example.f90 + +#### Linking with the MKL at VENUS + +Please follow the infomation at website \<br /> +<http://hpcsoftware.ncsa.illinois.edu/Software/user/show_all.php?deploy_id=951&view=NCSA> + + icc -O1 -I/sw/global/compilers/intel/2013/mkl//include -lmpi -mkl -lmkl_scalapack_lp64 -lmkl_blacs_sgimpt_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core example.c + +#### + +## FFTW + +FFTW is a C subroutine library for computing the discrete Fourier +transform (DFT) in one or more dimensions, of arbitrary input size, and +of both real and complex data (as well as of even/odd data, i.e. the +discrete cosine/sine transforms or DCT/DST). Before using this library, +please check out the functions of vendor speciï¬c libraries ACML and/or +MKL. diff --git a/twiki2md/root/SoftwareDevelopment/Miscellaneous.md b/twiki2md/root/SoftwareDevelopment/Miscellaneous.md new file mode 100644 index 000000000..61b4678c0 --- /dev/null +++ b/twiki2md/root/SoftwareDevelopment/Miscellaneous.md @@ -0,0 +1,50 @@ +# Miscellaneous + +## Check Assembler Code + +If a binary `a.out` was built to include symbolic information (option +"-g") one can have a look at a commented disassembly with + + objdump -dS a.out + +to see something like: + + do { + checksum += d.D.ACC(idx).i[0] * d.D.ACC(idx).i[1]; + 8049940: 89 d9 mov %ebx,%ecx + 8049942: 2b 8d 08 fa ff ff sub -0x5f8(%ebp),%ecx + 8049948: 8b 55 c4 mov -0x3c(%ebp),%edx + 804994b: 2b 95 0c fa ff ff sub -0x5f4(%ebp),%edx + 8049951: 8b 45 c8 mov -0x38(%ebp),%eax + 8049954: 2b 85 10 fa ff ff sub -0x5f0(%ebp),%eax + 804995a: 0f af 85 78 fd ff ff imul -0x288(%ebp),%eax + 8049961: 01 c2 add %eax,%edx + 8049963: 0f af 95 74 fd ff ff imul -0x28c(%ebp),%edx + 804996a: 01 d1 add %edx,%ecx + 804996c: 8d 0c 49 lea (%ecx,%ecx,2),%ecx + 804996f: c1 e1 03 shl $0x3,%ecx + 8049972: 03 8d b8 fd ff ff add -0x248(%ebp),%ecx + 8049978: dd 01 fldl (%ecx) + 804997a: dd 41 08 fldl 0x8(%ecx) + 804997d: d9 c1 fld %st(1) + 804997f: d8 c9 fmul %st(1),%st + 8049981: dc 85 b8 f9 ff ff faddl -0x648(%ebp) + checksum += d.D.ACC(idx).i[2] * d.D.ACC(idx).i[0]; + 8049987: dd 41 10 fldl 0x10(%ecx) + 804998a: dc cb fmul %st,%st(3) + 804998c: d9 cb fxch %st(3) + 804998e: de c1 faddp %st,%st(1) + 8049990: d9 c9 fxch %st(1) + checksum -= d.D.ACC(idx).i[1] * d.D.ACC(idx).i[2]; + 8049992: de ca fmulp %st,%st(2) + 8049994: de e1 fsubp %st,%st(1) + 8049996: dd 9d b8 f9 ff ff fstpl -0x648(%ebp) + } + +## I/O from/to binary files + +## Compilation Problem Isolator + +`icpi` + +-- Main.mark - 2009-12-16 diff --git a/twiki2md/root/SoftwareDevelopment/PerformanceTools.md b/twiki2md/root/SoftwareDevelopment/PerformanceTools.md new file mode 100644 index 000000000..eb353e8d6 --- /dev/null +++ b/twiki2md/root/SoftwareDevelopment/PerformanceTools.md @@ -0,0 +1,15 @@ +# Performance Tools + +- [Score-P](ScoreP) - tool suite for profiling, event tracing, and + online analysis of HPC applications +- [VampirTrace](VampirTrace) - recording performance relevant data at + runtime +- [Vampir](Vampir) - visualizing performance data from your program +- [Hardware performance counters - PAPI](PapiLibrary) - generic + performance counters +- [perf tools](PerfTools) - general performance statistic +- [IOTrack](IOTrack) - I/O statistics +- [EnergyMeasurement](EnergyMeasurement) - energy/power measurements + on taurus + +-- Main.mark - 2009-12-16 diff --git a/twiki2md/root/SystemAtlas.md b/twiki2md/root/SystemAtlas.md new file mode 100644 index 000000000..13ce0ace2 --- /dev/null +++ b/twiki2md/root/SystemAtlas.md @@ -0,0 +1,126 @@ +# Atlas + +**`%RED%This page is deprecated! Atlas is a former system!%ENDCOLOR%`** +( [Current hardware](Compendium.Hardware)) + +Atlas is a general purpose HPC cluster for jobs using 1 to 128 cores in +parallel ( [Information on the hardware](HardwareAtlas)). + +## Compiling Parallel Applications + +When loading a compiler module on Atlas, the module for the MPI +implementation OpenMPI is also loaded in most cases. If not, you should +explicitly load the OpenMPI module with `module load openmpi`. This also +applies when you use the system's (old) GNU compiler. ( [read more about +Modules](Compendium.RuntimeEnvironment), [read more about +Compilers](Compendium.Compilers)) + +Use the wrapper commands `mpicc` , `mpiCC` , `mpif77` , or `mpif90` to +compile MPI source code. They use the currently loaded compiler. To +reveal the command lines behind the wrappers, use the option `-show`. + +For running your code, you have to load the same compiler and MPI module +as for compiling the program. Please follow te following guiedlines to +run your parallel program using the batch system. + +## Batch System + +Applications on an HPC system can not be run on the login node. They +have to be submitted to compute nodes with dedicated resources for the +user's job. Normally a job can be submitted with these data: + +- number of CPU cores, +- requested CPU cores have to belong on one node (OpenMP programs) or + can distributed (MPI), +- memory per process, +- maximum wall clock time (after reaching this limit the process is + killed automatically), +- files for redirection of output and error messages, +- executable and command line parameters. + +### LSF + +The batch sytem on Atlas is LSF. For general information on LSF, please +follow [this link](PlatformLSF). + +### Submission of Parallel Jobs + +To run MPI jobs ensure that the same MPI module is loaded as during +compile-time. In doubt, check you loaded modules with `module list`. If +you code has been compiled with the standard OpenMPI installation, you +can load the OpenMPI module via `module load openmpi`. + +Please pay attention to the messages you get loading the module. They +are more up-to-date than this manual. To submit a job the user has to +use a script or a command-line like this: + + <span class='WYSIWYG_HIDDENWHITESPACE'> </span>bsub -n <N> mpirun <program name><span class='WYSIWYG_HIDDENWHITESPACE'> </span> + +### Memory Limits + +**Memory limits are enforced.** This means that jobs which exceed their +per-node memory limit **may be killed** automatically by the batch +system. + +The **default limit** is **300 MB** *per job slot* (bsub -n). + +Atlas has sets of nodes with different amount of installed memory which +affect where your job may be run. To achieve the shortest possible +waiting time for your jobs, you should be aware of the limits shown in +the following table and read through the explanation below. + +| Nodes | No. of Cores | Avail. Memory per Job Slot | Max. Memory per Job Slot for Oversubscription | +|:-------------|:-------------|:---------------------------|:----------------------------------------------| +| `n[001-047]` | `3008` | `940 MB` | `1880 MB` | +| `n[049-072]` | `1536` | `1950 MB` | `3900 MB` | +| `n[085-092]` | `512` | `8050 MB` | `16100 MB` | + +#### Explanation + +The amount of memory that you request for your job (-M ) restricts to +which nodes it will be scheduled. Usually, the column **"Avail. Memory +per Job Slot"** shows the maximum that will be allowed on the respective +nodes. + +However, we allow for **oversubscribing of job slot memory**. This means +that jobs which use **-n32 or less** may be scheduled to smaller memory +nodes. + +Have a look at the **examples below**. + +#### Monitoring memory usage + +At the end of the job completion mail there will be a link to a website +which shows the memory usage over time per node. This will only be +available for longer running jobs (>10 min). + +#### Examples + +| Job Spec. | Nodes Allowed | Remark | +|:--------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------| +| `bsub %GREEN%-n 1 -M 500%ENDCOLOR%` | `All nodes` | \<= 940 Fits everywhere | +| `bsub %GREEN%-n 64 -M 700%ENDCOLOR%` | `All nodes` | \<= 940 Fits everywhere | +| `bsub %GREEN%-n 4 -M 1800%ENDCOLOR%` | `All nodes` | Is allowed to oversubscribe on small nodes n\[001-047\] | +| `bsub %GREEN%-n 64 -M 1800%ENDCOLOR%` | `n[049-092]` | 64\*1800 will not fit onto a single small node and is therefore restricted to running on medium and large nodes | +| \<span>bsub \<span style`"color: #eecc22;">-n 4 -M 2000</span></span> | =n[049-092]` | Over limit for oversubscribing on small nodes n\[001-047\], but may still go to medium nodes | | +| \<span>bsub \<span style`"color: #eecc22;">-n 32 -M 2000</span></span> | =n[049-092]` | Same as above | | +| `bsub %GREEN%-n 32 -M 1880%ENDCOLOR%` | `All nodes` | Using max. 1880 MB, the job is eligible for running on any node | +| \<span>bsub \<span style`"color: #eecc22;">-n 64 -M 2000</span></span> | =n[085-092]` | Maximum for medium nodes is 1950 per slot - does the job **really** need **2000 MB** per process? | | +| `bsub %GREEN%-n 64 -M 1950%ENDCOLOR%` | `n[049-092]` | When using 1950 as maximum, it will fit to the medium nodes | +| `bsub -n 32 -M 16000` | `n[085-092]` | Wait time might be **very long** | +| `bsub %RED%-n 64 -M 16000%ENDCOLOR%` | `n[085-092]` | Memory request cannot be satisfied (64\*16 MB = 1024 GB), **%RED%cannot schedule job%ENDCOLOR%** | + +### Batch Queues + +*Batch queues are subject to (mostly minor) changes anytime*. The +scheduling policy prefers short running jobs over long running ones. +This means that **short jobs get higher priorities** and are usually +started earlier than long running jobs. + +| Batch Queue | Admitted Users | Max. Cores | Default Runtime | \<div style="text-align: right;">Max. Runtime\</div> | +|:--------------|:---------------|:---------------------------------------------|:--------------------------------------------------|:-----------------------------------------------------| +| `interactive` | `all` | \<div style="text-align: right;">n/a\</div> | \<div style="text-align: right;">12h 00min\</div> | \<div style="text-align: right;">12h 00min\</div> | +| `short` | `all` | \<div style="text-align: right;">1024\</div> | \<div style="text-align: right;">1h 00min\</div> | \<div style="text-align: right;">24h 00min\</div> | +| `medium` | `all` | \<div style="text-align: right;">1024\</div> | \<div style="text-align: right;">24h 01min\</div> | \<div style="text-align: right;">72h 00min\</div> | +| `long` | `all` | \<div style="text-align: right;">1024\</div> | \<div style="text-align: right;">72h 01min\</div> | \<div style="text-align: right;">120h 00min\</div> | +| `rtc` | `on request` | \<div style="text-align: right;">4\</div> | \<div style="text-align: right;">12h 00min\</div> | \<div style="text-align: right;">300h 00min\</div> | diff --git a/twiki2md/root/SystemTaurus/EnergyMeasurement.md b/twiki2md/root/SystemTaurus/EnergyMeasurement.md new file mode 100644 index 000000000..607263056 --- /dev/null +++ b/twiki2md/root/SystemTaurus/EnergyMeasurement.md @@ -0,0 +1,310 @@ +# Energy Measurement Infrastructure + +All nodes of the HPC machine Taurus are equipped with power +instrumentation that allow the recording and accounting of power +dissipation and energy consumption data. The data is made available +through several different interfaces, which will be described below. + +## System Description + +The Taurus system is split into two phases. While both phases are +equipped with energy-instrumented nodes, the instrumentation +significantly differs in the number of instrumented nodes and their +spatial and temporal granularity. + +### Phase 1 + +In phase one, the 270 Sandy Bridge nodes are equipped with node-level +power instrumentation that is stored in the Dataheap infrastructure at a +rate of 1Sa/s and further the energy consumption of a job is available +in SLURM (see below). + +### Phase 2 + +In phase two, all of the 1456 Haswell DLC nodes are equipped with power +instrumentation. In addition to the access methods of phase one, users +will also be able to access the measurements through a C API to get the +full temporal and spatial resolution, as outlined below: + +- ** Blade:**1 kSa/s for the whole node, includes both sockets, DRAM, + SSD, and other on-board consumers. Since the system is directly + water cooled, no cooling components are included in the blade + consumption. +- **Voltage regulators (VR):** 100 Sa/s for each of the six VR + measurement points, one for each socket and four for eight DRAM + lanes (two lanes bundled). + +The GPU blades of each Phase as well as the Phase I Westmere partition +also have 1 Sa/s power instrumentation but have a lower accuracy. + +HDEEM is now generally available on all nodes in the "haswell" +partition. + +## Summary of Measurement Interfaces + +| Interface | Sensors | Rate | Phase I | Phase II Haswell | +|:-------------------------------------------|:----------------|:--------------------------------|:--------|:-----------------| +| Dataheap (C, Python, VampirTrace, Score-P) | Blade, (CPU) | 1 Sa/s | yes | yes | +| HDEEM\* (C, Score-P) | Blade, CPU, DDR | 1 kSa/s (Blade), 100 Sa/s (VRs) | no | yes | +| HDEEM Command Line Interface | Blade, CPU, DDR | 1 kSa/s (Blade), 100 Sa/s (VR) | no | yes | +| SLURM Accounting (sacct) | Blade | Per Job Energy | yes | yes | +| SLURM Profiling (hdf5) | Blade | up to 1 Sa/s | yes | yes | + +Note: Please specify `-p haswell --exclusive` along with your job +request if you wish to use hdeem. + +## Accuracy + +HDEEM measurements have an accuracy of 2 % for Blade (node) +measurements, and 5 % for voltage regulator (CPU, DDR) measurements. + +## Command Line Interface + +The HDEEM infrastructure can be controlled through command line tools +that are made available by loading the **hdeem** module. They are +commonly used on the node under test to start, stop, and query the +measurement device. + +- **startHdeem**: Start a measurement. After the command succeeds, the + measurement data with the 1000 / 100 Sa/s described above will be + recorded on the Board Management Controller (BMC), which is capable + of storing up to 8h of measurement data. +- **stopHdeem**: Stop a measurement. No further data is recorded and + the previously recorded data remains available on the BMC. +- **printHdeem**: Read the data from the BMC. By default, the data is + written into a CSV file, whose name can be controlled using the + **-o** argument. +- **checkHdeem**: Print the status of the measurement device. +- **clearHdeem**: Reset and clear the measurement device. No further + data can be read from the device after this command is executed + before a new measurement is started. + +## Integration in Application Performance Traces + +The per-node power consumption data can be included as metrics in +application traces by using the provided metric plugins for Score-P (and +VampirTrace). The plugins are provided as modules and set all necessary +environment variables that are required to record data for all nodes +that are part of the current job. + +For 1 Sa/s Blade values (Dataheap): + +- [Score-P](ScoreP): use the module **`scorep-dataheap`** +- [VampirTrace](VampirTrace): use the module + **vampirtrace-plugins/power-1.1** + +For 1000 Sa/s (Blade) and 100 Sa/s (CPU{0,1}, DDR{AB,CD,EF,GH}): + +- [Score-P](ScoreP): use the module **\<span + class="WYSIWYG_TT">scorep-hdeem\</span>**\<br />Note: %ENDCOLOR%This + module requires a recent version of "scorep/sync-...". Please use + the latest that fits your compiler & MPI version.**\<br />** +- [VampirTrace](VampirTrace): not supported + +By default, the modules are set up to record the power data for the +nodes they are used on. For further information on how to change this +behavior, please use module show on the respective module. + + # Example usage with gcc + % module load scorep/trunk-2016-03-17-gcc-xmpi-cuda7.5 + % module load scorep-dataheap + % scorep gcc application.c -o application + % srun ./application + +Once the application is finished, a trace will be available that allows +you to correlate application functions with the component power +consumption of the parallel application. Note: For energy measurements, +only tracing is supported in Score-P/VampirTrace. The modules therefore +disables profiling and enables tracing, please use [Vampir](Vampir) to +view the trace. + +\<img alt="demoHdeem_high_low_vampir_3.png" height="262" +src="%ATTACHURL%/demoHdeem_high_low_vampir_3.png" width="695" /> + +%RED%Note<span class="twiki-macro ENDCOLOR"></span>: the power +measurement modules **`scorep-dataheap`** and **`scorep-hdeem`** are +dynamic and only need to be loaded during execution. However, +**`scorep-hdeem`** does require the application to be linked with a +certain version of Score-P. + +By default,** `scorep-dataheap`**records all sensors that are available. +Currently this is the total node consumption and for Phase II the CPUs. +**`scorep-hdeem`** also records all available sensors (node, 2x CPU, 4x +DDR) by default. You can change the selected sensors by setting the +environment variables: + + # For HDEEM + % export SCOREP_METRIC_HDEEM_PLUGIN=Blade,CPU* + # For Dataheap + % export SCOREP_METRIC_DATAHEAP_PLUGIN=localhost/watts + +For more information on how to use Score-P, please refer to the +[respective documentation](ScoreP). + +## Access Using Slurm Tools + +[Slurm](Slurm) maintains its own database of job information, including +energy data. There are two main ways of accessing this data, which are +described below. + +### Post-Mortem Per-Job Accounting + +This is the easiest way of accessing information about the energy +consumed by a job and its job steps. The Slurm tool `sacct` allows users +to query post-mortem energy data for any past job or job step by adding +the field `ConsumedEnergy` to the `--format` parameter: + + $> sacct --format="jobid,jobname,ntasks,submit,start,end,ConsumedEnergy,nodelist,state" -j 3967027 + JobID JobName NTasks Submit Start End ConsumedEnergy NodeList State + ------------ ---------- -------- ------------------- ------------------- ------------------- -------------- --------------- ---------- + 3967027 bash 2014-01-07T12:25:42 2014-01-07T12:25:52 2014-01-07T12:41:20 taurusi1159 COMPLETED + 3967027.0 sleep 1 2014-01-07T12:26:07 2014-01-07T12:26:07 2014-01-07T12:26:18 0 taurusi1159 COMPLETED + 3967027.1 sleep 1 2014-01-07T12:29:06 2014-01-07T12:29:06 2014-01-07T12:29:16 1.67K taurusi1159 COMPLETED + 3967027.2 sleep 1 2014-01-07T12:33:25 2014-01-07T12:33:25 2014-01-07T12:33:36 1.84K taurusi1159 COMPLETED + 3967027.3 sleep 1 2014-01-07T12:34:06 2014-01-07T12:34:06 2014-01-07T12:34:11 1.09K taurusi1159 COMPLETED + 3967027.4 sleep 1 2014-01-07T12:38:03 2014-01-07T12:38:03 2014-01-07T12:39:44 18.93K taurusi1159 COMPLETED + +The job consisted of 5 job steps, each executing a sleep of a different +length. Note that the ConsumedEnergy metric is only applicable to +exclusive jobs. + +### + +### Slurm Energy Profiling + +The `srun` tool offers several options for profiling job steps by adding +the `--profile` parameter. Possible profiling options are `All`, +`Energy`, `Task`, `Lustre`, and `Network`. In all cases, the profiling +information is stored in an hdf5 file that can be inspected using +available hdf5 tools, e.g., `h5dump`. The files are stored under +`/scratch/profiling/` for each job, job step, and node. A description of +the data fields in the file can be found +[here](http://slurm.schedmd.com/hdf5_profile_user_guide.html#HDF5). In +general, the data files contain samples of the current **power** +consumption on a per-second basis: + + $> srun -p sandy --acctg-freq=2,energy=1 --profile=energy sleep 10 + srun: job 3967674 queued and waiting for resources + srun: job 3967674 has been allocated resources + $> h5dump /scratch/profiling/jschuch/3967674_0_taurusi1073.h5 + [...] + DATASET "Energy_0000000002 Data" { + DATATYPE H5T_COMPOUND { + H5T_STRING { + STRSIZE 24; + STRPAD H5T_STR_NULLTERM; + CSET H5T_CSET_ASCII; + CTYPE H5T_C_S1; + } "Date_Time"; + H5T_STD_U64LE "Time"; + H5T_STD_U64LE "Power"; + H5T_STD_U64LE "CPU_Frequency"; + } + DATASPACE SIMPLE { ( 1 ) / ( 1 ) } + DATA { + (0): { + "", + 1389097545, # timestamp + 174, # power value + 1 + } + } + } + +## + +## Using the HDEEM C API + +Note: Please specify -p haswell --exclusive along with your job request +if you wish to use hdeem. + +Please download the offical documentation at \<font face="Calibri" +size="2"> [\<font +color="#0563C1">\<u>http://www.bull.com/download-hdeem-library-reference-guide\</u>\</font>](http://www.bull.com/download-hdeem-library-reference-guide)\</font> + +The HDEEM headers and sample code are made available via the hdeem +module. To find the location of the hdeem installation use + + % module show hdeem + ------------------------------------------------------------------- + /sw/modules/taurus/libraries/hdeem/2.1.9ms: + + conflict hdeem + module-whatis Load hdeem version 2.1.9ms + prepend-path PATH /sw/taurus/libraries/hdeem/2.1.9ms/include + setenv HDEEM_ROOT /sw/taurus/libraries/hdeem/2.1.9ms + ------------------------------------------------------------------- + +You can find an example of how to use the API under +\<span>$HDEEM_ROOT/sample.\</span> + +## Access Using the Dataheap Infrastructure + +In addition to the energy accounting data that is stored by Slurm, this +information is also written into our local data storage and analysis +infrastructure called +[Dataheap](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/dataheap/). +From there, the data can be used in various ways, such as including it +into application performance trace data or querying through a Python +interface. + +The Dataheap infrastructure is designed to store various types of +time-based samples from different data sources. In the case of the +energy measurements on Taurus, the data is stored as a timeline of power +values which allows the reconstruction of the power and energy +consumption over time. The timestamps are stored as UNIX timestamps with +a millisecond granularity. The data is stored for each node in the form +of `nodename/watts`, e.g., `taurusi1073/watts`. Further metrics might +already be available or might be added in the future for which +information is available upon request. + +**Note**: The dataheap infrastructure can only be accessed from inside +the university campus network. + +### Using the Python Interface + +The module `dataheap/1.0` provides a Python module that can be used to +query the data in the Dataheap for personalized data analysis. The +following is an example of how to use the interface: + + import time + import os + from dhRequest import dhClient + + # Connect to the dataheap manager + dhc = dhClient() + dhc.connect(os.environ['DATAHEAP_MANAGER_ADDR'], int(os.environ['DATAHEAP_MANAGER_PORT'])) + + # take timestamps + tbegin = dhc.getTimeStamp() + # workload + os.system("srun -n 6 a.out") + tend = dhc.getTimeStamp() + + # wait for the data to get to the + # dataheap + time.sleep(5) + + # replace this with name of the node the job ran on + # Note: use multiple requests if the job used multiple nodes + countername = "taurusi1159/watts" + + # query the dataheap + integral = dhc.storageRequest("INTEGRAL(%d,%d,\"%s\", 0)"%(tbegin, tend, countername)) + # Remember: timestamps are stored in millisecond UNIX timestamps + energy = integral/1000 + + print energy + + timeline = dhc.storageRequest("TIMELINE(%d,%d,\"%s\", 0)"%(tbegin, tend, countername)) + + # output a list of all timestamp/power-value pairs + print timeline + +## More information and Citing + +More information can be found in the paper \<a +href="<http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7016382>" +title="HDEEM Paper E2SC 2014">HDEEM: high definition energy efficiency +monitoring\</a> by Hackenberg et al. Please cite this paper if you are +using HDEEM for your scientific work. diff --git a/twiki2md/root/SystemTaurus/HardwareTaurus.md b/twiki2md/root/SystemTaurus/HardwareTaurus.md new file mode 100644 index 000000000..f6c53f795 --- /dev/null +++ b/twiki2md/root/SystemTaurus/HardwareTaurus.md @@ -0,0 +1,110 @@ +---# Central Components + +- Login-Nodes (`tauruslogin[3-6].hrsk.tu-dresden.de`) + - each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 each with 12 cores + @ 2.50GHz, MultiThreading Disabled, 64 GB RAM, 128 GB SSD local + disk + - IPs: 141.30.73.\[102-105\] +- Transfer-Nodes (`taurusexport3/4.hrsk.tu-dresden.de`, DNS Alias + `taurusexport.hrsk.tu-dresden.de`) + - 2 Servers without interactive login, only available via file + transfer protocols (rsync, ftp) + - IPs: 141.30.73.82/83 +- Direct access to these nodes is granted via IP whitelisting (contact + <hpcsupport@zih.tu-dresden.de>) - otherwise use TU Dresden VPN. + +# AMD Rome CPUs + NVIDIA A100 + +- 32 nodes, each with + - 8 x NVIDIA A100-SXM4 + - 2 x AMD EPYC CPU 7352 (24 cores) @ 2.3 GHz, MultiThreading + disabled + - 1 TB RAM + - 3.5 TB /tmp local NVMe device +- Hostnames: taurusi\[8001-8034\] +- SLURM partition `alpha` (old:gpu3) +- dedicated mostly for ScaDS-AI + +# Island 7 - AMD Rome CPUs + +- 192 nodes, each with + - 2x AMD EPYC CPU 7702 (64 cores) @ 2.0GHz, MultiThreading + enabled, + - 512 GB RAM + - 200 GB /tmp on local SSD local disk +- Hostnames: taurusi\[7001-7192\] +- SLURM partition `romeo` +- more information under [RomeNodes](RomeNodes) + +# Large SMP System HPE Superdome Flex + +- 32 x Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz (28 cores) +- 47 TB RAM +- currently configured as one single node + - Hostname: taurussmp8 +- SLURM partition `julia` +- more information under [HPE SD Flex](SDFlex) + +# IBM Power9 Nodes for Machine Learning + +For machine learning, we have 32 IBM AC922 nodes installed with this +configuration: + +- 2 x IBM Power9 CPU (2.80 GHz, 3.10 GHz boost, 22 cores) +- 256 GB RAM DDR4 2666MHz +- 6x NVIDIA VOLTA V100 with 32GB HBM2 +- NVLINK bandwidth 150 GB/s between GPUs and host +- SLURM partition `ml` +- Hostnames: taurusml\[1-32\] + +# Island 4 to 6 - Intel Haswell CPUs + +- 1456 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2680 v3 (12 cores) + @ 2.50GHz, MultiThreading disabled, 128 GB SSD local disk +- Hostname: taurusi4\[001-232\], taurusi5\[001-612\], + taurusi6\[001-612\] +- varying amounts of main memory (selected automatically by the batch + system for you according to your job requirements) + - 1328 nodes with 2.67 GB RAM per core (64 GB total): + taurusi\[4001-4104,5001-5612,6001-6612\] + - 84 nodes with 5.34 GB RAM per core (128 GB total): + taurusi\[4105-4188\] + - 44 nodes with 10.67 GB RAM per core (256 GB total): + taurusi\[4189-4232\] +- SLURM Partition `haswell` +- [Node topology](%ATTACHURL%/i4000.png) + +### Extension of Island 4 with Broadwell CPUs + +- 32 nodes, eachs witch 2 x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz + (**14 cores**) , MultiThreading disabled, 64 GB RAM, 256 GB SSD + local disk +- from the users' perspective: Broadwell is like Haswell +- Hostname: taurusi\[4233-4264\] +- SLURM partition `broadwell` + +# Island 2 Phase 2 - Intel Haswell CPUs + NVIDIA K80 GPUs + +- 64 nodes, each with 2x Intel(R) Xeon(R) CPU E5-E5-2680 v3 (12 cores) + @ 2.50GHz, MultiThreading Disabled, 64 GB RAM (2.67 GB per core), + 128 GB SSD local disk, 4x NVIDIA Tesla K80 (12 GB GDDR RAM) GPUs +- Hostname: taurusi2\[045-108\] +- SLURM Partition `gpu` +- [Node topology](%ATTACHURL%/i4000.png) (without GPUs) + +# SMP Nodes - up to 2 TB RAM + +- 5 Nodes each with 4x Intel(R) Xeon(R) CPU E7-4850 v3 (14 cores) @ + 2.20GHz, MultiThreading Disabled, 2 TB RAM + - Hostname: `taurussmp[3-7]` + - SLURM Partition `smp2` + - [Node topology](%ATTACHURL%/smp2.png) + +---# Island 2 Phase 1 - Intel Sandybridge CPUs + NVIDIA K20x GPUs + +- 44 nodes, each with 2x Intel(R) Xeon(R) CPU E5-2450 (8 cores) @ + 2.10GHz, MultiThreading Disabled, 48 GB RAM (3 GB per core), 128 GB + SSD local disk, 2x NVIDIA Tesla K20x (6 GB GDDR RAM) GPUs +- Hostname: `taurusi2[001-044]` +- SLURM Partition `gpu1` +- [Node topology](%ATTACHURL%/i2000.png) (without GPUs diff --git a/twiki2md/root/SystemTaurus/KnlNodes.md b/twiki2md/root/SystemTaurus/KnlNodes.md new file mode 100644 index 000000000..f779a68bc --- /dev/null +++ b/twiki2md/root/SystemTaurus/KnlNodes.md @@ -0,0 +1,55 @@ +# Intel Xeon Phi (Knights Landing) %RED%- Out of Service<span class="twiki-macro ENDCOLOR"></span> + +The nodes `taurusknl[1-32]` are equipped with + +- Intel Xeon Phi procesors: 64 cores Intel Xeon Phi 7210 (1,3 GHz) +- 96 GB RAM DDR4 +- 16 GB MCDRAM +- /scratch, /lustre/ssd, /projects, /home are mounted + +Benchmarks, so far (single node): + +- HPL (Linpack): 1863.74 GFlops +- SGEMM (single precision) MKL: 4314 GFlops +- Stream (only 1.4 GiB memory used): 431 GB/s + +Each of them can run 4 threads, so one can start a job here with e.g. + + srun -p knl -N 1 --mem=90000 -n 1 -c 64 a.out + +In order to get their optimal performance please re-compile your code +with the most recent Intel compiler and explicitely set the compiler +flag **`-xMIC-AVX512`**. + +MPI works now, we recommend to use the latest Intel MPI version +(intelmpi/2017.1.132). To utilize the OmniPath Fabric properly, make +sure to use the "ofi" fabric provider, which is the new default set by +the module file. + +Most nodes have a fixed configuration for cluster mode (Quadrant) and +memory mode (Cache). For testing purposes, we have configured a few +nodes with different modes (other configurations are possible upon +request): + +| Nodes | Cluster Mode | Memory Mode | +|:-------------------|:-------------|:------------| +| taurusknl\[1-28\] | Quadrant | Cache | +| taurusknl29 | Quadrant | Flat | +| taurusknl\[30-32\] | SNC4 | Flat | + +They have SLURM features set, so that you can request them specifically +by using the SLURM parameter **--constraint** where multiple values can +be linked with the & operator, e.g. **--constraint="SNC4&Flat"**. If you +don't set a constraint, your job will run preferably on the nodes with +Quadrant+Cache. + +Note that your performance might take a hit if your code is not +NUMA-aware and does not make use of the Flat memory mode while running +on the nodes that have those modes set, so you might want to use +--constraint="Quadrant&Cache" in such a case to ensure your job does not +run on an unfavorable node (which might happen if all the others are +already allocated). + +\<a +href="<http://www.prace-ri.eu/best-practice-guide-knights-landing-january-2017/>" +title="Knl Best Practice Guide">KNL Best Practice Guide\</a> from PRACE diff --git a/twiki2md/root/SystemTaurus/RunningNxGpuAppsInOneJob.md b/twiki2md/root/SystemTaurus/RunningNxGpuAppsInOneJob.md new file mode 100644 index 000000000..2152522ae --- /dev/null +++ b/twiki2md/root/SystemTaurus/RunningNxGpuAppsInOneJob.md @@ -0,0 +1,85 @@ +# Running Multiple GPU Applications Simultaneously in a Batch Job + +Keywords: slurm, job, gpu, multiple, instances, application, program, +background, parallel, serial, concurrently, simultaneously + +## Objective + +Our starting point is a (serial) program that needs a single GPU and +four CPU cores to perform its task (e.g. TensorFlow). The following +batch script shows how to run such a job on the Taurus partition called +"ml". + + #!/bin/bash + #SBATCH --ntasks=1 + #SBATCH --cpus-per-task=4 + #SBATCH --gres=gpu:1 + #SBATCH --gpus-per-task=1 + #SBATCH --time=01:00:00 + #SBATCH --mem-per-cpu=1443 + #SBATCH --partition=ml + + srun some-gpu-application + +When srun is used within a submission script, it inherits parameters +from sbatch, including --ntasks=1, --cpus-per-task=4, etc. So we +actually implicitly run the following + + srun --ntasks=1 --cpus-per-task=4 ... --partition=ml some-gpu-application + +Now, our goal is to run four instances of this program concurrently in a +**single** batch script. Of course we could also start the above script +multiple times with sbatch, but this is not what we want to do here. + +## Solution + +In order to run multiple programs concurrently in a single batch +script/allocation we have to do three things: + +1\. Allocate enough resources to accommodate multiple instances of our +program. This can be achieved with an appropriate batch script header +(see below). + +2\. Start job steps with srun as background processes. This is achieved +by adding an ampersand at the end of the srun command + +3\. Make sure that each background process gets its private resources. +We need to set the resource fraction needed for a single run in the +corresponding srun command. The total aggregated resources of all job +steps must fit in the allocation specified in the batch script header. +Additionally, the option --exclusive is needed to make sure that each +job step is provided with its private set of CPU and GPU resources. + +The following example shows how four independent instances of the same +program can be run concurrently from a single batch script. Each +instance (task) is equipped with 4 CPUs (cores) and one GPU. + + #!/bin/bash + #SBATCH --ntasks=4 + #SBATCH --cpus-per-task=4 + #SBATCH --gres=gpu:4 + #SBATCH --gpus-per-task=1 + #SBATCH --time=01:00:00 + #SBATCH --mem-per-cpu=1443 + #SBATCH --partition=ml + + srun --exclusive --gres=gpu:1 --ntasks=1 --cpus-per-task=4 --gpus-per-task=1 --mem-per-cpu=1443 some-gpu-application & + srun --exclusive --gres=gpu:1 --ntasks=1 --cpus-per-task=4 --gpus-per-task=1 --mem-per-cpu=1443 some-gpu-application & + srun --exclusive --gres=gpu:1 --ntasks=1 --cpus-per-task=4 --gpus-per-task=1 --mem-per-cpu=1443 some-gpu-application & + srun --exclusive --gres=gpu:1 --ntasks=1 --cpus-per-task=4 --gpus-per-task=1 --mem-per-cpu=1443 some-gpu-application & + + echo "Waiting for all job steps to complete..." + wait + echo "All jobs completed!" + +In practice it is possible to leave out resource options in srun that do +not differ from the ones inherited from the surrounding sbatch context. +The following line would be sufficient to do the job in this example: + + srun --exclusive --gres=gpu:1 --ntasks=1 some-gpu-application & + +Yet, it adds some extra safety to leave them in, enabling the SLURM +scheduler to complain if not enough resources in total were specified in +the header of the batch script. + +-- Main.HolgerBrunst - 2021-04-16 diff --git a/twiki2md/root/SystemTaurus/Slurm.md b/twiki2md/root/SystemTaurus/Slurm.md new file mode 100644 index 000000000..70563d411 --- /dev/null +++ b/twiki2md/root/SystemTaurus/Slurm.md @@ -0,0 +1,570 @@ +# Slurm + + The HRSK-II systems are operated +with the batch system Slurm. Just specify the resources you need in +terms of cores, memory, and time and your job will be placed on the +system. + +## Job Submission + +Job submission can be done with the command: `srun [options] <command>` + +However, using srun directly on the shell will be blocking and launch an +interactive job. Apart from short test runs, it is recommended to launch +your jobs into the background by using batch jobs. For that, you can +conveniently put the parameters directly in a job file which you can +submit using `sbatch [options] <job file>` + +Some options of `srun/sbatch` are: + +| slurm option | Description | +|:---------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| -n \<N> or --ntasks \<N> | set a number of tasks to N(default=1). This determines how many processes will be spawned by srun (for MPI jobs). | +| -N \<N> or --nodes \<N> | set number of nodes that will be part of a job, on each node there will be --ntasks-per-node processes started, if the option --ntasks-per-node is not given, 1 process per node will be started | +| --ntasks-per-node \<N> | how many tasks per allocated node to start, as stated in the line before | +| -c \<N> or --cpus-per-task \<N> | this option is needed for multithreaded (e.g. OpenMP) jobs, it tells SLURM to allocate N cores per task allocated; typically N should be equal to the number of threads you program spawns, e.g. it should be set to the same number as OMP_NUM_THREADS | +| -p \<name> or --partition \<name> | select the type of nodes where you want to execute your job, on Taurus we currently have haswell, `smp`, `sandy`, `west`, ml and `gpu` available | +| --mem-per-cpu \<name> | specify the memory need per allocated CPU in MB | +| --time \<HH:MM:SS> | specify the maximum runtime of your job, if you just put a single number in, it will be interpreted as minutes | +| --mail-user \<your email> | tell the batch system your email address to get updates about the status of the jobs | +| --mail-type ALL | specify for what type of events you want to get a mail; valid options beside ALL are: BEGIN, END, FAIL, REQUEUE | +| -J \<name> or --job-name \<name> | give your job a name which is shown in the queue, the name will also be included in job emails (but cut after 24 chars within emails) | +| --no-requeue | At node failure, jobs are requeued automatically per default. Use this flag to disable requeueing. | +| --exclusive | tell SLURM that only your job is allowed on the nodes allocated to this job; please be aware that you will be charged for all CPUs/cores on the node | +| -A \<project> | Charge resources used by this job to the specified project, useful if a user belongs to multiple projects. | +| -o \<filename> or --output \<filename> | \<p>specify a file name that will be used to store all normal output (stdout), you can use %j (job id) and %N (name of first node) to automatically adopt the file name to the job, per default stdout goes to "slurm-%j.out"\</p> \<p>%RED%NOTE:<span class="twiki-macro ENDCOLOR"></span> the target path of this parameter must be writeable on the compute nodes, i.e. it may not point to a read-only mounted file system like /projects.\</p> | +| -e \<filename> or --error \<filename> | \<p>specify a file name that will be used to store all error output (stderr), you can use %j (job id) and %N (name of first node) to automatically adopt the file name to the job, per default stderr goes to "slurm-%j.out" as well\</p> \<p>%RED%NOTE:<span class="twiki-macro ENDCOLOR"></span> the target path of this parameter must be writeable on the compute nodes, i.e. it may not point to a read-only mounted file system like /projects.\</p> | +| -a or --array | submit an array job, see the extra section below | +| -w \<node1>,\<node2>,... | restrict job to run on specific nodes only | +| -x \<node1>,\<node2>,... | exclude specific nodes from job | + +The following example job file shows how you can make use of sbatch + + #!/bin/bash + #SBATCH --time=01:00:00 + #SBATCH --output=simulation-m-%j.out + #SBATCH --error=simulation-m-%j.err + #SBATCH --ntasks=512 + #SBATCH -A myproject + + echo Starting Program + +During runtime, the environment variable SLURM_JOB_ID will be set to the +id of your job. + +You can also use our **[Slurm Batch File Generator](Slurmgenerator)**, +which could help you create basic SLURM job scripts. + +Detailed information on \<a +href="SystemTaurus#Run_45time_and_Memory_Limits" target="\_self" +title="memory limits on Taurus">memory limits on Taurus\</a>. + +#InteractiveJobs + +### Interactive Jobs + +Interactive activities like editing, compiling etc. are normally limited +to the login nodes. For longer interactive sessions you can allocate +cores on the compute node with the command "salloc". It takes the same +options like `sbatch` to specify the required resources. + +The difference to LSF is, that `salloc` returns a new shell on the node, +where you submitted the job. You need to use the command `srun` in front +of the following commands to have these commands executed on the +allocated resources. If you allocate more than one task, please be aware +that srun will run the command on each allocated task! + +An example of an interactive session looks like: + + tauruslogin3 /home/mark> srun --pty -n 1 -c 4 --time=1:00:00 --mem-per-cpu=1700 bash<br />srun: job 13598400 queued and waiting for resources<br />srun: job 13598400 has been allocated resources<br />taurusi1262 /home/mark> # start interactive work with e.g. 4 cores. + +<span class="twiki-macro RED"></span> **Note:** <span +class="twiki-macro ENDCOLOR"></span> A dedicated partition "interactive" +is reserved for short jobs +(\<`8h) with not more than one job per user. Please check the availability of nodes there with =sinfo -p interactive` +. + +### Interactive X11/GUI Jobs + +SLURM will forward your X11 credentials to the first (or even all) node +for a job with the (undocumented) --x11 option. For example, an +interactive session for 1 hour with Matlab using eigth cores can be +started with: + + module load matlab + srun --ntasks=1 --cpus-per-task=8 --time=1:00:00 --pty --x11=first matlab + +<span class="twiki-macro RED"></span> **Note:** <span +class="twiki-macro ENDCOLOR"></span> If you are getting the error: + + srun: error: x11: unable to connect node taurusiXXXX + +that probably means you still have an old host key for the target node +in your \~/.ssh/known_hosts file (e.g. from pre-SCS5). This can be +solved either by removing the entry from your known_hosts or by simply +deleting the known_hosts file altogether if you don't have important +other entries in it. + +### Requesting an Nvidia K20X / K80 / A100 + +SLURM will allocate one or many GPUs for your job if requested. Please +note that GPUs are only available in certain partitions, like +\<span>gpu\</span>2, gpu3 or `gpu2-interactive`. The option for +sbatch/srun in this case is `--gres=gpu:[NUM_PER_NODE]` (where +`NUM_PER_NODE` can be `1`, 2 or 4, meaning that one, two or four of the +GPUs per node will be used for the job). A sample job file could look +like this + + #!/bin/bash + #SBATCH -A Project1 # account CPU time to Project1 + #SBATCH --nodes=2 # request 2 nodes<br />#SBATCH --mincpus=1 # allocate one task per node...<br />#SBATCH --ntasks=2 # ...which means 2 tasks in total (see note below) + #SBATCH --cpus-per-task=6 # use 6 threads per task + #SBATCH --gres=gpu:1 # use 1 GPU per node (i.e. use one GPU per task) + #SBATCH --time=01:00:00 # run for 1 hour + srun ./your/cuda/application # start you application (probably requires MPI to use both nodes) + +Please be aware that the partitions `gpu`, `gpu1` and `gpu2` can only be +used for non-interactive jobs which are submitted by \<span> `sbatch`. +\</span>Interactive jobs (`salloc`, `srun`) will have to use the +partition `gpu-interactive`. SLURM will automatically select the right +partition if the partition parameter (-p) is omitted. + +<span class="twiki-macro RED"></span> **Note:** %ENDCOLOR%Due to an +unresolved issue concering the SLURM job scheduling behavior, it is +currently not practical to use --ntasks-per-node together with GPU jobs. +If you want to use multiple nodes, please use the parameters --ntasks +and --mincpus instead. The values of mincpus \* nodes has to equal +ntasks in this case. + +### Limitations of GPU job allocations + +The number of cores per node that are currently allowed to be allocated +for GPU jobs is limited depending on how many GPUs are being requested. +On the K80 nodes, you may only request up to 6 cores per requested GPU +(8 per on the K20 nodes). This is because we do not wish that GPUs +remain unusable due to all cores on a node being used by a single job +which does not, at the same time, request all GPUs. + +E.g., if you specify --gres=gpu:2, your total number of cores per node +(meaning: ntasks \* cpus-per-task) may not exceed 12 (on the K80 nodes) + +Note that this also has implications for the use of the --exclusive +parameter. Since this sets the number of allocated cores to 24 (or 16 on +the K20X nodes), you also **must** request all four GPUs by specifying +--gres=gpu:4, otherwise your job will not start. In the case of +--exclusive, it won't be denied on submission, because this is evaluated +in a later scheduling step. Jobs that directly request too many cores +per GPU will be denied with the error message: \<div +style="padding-left: 30px;">Batch job submission failed: Requested node +configuration is not available\</div> + +#ParallelJobs + +### Parallel Jobs + +For submitting parallel jobs, a few rules have to be understood and +followed. In general, they depend on the type of parallelization and +architecture. + +#OpenMPJobs + +#### OpenMP Jobs + +An SMP-parallel job can only run within a node, so it is necessary to +include the options `-N 1` and `-n 1`. The maximum number of processors +for an SMP-parallel program is 488 on Venus and 56 on taurus (smp +island). Using --cpus-per-task N SLURM will start one task and you will +have N CPUs available for your job. An example job file would look like: + + #!/bin/bash + #SBATCH -J Science1 + #SBATCH --nodes=1 + #SBATCH --tasks-per-node=1 + #SBATCH --cpus-per-task=8 + #SBATCH --mail-type=end + #SBATCH --mail-user=your.name@tu-dresden.de + #SBATCH --time=08:00:00 + + export OMP_NUM_THREADS=8 + ./path/to/binary + +#MpiJobs + +#### MPI Jobs + +For MPI jobs one typically allocates one core per task that has to be +started.** Please note:** There are different MPI libraries on Taurus +and Venus, so you have to compile the binaries specifically for their +target. + + #!/bin/bash + #SBATCH -J Science1 + #SBATCH --ntasks=864 + #SBATCH --mail-type=end + #SBATCH --mail-user=your.name@tu-dresden.de + #SBATCH --time=08:00:00 + + srun ./path/to/binary + +#PseudoParallelJobs + +#### Multiple Programms Running Simultaneously in a Job + +In this short example, our goal is to run four instances of a program +concurrently in a **single** batch script. Of course we could also start +a batch script four times with sbatch but this is not what we want to do +here. Please have a look at [Running Multiple GPU Applications +Simultaneously in a Batch Job](Compendium.RunningNxGpuAppsInOneJob) in +case you intend to run GPU programs simultaneously in a **single** job. + + #!/bin/bash + #SBATCH -J PseudoParallelJobs + #SBATCH --ntasks=4 + #SBATCH --cpus-per-task=1 + #SBATCH --mail-type=end + #SBATCH --mail-user=your.name@tu-dresden.de + #SBATCH --time=01:00:00 + + # The following sleep command was reported to fix warnings/errors with srun by users (feel free to uncomment). + #sleep 5 + srun --exclusive --ntasks=1 ./path/to/binary & + + #sleep 5 + srun --exclusive --ntasks=1 ./path/to/binary & + + #sleep 5 + srun --exclusive --ntasks=1 ./path/to/binary & + + #sleep 5 + srun --exclusive --ntasks=1 ./path/to/binary & + + echo "Waiting for parallel job steps to complete..." + wait + echo "All parallel job steps completed!" + +#ExclusiveJobs + +### Exclusive Jobs for Benchmarking + +Jobs on taurus run, by default, in shared-mode, meaning that multiple +jobs can run on the same compute nodes. Sometimes, this behaviour is not +desired (e.g. for benchmarking purposes), in which case it can be turned +off by specifying the SLURM parameter: `--exclusive` . + +*Setting --exclusive **only** makes sure that there will be **no other +jobs running on your nodes**. It does not, however, mean that you +automatically get access to all the resources which the node might +provide without explicitly requesting them, e.g. you still have to +request a GPU via the generic resources parameter (gres) to run on the +GPU partitions, or you still have to request all cores of a node if you +need them. CPU cores can either to be used for a task (--ntasks) or for +multi-threading within the same task (--cpus-per-task). Since those two +options are semantically different (e.g., the former will influence how +many MPI processes will be spawned by 'srun' whereas the latter does +not), SLURM cannot determine automatically which of the two you might +want to use. Since we use cgroups for separation of jobs, your job is +not allowed to use more resources than requested.* + +If you just want to use all available cores in a node, you have to +specify how Slurm should organize them, like with \<span>"-p haswell -c +24\</span>" or "\<span>-p haswell --ntasks-per-node=24". \</span> + +Here is a short example to ensure that a benchmark is not spoiled by +other jobs, even if it doesn't use up all resources in the nodes: + + #!/bin/bash + #SBATCH -J Benchmark<br />#SBATCH -p haswell<br />#SBATCH --nodes=2<br />#SBATCH --ntasks-per-node=2 + #SBATCH --cpus-per-task=8<br />#SBATCH --exclusive # ensure that nobody spoils my measurement on 2 x 2 x 8 cores<br />#SBATCH --mail-user=your.name@tu-dresden.de + #SBATCH --time=00:10:00 + + srun ./my_benchmark + +### Array Jobs + +Array jobs can be used to create a sequence of jobs that share the same +executable and resource requirements, but have different input files, to +be submitted, controlled, and monitored as a single unit. The arguments +-a or --array take an additional parameter that specify the array +indices. Within the job you can read the environment variables +SLURM_ARRAY_JOB_ID, which will be set to the first job ID of the array, +and SLURM_ARRAY_TASK_ID, which will be set individually for each step. + +Within an array job, you can use %a and %A in addition to %j and %N +(described above) to make the output file name specific to the job. %A +will be replaced by the value of SLURM_ARRAY_JOB_ID and %a will be +replaced by the value of SLURM_ARRAY_TASK_ID. + +Here is an example of how an array job can look like: + + #!/bin/bash + #SBATCH -J Science1 + #SBATCH --array 0-9 + #SBATCH -o arraytest-%A_%a.out + #SBATCH -e arraytest-%A_%a.err + #SBATCH --ntasks=864 + #SBATCH --mail-type=end + #SBATCH --mail-user=your.name@tu-dresden.de + #SBATCH --time=08:00:00 + + echo "Hi, I am step $SLURM_ARRAY_TASK_ID in this array job $SLURM_ARRAY_JOB_ID" + +**%RED%Note:<span class="twiki-macro ENDCOLOR"></span>**If you submit a +large number of jobs doing heavy I/O in the Lustre file systems you +should limit the number of your simultaneously running job with a second +parameter like: + + #SBATCH --array=1-100000%100 + +For further details please read the Slurm documentation at +<https://slurm.schedmd.com/sbatch.html> + +### Chain Jobs + +You can use chain jobs to create dependencies between jobs. This is +often the case if a job relies on the result of one or more preceding +jobs. Chain jobs can also be used if the runtime limit of the batch +queues is not sufficient for your job. SLURM has an option `-d` or +"--dependency" that allows to specify that a job is only allowed to +start if another job finished. + +Here is an example of how a chain job can look like, the example submits +4 jobs (described in a job file) that will be executed one after each +other with different CPU numbers: + + #!/bin/bash + TASK_NUMBERS="1 2 4 8" + DEPENDENCY="" + JOB_FILE="myjob.slurm" + + for TASKS in $TASK_NUMBERS ; do + JOB_CMD="sbatch --ntasks=$TASKS" + if [ -n "$DEPENDENCY" ] ; then + JOB_CMD="$JOB_CMD --dependency afterany:$DEPENDENCY" + fi + JOB_CMD="$JOB_CMD $JOB_FILE" + echo -n "Running command: $JOB_CMD " + OUT=`$JOB_CMD` + echo "Result: $OUT" + DEPENDENCY=`echo $OUT | awk '{print $4}'` + done + +### Binding and Distribution of Tasks + +The SLURM provides several binding strategies to place and bind the +tasks and/or threads of your job to cores, sockets and nodes. Note: Keep +in mind that the distribution method has a direct impact on the +execution time of your application. The manipulation of the distribution +can either speed up or slow down your application. More detailed +information about the binding can be found +[here](BindingAndDistributionOfTasks). + +The default allocation of the tasks/threads for OpenMP, MPI and Hybrid +(MPI and OpenMP) are as follows. + +**OpenMP** + +The illustration below shows the default binding of a pure OpenMP-job on +1 node with 16 cpus on which 16 threads are allocated. + + #!/bin/bash + #SBATCH --nodes=1 + #SBATCH --tasks-per-node=1 + #SBATCH --cpus-per-task=16 + + export OMP_NUM_THREADS=16 + + srun --ntasks 1 --cpus-per-task $OMP_NUM_THREADS ./application + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAX4AAADeCAIAAAC10/zxAAAABmJLR0QA/wD/AP+gvaeTAAASvElEQVR4nO3de1BU5ePH8XMIBN0FVllusoouCuZ0UzMV7WtTDqV2GRU0spRm1GAqtG28zaBhNmU62jg2WWkXGWegNLVmqnFGQhsv/WEaXQxLaFEQdpfBXW4ul+X8/jgTQ1z8KQd4luX9+mv3Oc8+5zl7nv3wnLNnObKiKBIA9C8/0R0AMBgRPQAEIHoACED0ABCA6AEgANEDQACiB4AARA8AAYgeAAIQPQAEIHoACED0ABCA6AEgANEDQACiB4AARA8AAYgeAAIQPQAEIHoACOCv5cWyLPdWPwAMOFr+szuzHgACaJr1qLinBTDYaD/iYdYDQACiB4AARA8AAYgeAAIQPQAEIHoACED0ABCA6AEgANEDQACiB4AARA8AAYgeAAIQPQAEIHoACED0ABCA6AEgANEDQACiB4AARA8AAYgeAAIQPQAEIHoACED0ABCA6Bm8HnnkEVmWz54921YSFRV17Nix22/hl19+0ev1t18/JycnMTFRp9NFRUXdQUfhi4ieQS0sLGzt2rX9tjqj0bhmzZrs7Ox+WyO8FtEzqK1YsaK4uPirr77qvKiioiIlJSUiIsJkMr3yyisNDQ1q+bVr1x5//HGDwXDPPfecOXOmrX5NTU1GRsaoUaPCw8OfffbZqqqqzm3Omzdv8eLFo0aN6qPNwQBC9Axqer0+Ozt748aNzc3NHRYtWrQoICCguLj4/PnzFy5csFgsanlKSorJZKqsrPzuu+8+/PDDtvpLly612WwXL168evVqaGhoWlpav20FBiRFA+0tQKDZs2dv3bq1ubl5woQJe/bsURQlMjLy6NGjiqIUFRVJkmS329Wa+fn5QUFBHo+nqKhIluXq6mq1PCcnR6fTKYpSUlIiy3JbfZfLJcuy0+nscr25ubmRkZF9vXXoU9o/+/7CMg/ewd/ff9u2bStXrly2bFlbYVlZmU6nCw8PV5+azWa3211VVVVWVhYWFjZ8+HC1fPz48eoDq9Uqy/LUqVPbWggNDS0vLw8NDe2v7cAAQ/RAeuaZZ3bu3Llt27a2EpPJVF9f73A41PSxWq2BgYFGozEmJsbpdDY2NgYGBkqSVFlZqdYfPXq0LMuFhYVkDW4T53ogSZK0Y8eO3bt319bWqk/j4+OnT59usVjq6upsNltWVtby5cv9/PwmTJgwadKk9957T5KkxsbG3bt3q/Xj4uKSkpJWrFhRUVEhSZLD4Th8+HDntXg8HrfbrZ5XcrvdjY2N/bR58D5EDyRJkqZNmzZ//vy2r7FkWT58+HBDQ8PYsWMnTZp033337dq1S1106NCh/Pz8yZMnP/roo48++mhbC7m5uSNHjkxMTAwODp4+ffrp06c7r2Xfvn1Dhw5dtmyZzWYbOnRoWFhYP2wavJPcdsaoJy+WZUmStLQAYCDS/tln1gNAAKIHgABEDwABiB4AAhA9AAQgegAIQPQAEIDoASAA0QNAAKIHgABEDwABiB4AAhA9AAQgegAIQPQAEIDoASAA0QNAAKIHgABEDwABiB4AAhA9AAQgegAIQPQAEIDoASAA0QNAAH/tTaj3IQSA28esB4AAmu65DgA9w6wHgABEDwABiB4AAhA9AAQgegAIQPQAEEDTJYVcTDgY9OzyC8bGYKDl0hxmPQAE6IUfUnBRoq/SPnNhbPgq7WODWQ8AAYgeAAIQPQAEIHoACED0ABCA6AEgANEDQACiB4AARA8AAYgeAAIQPQAEIHoACED0ABDA96Pn0qVLTz31lNFoHDZs2IQJE9avX9+DRiZMmHDs2LHbrPzAAw/k5eV1uSgnJycxMVGn00VFRfWgG+hdXjU2XnvttYkTJw4bNmz06NHr1q1ramrqQWcGEB+PntbW1ieeeGLkyJG//fZbVVVVXl6e2WwW2B+j0bhmzZrs7GyBfYDK28ZGXV3dRx99dO3atby8vLy8vDfeeENgZ/qDooH2FvratWvXJEm6dOlS50XXr19PTk4ODw+PiYl5+eWX6+vr1fIbN25kZGSMHj06ODh40qRJRUVFiqIkJCQcPXpUXTp79uxly5Y1NTW5XK709HSTyWQ0GpcsWeJwOBRFeeWVVwICAoxGY2xs7LJly7rsVW5ubmRkZF9tc+/Rsn8ZGz0bG6rNmzc//PDDvb/NvUf7/vXxWc/IkSPj4+PT09O/+OKLq1evtl+0aNGigICA4uLi8+fPX7hwwWKxqOWpqamlpaXnzp1zOp0HDhwIDg5ue0lpaenMmTNnzZp14MCBgICApUuX2my2ixcvXr16NTQ0NC0tTZKkPXv2TJw4cc+ePVar9cCBA/24rbgz3jw2Tp8+PWXKlN7fZq8iNvn6gc1m27Bhw+TJk/39/ceNG5ebm6soSlFRkSRJdrtdrZOfnx8UFOTxeIqLiyVJKi8v79BIQkLCpk2bTCbTRx99pJaUlJTIstzWgsvlkmXZ6XQqinL//fera+kOsx4v4YVjQ1GUzZs3jx07tqqqqhe3tNf1QnqIXX1/qq2t3blzp5+f36+//nrixAmdTte26J9//pEkyWaz5efnDxs2rPNrExISIiMjp02b5na71ZIffvjBz88vth2DwfDHH38oRI/m1/Y/7xkbW7ZsMZvNVqu1V7ev92nfvz5+wNWeXq+3WCxBQUG//vqryWSqr693OBzqIqvVGhgYqB6ENzQ0VFRUdH757t27w8PDn3766YaGBkmSRo8eLctyYWGh9V83btyYOHGiJEl+foPoXfUNXjI2NmzYcPDgwVOnTsXGxvbBVnoXH/+QVFZWrl279uLFi/X19dXV1e+8805zc/PUqVPj4+OnT59usVjq6upsNltWVtby5cv9/Pzi4uKSkpJWrVpVUVGhKMrvv//eNtQCAwOPHDkSEhIyd+7c2tpateaKFSvUCg6H4/Dhw2rNqKioy5cvd9kfj8fjdrubm5slSXK73Y2Njf3yNqAL3jY2MjMzjxw5cvz4caPR6Ha7ff7LdR8/4HK5XCtXrhw/fvzQoUMNBsPMmTO//fZbdVFZWdnChQuNRmN0dHRGRkZdXZ1aXl1dvXLlypiYmODg4MmTJ1++fFlp9y1GS0vLCy+88NBDD1VXVzudzszMzDFjxuj1erPZvHr1arWFkydPjh8/3mAwLFq0qEN/9u7d2/7Nbz+x90Ja9i9j447Gxo0bNzp8MOPi4vrvvbhz2vevrGi4XYl6QwwtLcCbadm/jA3fpn3/+vgBFwDvRPQAEIDoASAA0QNAAKIHgABEDwABiB4AAhA9AAQgegAIQPQAEIDoASAA0QNAAKIHgABEDwAB/LU3of58HuiMsYHuMOsBIICmfxUGAD3DrAeAAEQPAAGIHgACED0ABCB6AAhA9AAQgOgBIICmq5m5VnUw0HILQPg2bgEIYIDphd9wcT20r9I+c2Fs+CrtY4NZDwABiB4AAhA9AAQgegAIQPQAEIDoASAA0QNAAKIHgAA+Gz1nzpyZP3/+iBEjdDrdvffem5WVVV9f3w/rbWlpyczMHDFiREhIyNKlS2tqarqsptfr5XYCAwMbGxv7oXuDlqjxYLPZFi9ebDQaDQbD448/fvny5S6r5eTkJCYm6nS6qKio9uVpaWntx0leXl4/9Ll/+Gb0fPPNN4899tj9999/7tw5u91+8OBBu91eWFh4O69VFKW5ubnHq96yZcvx48fPnz9/5cqV0tLS9PT0LqvZbLbafy1cuHDBggWBgYE9XiluTeB4yMjIcDqdf/31V3l5eXR0dEpKSpfVjEbjmjVrsrOzOy+yWCxtQyU5ObnHPfE6igbaW+gLHo/HZDJZLJYO5a2trYqiXL9+PTk5OTw8PCYm5uWXX66vr1eXJiQkZGVlzZo1Kz4+vqCgwOVypaenm0wmo9G4ZMkSh8OhVtu1a1dsbGxoaGh0dPTWrVs7rz0iIuLTTz9VHxcUFPj7+9+4ceMWvXU4HIGBgT/88IPGre4LWvav94wNseMhLi5u//796uOCggI/P7+WlpbuupqbmxsZGdm+ZPny5evXr+/ppvehXkgPsavvC+pfs4sXL3a5dMaMGampqTU1NRUVFTNmzHjppZfU8oSEhHvuuaeqqkp9+uSTTy5YsMDhcDQ0NKxatWr+/PmKoly+fFmv1//999+Kojidzp9//rlD4xUVFe1XrR5tnTlz5ha93bFjx/jx4zVsbh/yjegROB4URVm3bt1jjz1ms9lcLtfzzz+/cOHCW3S1y+iJjo42mUxTpkx59913m5qa7vwN6BNETxdOnDghSZLdbu+8qKioqP2i/Pz8oKAgj8ejKEpCQsL777+vlpeUlMiy3FbN5XLJsux0OouLi4cOHfrll1/W1NR0ueq//vpLkqSSkpK2Ej8/v++///4WvY2Pj9+xY8edb2V/8I3oETge1MqzZ89W342777776tWrt+hq5+g5fvz42bNn//7778OHD8fExHSeu4miff/64Lme8PBwSZLKy8s7LyorK9PpdGoFSZLMZrPb7a6qqlKfjhw5Un1gtVplWZ46deqYMWPGjBlz3333hYaGlpeXm83mnJycDz74ICoq6n//+9+pU6c6tB8cHCxJksvlUp/W1ta2traGhIR8/vnnbWcK29cvKCiwWq1paWm9te3oTOB4UBRlzpw5ZrO5urq6rq5u8eLFs2bNqq+v7248dJaUlDRjxoxx48YtWrTo3XffPXjwoJa3wruITb6+oB7bv/766x3KW1tbO/yVKygoCAwMbPsrd/ToUbX8ypUrd911l9Pp7G4VDQ0Nb7/99vDhw9XzBe1FRER89tln6uOTJ0/e+lzPkiVLnn322TvbvH6kZf96z9gQOB4cDofU6QD8p59+6q6dzrOe9r788ssRI0bcalP7US+kh9jV95Gvv/46KCho06ZNxcXFbrf7999/z8jIOHPmTGtr6/Tp059//vna2trKysqZM2euWrVKfUn7oaYoyty5c5OTk69fv64oit1uP3TokKIof/75Z35+vtvtVhRl3759ERERnaMnKysrISGhpKTEZrM9/PDDqamp3XXSbrcPGTLEO08wq3wjehSh4yE2NnblypUul+vmzZtvvvmmXq+vrq7u3MOWlpabN2/m5ORERkbevHlTbdPj8ezfv99qtTqdzpMnT8bFxbWdihKO6OnW6dOn586dazAYhg0bdu+9977zzjvqlxdlZWULFy40Go3R0dEZGRl1dXVq/Q5Dzel0ZmZmjhkzRq/Xm83m1atXK4py4cKFhx56KCQkZPjw4dOmTfvxxx87r7epqenVV181GAx6vT41NdXlcnXXw+3bt3vtCWaVz0SPIm48FBYWJiUlDR8+PCQkZMaMGd39pdm7d2/7YxGdTqcoisfjmTNnTlhY2JAhQ8xm88aNGxsaGnr9nekZ7ftX0z3X1SNVLS3Am2nZv4wN36Z9//rgaWYA3o/oASAA0QNAAKIHgABEDwABiB4AAhA9AAQgegAIQPQAEKAX7rmu/e7L8FWMDXSHWQ8AATT9hgsAeoZZDwABiB4AAhA9AAQgegAIQPQAEIDoASAA0QNAAE1XM3OtKjCY8b+ZAQwwvfAbLq6HBgYb7Uc8zHoACED0ABCA6AEgANGDjlpaWjIzM0eMGBESErJ06dKampouq+Xk5CQmJup0uqioqA6L0tLS5Hby8vL6vtcYYIgedLRly5bjx4+fP3/+ypUrpaWl6enpXVYzGo1r1qzJzs7ucqnFYqn9V3Jych92FwMT0YOOPv744w0bNpjN5oiIiLfeeuvQoUNOp7NztXnz5i1evHjUqFFdNhIQEKD/l79/L3yRCh9D9OA/Kisr7Xb7pEmT1KdTpkxpaWm5dOnSnbaTk5MzatSoBx98cPv27c3Nzb3dTQx4/DnCf9TW1kqSFBoaqj4NDg728/Pr7nRPd5577rmXXnopPDy8sLBw9erVNptt586dvd9XDGRED/4jODhYkiSXy6U+ra2tbW1tDQkJ+fzzz1988UW18P+9iDQpKUl9MG7cOLfbbbFYiB50wAEX/iMqKioiIuKXX35Rn164cMHf33/ixIlpaWnKv+6owSFDhrS0tPRBTzGwET3oaNWqVdu2bfvnn3/sdvumTZtSUlIMBkPnah6Px+12q+dx3G53Y2OjWt7a2vrJJ5+Ulpa6XK5Tp05t3LgxJSWlXzcAA4KigfYW4IWamppeffVVg8Gg1+tTU1NdLleX1fbu3dt+IOl0OrXc4/HMmTMnLCxsyJAhZrN548aNDQ0N/dh99Aftn31NN8NRf0KmpQUAA5H2zz4HXAAEIHoACED0ABCA6AEgQC9cUsh/aAZwp5j1ABBA05frANAzzHoACED0ABCA6AEgANEDQACiB4AARA8AAYgeAAIQPQAEIHoACED0ABCA6AEgANEDQACiB4AARA8AAYgeAAIQPQAEIHoACED0ABCA6AEgwP8BhqBe/aVBoe8AAAAASUVORK5CYII=" +/> + +**MPI** + +The illustration below shows the default binding of a pure MPI-job. In +which 32 global ranks are distributed onto 2 nodes with 16 cores each. +Each rank has 1 core assigned to it. + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=16 + #SBATCH --cpus-per-task=1 + + srun --ntasks 32 ./application + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAw4AAADeCAIAAAAb9sCoAAAABmJLR0QA/wD/AP+gvaeTAAAfBklEQVR4nO3dfXBU1f348bshJEA2ISGbB0gIZAMJxqciIhCktGKxaqs14UEGC9gBJVUjxIo4EwFlpiqMOgydWipazTBNVATbGevQMQQYUMdSEEUNYGIID8kmMewmm2TzeH9/3On+9pvN2T27N9nsJu/XX+Tu/dx77uee8+GTu8tiUFVVAQAAQH/ChnoAAAAAwYtWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQIhWCQAAQChcT7DBYBiocQAIOaqqDvUQfEC9AkYyPfWKp0oAAABCup4qaULrN0sA+oXuExrqFTDS6K9XPFUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUCAAAQolUauX72s58ZDIZPP/3UuSU5OfnDDz+UP8KXX35pNBrl9y8uLs7JyYmKikpOTvZhoABGvMDXq40bN2ZnZ48bNy4tLW3Tpk2dnZ0+DBfDC63SiBYfH//0008H7HQmk2nDhg3btm0L2BkBDBsBrld2u33Pnj2XLl0qLS0tLS3dunVrwE6NYEOrNKKtXbu2srLygw8+cH+ptrZ26dKliYmJqampjz/+eFtbm7b90qVLd911V2xs7A033HDixAnn/s3Nzfn5+ZMnT05ISHjwwQcbGxvdj3nPPfcsW7Zs8uTJg3Q5AIaxANerN954Y8GCBfHx8Tk5OQ8//LBrOEYaWqURzWg0btu27dlnn+3q6urzUl5e3ujRoysrK0+ePHnq1KnCwkJt+9KlS1NTU+vq6v71r3/95S9/ce6/cuVKi8Vy+vTpmpqa8ePHr1mzJmBXAWAkGMJ6dfz48VmzZg3o1SCkqDroPwKG0MKFC7dv397V1TVjxozdu3erqpqUlHTw4EFVVSsqKhRFqa+v1/YsKysbM2ZMT09PRUWFwWBoamrSthcXF0dFRamqWlVVZTAYnPvbbDaDwWC1Wvs9b0lJSVJS0mBfHQZVKK79UBwznIaqXqmqumXLlvT09MbGxkG9QAwe/Ws/PNCtGYJMeHj4Sy+9tG7dulWrVjk3Xr58OSoqKiEhQfvRbDY7HI7GxsbLly/Hx8fHxcVp26dPn679obq62mAwzJ4923mE8ePHX7lyZfz48YG6DgDDX+Dr1QsvvLBv377y8vL4+PjBuioEPVolKPfff/8rr7zy0ksvObekpqa2trY2NDRo1ae6ujoyMtJkMqWkpFit1o6OjsjISEVR6urqtP3T0tIMBsOZM2fojQAMqkDWq82bNx84cODo0aOpqamDdkEIAXxWCYqiKDt37ty1a1dLS4v2Y2Zm5ty5cwsLC+12u8ViKSoqWr16dVhY2IwZM2bOnPnaa68pitLR0bFr1y5t/4yMjMWLF69du7a2tlZRlIaGhv3797ufpaenx+FwaJ8zcDgcHR0dAbo8AMNIYOpVQUHBgQMHDh06ZDKZHA4HXxYwktEqQVEUZc6cOffee6/zn40YDIb9+/e3tbWlp6fPnDnzpptuevXVV7WX3n///bKysltuueWOO+644447nEcoKSmZNGlSTk5OdHT03Llzjx8/7n6WN954Y+zYsatWrbJYLGPHjuWBNgA/BKBeWa3W3bt3X7hwwWw2jx07duzYsdnZ2YG5OgQhg/MTT/4EGwyKoug5AoBQFIprPxTHDEA//Wufp0oAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABCtEoAAABC4UM9AAAInKqqqqEeAoAQY1BV1f9gg0FRFD1HABCKQnHta2MGMDLpqVcD8FSJAgQg+JnN5qEeAoCQNABPlQCMTKH1VAkA/KOrVQIAABje+BdwAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrq+gpLvVRoJ/Ps6CebGSBBaXzXCnBwJqFcQ0VOveKoEAAAgNAD/sUlo/WYJefp/02JuDFeh+1s4c3K4ol5BRP/c4KkSAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACAEK0SAACA0PBvlb799ttf//rXJpNp3LhxM2bMeOaZZ/w4yIwZMz788EPJnX/yk5+Ulpb2+1JxcXFOTk5UVFRycrIfw8DACqq5sXHjxuzs7HHjxqWlpW3atKmzs9OPwSDUBdWcpF4FlaCaGyOtXg3zVqm3t/eXv/zlpEmTvv7668bGxtLSUrPZPITjMZlMGzZs2LZt2xCOAZpgmxt2u33Pnj2XLl0qLS0tLS3dunXrEA4GQyLY5iT1KngE29wYcfVK1UH/EQbbpUuXFEX59ttv3V+6evXqkiVLEhISUlJSHnvssdbWVm37tWvX8vPz09LSoqOjZ86cWVFRoapqVlbWwYMHtVcXLly4atWqzs5Om822fv361NRUk8m0fPnyhoYGVVUff/zx0aNHm0ymKVOmrFq1qt9RlZSUJCUlDdY1Dxw995e54d/c0GzZsmXBggUDf80DJ/jvr7vgH3NwzknqVTAIzrmhGQn1apg/VZo0aVJmZub69evffffdmpoa15fy8vJGjx5dWVl58uTJU6dOFRYWattXrFhx8eLFzz77zGq1vvPOO9HR0c6Qixcvzp8///bbb3/nnXdGjx69cuVKi8Vy+vTpmpqa8ePHr1mzRlGU3bt3Z2dn7969u7q6+p133gngtcI3wTw3jh8/PmvWrIG/ZgS3YJ6TGFrBPDdGRL0a2k4tACwWy+bNm2+55Zbw8PBp06aVlJSoqlpRUaEoSn19vbZPWVnZmDFjenp6KisrFUW5cuVKn4NkZWU999xzqampe/bs0bZUVVUZDAbnEWw2m8FgsFqtqqrefPPN2llE+C0tSATh3FBVdcuWLenp6Y2NjQN4pQMuJO5vHyEx5iCck9SrIBGEc0MdMfVq+LdKTi0tLa+88kpYWNhXX331ySefREVFOV/64YcfFEWxWCxlZWXjxo1zj83KykpKSpozZ47D4dC2HD58OCwsbIqL2NjYb775RqX06I4NvOCZG88//7zZbK6urh7Q6xt4oXV/NaE15uCZk9SrYBM8c2Pk1Kth/gacK6PRWFhYOGbMmK+++io1NbW1tbWhoUF7qbq6OjIyUntTtq2trba21j18165dCQkJ9913X1tbm6IoaWlpBoPhzJkz1f9z7dq17OxsRVHCwkZQVoeHIJkbmzdv3rdv39GjR6dMmTIIV4lQEiRzEkEoSObGiKpXw3yR1NXVPf3006dPn25tbW1qanrxxRe7urpmz56dmZk5d+7cwsJCu91usViKiopWr14dFhaWkZGxePHiRx55pLa2VlXVs2fPOqdaZGTkgQMHYmJi7r777paWFm3PtWvXajs0NDTs379f2zM5OfncuXP9jqenp8fhcHR1dSmK4nA4Ojo6ApIG9CPY5kZBQcGBAwcOHTpkMpkcDsew/8e3cBdsc5J6FTyCbW6MuHo1tA+1BpvNZlu3bt306dPHjh0bGxs7f/78jz76SHvp8uXLubm5JpNp4sSJ+fn5drtd297U1LRu3bqUlJTo6Ohbbrnl3Llzqsu/Guju7v7tb3972223NTU1Wa3WgoKCqVOnGo1Gs9n85JNPakc4cuTI9OnTY2Nj8/Ly+ozn9ddfd02+64PTIKTn/jI3fJob165d67MwMzIyApcL3wX//XUX/GMOqjmpUq+CSVDNjRFYrwzOo/jBYDBop/f7CAhmeu4vc2N4C8X7G4pjhjzqFUT0399h/gYcAACAHrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQrRKAAAAQuH6D2EwGPQfBMMScwPBhjkJEeYGRHiqBAAAIGRQVXWoxwAAABCkeKoEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgRKsEAAAgpOvbuvlu05HAv2/eYm6MBKH1rWzMyZGAegURPfWKp0oAAABCA/B/wIXWb5aQp/83LebGcBW6v4UzJ4cr6hVE9M8NnioBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAI0SoBAAAIDdtW6cSJE/fee++ECROioqJuvPHGoqKi1tbWAJy3u7u7oKBgwoQJMTExK1eubG5u7nc3o9FocBEZGdnR0RGA4Y1YQzUfLBbLsmXLTCZTbGzsXXfdde7cuX53Ky4uzsnJiYqKSk5Odt2+Zs0a13lSWloagDEj8KhXcEW9CjbDs1X65z//uWjRoptvvvmzzz6rr6/ft29ffX39mTNnZGJVVe3q6vL71M8///yhQ4dOnjz5/fffX7x4cf369f3uZrFYWv4nNzf3gQceiIyM9Puk8GwI50N+fr7Vaj1//vyVK1cmTpy4dOnSfnczmUwbNmzYtm2b+0uFhYXOqbJkyRK/R4KgRb2CK+pVMFJ10H+EwdDT05OamlpYWNhne29vr6qqV69eXbJkSUJCQkpKymOPPdba2qq9mpWVVVRUdPvtt2dmZpaXl9tstvXr16empppMpuXLlzc0NGi7vfrqq1OmTBk/fvzEiRO3b9/ufvbExMS33npL+3N5eXl4ePi1a9c8jLahoSEyMvLw4cM6r3ow6Lm/wTM3hnY+ZGRk7N27V/tzeXl5WFhYd3e3aKglJSVJSUmuW1avXv3MM8/4e+mDKHjur7zgHDP1aqBQr6hXIgPQ7Qzt6QeD1n2fPn2631fnzZu3YsWK5ubm2traefPmPfroo9r2rKysG264obGxUfvxV7/61QMPPNDQ0NDW1vbII4/ce++9qqqeO3fOaDReuHBBVVWr1frf//63z8Fra2tdT609zT5x4oSH0e7cuXP69Ok6LncQDY/SM4TzQVXVTZs2LVq0yGKx2Gy2hx56KDc318NQ+y09EydOTE1NnTVr1ssvv9zZ2el7AgZF8NxfecE5ZurVQKFeUa9EaJX68cknnyiKUl9f7/5SRUWF60tlZWVjxozp6elRVTUrK+tPf/qTtr2qqspgMDh3s9lsBoPBarVWVlaOHTv2vffea25u7vfU58+fVxSlqqrKuSUsLOzjjz/2MNrMzMydO3f6fpWBMDxKzxDOB23nhQsXatm47rrrampqPAzVvfQcOnTo008/vXDhwv79+1NSUtx/1xwqwXN/5QXnmKlXA4V6pW2nXrnTf3+H4WeVEhISFEW5cuWK+0uXL1+OiorSdlAUxWw2OxyOxsZG7cdJkyZpf6iurjYYDLNnz546derUqVNvuumm8ePHX7lyxWw2FxcX//nPf05OTv7pT3969OjRPsePjo5WFMVms2k/trS09Pb2xsTEvP32285PurnuX15eXl1dvWbNmoG6drgbwvmgquqdd95pNpubmprsdvuyZctuv/321tZW0Xxwt3jx4nnz5k2bNi0vL+/ll1/et2+fnlQgCFGv4Ip6FaSGtlMbDNp7vU899VSf7b29vX268vLy8sjISGdXfvDgQW37999/P2rUKKvVKjpFW1vbH//4x7i4OO39Y1eJiYl/+9vftD8fOXLE83v/y5cvf/DBB327vADSc3+DZ24M4XxoaGhQ3N7g+Pzzz0XHcf8tzdV77703YcIET5caQMFzf+UF55ipVwOFeqVtp165G4BuZ2hPP0j+8Y9/jBkz5rnnnqusrHQ4HGfPns3Pzz9x4kRvb+/cuXMfeuihlpaWurq6+fPnP/LII1qI61RTVfXuu+9esmTJ1atXVVWtr69///33VVX97rvvysrKHA6HqqpvvPFGYmKie+kpKirKysqqqqqyWCwLFixYsWKFaJD19fURERHB+QFJzfAoPeqQzocpU6asW7fOZrO1t7e/8MILRqOxqanJfYTd3d3t7e3FxcVJSUnt7e3aMXt6evbu3VtdXW21Wo8cOZKRkeH8aMKQC6r7Kylox0y9GhDUK+cRqFd90CoJHT9+/O67746NjR03btyNN9744osvav9Y4PLly7m5uSaTaeLEifn5+Xa7Xdu/z1SzWq0FBQVTp041Go1ms/nJJ59UVfXUqVO33XZbTExMXFzcnDlzjh075n7ezs7OJ554IjY21mg0rlixwmaziUa4Y8eOoP2ApGbYlB516ObDmTNnFi9eHBcXFxMTM2/ePNHfNK+//rrrs96oqChVVXt6eu688874+PiIiAiz2fzss8+2tbUNeGb8E2z3V0Ywj5l6pR/1yhlOvepD//01OI/iB+2dSz1HQDDTc3+ZG8NbKN7fUBwz5FGvIKL//g7Dj3UDAAAMFFolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAIVolAAAAoXD9h/D6vw1jxGJuINgwJyHC3IAIT5UAAACEdP0fcAAAAMMbT5UAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEdH1bN99tOhL4981bzI2RILS+lY05ORJQryCip17xVAkAAEBoAP4POD1dPLHBH6tHKF4vsfKxoSgU80ysfKweoXi9xMrH6sFTJQAAACFaJQAAACFaJQAAAKFBaZW6u7sLCgomTJgQExOzcuXK5uZm+diNGzdmZ2ePGzcuLS1t06ZNnZ2dfpx95syZBoOhrq7Op8B///vfc+bMGTNmTEJCwqZNm+QDLRbLsmXLTCZTbGzsXXfdde7cOc/7FxcX5+TkREVFJScn9xm517yJYmXyJop1nt2/vPnE8xg8KyoqSk9Pj4yMjI+Pv++++77//nv52DVr1hhclJaWyscajUbX2MjIyI6ODsnYy5cv5+XlxcfHT5gw4fe//73XQFF+ZPIm2kcmb6JYPXkLZh7y6bUOiGJl6oBoncqsfVGszNr3vI/nte8h1muuRLEyuRLNWz1/v8gQ3V+ZOiCKlakDolzJrH1RrMzaF8XKrH1RrEyuRLEyuRJdl56/X7xQdRAdoaioKDMzs7Ky0mKxzJ8/f8WKFfKxa9euPXbsWGNj44kTJyZPnrx582b5WM327dsXLVqkKEptba18bFlZmdFo/Otf/1pXV1dTU3Ps2DH52AceeOAXv/jFjz/+aLfbV69efeONN3qO/eijj959990dO3YkJSW57iPKm0ysKG8ysRr3vOmZIaJYz2PwHPv5559XVlY2NzdXVVXdf//9OTk58rGrV68uLCxs+Z+uri75WLvd7gzMzc1dvny5fOxtt9324IMP2my2q1evzp0798knn/QcK8qPaLtMrChvMrGivOmvHoEnc72iOiATK6oDrrGidSqz9kWxMmvfc131vPZFsTK5EsXK5Eo0b2Vy5SuZ+yuqAzKxojogkyuZtS+KlVn7oliZtS+KlcmVKFYmV6LrksmVfwalVUpMTHzrrbe0P5eXl4eHh1+7dk0y1tWWLVsWLFggf15VVb/55puMjIwvvvhC8bFVysnJeeaZZzyPRxSbkZGxd+9e7c/l5eVhYWHd3d1eY0tKSvrcTlHeZGJdueZNMrbfvA1U6XHnefxez9vZ2Zmfn3/PPffIx65evdrv++vU0NAQGRl5+PBhydgrV64oilJRUaH9ePDgQaPR2NHR4TVWlB/37T7NjT55k4kV5U1/6Qk8mesV1QGZWFEdEOXKdZ3Kr333WNF2yVif1r5rrHyu3GN9ylWfeetrrmT4tI761AGvsR7qgPz9lVn7olhVYu27x/q69vs9r9dc9Yn1NVf9/l0gnyt5A/8GXF1dXX19/cyZM7UfZ82a1d3d/e233/pxqOPHj8+aNUt+/56ent/97nevvfZadHS0TydyOByff/55T0/PddddFxcXt2jRoq+++ko+PC8vr6SkpL6+vrm5+c033/zNb34zatQonwaghGbeAq+4uDg5OTk6Ovrrr7/++9//7mvs5MmTb7311h07dnR1dflx9rfffjstLe3nP/+55P7OJepkt9t9et9woAxt3kJFgOuAc536sfZFa1xm7bvu4+vad8b6kSvX80rmyn3eDmCd9FsA6oCvNdxDrE9r3z1Wfu33O2bJXDlj5XOlp6b5Q0+f1e8Rzp8/ryhKVVXV/2/HwsI+/vhjmVhXW7ZsSU9Pb2xslDyvqqo7d+5cunSpqqrfffed4stTpdraWkVR0tPTz549a7fbN2zYkJKSYrfbJc9rs9kWLlyovXrdddfV1NTInLdP5+shb15jXfXJm0ysKG96ZojnWL+fKrW1tV29evXYsWMzZ85cu3atfOyhQ4c+/fTTCxcu7N+/PyUlpbCw0Ncxq6qamZm5c+dOn8Z86623Oh8mz5s3T1GUzz77zGvsgD9V6jdvMrGivOmvHoHn9Xo91AGZXInqQL+5cl2nPq19VVwbva599318WvuusT7lyv28krlyn7e+5kqSTzW2Tx2QiRXVAfn7K/mkxD1Wcu27x/q09kVz0muu3GMlc+Xh74LBeKo08K2StoROnz6t/ah95u7EiRMysU7PP/+82Wyurq6WP++FCxcmTZpUV1en+t4qtbS0KIqyY8cO7cf29vZRo0YdPXpUJra3t3f27NkPP/xwU1OT3W7funVrWlqaTJvVb5nuN2/yy9g9b15jPeRtYEuPzPjlz3vs2DGDwdDa2upH7L59+xITE3097+HDhyMiIhoaGnwa88WLF3Nzc5OSktLT07du3aooyvnz573GDtIbcOr/zZuvsa550196As/r9XqoA15jPdQB99g+69SntS+qjTJrv88+Pq39PrE+5apPrE+50jjnrU+5kie/FtzrgEysqA7I31+Zte/5703Pa99zrOe1L4qVyZV7rHyu3K9LExpvwCUnJycmJn755Zfaj6dOnQoPD8/OzpY/wubNm/ft23f06NEpU6bIRx0/fryxsfH66683mUxaK3r99de/+eabMrFGo3HatGnOL/T06Zs9f/zxx//85z8FBQVxcXFRUVFPPfVUTU3N2bNn5Y+gCcW8Da1Ro0b58UanoigRERHd3d2+Ru3Zsyc3N9dkMvkUlZaW9sEHH9TV1VVVVaWmpqakpEybNs3XUw+sAOcthASmDrivU/m1L1rjMmvffR/5te8eK58r91j/aqY2b/XXSZ0GtQ74V8PlY0Vr32ush7XvIdZrrvqN9aNm+l3TfKCnzxIdoaioKCsrq6qqymKxLFiwwKd/AffEE09Mnz69qqqqvb29vb3d/TOwotjW1tZL/3PkyBFFUU6dOiX/Jtqrr75qNpvPnTvX3t7+hz/8YfLkyfJPLKZMmbJu3Tqbzdbe3v7CCy8YjcampiYPsd3d3e3t7cXFxUlJSe3t7Q6HQ9suyptMrChvXmM95E3PDBHFisbvNbazs/PFF1+sqKiwWq1ffPHFrbfempeXJxnb09Ozd+/e6upqq9V65MiRjIyMRx99VH7MqqrW19dHRET0+4Fuz7EnT5784YcfGhsbDxw4kJCQ8Pbbb3uOFeVHtN1rrIe8eY31kDf91SPwZPIsqgMysaI64BorWqcya18UK7P2+91Hcu2Lji+TK1Gs11x5mLcyuRqMuaEK6oBMrKgOyORKZu33Gyu59vuNlVz7Hv6+9porUazXXHm4Lplc+WdQWqXOzs4nnngiNjbWaDSuWLHCZrNJxl67dk35vzIyMuTP6+TrG3Cqqvb29m7ZsiUpKSkmJuaOO+74+uuv5WPPnDmzePHiuLi4mJiYefPmef0XUq+//rrrNUZFRWnbRXnzGushbzLnFeVNz/QSxXodgyi2q6vrvvvuS0pKioiImDp16saNG+XnVU9Pz5133hkfHx8REWE2m5999tm2tjb5MauqumPHjunTp/f7kufYXbt2JSYmjh49Ojs7u7i42GusKD+i7V5jPeTNa6yHvOmZG0NFJs+iOiATK6oDzlgP69Tr2hfFyqx9mboqWvseYr3mykOs11x5mLcyddJXMvdXFdQBmVhRHZDJlde1L4qVWfuiWJm173leec6Vh1ivufJwXTJ10j8G51H8ELr/bR6xxBI7VLFDJRRzRSyxxA5trIb/2AQAAECIVgkAAECIVgkAAECIVgkAAEBoAD7WjeFNz8foMLyF4se6MbxRryDCx7oBAAAGha6nSgAAAMMbT5UAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACEaJUAAACE/h82xQH7rLtt0wAAAABJRU5ErkJggg==" +/> + +**Hybrid (MPI and OpenMP)** + +In the illustration below the default binding of a Hybrid-job is shown. +In which 8 global ranks are distributed onto 2 nodes with 16 cores each. +Each rank has 4 cores assigned to it. + + #!/bin/bash + #SBATCH --nodes=2 + #SBATCH --tasks-per-node=4 + #SBATCH --cpus-per-task=4 + + export OMP_NUM_THREADS=4 + + srun --ntasks 8 --cpus-per-task $OMP_NUM_THREADS ./application + +\<img alt="" +src="data:;base64,iVBORw0KGgoAAAANSUhEUgAAAvoAAADyCAIAAACzsfbGAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3de1iUdf7/8XsQA+SoDgdhZHA4CaUlpijmYdXooJvrsbxqzXa1dCsPbFlt5qF2O2xbXV52bdtlV25c7iVrhrVXWVaEupJ2gjxUYAIDgjgcZJCDIIf7+8f9a36zjCAwM/eMn3k+/oJ77rnf9z3z5u1r7hnn1siyLAEAAIjLy9U7AAAA4FzEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABCctz131mg0jtoPANccWZZVrsjMATyZPTOHszsAAEBwdp3dUaj/Cg+Aa7n2LAszB/A09s8czu4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9zxXDNmzNBoNF9++aVlSURExPvvv9/3LXz//fcBAQF9Xz8zMzMtLc3f3z8iIqIfOwpACOrPnPXr1ycnJw8ZMiQ6OnrDhg2XL1/ux+5CLMQdjzZ8+PDHH39ctXJarXbdunVbtmxRrSIAt6LyzGlqanrzzTfPnj2blZWVlZW1efNm1UrD3RB3PNqKFSuKi4vfe+8925uqqqoWL14cFham0+keeeSRlpYWZfnZs2dvu+22kJCQG264IS8vz7L+xYsXV69ePXLkyNDQ0Hvuuae2ttZ2m3feeeeSJUtGjhzppMMB4OZUnjk7duyYOnXq8OHD09LSHnjgAeu7w9MQdzxaQEDAli1bnnrqqfb29m43LVy4cPDgwcXFxd9++21+fn5GRoayfPHixTqd7vz58/v37//HP/5hWf/ee+81mUwFBQXl5eXBwcHLly9X7SgAXCtcOHOOHDkyfvx4hx4NrimyHezfAlxo+vTpzz33XHt7++jRo7dv3y7Lcnh4+L59+2RZLiwslCSpurpaWTMnJ8fX17ezs7OwsFCj0Vy4cEFZnpmZ6e/vL8tySUmJRqOxrN/Q0KDRaMxm8xXr7t69Ozw83NlHB6dy1d8+M+ea5qqZI8vypk2bRo0aVVtb69QDhPPY/7fvrXa8gpvx9vZ+8cUXV65cuWzZMsvCiooKf3//0NBQ5VeDwdDa2lpbW1tRUTF8+PChQ4cqy+Pj45UfjEajRqOZMGGCZQvBwcGVlZXBwcFqHQeAa4P6M+fZZ5/dtWtXbm7u8OHDnXVUcHvEHUjz5s175ZVXXnzxRcsSnU7X3NxcU1OjTB+j0ejj46PVaqOiosxmc1tbm4+PjyRJ58+fV9aPjo7WaDTHjx8n3wC4KjVnzpNPPpmdnX3o0CGdTue0A8I1gM/uQJIk6eWXX962bVtjY6Pya0JCwqRJkzIyMpqamkwm08aNG++//34vL6/Ro0ePGzfutddekySpra1t27ZtyvqxsbHp6ekrVqyoqqqSJKmmpmbv3r22VTo7O1tbW5X37FtbW9va2lQ6PABuRp2Zs2bNmuzs7AMHDmi12tbWVv4juicj7kCSJCk1NXXOnDmW/wqh0Wj27t3b0tIyatSocePGjR079tVXX1Vuevfdd3NyclJSUmbOnDlz5kzLFnbv3h0ZGZmWlhYYGDhp0qQjR47YVtmxY4efn9+yZctMJpOfnx8nlgGPpcLMMZvN27dv//nnnw0Gg5+fn5+fX3JysjpHBzeksXwCaCB31mgkSbJnCwCuRa7622fmAJ7J/r99zu4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBedu/CY1GY/9GAKCPmDkA+ouzOwAAQHAaWZZdvQ8AAABOxNkdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADB2fU1g3zZlycY2FcV0BueQP2vsaCvPAEzBz2xZ+ZwdgcAAAjOAReR4IsKRWX/qyV6Q1SufSVNX4mKmYOe2N8bnN0BAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjx486PP/7461//WqvVDhkyZPTo0U888cQANjJ69Oj333+/jyvfdNNNWVlZV7wpMzMzLS3N398/IiJiALsBx3Kr3li/fn1ycvKQIUOio6M3bNhw+fLlAewM3IFb9RUzx624VW942swRPO50dXXdfvvtkZGRJ0+erK2tzcrKMhgMLtwfrVa7bt26LVu2uHAfoHC33mhqanrzzTfPnj2blZWVlZW1efNmF+4MBszd+oqZ4z7crTc8bubIdrB/C8529uxZSZJ+/PFH25vOnTu3aNGi0NDQqKiohx9+uLm5WVleX1+/evXq6OjowMDAcePGFRYWyrKcmJi4b98+5dbp06cvW7bs8uXLDQ0Nq1at0ul0Wq327rvvrqmpkWX5kUceGTx4sFar1ev1y5Ytu+Je7d69Ozw83FnH7Dj2PL/0xsB6Q7Fp06apU6c6/pgdx1XPL33FzHHGfdXhnr2h8ISZI/jZncjIyISEhFWrVv373/8uLy+3vmnhwoWDBw8uLi7+9ttv8/PzMzIylOVLly4tKys7evSo2Wx+5513AgMDLXcpKyubMmXKLbfc8s477wwePPjee+81mUwFBQXl5eXBwcHLly+XJGn79u3Jycnbt283Go3vvPOOiseK/nHn3jhy5Mj48eMdf8xwPnfuK7iWO/eGR8wc16YtFZhMpieffDIlJcXb2zsuLm737t2yLBcWFkqSVF1drayTk5Pj6+vb2dlZXFwsSVJlZWW3jSQmJj7zzDM6ne7NN99UlpSUlGg0GssWGhoaNBqN2WyWZfnGG29UqvSEV1puwg17Q5blTZs2jRo1qra21oFH6nCuen7pK2aOM+6rGjfsDdljZo74cceisbHxlVde8fLyOnHixOeff+7v72+5qbS0VJIkk8mUk5MzZMgQ2/smJiaGh4enpqa2trYqS7744gsvLy+9lZCQkB9++EFm9Nh9X/W5T29s3brVYDAYjUaHHp/jEXf6wn36ipnjbtynNzxn5gj+Zpa1gICAjIwMX1/fEydO6HS65ubmmpoa5Saj0ejj46O8wdnS0lJVVWV7923btoWGht51110tLS2SJEVHR2s0muPHjxt/UV9fn5ycLEmSl5cHPapicJPeePLJJ3ft2nXo0CG9Xu+Eo4Ta3KSv4IbcpDc8auYI/kdy/vz5xx9/vKCgoLm5+cKFCy+88EJ7e/uECRMSEhImTZqUkZHR1NRkMpk2btx4//33e3l5xcbGpqenP/jgg1VVVbIsnzp1ytJqPj4+2dnZQUFBd9xxR2Njo7LmihUrlBVqamr27t2rrBkREVFUVHTF/ens7GxtbW1vb5ckqbW1ta2tTZWHAVfgbr2xZs2a7OzsAwcOaLXa1tZW4f9TqKjcra+YOe7D3XrD42aOa08uOVtDQ8PKlSvj4+P9/PxCQkKmTJny0UcfKTdVVFQsWLBAq9WOGDFi9erVTU1NyvILFy6sXLkyKioqMDAwJSWlqKhItvokfEdHx29/+9uJEydeuHDBbDavWbMmJiYmICDAYDCsXbtW2cLBgwfj4+NDQkIWLlzYbX/eeOMN6wff+gSmG7Ln+aU3+tUb9fX13f4wY2Nj1Xss+s9Vzy99xcxxxn3V4Va94YEzR2PZygBoNBql/IC3AHdmz/NLb4jNVc8vfSU2Zg56Yv/zK/ibWQAAAMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAILztn8TymXZAVv0BpyBvkJP6A30hLM7AABAcBpZll29DwAAAE7E2R0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAMERdwAAgODs+lZlvr/SEwzsm5noDU+g/rd20VeegJmDntgzczi7AwAABOeAa2bxvcyisv/VEr0hKte+kqavRMXMQU/s7w3O7gAAAMERdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAghM27uTl5c2ZM2fYsGH+/v5jxozZuHFjc3OzCnU7OjrWrFkzbNiwoKCge++99+LFi1dcLSAgQGPFx8enra1Nhd3zWK7qB5PJtGTJEq1WGxIScttttxUVFV1xtczMzLS0NH9//4iICOvly5cvt+6TrKwsFfYZA8PMgTVmjrsRM+785z//mTVr1o033nj06NHq6updu3ZVV1cfP368L/eVZbm9vX3Apbdu3XrgwIFvv/32zJkzZWVlq1atuuJqJpOp8RcLFiyYP3++j4/PgIuidy7sh9WrV5vN5tOnT1dWVo4YMWLx4sVXXE2r1a5bt27Lli22N2VkZFhaZdGiRQPeEzgVMwfWmDnuSLaD/Vtwhs7OTp1Ol5GR0W15V1eXLMvnzp1btGhRaGhoVFTUww8/3NzcrNyamJi4cePGW265JSEhITc3t6GhYdWqVTqdTqvV3n333TU1Ncpqr776ql6vDw4OHjFixHPPPWdbPSws7O2331Z+zs3N9fb2rq+v72Vva2pqfHx8vvjiCzuP2hnseX7dpzdc2w+xsbFvvfWW8nNubq6Xl1dHR0dPu7p79+7w8HDrJffff/8TTzwx0EN3Ilc9v+7TV9aYOY7CzGHm9MQBicW15Z1BSdAFBQVXvHXy5MlLly69ePFiVVXV5MmTH3roIWV5YmLiDTfcUFtbq/w6d+7c+fPn19TUtLS0PPjgg3PmzJFluaioKCAg4Oeff5Zl2Ww2f/fdd902XlVVZV1aOaucl5fXy96+/PLL8fHxdhyuE4kxelzYD7Isb9iwYdasWSaTqaGh4b777luwYEEvu3rF0TNixAidTjd+/PiXXnrp8uXL/X8AnIK4Y42Z4yjMHGZOT4g7V/D5559LklRdXW17U2FhofVNOTk5vr6+nZ2dsiwnJia+/vrryvKSkhKNRmNZraGhQaPRmM3m4uJiPz+/PXv2XLx48YqlT58+LUlSSUmJZYmXl9fHH3/cy94mJCS8/PLL/T9KNYgxelzYD8rK06dPVx6NpKSk8vLyXnbVdvQcOHDgyy+//Pnnn/fu3RsVFWX7etFViDvWmDmOwsxRljNzbNn//Ar42Z3Q0FBJkiorK21vqqio8Pf3V1aQJMlgMLS2ttbW1iq/RkZGKj8YjUaNRjNhwoSYmJiYmJixY8cGBwdXVlYaDIbMzMy///3vERER06ZNO3ToULftBwYGSpLU0NCg/NrY2NjV1RUUFPTPf/7T8skv6/Vzc3ONRuPy5csddeyw5cJ+kGV59uzZBoPhwoULTU1NS5YsueWWW5qbm3vqB1vp6emTJ0+Oi4tbuHDhSy+9tGvXLnseCjgJMwfWmDluyrVpyxmU903/+Mc/dlve1dXVLVnn5ub6+PhYkvW+ffuU5WfOnBk0aJDZbO6pREtLy/PPPz906FDlvVhrYWFhO3fuVH4+ePBg7++j33333ffcc0//Dk9F9jy/7tMbLuyHmpoayeaNhmPHjvW0HdtXWtb27NkzbNiw3g5VRa56ft2nr6wxcxyFmaMsZ+bYckBicW15J/nggw98fX2feeaZ4uLi1tbWU6dOrV69Oi8vr6ura9KkSffdd19jY+P58+enTJny4IMPKnexbjVZlu+4445FixadO3dOluXq6up3331XluWffvopJyentbVVluUdO3aEhYXZjp6NGzcmJiaWlJSYTKapU6cuXbq0p52srq6+7rrr3PMDgwoxRo/s0n7Q6/UrV65saGi4dOnSs88+GxAQcOHCBds97OjouHTpUmZmZnh4+KVLl5RtdnZ2vvXWW0aj0Ww2Hzx4MDY21vI2v8sRd7ph5jgEM8eyBWZON8SdHh05cuSOO+4ICQkZMmTImDFjXnjhBeUD8BUVFQsWLNBqtSNGjFi9enVTU5OyfrdWM5vNa9asiYmJCQgIMBgMa9eulWU5Pz9/4sSJQUFBQ4cOTU1NPXz4sG3dy5cvP/rooyEhIQEBAUuXLm1oaOhpD//617+67QcGFcKMHtl1/XD8+PH09PShQ4cGBQVNnjy5p39p3njjDetzrv7+/rIsd3Z2zp49e/jw4dddd53BYHjqqadaWloc/sgMDHHHFjPHfswcy92ZOd3Y//xqLFsZAOVdQHu2AHdmz/NLb4jNVc8vfSU2Zg56Yv/zK+BHlQEAAKwRdwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwXnbv4mrXmEVHovegDPQV+gJvYGecHYHAAAIzq5rZgEAALg/zu4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARn17cq8/2VnmBg38xEb3gC9b+1i77yBMwc9MSemcPZHQAAIDgHXDPLVa/wqKtOXXt42mPlaXVdxdMeZ0+raw9Pe6w8ra49OLsDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACA44g4AABCcw+KO2Wz29vaOiYnR6/V/+MMf+v6f8o1G4+zZs3u69cMPPzQYDDExMZmZmWrWnT9/fkhIyKJFi3pawRl1S0tLZ86cGRUVlZSU9Mknn6hWt6WlJSUlRafT6fX6bdu29XGDfUdv2F9X1N6wh5OeX0mSWlpa9Hr9unXr1Kzr7++v0+l0Ot3ixYvVrHv27NmZM2eGhYUlJSW1traqU7egoED3C29v77y8vD5us4/oDYfUFa03ZDtYb6G+vj4qKkqW5dbW1gkTJnz88cd93EhpaemsWbOueFN7e7vBYDAajTU1NdHR0Q0NDerUlWU5Nzc3Ozt74cKF1gudXbe4uPjo0aOyLJ86dSo8PLyzs1Oduh0dHefPn5dlua6uLjIyUvm5W93+ojccW1ek3rCHCs+vLMsbN25cvHjx2rVr1ayr1+ttF6pQd/bs2Tt27JBluby8vL29XbW6ipqamhEjRnR0dNjW7S96w+F1hekNhePfzPLx8Zk4ceKZM2ckSWpra5s1a1ZKSsq4ceMOHTokSZLRaExNTX3ooYduvfXWRx991PqOeXl5kydPrqmpsSz5+uuvExIS9Hq9VqudMWNGTk6OOnUlSZoxY0ZgYKDKx2swGCZNmiRJ0vXXXy9JUnNzszp1Bw0aFB4eLklSR0dHQECAn59fXw58AOgNesMZHPv8lpSU/Pjjj3feeafKdV1yvKWlpUajccWKFZIkjRw50tu7t+/Zd8bxvvfee3fdddegQYMG9lBcFb1Bb/x/9mQl6y1YUt7FixfHjh2bm5sry3JnZ2d9fb0sy1VVVWlpabIsl5aWBgcH19TUyLI8bdq0kpISJeXl5eWlpqaaTCbr7b/77ru///3vlZ//9Kc/bd++XZ26is8++6wvr+AdXleW5U8//XTKlClq1m1oaIiOjh40aNAbb7xxxbr9RW84o64sRG/YQ4XjXbhwYWFh4c6dO6/6Ct6xdQMCAgwGw/jx4z/55BPV6n766aczZsyYP3/+TTfdtHnzZjWPVzFz5sycnJwr1u0vesOxdUXqjf+3Bbvu/L+HPWjQIL1ef9111y1btkxZ2NXV9fTTT6elpU2fPj04OFiW5dLS0mnTpim3rly5Mjc3t7S0VK/XjxkzxnKe3KKP/6Q5vK7iqv+kOaluWVlZUlLSTz/9pHJd5V6jRo0qLy+3rdtf9Aa94QzOPt5PPvlk/fr1siz3/k+aMx5no9Eoy3J+fn5kZGRdXZ06dT/++GNfX9/CwsJLly5NnTrV8maEOn1lMpkiIyMt71bI7j1z6A3Vjld2dG8oHPlmVkREhNFoLCsr++qrr3744QdJkvbv319cXHzo0KGDBw/6+voqqw0ePFj5wcvLq6OjQ5KksLAwPz+/EydOdNtgZGTkuXPnlJ8rKysjIyPVqeuq45UkyWw233XXXdu3bx89erSadRUxMTGpqamnTp3q/4NxFfQGveEMDj/eY8eO7dmzJyYm5rHHHnv77befffZZdepKkqTX6yVJGjduXHJy8unTp9WpGxUVlZiYmJiY6Ovre+utt548eVK145Uk6b333ps3b56T3smiN+iNbhz/2Z2IiIgtW7Zs3bpVkqT6+nqDweDt7f3111+bTKae7hIUFPTBBx889thj33zzjfXyiRMnFhUVlZeX19XV5ebm9v6BeQfW7RcH1r18+fKCBQvWr18/a9YsNetWVVUp0aGiouLYsWPJyclXrT4w9Aa94QwOPN7NmzdXVFQYjca//e1vv/vd7zZt2qRO3bq6ugsXLkiSVFRUdOrUqdjYWHXq3nDDDV1dXRUVFZ2dnf/973+TkpLUqavYs2fPkiVLeqloP3qD3rBwyvfuLF68+MSJE4WFhfPmzfv666+XLl36r3/9Kzo6upe7REREZGdnP/DAA0VFRZaF3t7er7322owZM1JSUrZu3RoUFKROXUmSbrvttqVLl+7fv1+n0xUUFKhT9/PPPz98+PDTTz+t/B88o9GoTt26urrZs2dHRUXNmjXrz3/+s/JKwknoDXrDGRz4/Lqk7tmzZ1NTU6Oion7zm9+8/vrroaGh6tTVaDTbtm1LT09PSkq6/vrr586dq05dSZJMJtPp06enTZvWe0X70Rv0hkJjeUtsIHfWaCRJsmcL1BW17rW4z9SlLnWv3brX4j5TV826fKsyAAAQHHEHAAAIjrgDAAAEp17c6eUKR1e9CNGAlfZ8pSEVLgbU09VVrnoBFHv0dJUTZ1+kxh5/+ctf4uPj4+Li1q9fb3tTQkJCQkLCvn377Kxi22b96skBd2m3O/bSk7Yr29OlV9zhnnrSdmWndqk6mDkWzJxumDk9rSzyzLHnS3v6vgXbKxw1NDR0dXUpt17xIkQOqWt7pSFL3Z4uBuSQugrrq6tYH+8VL4DiqLrdrnJiXVfR7UIkjqo74PuePXtWr9e3tLS0t7enpKR88803ln3+7rvvbrrppkuXLtXV1SmT1J663dqsvz3Ze5f2vW4vPWm78lW7tO91FT31pO3KvXep/dNjYJg5vWPm9GVNZo5nzhyVzu7YXuFo7NixlZWVyq19vwhRf9leachS19kXA+p2dRXr43WeUpurnNjWdfZFavorICDA19e3ra1NuQTd8OHDLftcWFiYmprq6+s7bNiwkSNHHj582J5C3dqsvz054C7tdsdeetJ2ZXu61HaHe+lJ5/0Nugozh5nTE2aOZ84cleLOuXPnoqKilJ91Ol1lZWVWVtZVvz/AgT777LO4uLjAwEDruhcvXtTr9ZGRkevXr7/qF7f014YNG55//nnLr9Z16+rqYmNjb7755gMHDji26JkzZ3Q63YIFC8aNG7dly5ZudRUqfLVXv4SEhGRkZERHR0dGRs6bN2/UqFGWfR4zZsyRI0caGxvPnz+fn5/v2Nntnj1py4Fd2ktP2nJel6rDPZ9fZo47YOZ45szp7RqnTqWETXWUl5evXbs2Ozu7W92goKCysjKj0Thz5sw5c+aMHDnSURUPHDgQHR2dmJh49OhRZYl13VOnTun1+oKCgrlz5548eXLYsGGOqtvZ2Xns2LHvv/9er9enp6dPmjTp9ttvt16hurq6sLBw+vTpjqpov/Ly8ldffbWkpMTX1/dXv/rV3LlzLY/VmDFjVq1aNX369IiIiLS0tN4vyWs/d+hJW47q0t570pbzutRV3OH5Zea4A2aOZ84clc7u9PEKR85w1SsNOeNiQL1fXaUvF0AZmKte5cSpF6kZmIKCgptvvlmr1QYEBMycOfOrr76yvvWRRx7Jz8/fv39/fX19XFycA+u6c0/asr9L+3jFHwvndak63Pn5Zea4FjOnL8SbOSrFHdsrHG3evNlsNju7ru2Vhix1nXoxINurq1jq9usCKP1le5WTbo+zu51VliQpPj7+m2++aWpqamtrO3z4cEJCgvU+l5WVSZL04Ycfms3m1NRUB9Z1w5605cAu7aUnbTm1S9Xhhs8vM8dNMHM8dObY8znnfm3hgw8+GDVqVHR09M6dO2VZHjlyZGNjo3JTenq6Vqv18/OLiorKz893YN2PPvpo0KBBUb8oLS211D158mRSUlJkZGRCQsKuXbv6srUBPGI7d+5UPpFuqVtQUBAXFxcZGTl69Oi9e/c6vO4XX3yRlJQUHx+/bt06+X8f5/Pnz0dGRnZ2dvZxU/Z0SL/u+/zzz8fFxcXGxmZkZMj/u88TJ04MCwu7+eabT506ZWdd2zbrV0/23qV9r9tLT9qufNUu7dfxKmx70nblq3ap/dNjYJg5V8XM6QtmjgfOHPXijrWioqJHH32UuqLWtee+nvZYeVpdO11zx0tdderac19Pe6w8ra4FlwilrlPqXov7TF3qUvfarXst7jN11azLRSQAAIDgiDsAAEBwxB0AACA44g4AABAccQcAAAiOuAMAAARH3AEAAIIj7gAAAME54GsGITZ7vvILYnPVV41BbMwc9ISvGQQAAOiRXWd3AAAA3B9ndwAAgOCIOwAAQHDEHQAAIDjiDgAAEBxxBwAACI64AwAABEfcAQAAgiPuAAAAwRF3AACA4Ig7AABAcMQdAAAgOOIOAAAQHHEHAAAIjrgDAAAER9wBAACCI+4AAADBEXcAAIDgiDsAAEBwxB0AACC4/wNeW27o5DoAAAACSURBVCEI/r8gawAAAABJRU5ErkJggg==" +/> + +## Editing Jobs + +Jobs that have not yet started can be altered. Using +`scontrol update timelimit=4:00:00 jobid=<jobid>` is is for example +possible to modify the maximum runtime. scontrol understands many +different options, please take a look at the man page for more details. + +## Job and SLURM Monitoring + +On the command line, use `squeue` to watch the scheduling queue. This +command will tell the reason, why a job is not running (job status in +the last column of the output). More information about job parameters +can also be determined with `scontrol -d show job <jobid>` Here are +detailed descriptions of the possible job status: + +| Reason | Long description | +|:-------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------| +| Dependency | This job is waiting for a dependent job to complete. | +| None | No reason is set for this job. | +| PartitionDown | The partition required by this job is in a DOWN state. | +| PartitionNodeLimit | The number of nodes required by this job is outside of its partitions current limits. Can also indicate that required nodes are DOWN or DRAINED. | +| PartitionTimeLimit | The jobs time limit exceeds its partitions current time limit. | +| Priority | One or higher priority jobs exist for this partition. | +| Resources | The job is waiting for resources to become available. | +| NodeDown | A node required by the job is down. | +| BadConstraints | The jobs constraints can not be satisfied. | +| SystemFailure | Failure of the SLURM system, a file system, the network, etc. | +| JobLaunchFailure | The job could not be launched. This may be due to a file system problem, invalid program name, etc. | +| NonZeroExitCode | The job terminated with a non-zero exit code. | +| TimeLimit | The job exhausted its time limit. | +| InactiveLimit | The job reached the system InactiveLimit. | + +In addition, the `sinfo` command gives you a quick status overview. + +For detailed information on why your submitted job has not started yet, +you can use: `whypending` \<span> \<jobid>\</span>. + +## Accounting + +The SLRUM command `sacct` provides job statistics like memory usage, CPU +time, energy usage etc. Examples: + + # show all own jobs contained in the accounting database + sacct + # show specific job + sacct -j <JOBID> + # specify fields + sacct -j <JOBID> -o JobName,MaxRSS,MaxVMSize,CPUTime,ConsumedEnergy + # show all fields + sacct -j <JOBID> -o ALL + +Read the manpage (`man sacct`) for information on the provided fields. + +Note that sacct by default only shows data of the last day. If you want +to look further into the past without specifying an explicit job id, you +need to provide a startdate via the **-S** or **--starttime** parameter, +e.g + + # show all jobs since the beginning of year 2020: + sacct -S 2020-01-01 + +## Killing jobs + +The command `scancel <jobid>` kills a single job and removes it from the +queue. By using `scancel -u <username>` you are able to kill all of your +jobs at once. #HostList + +## Host List + +If you want to place your job onto specific nodes, there are two options +for doing this. Either use -p to specify a host group that fits your +needs. Or, use -w or (--nodelist) with a name node nodes that will work +for you. + +## Job Profiling + +\<a href="%ATTACHURL%/hdfview_memory.png"> \<img alt="" height="272" +src="%ATTACHURL%/hdfview_memory.png" style="float: right; margin-left: +10px;" title="hdfview" width="324" /> \</a> + +SLURM offers the option to gather profiling data from every task/node of +the job. Following data can be gathered: + +- Task data, such as CPU frequency, CPU utilization, memory + consumption (RSS and VMSize), I/O +- Energy consumption of the nodes +- Infiniband data (currently deactivated) +- Lustre filesystem data (currently deactivated) + +The data is sampled at a fixed rate (i.e. every 5 seconds) and is stored +in a HDF5 file. + +**CAUTION**: Please be aware that the profiling data may be quiet large, +depending on job size, runtime, and sampling rate. Always remove the +local profiles from /lustre/scratch2/profiling/${USER}, either by +running sh5util as shown above or by simply removing those files. + +Usage examples: + + # create energy and task profiling data (--acctg-freq is the sampling rate in seconds) + srun --profile=All --acctg-freq=5,energy=5 -n 32 ./a.out + # create task profiling data only + srun --profile=All --acctg-freq=5 -n 32 ./a.out + + # merge the node local files in /lustre/scratch2/profiling/${USER} to single file + # (without -o option output file defaults to job_<JOBID>.h5) + sh5util -j <JOBID> -o profile.h5 + # in jobscripts or in interactive sessions (via salloc): + sh5util -j ${SLURM_JOBID} -o profile.h5 + + # view data: + module load HDFView + hdfview.sh profile.h5 + +More information about profiling with SLURM: + +- SLURM Profiling: + <http://slurm.schedmd.com/hdf5_profile_user_guide.html> +- sh5util: <http://slurm.schedmd.com/sh5util.html> + +## Reservations + +If you want to run jobs, which specifications are out of our job +limitations, you could ask for a reservation +(<hpcsupport@zih.tu-dresden.de>). Please add the following information +to your request mail: + +- start time (please note, that the start time have to be later than + the day of the request plus 7 days, better more, because the longest + jobs run 7 days) +- duration or end time +- account +- node count or cpu count +- partition + +After we agreed with your requirements, we will send you an e-mail with +your reservation name. Then you could see more information about your +reservation with the following command: + + scontrol show res=<reservation name> + e.g. scontrol show res=hpcsupport_123 + +If you want to use your reservation, you have to add the parameter +"--reservation=\<reservation name>" either in your sbatch script or to +your srun or salloc command. + +# SLURM External Links + +- Manpages, tutorials, examples, etc: <http://slurm.schedmd.com/> +- Comparison with other batch systems: + <http://www.schedmd.com/slurmdocs/rosetta.html> + +\</noautolink> diff --git a/twiki2md/root/SystemVenus/RamDiskDocumentation.md b/twiki2md/root/SystemVenus/RamDiskDocumentation.md new file mode 100644 index 000000000..d024d3203 --- /dev/null +++ b/twiki2md/root/SystemVenus/RamDiskDocumentation.md @@ -0,0 +1,59 @@ +## Using parts of the main memory as a temporary file system + +On systems with a very large main memory, it is for some workloads very +attractive to use parts of the main memory as a temporary file system. +This will reduce file access times dramatically and has proven to speed +up applications that are otherwise limited by I/O. + +We provide tools to allow users to create and destroy their own +ramdisks. Currently, this is only allowed on the SGI UV2 (venus). Please +note that the content of the ramdisk will vanish immediatelly when the +ramdisk is destroyed or the machine crashes. Always copy out result data +written to the ramdisk to another location. + +### Creating a ramdisk + +On venus, the creation of ramdisks is only allowed from within an LSF +job. The memory used for the ramdisk will be deducted from the memory +assigned to the LSF job. Thus, the amount of memory available for an LSF +job determines the maximum size of the ramdisk. Per LSF job only a +single ramdisk can be created (but you can create and delete a ramdisk +multiple times during a job). You need to load the corresponding +software module via + + module load ramdisk + +Afterwards, the ramdisk can be created with the command + + make-ramdisk «size of the ramdisk in GB» + +The path to the ramdisk is fixed to `/ramdisks/«JOBID»`. + +### Putting data onto the ramdisk + +The ramdisk itself works like a normal file system or directory. We +provide a script that uses multiple threads to copy a directory tree. It +can also be used to transfer single files but will only use one thread +in this case. It is used as follows + + parallel-copy.sh «source directory or file» «target directory» + +It is not specifically tailored to be used with the ramdisk. It can be +used for any copy process between two locations. + +### Destruction of the ramdisk + +A ramdisk will automatically be deleted at the end of the job. As an +alternative, you can delete your own ramdisk via the command + + kill-ramdisk + +. It is possible, that the deletion of the ramdisk fails. The reason for +this is typically that some process still has a file open within the +ramdisk or that there is still a program using the ramdisk or having the +ramdisk as its current path. Locating these processes, that block the +destruction of the ramdisk is possible via using the command + + lsof +d /ramdisks/«JOBID» + +-- Main.MichaelKluge - 2013-03-22 diff --git a/twiki2md/root/TensorFlow/Keras.md b/twiki2md/root/TensorFlow/Keras.md new file mode 100644 index 000000000..e69c8e7ad --- /dev/null +++ b/twiki2md/root/TensorFlow/Keras.md @@ -0,0 +1,245 @@ +# Keras + +\<span style="font-size: 1em;">This is an introduction on how to run a +Keras machine learning application on the new machine learning partition +of Taurus.\</span> + +\<span style="font-size: 1em;">Keras is a high-level neural network API, +written in Python and capable of running on top of \</span> +[TensorFlow](https://github.com/tensorflow/tensorflow).\<span +style="font-size: 1em;"> In this page, \</span>\<a +href="<https://www.tensorflow.org/guide/keras>" +target="\_top">Keras\</a>\<span style="font-size: 1em;"> will be +considered as a TensorFlow's high-level API for building and training +deep learning models. Keras includes support for TensorFlow-specific +functionality, such as \</span>\<a +href="<https://www.tensorflow.org/guide/keras#eager_execution>" +target="\_top">eager execution\</a>\<span style="font-size: 1em;">, +\</span>\<a href="<https://www.tensorflow.org/api_docs/python/tf/data>" +target="\_blank">tf.data\</a>\<span style="font-size: 1em;"> pipelines +and \</span>\<a href="<https://www.tensorflow.org/guide/estimators>" +target="\_top">estimators\</a>\<span style="font-size: 1em;">.\</span> + +On the machine learning nodes (machine learning partition), you can use +the tools from [IBM Power AI](PowerAI). PowerAI is an enterprise +software distribution that combines popular open-source deep learning +frameworks, efficient AI development tools (Tensorflow, Caffe, etc). + +In machine learning partition (modenv/ml) Keras is available as part of +the Tensorflow library at Taurus and also as a separate module named +"Keras". For using Keras in machine learning partition you have two +options: + +- use Keras as part of the TensorFlow module; +- use Keras separately and use Tensorflow as an interface between + Keras and GPUs. + +**Prerequisites**: To work with Keras you, first of all, need \<a +href="Login" target="\_blank">access\</a> for the Taurus system, loaded +Tensorflow module on ml partition, activated Python virtual environment. +Basic knowledge about Python, SLURM system also required. + +**Aim** of this page is to introduce users on how to start working with +Keras and TensorFlow on the \<a href="HPCDA" target="\_self">HPC-DA\</a> +system - part of the TU Dresden HPC system. + +There are three main options on how to work with Keras and Tensorflow on +the HPC-DA: 1. Modules; 2. JupyterNotebook; 3. Containers. One of the +main ways is using the [Modules +system](RuntimeEnvironment#Module_Environments) and Python virtual +environment. \<span style="font-size: 1em;">Please see the \</span> +[Python page](Python)\<span style="font-size: 1em;"> for the HPC-DA +system.\</span> + +The information about the Jupyter notebook and the **JupyterHub** could +be found \<a href="JupyterHub" target="\_blank">here\</a>. The use of +Containers is described \<a href="TensorFlowContainerOnHPCDA" +target="\_blank">here\</a>. + +Keras contains numerous implementations of commonly used neural-network +building blocks such as layers, \<a +href="<https://en.wikipedia.org/wiki/Objective_function>" +title="Objective function">objectives\</a>, \<a +href="<https://en.wikipedia.org/wiki/Activation_function>" +title="Activation function">activation functions\</a>, \<a +href="<https://en.wikipedia.org/wiki/Mathematical_optimization>" +title="Mathematical optimization">optimizers\</a>, and a host of tools +to make working with image and text data easier. Keras, for example, has +a library for preprocessing the image data. + +\<span style="font-size: 1em;">The core data structure of Keras is a +**model**, a way to organize layers. The Keras functional API is the way +to go for defining as simple (sequential) as complex models, such as +multi-output models, directed acyclic graphs, or models with shared +layers. \</span> + +## Getting started with Keras + +This example shows how to install and start working with TensorFlow and +Keras (using the module system). To get started, import \<a +href="<https://www.tensorflow.org/api_docs/python/tf/keras>" +target="\_blank">tf.keras\</a> as part of your TensorFlow program setup. +tf.keras is TensorFlow's implementation of the [Keras API +specification](https://keras.io/). This is a modified example that we +used for the \<a href="TensorFlow" target="\_blank">Tensorflow +page\</a>. + + srun -p ml --gres=gpu:1 -n 1 --pty --mem-per-cpu=8000 bash + + module load modenv/ml #example output: The following have been reloaded with a version change: 1) modenv/scs5 => modenv/ml + + mkdir python-virtual-environments + cd python-virtual-environments + module load TensorFlow #example output: Module TensorFlow/1.10.0-PythonAnaconda-3.6 and 1 dependency loaded. + which python + python3 -m venv --system-site-packages env #create virtual environment "env" which inheriting with global site packages + source env/bin/activate #example output: (env) bash-4.2$ + module load TensorFlow + python + import tensorflow as tf + from tensorflow.keras import layers + + print(tf.VERSION) #example output: 1.10.0 + print(tf.keras.__version__) #example output: 2.1.6-tf + +As was said the core data structure of Keras is a **model**, a way to +organize layers. In Keras, you assemble *layers* to build *models*. A +model is (usually) a graph of layers. For our example we use the most +common type of model is a stack of layers. The below \<a +href="<https://www.tensorflow.org/guide/keras#model_subclassing>" +target="\_blank">example\</a> of using the advanced model with model +subclassing and custom layers illustrate using TF-Keras API. + + import tensorflow as tf + from tensorflow.keras import layers + import numpy as np + + # Numpy arrays to train and evaluate a model + data = np.random.random((50000, 32)) + labels = np.random.random((50000, 10)) + + # Create a custom layer by subclassing + class MyLayer(layers.Layer): + + def __init__(self, output_dim, **kwargs): + self.output_dim = output_dim + super(MyLayer, self).__init__(**kwargs) + + # Create the weights of the layer + def build(self, input_shape): + shape = tf.TensorShape((input_shape[1], self.output_dim)) + # Create a trainable weight variable for this layer + self.kernel = self.add_weight(name='kernel', + shape=shape, + initializer='uniform', + trainable=True) + super(MyLayer, self).build(input_shape) + # Define the forward pass + def call(self, inputs): + return tf.matmul(inputs, self.kernel) + + # Specify how to compute the output shape of the layer given the input shape. + def compute_output_shape(self, input_shape): + shape = tf.TensorShape(input_shape).as_list() + shape[-1] = self.output_dim + return tf.TensorShape(shape) + + # Serializing the layer + def get_config(self): + base_config = super(MyLayer, self).get_config() + base_config['output_dim'] = self.output_dim + return base_config + + @classmethod + def from_config(cls, config): + return cls(**config) + # Create a model using your custom layer + model = tf.keras.Sequential([ + MyLayer(10), + layers.Activation('softmax')]) + + # The compile step specifies the training configuration + model.compile(optimizer=tf.compat.v1.train.RMSPropOptimizer(0.001), + loss='categorical_crossentropy', + metrics=['accuracy']) + + # Trains for 10 epochs(steps). + model.fit(data, labels, batch_size=32, epochs=10) + +## Running the sbatch script on ML modules (modenv/ml) + +Generally, for machine learning purposes ml partition is used but for +some special issues, SCS5 partition can be useful. The following sbatch +script will automatically execute the above Python script on ml +partition. If you have a question about the sbatch script see the +article about \<a href="BindingAndDistributionOfTasks" +target="\_blank">SLURM\</a>. Keep in mind that you need to put the +executable file (Keras_example) with python code to the same folder as +bash script or specify the path. + + #!/bin/bash + #SBATCH --mem=4GB # specify the needed memory + #SBATCH -p ml # specify ml partition + #SBATCH --gres=gpu:1 # use 1 GPU per node (i.e. use one GPU per task) + #SBATCH --nodes=1 # request 1 node + #SBATCH --time=00:05:00 # runs for 5 minutes + #SBATCH -c 16 # how many cores per task allocated + #SBATCH -o HLR_Keras_example.out # save output message under HLR_${SLURMJOBID}.out + #SBATCH -e HLR_Keras_example.err # save error messages under HLR_${SLURMJOBID}.err + + module load modenv/ml + module load TensorFlow + + python Keras_example.py + + ## when finished writing, submit with: sbatch <script_name> + +Output results and errors file you can see in the same folder in the +corresponding files after the end of the job. Part of the example +output: + + ...... + Epoch 9/10 + 50000/50000 [==============================] - 2s 37us/sample - loss: 11.5159 - acc: 0.1000 + Epoch 10/10 + 50000/50000 [==============================] - 2s 37us/sample - loss: 11.5159 - acc: 0.1020 + +## Tensorflow 2 + +\<a +href="<https://blog.tensorflow.org/2019/09/tensorflow-20-is-now-available.html>" +target="\_top">TensorFlow 2.0\</a> is a significant milestone for the +TensorFlow and the community. There are multiple important changes for +users. + +Tere are a number of TensorFlow 2 modules for both ml and scs5 +partitions in Taurus (2.0 (anaconda), 2.0 (python), 2.1 (python)) +(11.04.20). Please check\<a href="SoftwareModulesList" target="\_blank"> +the software modules list\</a> for the information about available +modules. + +%RED%Note<span class="twiki-macro ENDCOLOR"></span> Tensorflow 2 of the +current version is loading by default as a Tensorflow module. + +TensorFlow 2.0 includes many API changes, such as reordering arguments, +renaming symbols, and changing default values for parameters. Thus in +some cases, it makes code written for the TensorFlow 1 not compatible +with TensorFlow 2. However, If you are using the high-level APIs +**(tf.keras)** there may be little or no action you need to take to make +your code fully TensorFlow 2.0 \<a +href="<https://www.tensorflow.org/guide/migrate>" +target="\_blank">compatible\</a>. It is still possible to run 1.X code, +unmodified ( \<a +href="<https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md>" +target="\_top">except for contrib\</a>), in TensorFlow 2.0: + + import tensorflow.compat.v1 as tf + tf.disable_v2_behavior() #instead of "import tensorflow as tf" + +To make the transition to TF 2.0 as seamless as possible, the TensorFlow +team has created the \<a +href="<https://www.tensorflow.org/guide/upgrade>" +target="\_top">`tf_upgrade_v2`\</a> utility to help transition legacy +code to the new API. + +## F.A.Q: diff --git a/twiki2md/root/TermsOfUse.md b/twiki2md/root/TermsOfUse.md new file mode 100644 index 000000000..92a0fa4c2 --- /dev/null +++ b/twiki2md/root/TermsOfUse.md @@ -0,0 +1,54 @@ +# Nutzungsbedingungen / Terms Of Use + +**Attention:** Only the German version of the Terms of Use is binding. + +These new Terms of Use are valid from April 1, 2018: +[HPC-Nutzungsbedingungen_20180305.pdf](%ATTACHURL%/HPC-Nutzungsbedingungen_20180305.pdf?t=1520317028) + +**The key points are are:** \* For support reasons, we store your +contact data according to our IDM. (Will be anonymized at least 15 +months after the cancellation of the HPC login.) The data of the HPC +project (incl. contact of project leader) will be kept for further +reference. \*Our HPC systems may only be used according to the project +description. \* Responsibilities for the project leader: \* She has to +assign a team member with an HPC login as the technical project +administrator. She can do this herself if she has a login at our +systems. \* The project leader or the administrator will have to +add/remove members to their project. She has access to accounting data +for her project. \* These issues cover the data storage in our systems: +\* Please work in th escratch file systems. \* Upon request, the project +leader or the administrator can be given access to a user's directory. +\* The scratch file systems (/tmp, /scratch, /fastfs, /fasttemp) are for +temporary use only. After a certain time, files may be removed +automatically (for /tmp after 7 days, parallel scratch: after 100 days). +\* Before a user leaves a project the leader/administrator has to store +away worthy data. For this, the storage services of ZIH (long term +storage, intermediate archive) can be used. \* Project termination +(**new**) \* At project's end, jobs cannot be submitted and started any +longer. \* Logins are valid for 30 more days for saving data. \* Hundred +days after project termination, it's files will be deleted in the HPC +file systems. \* The HPC user agrees to follow the instructions and +hints of the support team. In case of non-compliance, she can be +disabled for the batch system or banned from the system. + +- Working with logs and HPC performance data (**new**)\<br /> + - For HPC related research ZIH will collect and analyze + performance data. Anonymized, it might be shared with research + partners. + - Log data will be kept for long term analyzes. + +These key points are only a brief summary. If in doubt, please consult +the German original. + +# Historie / History + +| Valid | Document | +|:--------------------------------|:------------------------------------------------------------------------------------------------------| +| 1 April 2018 - | [HPC-Nutzungsbedingungen_20180305.pdf](%ATTACHURL%/HPC-Nutzungsbedingungen_20180305.pdf?t=1520317028) | +| 1 October 2016 - 31 March 2018 | [HPC-Nutzungsbedingungen_20160901.pdf](%ATTACHURL%/HPC-Nutzungsbedingungen_20160901.pdf) | +| 5 June 2014 - 30 September 2016 | [HPC-Nutzungsbedingungen_20140606.pdf](%ATTACHURL%/HPC-Nutzungsbedingungen_20140606.pdf) | + +-- Main.MatthiasKraeusslein - 2016-09-01 + +- [Terms-of-use-HPC-20180305-engl.pdf](%ATTACHURL%/Terms-of-use-HPC-20180305-engl.pdf): + Not binding translation in english\</verbatim> diff --git a/twiki2md/root/Unicore_access/UNICORERestAPI.md b/twiki2md/root/Unicore_access/UNICORERestAPI.md new file mode 100644 index 000000000..02cc0bf61 --- /dev/null +++ b/twiki2md/root/Unicore_access/UNICORERestAPI.md @@ -0,0 +1,20 @@ +# UNICORE access via REST API + +**%RED%The UNICORE support has been abandoned and so this way of access +is no longer available.%ENDCOLOR%** + +Most of the UNICORE features are also available using its REST API. + +This API is documented here: + +<https://sourceforge.net/p/unicore/wiki/REST_API/> + +Some useful examples of job submission via REST are available at: + +<https://sourceforge.net/p/unicore/wiki/REST_API_Examples/> + +The base address for the Taurus system at the ZIH is: + +<https://unicore.zih.tu-dresden.de:8080/TAURUS/rest/core> + +-- Main.AlvaroAguilera - 2017-02-01 diff --git a/twiki2md/root/VenusOpen.md b/twiki2md/root/VenusOpen.md new file mode 100644 index 000000000..cfee2b385 --- /dev/null +++ b/twiki2md/root/VenusOpen.md @@ -0,0 +1,9 @@ +# Venus open to HPC projects + +The new HPC server [Venus](SystemVenus) is open to all HPC projects +running on Mars with a quota of 20000 CPU h for testing the system. +Projects without access to Mars have to apply for the new resorce. + +To increase the CPU quota beyond this limit, a follow-up (but full) +proposal is needed. This should be done via the new project management +system. diff --git a/twiki2md/root/WebCreateNewTopic/ProjectManagement.md b/twiki2md/root/WebCreateNewTopic/ProjectManagement.md new file mode 100644 index 000000000..1b3a17b27 --- /dev/null +++ b/twiki2md/root/WebCreateNewTopic/ProjectManagement.md @@ -0,0 +1,118 @@ + + +## Project management + +The HPC project leader has overall responsibility for the project and +for all activities within his project on ZIH's HPC systems. In +particular he shall: + +- add and remove users from the project, +- update contact details of th eproject members, +- monitor the resources his project, +- inspect and store data of retiring users. + +For this he can appoint a *project administrator* with an HPC account to +manage technical details. + +The front-end to the HPC project database enables the project leader and +the project administrator to + +- add and remove users from the project, +- define a technical administrator, +- view statistics (resource consumption), +- file a new HPC proposal, +- file results of the HPC project. + +## Access + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="password" width="100">%ATTACHURLPATH%/external_login.png</span> + +\<div style="text-align: justify;"> The entry point to the project +management system is \<a +href="<https://hpcprojekte.zih.tu-dresden.de/managers>" +target="\_blank"><https://hpcprojekte.zih.tu-dresden.de/managers>\</a>. +The project leaders of an ongoing project and their accredited admins +are allowed to login to the system. In general each of these persons +should possess a ZIH login at the Technical University of Dresden, with +which it is possible to log on the homepage. In some cases, it may +happen that a project leader of a foreign organization do not have a ZIH +login. For this purpose, it is possible to set a local password: " +[Passwort +vergessen](https://hpcprojekte.zih.tu-dresden.de/managers/members/missingPassword)". +\</div> \<br style="clear: both;" /> <span class="twiki-macro IMAGE" +type="frame" align="right" caption="password reset" +width="100">%ATTACHURLPATH%/password.png</span> \<div style="text-align: +justify;"> On the 'Passwort vergessen' page, it is possible to reset the +passwords of a 'non-ZIH-login'. For this you write your login, which +usually corresponds to your email address, in the field and click on +'zurcksetzen'. Within 10 minutes the system sends a signed e-mail from +<hpcprojekte@zih.tu-dresden.de> to the registered e-mail address. this +e-mail contains a link to reset the password. \</div> \<br style="clear: +both;" /> + +## Projects + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="projects overview" +width="100">%ATTACHURLPATH%/overview.png</span> + +\<div style="text-align: justify;"> After login you reach an overview +that displays all available projects. In each of these projects are +listed, you are either project leader or an assigned project +administrator. From this list, you have the option to view the details +of a project or make a following project request. The latter is only +possible if a project has been approved and is active or was. In the +upper right area you will find a red button to log out from the system. +\</div> \<br style="clear: both;" /> \<br /> <span +class="twiki-macro IMAGE" type="frame" align="right" +caption="project details" +width="100">%ATTACHURLPATH%/project_details.png</span> \<div +style="text-align: justify;"> The project details provide information +about the requested and allocated resources. The other tabs show the +employee and the statistics about the project. \</div> \<br +style="clear: both;" /> + +### manage project members (dis-/enable) + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="project members" width="100">%ATTACHURLPATH%/members.png</span> +\<div style="text-align: justify;"> The project members can be managed +under the tab 'employee' in the project details. This page gives an +overview of all ZIH logins that are a member of a project and its +status. If a project member marked in green, it can work on all +authorized HPC machines when the project has been approved. If an +employee is marked in red, this can have several causes: + +- he was manually disabled by project managers, project administrator + or an employee of the ZIH +- he was disabled by the system because his ZIH login expired +- his confirmation of the current hpc-terms is missing + +You can specify a user as an administrator. This user can then access +the project managment system. Next, you can disable individual project +members. This disabling is only a "request of disabling" and has a time +delay of 5 minutes. An user can add or reactivate himself, with his +zih-login, to a project via the link on the end of the page. To prevent +misuse this link is valid for 2 weeks and will then be renewed +automatically. \</div> \<br style="clear: both;" /> + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="add member" width="100">%ATTACHURLPATH%/add_member.png</span> + +\<div style="text-align: justify;"> The link leads to a page where you +can sign in to a Project by accepting the term of use. You need also an +valid ZIH-Login. After this step it can take 1-1,5 h to transfer the +login to all cluster nodes. \</div> \<br style="clear: both;" /> + +### statistic + +<span class="twiki-macro IMAGE" type="frame" align="right" +caption="project statistic" width="100">%ATTACHURLPATH%/stats.png</span> + +\<div style="text-align: justify;"> The statistic is located under the +tab 'Statistik' in the project details. The data will updated once a day +an shows used CPU-time and used disk space of an project. Following +projects shows also the data of the predecessor. \</div> + +\<br style="clear: both;" /> diff --git a/twiki2md/root/WebCreateNewTopic/SlurmExamples.md b/twiki2md/root/WebCreateNewTopic/SlurmExamples.md new file mode 100644 index 000000000..0dd646a58 --- /dev/null +++ b/twiki2md/root/WebCreateNewTopic/SlurmExamples.md @@ -0,0 +1 @@ +\- Array-Job with Afterok-Dependency and DataMover Usage diff --git a/twiki2md/root/WebCreateNewTopic/WebVNC.md b/twiki2md/root/WebCreateNewTopic/WebVNC.md new file mode 100644 index 000000000..719da7dcf --- /dev/null +++ b/twiki2md/root/WebCreateNewTopic/WebVNC.md @@ -0,0 +1,91 @@ +# WebVNC + +We provide a Singularity container with a VNC setup that can be used as +an alternative to X-Forwarding to start graphical applications on the +cluster. + +It utilizes [noVNC](https://novnc.com) to offer a web-based client that +you can use with your browser, so there's no additional client software +necessary. + +Also, we have prepared a script that makes launching the VNC server much +easier. + +## %RED%new<span class="twiki-macro ENDCOLOR"></span> method with JupyterHub + +**Check out the [new documentation about virtual +desktops](Compendium.VirtualDesktops).** + +The [JupyterHub](Compendium.JupyterHub) service is now able to start a +VNC session based on the Singularity container mentioned here. + +Quickstart: 1 Click here to start a session immediately: \<a +href="<https://taurus.hrsk.tu-dresden.de/jupyter/hub/spawn#/>\~(partition\~'interactive\~cpuspertask\~'2\~mempercpu\~'2583\~environment\~'test)" +target="\_blank"><https://taurus.hrsk.tu-dresden.de/jupyter/hub/spawn#/>\~(partition\~'interactive\~cpuspertask\~'2\~mempercpu\~'2583\~environment\~'test)\</a> +1 Wait for the JupyterLab interface to appear. Then click on the + in +the upper left corner -> New launcher -> WebVNC. + +Steps to get started manually: 1 Visit +<https://taurus.hrsk.tu-dresden.de> to login to JupyterHub. 1 Choose +your (SLURM) parameters for the job on Taurus. 1 Select the "test" +environment in the "advanced" tab. 1 Click on the "Spawn" button. 1 Wait +for the JupyterLab interface to appear. Then click on the WebVNC button. + +## Example usage + +### Step 1<span class="twiki-macro RED"></span>.1<span class="twiki-macro ENDCOLOR"></span>: + +Start the `runVNC` script in our prepared container in an interactive +batch job (here with 4 cores and 2.5 GB of memory per core): + + srun --pty -p interactive --mem-per-cpu=2500 -c 4 -t 8:00:00 singularity exec /scratch/singularity/xfce.sif runVNC + +Of course, you can adjust the batch job parameters to your liking. Note +that the default timelimit in partition interactive is only 30 minutes, +so you should specify a longer one with `-t`. + +The script will automatically generate a self-signed SSL certificate and +place it in your home directory under the name `self.pem`. This path can +be overridden via the --cert parameter to `runVNC`. + +On success, it will print you an URL and a one-time password: + + Note: Certificate file /home/user/self.pem already exists. Skipping generation. + Starting VNC server... + Server started successfully. + Please browse to: https://172.24.146.46:5901/vnc.html + The one-time password is: 71149997 + +### %RED%Step 1.2:<span class="twiki-macro ENDCOLOR"></span> + +<span class="twiki-macro RED"></span> NEW: Since the last security issue +direct access to the compute nodes is not allowed anymore. Therefore you +have to create a tunnel from your laptop or workstation through the +specific compute node and Port as follows. + + ssh -NL <local port>:<compute node>:<remote port> <zih login>@tauruslogin.hrsk.tu-dresden.de + e.g. ssh NL 5901:172.24.146.46:5901 rotscher@tauruslogin.hrsk.tu-dresden.de + +<span class="twiki-macro ENDCOLOR"></span> + +### Step 2: + +%RED%Open your local web-browser and connect to the following URL. + + https://localhost:<local port>/vnc.html (e.g. https://localhost:5901/vnc.html) + +<span class="twiki-macro ENDCOLOR"></span> + +<span class="twiki-macro BLACK"></span> Since you are using a +self-signed certificate and the node does not have a public DNS name, +your browser will not be able to verify it and you will have to add an +exception (via the "Advanced" button).<span +class="twiki-macro ENDCOLOR"></span> + +### Step 3: + +On the website, click the `Connect` button and enter the one-time +password that was previously displayed in order to authenticate. You +will then see an Xfce4 desktop and can start a terminal in there, where +you can use the "ml" or "module" command as usual to load and then run +your graphical applications. Enjoy! diff --git a/twiki2md/root/WebHome.md b/twiki2md/root/WebHome.md new file mode 100644 index 000000000..7d09bf683 --- /dev/null +++ b/twiki2md/root/WebHome.md @@ -0,0 +1,47 @@ +# Foreword + +This compendium is work in progress, since we try to incorporate more +information with increasing experience and with every question you ask +us. We invite you to take part in the improvement of these pages by +correcting or adding useful information or commenting the pages. + +Ulf Markwardt + +# Contents + +- [Introduction](Introduction) +- [Access](Access), [TermsOfUse](TermsOfUse), [login](Login), [project + management](ProjectManagement), [ step-by step + examples](StepByStepTaurus) +- Our HPC Systems + - [Taurus: general purpose HPC cluster (HRSK-II)](SystemTaurus) + - [Venus: SGI Ultraviolet](SystemVenus) + - **[HPC for Data Analytics](HPCDA)** +- **[Data Management](Data Management)**, [WorkSpaces](WorkSpaces) +- [Batch Systems](Batch Systems) +- HPC Software + - [Runtime Environment](Runtime Environment) + - [Available Software](Applications) + - [Custom EasyBuild Environment](Custom EasyBuild Environment) +- [Software Development](Software Development) + - [BuildingSoftware](BuildingSoftware) + - [GPU Programming](GPU Programming) + +<!-- --> + +- [Checkpoint/Restart](CheckpointRestart) +- [Containers](Containers) +- [Further Documentation](Further Documentation) + +<!-- --> + +- [Older Hardware](Hardware) + +# News + +- 2021-05-10 GPU sub-cluster "\<a href="AlphaCentauri" + title="AlphaCentauri">AlphaCentauri\</a>" ready for production +- 2021-03-18 [HPC Introduction - + Slides](%ATTACHURL%/HPC-Introduction.pdf) +- 2021-01-20 new file system /beegfs/global0, introducing [Slurm + features](Slurmfeatures) diff --git a/twiki2md/root/WebHome/Access.md b/twiki2md/root/WebHome/Access.md new file mode 100644 index 000000000..f673950ad --- /dev/null +++ b/twiki2md/root/WebHome/Access.md @@ -0,0 +1,138 @@ +# Project Application for using the High Performance Computers + +In order to use the HPC systems installed at ZIH, an application form +has to be filled in. The project applications are reviewed by the +[Scientific Advisory +Board](https://tu-dresden.de/zih/die-einrichtung/wissenschaftlicher-beirat) +of the ZIH. The approval of the requested resources is always valid for +one year. The project duration for your project can span more than one +year, but the resources will be granted for each year individually. + +**The HPC project manager should hold a professorship (university) or +head a research group. The project manager is called to inform the ZIH +about any changes according the staff of the project (retirements, a +change to another institute).\<br /> (That is also regarding trial +accounts. And also trial accounts have to fill in the application +form.)\<br />** + +It is invariably possible to apply for more/different resources. Whether +additional resources are granted or not depends on the current +allocations and on the availablility of the installed systems. + +The terms of use of the HPC systems are only \<a href="TermsOfUse" +target="\_blank" title="NutzungsbedingungenHPC">available in German\</a> +- at the moment. + +## Online Project Application + +You may also apply for the "Schnupperaccount" (trial account) of up to +43.000 CPU hours for one year (3500 CPUh per month). If you would like +to continue after this, a **complete project application** is required +(see below). + +For obtaining access to the machines, the following forms have to be +filled in: 1 an [online +application](https://hpcprojekte.zih.tu-dresden.de/) form for the +project (one form per project). The data will be stored automatically in +a database.\<br />\<br /> 1 Users/guests at TU Dresden without a +ZIH-login have to fill in the following \<a +href="<http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag/dateien/ZIH_HPC_login_eng.pdf>" +target="\_blank" title="Login High Performance Computers">pdf\</a> +additionally. \<br />TUD-external Users fill please \<a +href="<http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag/dateien/ZIH_HPC_login_externe_eng.pdf>" +target="\_blank" title="HPClogin-externe">this form (pdf)\</a> to get a +login.\<br />Please sign and stamp it and send it by fax to +49 351 +46342328, or by mail to TU Dresden, ZIH - Service Desk, 01062 Dresden, +Germany.\<br />\<br />To add members with a valid ZIH-login to your +project(s), please use this website: +<https://hpcprojekte.zih.tu-dresden.de/managers/> \<br />(Access for +proposers with valid ZIH-login after submission of the online +application (see1.) ) + +** \<u>**subsequent applications / view for project leader:**\</u>\<br +/>** + +Subsequent applications will be neccessary, + +- if the end of project is reached +- if the applied resources won't be sufficient + +The project leader and one person instructed by him, the project +administrator, should use \<a +href="<https://hpcprojekte.zih.tu-dresden.de/managers/>" +target="\_blank">this website\</a>.\<br /> (ZIH-login neccessary)\<br /> +\<br /> At this website you have an overview of your projects, the usage +of resources, you can submit subsequent applications, and you are able +to add staff members to your project. + + + +## Complete Project Application + +Some reference points for an extended description of the application +form are: + +- Presentation of the problem and description of project content (with + references of publications) +- Description of achieved preliminary work, pre-studies with results, + experiences +- Description of target objectives and target cognitions +- Description of physical and mathematical methods or solutions +- An idea of resources you need (time, number of CPUs, parallel use of + CPUs) +- 1-2 figures are helpful to understand the description + + + +#### Here are some templates in German/English and Word/LaTeX. + +Word-template( +[German](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag/dateien/zih-projektantrag-lang.doc), +[English](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag/dateien/zih-application-long.doc)) +LaTeX-template( +[German](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag/dateien/zih-projektantrag-lang.tex), +[English](http://tu-dresden.de/die_tu_dresden/zentrale_einrichtungen/zih/dienste/formulare/projektantrag/dateien/zih-application-long.tex)) + +#### A small petition: + +If you plan to publish a paper with results based on the used CPU hours +of our machines, please insert in the acknowledgement an small part with +thank for the support by the machines of the ZIH/TUD. (see example +below) Please send us a link/reference to the paper if it was puplished. +It will be very helpfull for the next acquirement of compute power. +Thank you very much. + +Two examples: + +- The computations were performed on an Bull Cluster TAURUS at the + Center for Information Services and High Performance Computing (ZIH) + at TU Dresden. +- or We thank the Center for Information Services and High Performance + Computing (ZIH) at TU Dresden for generous allocations of computer + time. + +### + + + +## Access for Lectures/Courses + +You would like to use the clusters at ZIH for your courses? No +problem.\<br />Please initiate a normal project application (see above), +where you can tell us your wishes. In addition we need an application +for an HPC-login for the head of the course (also via link above). + +Boundary conditions: 1 Teaching has uncomplicated access - but is +subordinate to research projects when resource problems occurs. 1 As +with all other projects, access is only possible with online project +application\<br +/><https://tu-dresden.de/zih/hochleistungsrechnen/zugang/projektantrag>. +1 For courses, there is no need to deliver additional detailed project +description.\<br />(In exceptional cases, the ZIH reserves the right to +require a more detailed abstract.) 1 5.000 CPUh / month are possible +without additional communication. 1 More than 5.000h / month should be +possible after consulting the HPCSupport. 1 Post-iterations for 2.+3. +are also possible at any time by arrangement. 1 For immediate job +processing during classroom sessions (lecture, seminar) reservations are +absolutely necessary. \<br />They have to be submitted at least 8 days +before the appointment - via hpcsupport @ zih. tu-dresden. de diff --git a/twiki2md/root/WebHome/Accessibility.md b/twiki2md/root/WebHome/Accessibility.md new file mode 100644 index 000000000..022418cf2 --- /dev/null +++ b/twiki2md/root/WebHome/Accessibility.md @@ -0,0 +1,54 @@ +# Erklrung zur Barrierefreiheit + +Diese Erklrung zur Barrierefreiheit gilt fr die unter +<https://doc.zih.tu-dresden.de> verffentlichte Website der Technischen +Universitt Dresden. + +Als ffentliche Stelle im Sinne des Barrierefreie-Websites-Gesetz +(BfWebG) ist die Technische Universitt Dresden bemht, ihre Websites und +mobilen Anwendungen im Einklang mit den Bestimmungen des +Barrierefreie-Websites-Gesetz (BfWebG) in Verbindung mit der +Barrierefreie-Informationstechnik-Verordnung (BITV 2.0) barrierefrei +zugnglich zu machen. + +## Erstellung dieser Erklrung zur Barrierefreiheit + +Diese Erklrung wurde am 17.09.2020 erstellt und zuletzt am 17.09.2020 +aktualisiert. Grundlage der Erstellung dieser Erklrung zur +Barrierefreiheit ist eine am 17.09.2020 von der TU Dresden durchgefhrte +Selbstbewertung. + +## Stand der Barrierefreiheit + +Es wurde bisher noch kein BITV-Test fr die Website durchgefhrt. Dieser +ist bis 30.11.2020 geplant. + +## Kontakt + +Sollten Ihnen Mngel in Bezug auf die barrierefreie Gestaltung auffallen, +knnen Sie uns diese ber das Formular [Barriere +melden](https://tu-dresden.de/barrierefreiheit/barriere-melden) +mitteilen und im zugnglichen Format anfordern. Alternativ knnen Sie sich +direkt an die Meldestelle fr Barrieren wenden (Koordinatorin: Mandy +Weickert, E-Mail: <barrieren@tu-dresden.de>, Telefon: [+49 351 +463-42022](tel:+49-351-463-42022), Fax: [+49 351 +463-42021](tel:+49-351-463-42021), Besucheradresse: Nthnitzer Strae 46, +APB 1102, 01187 Dresden). + +## Durchsetzungsverfahren + +Wenn wir Ihre Rckmeldungen aus Ihrer Sicht nicht befriedigend +bearbeiten, knnen Sie sich an die Schsische Durchsetzungsstelle wenden: + +**Beauftragter der Schsischen Staatsregierung fr die Belange von +Menschen mit Behinderungen**\<br /> Albertstrae 10\<br /> 01097 Dresden + +Postanschrift: Archivstrae 1, 01097 Dresden\<br /> E-Mail: +<info.behindertenbeauftragter@sk.sachsen.de>\<br /> Telefon: [+49 351 +564-12161](tel:+49-351-564-12161)\<br /> Fax: [+49 351 +564-12169](tel:+49-351-564-12169)\<br /> Webseite: +<https://www.inklusion.sachsen.de> + +\<div id="footer"> \</div> + +-- Main.MatthiasKraeusslein - 2020-09-18 diff --git a/twiki2md/root/WebHome/Applications.md b/twiki2md/root/WebHome/Applications.md new file mode 100644 index 000000000..8c11779dd --- /dev/null +++ b/twiki2md/root/WebHome/Applications.md @@ -0,0 +1,52 @@ +# Installed Applications + +The following applications are available on the HRSK systems. (General +descriptions are taken from the vendor's web site or from +Wikipedia.org.) + +Before running an application you normally have to load a +[module](RuntimeEnvironment#Modules). Please read the instructions given +while loading the module, they might be more up-to-date than this +manual. + +- [Complete List of Modules](SoftwareModulesList) +- [Using Software Modules](RuntimeEnvironment#Modules) + +<!-- --> + +- [Mathematics](Mathematics) +- [Nanoscale Simulations](Nanoscale Simulations) +- [FEM Software](FEMSoftware) +- [Computational Fluid Dynamics](CFD) +- [Deep Learning](DeepLearning) + +<!-- --> + +- [Visualization Tools](Visualization) , \<a + href="DesktopCloudVisualization" title="Remote Rendering on GPU + nodes">Remote Rendering on GPU nodes\</a> +- \<s>[Graphical access using UNICORE](unicore_access)\</s> The + UNICORE support has been abandoned and so this way of access is no + longer available. + +## \<a name="ExternalLicense">\</a>Use of external licenses + +It is possible (please contact the support team first) for our users to +install their own software and use their own license servers, e.g. +FlexLM. The outbound IP addresses from Taurus are: + +- compute nodes: NAT via 141.76.3.193 +- login nodes: 141.30.73.102-141.30.73.105 + +The IT department of the external institute has to open the firewall for +license communications (might be multiple ports) from Taurus and enable +handing-out license to these IPs and login. + +The user has to configure the software to use the correct license +server. This can typically be done by environment variable or file. + +Attention: If you are using software we have installed, but bring your +own license key (e.g. commercial ANSYS), make sure that to substitute +the environment variables we are using as default! (To verify this, run +`printenv|grep licserv` and make sure that you dont' see entries +refering to our ZIH license server.) diff --git a/twiki2md/root/WebHome/BigDataFrameworks:ApacheSparkApacheFlinkApacheHadoop.md b/twiki2md/root/WebHome/BigDataFrameworks:ApacheSparkApacheFlinkApacheHadoop.md new file mode 100644 index 000000000..540cbcdf4 --- /dev/null +++ b/twiki2md/root/WebHome/BigDataFrameworks:ApacheSparkApacheFlinkApacheHadoop.md @@ -0,0 +1,194 @@ +# BIG DATA FRAMEWORKS: APACHE SPARK, APACHE FLINK, APACHE HADOOP + +<span class="twiki-macro RED"></span> **This page is under +construction** <span class="twiki-macro ENDCOLOR"></span> + + + +[Apache Spark](https://spark.apache.org/), [Apache +Flink](https://flink.apache.org/) and [Apache +Hadoop](https://hadoop.apache.org/) are frameworks for processing and +integrating Big Data. These frameworks are also offered as software +[modules](RuntimeEnvironment#Modules) on Taurus for both ml and scs5 +partitions. You could check module availability in [the software module +list](SoftwareModulesList) or by the command: + + ml av Spark + +**Aim** of this page is to introduce users on how to start working with +the frameworks on Taurus in general as well as on the \<a href="HPCDA" +target="\_self">HPC-DA\</a> system. + +**Prerequisites:** To work with the frameworks, you need \<a +href="Login" target="\_blank">access\</a> to the Taurus system and basic +knowledge about data analysis and [SLURM](Slurm). + +\<span style="font-size: 1em;">The usage of big data frameworks is +different from other modules due to their master-worker approach. That +means, before an application can be started, one has to do additional +steps. In the following, we assume that a Spark application should be +started.\</span> + +The steps are: 1 Load the Spark software module 1 Configure the Spark +cluster 1 Start a Spark cluster 1 Start the Spark application + +## Interactive jobs with Apache Spark with the default configuration + +The Spark module is available for both **scs5** and **ml** partitions. +Thus, it could be used for different CPU architectures: Haswell, Power9 +(ml partition) etc. + +Let us assume that 2 nodes should be used for the computation. Use a +`srun` command similar to the following to start an interactive session +using the Haswell partition: + + srun --partition=haswell -N2 --mem=60g --exclusive --time=01:00:00 --pty bash -l #Job submission to haswell nodes with an allocation of 2 nodes with 60 GB main memory exclusively for 1 hour + +The command for different resource allocation on the **ml** partition is +similar: + + srun -p ml -N 1 -n 1 -c 2 --gres=gpu:1 --time=01:00:00 --pty --mem-per-cpu=10000 bash #job submission to ml nodes with an allocation of 1 node, 1 task per node, 2 CPUs per task, 1 gpu per node, with 10000 MB for 1 hour. + +Once you have the shell, load Spark using the following command: + + ml Spark + +Before the application can be started, the Spark cluster needs to be set +up. To do this, configure Spark first using configuration template at +`$SPARK_HOME/conf`: + + source framework-configure.sh spark $SPARK_HOME/conf + +This places the configuration in a directory called +`cluster-conf-<JOB_ID>` in your home directory, where `<JOB_ID>` stands +for the job id of the SLURM job. After that, you can start Spark in the +usual way: + + start-all.sh + +The Spark processes should now be set up and you can start your +application, e. g.: + + spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.4.jar 1000 + +%RED%Note<span class="twiki-macro ENDCOLOR"></span>: Please do not +delete the directory `cluster-conf-<JOB_ID>` while the job is still +running. This may lead to errors. + +## Batch jobs + +Using **srun** directly on the shell blocks the shell and launches an +interactive job. Apart from short test runs, it is **recommended to +launch your jobs in the background using batch jobs**. For that, you can +conveniently put the parameters directly into the job file which you can +submit using **sbatch \[options\] \<job file>**. + +Please use a [batch job](Slurm) similar to the one attached: +[example-spark.sbatch](%ATTACHURL%/example-spark.sbatch). + +## Apache Spark with [Jupyter](JupyterHub) notebook + +There are two general options on how to work with Jupyter notebooks on +Taurus:\<br />There is [jupyterhub](JupyterHub), where you can simply +run your Jupyter notebook on HPC nodes (the preferable way). Also, you +can run a remote jupyter server manually within a sbatch GPU job and +with the modules and packages you need. You can find the manual server +setup [here](DeepLearning). + +### Preparation + +If you want to run Spark in Jupyter notebooks, you have to prepare it +first. This is comparable to the \<a href="JupyterHub#Conda_environment" +title="description for custom environments">description of custom +environments in jupyter.\</a> You start with an allocation: + + srun --pty -n 1 -c 2 --mem-per-cpu 2583 -t 01:00:00 bash -l + +When a node is allocated, install the required package with Anaconda: + + module load Anaconda3<br />cd<br />mkdir user-kernel + + conda create --prefix $HOME/user-kernel/haswell-py3.6-spark python=3.6 #Example output: Collecting package metadata:done Solving environment: done [...] + + conda activate $HOME/user-kernel/haswell-py3.6-spark + + conda install ipykernel #Example output: Collecting package metadata: done Solving environment: done[...] + + python -m ipykernel install --user --name haswell-py3.6-spark --display-name="haswell-py3.6-spark" #Example output: Installed kernelspec haswell-py3.6-spark in [...] + + conda install -c conda-forge findspark + conda install pyspark<br />conda install keras<br /><br />conda deactivate + +You are now ready to spawn a notebook with Spark. + +### Spawning a notebook + +Assuming that you have prepared everything as described above, you can +go to [https://taurus.hrsk.tu-dresden.de/jupyter\<br +/>](https://taurus.hrsk.tu-dresden.de/jupyter)In the tab "Advanced", go +to the field "Preload modules" and select one of the Spark modules.\<br +/>When your jupyter instance is started, check whether the kernel that +you created in the preparation phase (see above) is shown in the top +right corner of the notebook. If it is not already selected, select the +kernel haswell-py3.6-spark. Then, you can set up Spark. Since the setup +in the notebook requires more steps than in an interactive session, we +have created an example notebook that you can use as a starting point +for convenience: [SparkExample.ipynb](%ATTACHURL%/SparkExample.ipynb) + +%RED%Note<span class="twiki-macro ENDCOLOR"></span>: You could work with +simple examples in your home directory but according to the\<a +href="HPCStorageConcept2019" target="\_blank"> storage concept\</a>** +please use \<a href="WorkSpaces" target="\_blank">workspaces\</a> for +your study and work projects**. For this reason, you have to use +advanced options of Jupyterhub and put "/" in "Workspace scope" field. + +## Interactive jobs using a custom configuration + +The script framework-configure.sh is used to derive a configuration from +a template. It takes 2 parameters: + +- The framework to set up (Spark, Flink, Hadoop) +- A configuration template + +Thus, you can modify the configuration by replacing the default +configuration template with a customized one. This way, your custom +configuration template is reusable for different jobs. You can start +with a copy of the default configuration ahead of your interactive +session: + + cp -r $SPARK_HOME/conf my-config-template + +After you have changed my-config-template, you can use your new template +in an interactive job with: + + source framework-configure.sh spark my-config-template + +## Interactive jobs with Spark and Hadoop Distributed File System (HDFS) + +If you want to use Spark and HDFS together (or in general more than one +framework), a scheme similar to the following can be used: + + ml Hadoop <br />ml Spark<br />source framework-configure.sh hadoop $HADOOP_ROOT_DIR/etc/hadoop<br />source framework-configure.sh spark $SPARK_HOME/conf<br />start-dfs.sh<br />start-all.sh + +Note: It is recommended to use ssh keys to avoid entering the password +every time to log in to nodes. For the details, please check the +[documentation](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-ssh-configuration-keypairs). + +## FAQ + +Q: Command "\<span>source framework-configure.sh hadoop +$HADOOP_ROOT_DIR/etc/hadoop\</span>" gives the output: "\<span +style="font-size: 1em;">\<span>bash: framework-configure.sh: No such +file or directory\</span>"\</span> + +A: Please try to re-submit or re-run the job and if that doesn't help +re-login to Taurus. + +Q: There are a lot of errors and warnings during the set up of the +session + +A: Please check the work capability on a simple example. The source of +warnings could be ssh etc, and it could be not affecting the frameworks + +Note: If you have questions or need advice, please see +<https://www.scads.de/services> or contact the HPC support. diff --git a/twiki2md/root/WebHome/BuildingSoftware.md b/twiki2md/root/WebHome/BuildingSoftware.md new file mode 100644 index 000000000..33aeaf919 --- /dev/null +++ b/twiki2md/root/WebHome/BuildingSoftware.md @@ -0,0 +1,42 @@ +# Building Software + +While it is possible to do short compilations on the login nodes, it is +generally considered good practice to use a job for that, especially +when using many parallel make processes. Note that starting on December +6th 2016, the /projects file system will be mounted read-only on all +compute nodes in order to prevent users from doing large I/O there +(which is what the /scratch is for). In consequence, you cannot compile +in /projects within a job anymore. If you wish to install software for +your project group anyway, you can use a build directory in the /scratch +file system instead: + +Every sane build system should allow you to keep your source code tree +and your build directory separate, some even demand them to be different +directories. Plus, you can set your installation prefix (the target +directory) back to your /projects folder and do the "make install" step +on the login nodes. + +For instance, when using CMake and keeping your source in /projects, you +could do the following: + + # save path to your source directory: + export SRCDIR=/projects/p_myproject/mysource + + # create a build directory in /scratch: + mkdir /scratch/p_myproject/mysoftware_build + + # change to build directory within /scratch: + cd /scratch/p_myproject/mysoftware_build + + # create Makefiles: + cmake -DCMAKE_INSTALL_PREFIX=/projects/p_myproject/mysoftware $SRCDIR + + # build in a job: + srun --mem-per-cpu=1500 -c 12 --pty make -j 12 + + # do the install step on the login node again: + make install + +As a bonus, your compilation should also be faster in the parallel +/scratch file system than it would be in the comparatively slow +NFS-based /projects file system. diff --git a/twiki2md/root/WebHome/CXFSEndOfSupport.md b/twiki2md/root/WebHome/CXFSEndOfSupport.md new file mode 100644 index 000000000..4e82e0e7b --- /dev/null +++ b/twiki2md/root/WebHome/CXFSEndOfSupport.md @@ -0,0 +1,46 @@ +# Changes in the CXFS File System + +With the ending support from SGI, the CXFS file system will be seperated +from its tape library by the end of March, 2013. + +This file system is currently mounted at + +- SGI Altix: `/fastfs/` +- Atlas: `/hpc_fastfs/` + +We kindly ask our users to remove their large data from the file system. +Files worth keeping can be moved + +- to the new [Intermediate Archive](IntermediateArchive) (max storage + duration: 3 years) - see + [MigrationHints](CXFSEndOfSupport#MigrationHints) below, +- or to the [Log-term Archive](PreservationResearchData) (tagged with + metadata). + +To run the file system without support comes with the risk of losing +data. So, please store away your results into the Intermediate Archive. +`/fastfs` might on only be used for really temporary data, since we are +not sure if we can fully guarantee the availability and the integrity of +this file system, from then on. + +With the new HRSK-II system comes a large scratch file system with appr. +800 TB disk space. It will be made available for all running HPC systems +in due time. + +#MigrationHints + +## Migration from CXFS to the Intermediate Archive + +Data worth keeping shall be moved by the users to the directory +`archive_migration`, which can be found in your project's and your +personal `/fastfs` directories. (`/fastfs/my_login/archive_migration`, +`/fastfs/my_project/archive_migration` ) + +\<u>Attention:\</u> Exclusively use the command `mv`. Do **not** use +`cp` or `rsync`, for they will store a second version of your files in +the system. + +Please finish this by the end of January. Starting on Feb/18/2013, we +will step by step transfer these directories to the new hardware. + +- Set DENYTOPICVIEW = WikiGuest diff --git a/twiki2md/root/WebHome/Cloud.md b/twiki2md/root/WebHome/Cloud.md new file mode 100644 index 000000000..63f8910ab --- /dev/null +++ b/twiki2md/root/WebHome/Cloud.md @@ -0,0 +1,113 @@ +# Virtual machine on Taurus + +The following instructions are primarily aimed at users who want to +build their [Singularity](Container) containers on Taurus. + +The \<a href="Container" title="Singularity">container setup\</a> +requires a Linux machine with root privileges, the same architecture and +a compatible kernel. \<br />If some of these requirements can not be +fulfilled, then there is also the option of using the provided virtual +machines on Taurus. + +Currently, starting VMs is only possible on ML and HPDLF nodes.\<br +/>The VMs on the ML nodes are used to build singularity containers for +the Power9 architecture and the HPDLF nodes to build singularity +containers for the x86 architecture. + +## Create a virtual machine + +The **--cloud=kvm** SLURM parameter specifies that a virtual machine +should be started. + +### On Power9 architecture + + rotscher@tauruslogin3:~> srun -p ml -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash + srun: job 6969616 queued and waiting for resources + srun: job 6969616 has been allocated resources + bash-4.2$ + +### On x86 architecture + + rotscher@tauruslogin3:~> srun -p hpdlf -N 1 -c 4 --hint=nomultithread --cloud=kvm --pty /bin/bash + srun: job 2969732 queued and waiting for resources + srun: job 2969732 has been allocated resources + bash-4.2$ + +## %RED%NEW:<span class="twiki-macro ENDCOLOR"></span> Access virtual machine + +\<span style="font-size: 1em;">Since the security issue on Taurus, we +restricted the file system permissions.\<br />\</span>\<span +style="font-size: 1em;">Now you have to wait until the file +/tmp/${SLURM_JOB_USER}\_${SLURM_JOB_ID}/activate is created, then you +can try to ssh into the virtual machine (VM), but it could be that the +VM needs some more seconds to boot and start the SSH daemon.\<br +/>\</span>\<span style="font-size: 1em;">So you may need to try the +\`ssh\` command multiple times till it succeeds.\</span> + + bash-4.2$ cat /tmp/rotscher_2759627/activate + #!/bin/bash + + if ! grep -q -- "Key for the VM on the ml partition" "/home/rotscher/.ssh/authorized_keys" >& /dev/null; then + cat "/tmp/rotscher_2759627/kvm.pub" >> "/home/rotscher/.ssh/authorized_keys" + else + sed -i "s|.*Key for the VM on the ml partition.*|ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3siZfQ6vQ6PtXPG0RPZwtJXYYFY73TwGYgM6mhKoWHvg+ZzclbBWVU0OoU42B3Ddofld7TFE8sqkHM6M+9jh8u+pYH4rPZte0irw5/27yM73M93q1FyQLQ8Rbi2hurYl5gihCEqomda7NQVQUjdUNVc6fDAvF72giaoOxNYfvqAkw8lFyStpqTHSpcOIL7pm6f76Jx+DJg98sXAXkuf9QK8MurezYVj1qFMho570tY+83ukA04qQSMEY5QeZ+MJDhF0gh8NXjX/6+YQrdh8TklPgOCmcIOI8lwnPTUUieK109ndLsUFB5H0vKL27dA2LZ3ZK+XRCENdUbpdoG2Czz Key for the VM on the ml partition|" "/home/rotscher/.ssh/authorized_keys" + fi + + ssh -i /tmp/rotscher_2759627/kvm root@192.168.0.6 + bash-4.2$ source /tmp/rotscher_2759627/activate + Last login: Fri Jul 24 13:53:48 2020 from gateway + [root@rotscher_2759627 ~]# + +## Example usage + +## Automation + +We provide [Tools](VMTools) to automate these steps. You may just type +**startInVM --arch=power9** on a tauruslogin node and you will be inside +the VM with everything mounted. + +## Known Issues + +### Temporary Memory + +The available space inside the VM can be queried with **df -h**. +Currently the whole VM has 8G and with the installed operating system, +6.6GB of available space. + +Sometimes the Singularity build might fail because of a disk +out-of-memory error. In this case it might be enough to delete leftover +temporary files from Singularity: + + rm -rf /tmp/sbuild-* + +If that does not help, e.g., because one build alone needs more than the +available disk memory, then it will be necessary to use the tmp folder +on scratch. In order to ensure that the files in the temporary folder +will be owned by root, it is necessary to set up an image inside +/scratch/tmp instead of using it directly. E.g., to create a 25GB of +temporary memory image: + + tmpDir="$( mktemp -d --tmpdir=/host_data/tmp )" && tmpImg="$tmpDir/singularity-build-temp-dir" + export LANG_BACKUP=$LANG + unset LANG + truncate -s 25G "$tmpImg.ext4" && echo yes | mkfs.ext4 "$tmpImg.ext4" + export LANG=$LANG_BACKUP + +The image can now be mounted and with the **SINGULARITY_TMPDIR** +environment variable can be specified as the temporary directory for +Singularity builds. Unfortunately, because of an open Singularity +[bug](https://github.com/sylabs/singularity/issues/32) it is should be +avoided to mount the image using **/dev/loop0**. + + mkdir -p "$tmpImg" && i=1 && while test -e "/dev/loop$i"; do (( ++i )); done && mknod -m 0660 "/dev/loop$i" b 7 "$i"<br />mount -o loop="/dev/loop$i" "$tmpImg"{.ext4,}<br /><br />export SINGULARITY_TMPDIR="$tmpImg"<br /><br />singularity build my-container.{sif,def} + +The architecture of the base image is automatically chosen when you use +an image from DockerHub. This may not work for Singularity Hub, so in +order to build for the power architecture the Bootstraps **shub** and +**library** should be avoided. + +### Transport Endpoint is not Connected + +This happens when the SSHFS mount gets unmounted because it is not very +stable. It is sufficient to run \~/mount_host_data.sh again or just the +sshfs command inside that script. diff --git a/twiki2md/root/WebHome/Container.md b/twiki2md/root/WebHome/Container.md new file mode 100644 index 000000000..ddb6309ae --- /dev/null +++ b/twiki2md/root/WebHome/Container.md @@ -0,0 +1,280 @@ +# Singularity + + + +If you wish to containerize your workflow/applications, you can use +Singularity containers on Taurus. As opposed to Docker, this solution is +much more suited to being used in an HPC environment. Existing Docker +containers can easily be converted. + +Website: \<a href="<https://www.sylabs.io>" +target="\_blank"><https://www.sylabs.io>\</a>\<br />Docu: \<a +href="<https://www.sylabs.io/docs/>" +target="\_blank"><https://www.sylabs.io/docs/>\</a> + +ZIH wiki sites: + +- [Example Definitions](SingularityExampleDefinitions) +- [Building Singularity images on Taurus](VMTools) +- [Hints on Advanced usage](SingularityRecipeHints) + +It is available on Taurus without loading any module. + +## Local installation + +One advantage of containers is that you can create one on a local +machine (e.g. your laptop) and move it to the HPC system to execute it +there. This requires a local installation of singularity. The easiest +way to do so is: 1 Check if go is installed by executing \`go version\`. +If it is **not**: \<verbatim>wget +<https://storage.googleapis.com/golang/getgo/installer_linux> && chmod ++x installer_linux && ./installer_linux && source +$HOME/.bash_profile\</verbatim> 1 Follow the instructions to [install +Singularity](https://github.com/sylabs/singularity/blob/master/INSTALL.md#clone-the-repo)\<br +/>\<br /> Clone the repo\<br /> \<pre>mkdir -p +${GOPATH}/src/github.com/sylabs && cd ${GOPATH}/src/github.com/sylabs && +git clone <https://github.com/sylabs/singularity.git> && cd +singularity\</pre> Checkout the version you want (see the \<a +href="<https://github.com/sylabs/singularity/releases>" +target="\_blank">Github Releases page\</a> for available releases), +e.g.\<br /> \<pre>git checkout v3.2.1\</pre> Build and install\<br /> +\<pre>cd ${GOPATH}/src/github.com/sylabs/singularity && ./mconfig && cd +./builddir && make && sudo make install\</pre> + +## + +## Container creation + +Since creating a new container requires access to system-level tools and +thus root privileges, it is not possible for users to generate new +custom containers on Taurus directly. You can, however, import an +existing container from, e.g., Docker. + +In case you wish to create a new container, you can do so on your own +local machine where you have the necessary privileges and then simply +copy your container file to Taurus and use it there.\<br />This does not +work on our **ml** partition, as it uses Power9 as its architecture +which is different to the x86 architecture in common +computers/laptops.** For that you can use the [VM Tools](VMTools).** + +### Creating a container + +Creating a container is done by writing a definition file and passing it +to + + singularity build myContainer.sif myDefinition.def + +NOTE: This must be done on a machine (or \<a href="Cloud" +target="\_blank">VM\</a>) with root rights. + +A definition file contains a bootstrap \<a +href="<https://sylabs.io/guides/3.2/user-guide/definition_files.html#header>" +target="\_blank">header\</a> where you choose the base and \<a +href="<https://sylabs.io/guides/3.2/user-guide/definition_files.html#sections>" +target="\_blank">sections\</a> where you install your software. + +The most common approach is to start from an existing docker image from +DockerHub. For example, to start from an \<a +href="<https://hub.docker.com/_/ubuntu>" target="\_blank">ubuntu +image\</a> copy the following into a new file called ubuntu.def (or any +other filename of your choosing) + + Bootstrap: docker<br />From: ubuntu:trusty<br /><br />%runscript<br /> echo "This is what happens when you run the container..."<br /><br />%post<br /> apt-get install g++ + +Then you can call: + + singularity build ubuntu.sif ubuntu.def + +And it will install Ubuntu with g++ inside your container, according to +your definition file. + +More bootstrap options are available. The following example, for +instance, bootstraps a basic CentOS 7 image. + + BootStrap: yum + OSVersion: 7 + MirrorURL: http://mirror.centos.org/centos-%{OSVERSION}/%{OSVERSION}/os/$basearch/ + Include: yum + + %runscript + echo "This is what happens when you run the container..." + + %post + echo "Hello from inside the container" + yum -y install vim-minimal + +More examples of definition files can be found at +<https://github.com/singularityware/singularity/tree/master/examples> + +### Importing a docker container + +You can import an image directly from the Docker repository (Docker +Hub): + + singularity build my-container.sif docker://ubuntu:latest + +As opposed to bootstrapping a container, importing from Docker does +**not require root privileges** and therefore works on Taurus directly. + +Creating a singularity container directly from a local docker image is +possible but not recommended. Steps: + + # Start a docker registry + $ docker run -d -p 5000:5000 --restart=always --name registry registry:2 + + # Push local docker container to it + $ docker tag alpine localhost:5000/alpine + $ docker push localhost:5000/alpine + + # Create def file for singularity like this... + $ cat example.def + Bootstrap: docker + Registry: <a href="http://localhost:5000" rel="nofollow" target="_blank">http://localhost:5000</a> + From: alpine + + # Build singularity container + $ singularity build --nohttps alpine.sif example.def + +### Starting from a Dockerfile + +As singularity definition files and Dockerfiles are very similar you can +start creating a definition file from an existing Dockerfile by +"translating" each section. + +There are tools to automate this. One of them is \<a +href="<https://github.com/singularityhub/singularity-cli>" +target="\_blank">spython\</a> which can be installed with \`pip\` (add +\`--user\` if you don't want to install it system-wide): + +`pip3 install -U spython` + +With this you can simply issue the following command to convert a +Dockerfile in the current folder into a singularity definition file: + +`spython recipe Dockerfile myDefinition.def<br />` + +Now please **verify** your generated defintion and adjust where +required! + +There are some notable changes between singularity definitions and +Dockerfiles: 1 Command chains in Dockerfiles (\`apt-get update && +apt-get install foo\`) must be split into separate commands (\`apt-get +update; apt-get install foo). Otherwise a failing command before the +ampersand is considered "checked" and does not fail the build. 1 The +environment variables section in Singularity is only set on execution of +the final image, not during the build as with Docker. So \`*ENV*\` +sections from Docker must be translated to an entry in the +*%environment* section and **additionally** set in the *%runscript* +section if the variable is used there. 1 \`*VOLUME*\` sections from +Docker cannot be represented in Singularity containers. Use the runtime +option \`-B\` to bind folders manually. 1 *\`CMD\`* and *\`ENTRYPOINT\`* +from Docker do not have a direct representation in Singularity. The +closest is to check if any arguments are given in the *%runscript* +section and call the command from \`*ENTRYPOINT*\` with those, if none +are given call \`*ENTRYPOINT*\` with the arguments of \`*CMD*\`: +\<verbatim>if \[ $# -gt 0 \]; then \<ENTRYPOINT> "$@" else \<ENTRYPOINT> +\<CMD> fi\</verbatim> + +## Using the containers + +### Entering a shell in your container + +A read-only shell can be entered as follows: + + singularity shell my-container.sif + +**IMPORTANT:** In contrast to, for instance, Docker, this will mount +various folders from the host system including $HOME. This may lead to +problems with, e.g., Python that stores local packages in the home +folder, which may not work inside the container. It also makes +reproducibility harder. It is therefore recommended to use +\`--contain/-c\` to not bind $HOME (and others like /tmp) automatically +and instead set up your binds manually via \`-B\` parameter. Example: + + singularity shell --contain -B /scratch,/my/folder-on-host:/folder-in-container my-container.sif + +You can write into those folders by default. If this is not desired, add +an \`:ro\` for read-only to the bind specification (e.g. \`-B +/scratch:/scratch:ro\`).\<br />Note that we already defined bind paths +for /scratch, /projects and /sw in our global singularity.conf, so you +needn't use the -B parameter for those. + +If you wish, for instance, to install additional packages, you have to +use the `-w` parameter to enter your container with it being writable. +This, again, must be done on a system where you have the necessary +privileges, otherwise you can only edit files that your user has the +permissions for. E.g: + + singularity shell -w my-container.sif + Singularity.my-container.sif> yum install htop + +The -w parameter should only be used to make permanent changes to your +container, not for your productive runs (it can only be used writeable +by one user at the same time). You should write your output to the usual +Taurus file systems like /scratch.Launching applications in your +container + +### Running a command inside the container + +While the "shell" command can be useful for tests and setup, you can +also launch your applications inside the container directly using +"exec": + + singularity exec my-container.img /opt/myapplication/bin/run_myapp + +This can be useful if you wish to create a wrapper script that +transparently calls a containerized application for you. E.g.: + + #!/bin/bash + + X=`which singularity 2>/dev/null` + if [ "z$X" = "z" ] ; then + echo "Singularity not found. Is the module loaded?" + exit 1 + fi + + singularity exec /scratch/p_myproject/my-container.sif /opt/myapplication/run_myapp "$@" + The better approach for that however is to use `singularity run` for that, which executes whatever was set in the _%runscript_ section of the definition file with the arguments you pass to it. + Example: + Build the following definition file into an image: + Bootstrap: docker + From: ubuntu:trusty + + %post + apt-get install -y g++ + echo '#include <iostream>' > main.cpp + echo 'int main(int argc, char** argv){ std::cout << argc << " args for " << argv[0] << std::endl; }' >> main.cpp + g++ main.cpp -o myCoolApp + mv myCoolApp /usr/local/bin/myCoolApp + + %runscript + myCoolApp "$@ + singularity build my-container.sif example.def + +Then you can run your application via + + singularity run my-container.sif first_arg 2nd_arg + +Alternatively you can execute the container directly which is +equivalent: + + ./my-container.sif first_arg 2nd_arg + +With this you can even masquerade an application with a singularity +container as if it was an actual program by naming the container just +like the binary: + + mv my-container.sif myCoolAp + +### Use-cases + +One common use-case for containers is that you need an operating system +with a newer GLIBC version than what is available on Taurus. E.g., the +bullx Linux on Taurus used to be based on RHEL6 having a rather dated +GLIBC version 2.12, some binary-distributed applications didn't work on +that anymore. You can use one of our pre-made CentOS 7 container images +(`/scratch/singularity/centos7.img`) to circumvent this problem. +Example: + + $ singularity exec /scratch/singularity/centos7.img ldd --version + ldd (GNU libc) 2.17 diff --git a/twiki2md/root/WebHome/DataManagement.md b/twiki2md/root/WebHome/DataManagement.md new file mode 100644 index 000000000..de1f95297 --- /dev/null +++ b/twiki2md/root/WebHome/DataManagement.md @@ -0,0 +1,16 @@ +# HPC Data Management + +To efficiently handle different types of storage systems, please design +your data workflow according to characteristics, like I/O footprint +(bandwidth/IOPS) of the application, size of the data, (number of +files,) and duration of the storage. In general, the mechanisms of +so-called** [Workspaces](WorkSpaces)** are compulsory for all HPC users +to store data for a defined duration - depending on the requirements and +the storage system this time span might range from days to a few years. + +- [HPC file systems](FileSystems) +- [Intermediate Archive](IntermediateArchive) +- [Special data containers](Special data containers) +- [Move data between file systems](DataMover) +- [Move data to/from ZIH's file systems](ExportNodes) +- [Longterm Preservation for Research Data](PreservationResearchData) diff --git a/twiki2md/root/WebHome/FurtherDocumentation.md b/twiki2md/root/WebHome/FurtherDocumentation.md new file mode 100644 index 000000000..2e6586a43 --- /dev/null +++ b/twiki2md/root/WebHome/FurtherDocumentation.md @@ -0,0 +1,81 @@ +# Further Documentation + + + +## Libraries and Compiler + +- <http://www.intel.com/software/products/mkl/index.htm> +- <http://www.intel.com/software/products/ipp/index.htm> +- <http://www.ball-project.org/> +- <http://www.intel.com/software/products/compilers/> - Intel Compiler + Suite +- <http://www.pgroup.com/doc> - PGI Compiler +- <http://pathscale.com/ekopath.html> - PathScale Compilers + +## Tools + +- <http://www.allinea.com/downloads/userguide.pdf> - Allinea DDT + Manual +- <http://www.totalviewtech.com/support/documentation.html> - + Totalview Documentation +- <http://www.gnu.org/software/gdb/documentation/> - GNU Debugger +- <http://vampir-ng.de> - official homepage of Vampir, an outstanding + tool for performance analysis developed at ZIH. +- <http://www.fz-juelich.de/zam/kojak/> - homepage of KOJAK at the FZ + Jlich. Parts of this project are used by Vampirtrace. +- <http://www.intel.com/software/products/threading/index.htm> + +## OpenMP + +You will find a lot of information at the following web pages: + +- <http://www.openmp.org> +- <http://www.compunity.org> + +## MPI + +The following sites may be interesting: + +- <http://www.mcs.anl.gov/mpi/> - the MPI homepage. +- <http://www.mpi-forum.org/> - Message Passing Interface (MPI) Forum + Home Page +- <http://www.open-mpi.org/> - the dawn of a new standard for a more + fail-tolerant MPI. +- The manual for SGI-MPI (installed on Mars ) can be found at: + +<http://techpubs.sgi.com/library/manuals/3000/007-3773-003/pdf/007-3773-003.pdf> + +## SGI developer forum + +The web sites behind +<http://www.sgi.com/developers/resources/tech_pubs.html> are full of +most detailed information on SGI systems. Have a look onto the section +'Linux Publications'. You will be redirected to the public part of SGI's +technical publication repository. + +- Linux Application Tuning Guide +- Linux Programmer's Guide, The +- Linux Device Driver Programmer's Guide +- Linux Kernel Internals.... and more. + +## Intel Itanium + +There is a lot of additional material regarding the Itanium CPU: + +- <http://www.intel.com/design/itanium/manuals/iiasdmanual.htm> +- <http://www.intel.com/design/archives/processors/itanium/index.htm> +- <http://www.intel.com/design/itanium2/documentation.htm> + +You will find the following manuals: + +- Intel Itanium Processor Floating-Point Software Assistance handler + (FPSWA) +- Intel Itanium Architecture Software Developer's Manuals Volume 1: + Application Architecture +- Intel Itanium Architecture Software Developer's Manuals Volume 2: + System Architecture +- Intel Itanium Architecture Software Developer's Manuals Volume 3: + Instruction Set +- Intel Itanium 2 Processor Reference Manual for Software Development + and Optimization +- Itanium Architecture Assembly Language Reference Guide diff --git a/twiki2md/root/WebHome/GPUProgramming.md b/twiki2md/root/WebHome/GPUProgramming.md new file mode 100644 index 000000000..8a7c57a06 --- /dev/null +++ b/twiki2md/root/WebHome/GPUProgramming.md @@ -0,0 +1,52 @@ +# GPU Programming + + + +## Directive Based GPU Programming + +Directives are special compiler commands in your C/C++ or Fortran source +code. The tell the compiler how to parallelize and offload work to a +GPU. This section explains how to use this technique. + +### OpenACC + +OpenACC ( [link to the website](http://www.openacc-standard.org)) is a +directive based GPU programming model. It currently only supports NVIDIA +GPUs as a target. + +Please use the following information as a start on OpenACC: \* +[Introduction](%ATTACHURL%/Dresden_OpenACC_Intro.pdf) \* + +OpenACC can be used with the PGI and CAPS compilers. For PGI please be +sure to load version 13.4 or newer for full support for the NVIDIA Tesla +K20x GPUs at ZIH. + +#### Using OpenACC with PGI compilers + +- For compilaton please add the compiler flag `-acc`, to enable + OpenACC interpreting by the compiler. +- `-Minfo` will tell you what the compiler is actually doing to your + code +- If you only want to use the created binary at ZIH resources, please + also add `-ta=nvidia:keple` +- OpenACC Turorial: [ intro1.pdf](%ATTACHURL%/intro1.pdf), + [intro2.pdf](%ATTACHURL%/intro2.pdf) . + +### HMPP + +HMPP is available from the CAPS compilers. + +## Native GPU Programming + +### CUDA + +Native CUDA ( [link to website](http://www.nvidia.com/cuda)) programs +can sometimes offer a better performance. Please use the following +slides as an introduction: + +- [Introduction to CUDA](%ATTACHURL%/Dresden_CUDA_Intro.pdf) +- [Advanced Tuning for NVIDIA Kepler + GPUs](%ATTACHURL%/Dresden_AccelerateWithKepler.pdf) + +In order to compiler an application with CUDA use the `nvcc` compiler +command diff --git a/twiki2md/root/WebHome/HPCDA.md b/twiki2md/root/WebHome/HPCDA.md new file mode 100644 index 000000000..19151bd29 --- /dev/null +++ b/twiki2md/root/WebHome/HPCDA.md @@ -0,0 +1,79 @@ +# HPC for Data Analytics + +With the HPC-DA system, the TU Dresden provides infrastructure for +High-Performance Computing and Data Analytics (HPC-DA) for German +researchers for computing projects with focus in one of the following +areas: + +- machine learning scenarios for large systems +- evaluation of various hardware settings for large machine learning + problems, including accelerator and compute node configuration and + memory technologies +- processing of large amounts of data on highly parallel machine + learning infrastructure. + +Currently we offer 25 Mio core hours compute time per year for external +computing projects. Computing projects have a duration of up to one year +with the possibility of extensions, thus enabling projects to continue +seamlessly. Applications for regular projects on HPC-DA can be submitted +at any time via the \<a +href="<https://tu-dresden.de/zih/hochleistungsrechnen/zugang/hpc-da>" +target="new">online web-based submission\</a> and review system. The +reviews of the applications are carried out by experts in their +respective scientific fields. Applications are evaluated only according +to their scientific excellence. + +ZIH provides a portfolio of preinstalled applications and offers support +for software installation/configuration of project-specific +applications. In particular, we provide consulting services for all our +users, and advise researchers on using the resources in an efficient +way. + +\<img align="right" alt="HPC-DA Overview" +src="%ATTACHURL%/bandwidth.png" title="bandwidth.png" width="250" /> + +## Access + +- Application for access using this \<a class="TMLlink" + href="<https://tu-dresden.de/zih/hochleistungsrechnen/zugang/hpc-da>" + target="new">Online Web Form\</a> + +## Hardware Overview + +- [Nodes for machine learning (Power9)](Power9) +- [NVMe Storage](NvmeStorage) (2 PB) +- \<a href="HpcdaWarmArchive" title="Warm Archive">Warm archive\</a> + (10 PB) +- HPC nodes (x86) for DA (island 6) +- Compute nodes with high memory bandwidth: [AMD Rome + Nodes](RomeNodes) (island 7) + +Additional hardware: + +- [Multi-GPU-Cluster](AlphaCentauri) for projects of SCADS.AI + +## File Systems and Object Storage + +- Lustre +- BeeGFS +- Quobyte +- S3 + +## HOWTOS + +- [Get started with HPC-DA](GetStartedWithHPCDA) +- [IBM Power AI](PowerAI) +- [Work with Singularity Containers on Power9\<br />](Cloud) +- [TensorFlow on HPC-DA (native)](TensorFlow) +- \<a href="TensorFlowOnJupyterNotebook" target="\_blank">Tensorflow + on Jupyter notebook\</a> +- Create and run your own TensorFlow container for HPC-DA (Power9) +- [TensorFlow on x86](DeepLearning) +- \<a href="PyTorch" target="\_blank">PyTorch on HPC-DA (Power9)\</a> +- \<a href="Python" target="\_blank">Python on HPC-DA (Power9)\</a> +- [JupyterHub](JupyterHub) +- [R on HPC-DA (Power9)](DataAnalyticsWithR) +- [Big Data frameworks: Apache Spark, Apache Flink, Apache + Hadoop](BigDataFrameworks:ApacheSparkApacheFlinkApacheHadoop) + +-- Main.UlfMarkwardt - 2019-05-05 diff --git a/twiki2md/root/WebHome/HPCStorageConcept2019.md b/twiki2md/root/WebHome/HPCStorageConcept2019.md new file mode 100644 index 000000000..c73cc86e1 --- /dev/null +++ b/twiki2md/root/WebHome/HPCStorageConcept2019.md @@ -0,0 +1,67 @@ +# HPC Storage Changes 2019 + +## \<font face="Open Sans, sans-serif"> **Hardware changes require new approach** \</font> + +\<font face="Open Sans, sans-serif">At the moment we are preparing to +remove our old hardware from 2013. This comes with a shrinking of our +/scratch from 5 to 4 PB. At the same time we have now our "warm archive" +operational for HPC with a capacity of 5 PB for now. \</font> + +\<font face="Open Sans, sans-serif">The tool concept of "workspaces" is +common in a large number of HPC centers. The idea is to allocate a +workspace directory in a certain storage system - connected with an +expiry date. After a grace period the data is deleted automatically. The +validity of a workspace can be extended twice. \</font> + +## \<font face="Open Sans, sans-serif"> **How to use workspaces?** \</font> + +\<font face="Open Sans, sans-serif">We have prepared a few examples at +<https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/WorkSpaces>\</font> + +- \<p>\<font face="Open Sans, sans-serif">For transient data, allocate + a workspace, run your job, remove data, and release the workspace + from with\</font>\<font face="Open Sans, sans-serif">i\</font>\<font + face="Open Sans, sans-serif">n your job file.\</font>\</p> +- \<p>\<font face="Open Sans, sans-serif">If you are working on a set + of data for weeks you might use workspaces in scratch and share them + with your groups by setting the file access attributes.\</font>\</p> +- \<p>\<font face="Open Sans, sans-serif">For \</font>\<font + face="Open Sans, sans-serif">mid-term storage (max 3 years), use our + "warm archive" which is large but slow. It is available read-only on + the compute hosts and read-write an login and export nodes. To move + in your data, you might want to use the [datamover + nodes](DataMover).\</font>\</p> + +## \<font face="Open Sans, sans-serif">Moving Data from /scratch and /lustre/ssd to your workspaces\</font> + +We are now mounting /lustre/ssd and /scratch read-only on the compute +nodes. As soon as the non-workspace /scratch directories are mounted +read-only on the login nodes as well, you won't be able to remove your +old data from there in the usual way. So you will have to use the +DataMover commands and ideally just move your data to your prepared +workspace: + + dtmv /scratch/p_myproject/some_data /scratch/ws/myuser-mynewworkspace + #or: + dtmv /scratch/p_myproject/some_data /warm_archive/ws/myuser-mynewworkspace + +Obsolete data can also be deleted like this: + + dtrm -rf /scratch/p_myproject/some_old_data + +**%RED%At the end of the year we will delete all data on /scratch and +/lsuter/ssd outside the workspaces.%ENDCOLOR%** + +## **\<font face="Open Sans, sans-serif">Data \</font>\<font face="Open Sans, sans-serif">life cycle management\</font>** + +\<font face="Open Sans, sans-serif">Please be aware: \</font>\<font +face="Open Sans, sans-serif">Data in workspaces will be deleted +automatically after the grace period.\</font>\<font face="Open Sans, +sans-serif"> This is especially true for the warm archive. If you want +to keep your data for a longer time please use our options for +[long-term storage](PreservationResearchData).\</font> + +\<font face="Open Sans, sans-serif">To \</font>\<font face="Open Sans, +sans-serif">help you with that, you can attach your email address for +notification or simply create an ICAL entry for your calendar +(...@tu-dresden.de only). \</font> diff --git a/twiki2md/root/WebHome/Hardware.md b/twiki2md/root/WebHome/Hardware.md new file mode 100644 index 000000000..bac841daf --- /dev/null +++ b/twiki2md/root/WebHome/Hardware.md @@ -0,0 +1,17 @@ +# Hardware + +Here, you can find basic information about the hardware installed at +ZIH. We try to keep this list up-to-date. + +- [BULL HPC-Cluster Taurus](HardwareTaurus) +- [SGI Ultraviolet (UV)](HardwareVenus) + +Hardware hosted by ZIH: + +Former systems + +- [PC-Farm Deimos](HardwareDeimos) +- [SGI Altix](HardwareAltix) +- [PC-Farm Atlas](HardwareAtlas) +- [PC-Cluster Triton](HardwareTriton) +- [HPC-Windows-Cluster Titan](HardwareTitan) diff --git a/twiki2md/root/WebHome/Impressum.md b/twiki2md/root/WebHome/Impressum.md new file mode 100644 index 000000000..9c34f3d81 --- /dev/null +++ b/twiki2md/root/WebHome/Impressum.md @@ -0,0 +1,14 @@ +Es gilt das \<a href="<https://tu-dresden.de/impressum>" rel="nofollow" +title="<https://tu-dresden.de/impressum>">Impressum der TU Dresden\</a> +mit folgenden nderungen: + +**Ansprechpartner/Betreiber:** + +Technische Universitt Dresden\<br />Zentrum fr Informationsdienste und +Hochleistungsrechnen\<br />01062 Dresden\<br />\<br />Tel.: +49 351 +463-40000\<br />Fax: +49 351 463-42328\<br />E-Mail: +<servicedesk@tu-dresden.de>\<br />\<br />**Konzeption, Technische +Umsetzung, Anbieter:**\<br />\<br />Technische Universitt Dresden\<br +/>Zentrum fr Informationsdienste und Hochleistungsrechnen\<br />Prof. +Dr. Wolfgang E. Nagel\<br />01062 Dresden\<br />\<br />Tel.: +49 351 +463-35450\<br />Fax: +49 351 463-37773\<br />E-Mail: <zih@tu-dresden.de> diff --git a/twiki2md/root/WebHome/JupyterHub.md b/twiki2md/root/WebHome/JupyterHub.md new file mode 100644 index 000000000..cccbdc07a --- /dev/null +++ b/twiki2md/root/WebHome/JupyterHub.md @@ -0,0 +1,374 @@ +# JupyterHub + +With our JupyterHub service we offer you now a quick and easy way to +work with jupyter notebooks on Taurus. + +Subpages: + +- [JupyterHub for Teaching (git-pull feature, quickstart links, direct + links to notebook files)](Compendium.JupyterHubForTeaching) + + + +## Disclaimer + +This service is provided "as-is", use at your own discretion. Please +understand that JupyterHub is a complex software system of which we are +not the developers and don't have any downstream support contracts for, +so we merely offer an installation of it but cannot give extensive +support in every case. + +## Access + +**%RED%NOTE:%ENDCOLOR%** This service is only available for users with +an active HPC project. See [here](Access) how to apply for an HPC +project. + +JupyterHub is available here:\<br /> +<https://taurus.hrsk.tu-dresden.de/jupyter> + +## Start a session + +Start a new session by clicking on the \<img alt="" height="24" +src="%ATTACHURL%/start_my_server.png" /> button. + +A form opens up where you can customize your session. Our simple form +offers you the most important settings to start quickly. + +\<a href="%ATTACHURL%/simple_form.png">\<img alt="session form" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/simple_form.png>" +style="border: 1px solid #888;" title="simple form" width="400" />\</a> + +For advanced users we have an extended form where you can change many +settings. You can: + +- modify Slurm parameters to your needs ( [more about + Slurm](Compendium.Slurm)) +- assign your session to a project or reservation +- load modules from the [LMOD module + system](Compendium.RuntimeEnvironment) +- choose a different standard environment (in preparation for future + software updates or testing additional features) + +\<a href="%ATTACHURL%/advanced_form_nov2019.png">\<img alt="session +form" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/advanced_form_nov2019.png>" +style="border: 1px solid #888;" title="advanced form" width="400" +/>\</a> + +You can save your own configurations as additional presets. Those are +saved in your browser and are lost if you delete your browsing data. Use +the import/export feature (available through the button) to save your +presets in text files. + +Note: the \<a +href`"https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/AlphaCentauri" target="_blank"> ==alpha=` +\</a> partition is available only in the extended form. + +## Applications + +You can choose between JupyterLab or the classic notebook app. + +### JupyterLab + +\<a href="%ATTACHURL%/jupyterlab_app.png">\<img alt="jupyterlab app" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/jupyterlab_app.png>" +style="border: 1px solid #888;" title="JupyterLab overview" width="400" +/>\</a> + +The main workspace is used for multiple notebooks, consoles or +terminals. Those documents are organized with tabs and a very versatile +split screen feature. On the left side of the screen you can open +several views: + +- file manager +- controller for running kernels and terminals +- overview of commands and settings +- details about selected notebook cell +- list of open tabs + +### Classic notebook + +\<a href="%ATTACHURL%/jupyter_notebook_app_filebrowser.png">\<img +alt="filebrowser in jupyter notebook server" width="400" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/jupyter_notebook_app_filebrowser.png>" +style="border: 1px solid #888;" title="Classic notebook (file browser)" +/>\</a> + +\<a href="%ATTACHURL%/jupyter_notebook_example_matplotlib.png">\<img +alt="jupyter_notebook_example_matplotlib" width="400" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/jupyter_notebook_example_matplotlib.png>" +style="border: 1px solid #888;" title="Classic notebook (matplotlib +demo)" />\</a> + +Initially you will get a list of your home directory. You can open +existing notebooks or files by clicking on them. + +Above the table on the right side is the "New â·" button which lets you +create new notebooks, files, directories or terminals. + +## The notebook + +In JupyterHub you can create scripts in notebooks. +Notebooks are programs which are split in multiple logical code blocks. +In between those code blocks you can insert text blocks for +documentation and each block can be executed individually. Each notebook +is paired with a kernel which runs the code. We currently offer one for +Python, C++, MATLAB and R. + +## Stop a session + +It's good practise to stop your session once your work is done. This +releases resources for other users and your quota is less charged. If +you just log out or close the window your server continues running and +will not stop until the Slurm job runtime hits the limit (usually 8 +hours). + +At first you have to open the JupyterHub control panel. + +**JupyterLab**: Open the file menu and then click on Logout. You can +also click on "Hub Control Panel" which opens the control panel in a new +tab instead. + +\<a href="%ATTACHURL%/jupyterlab_logout.png">\<img alt="" height="400" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/jupyterlab_logout.png>" +style="border: 1px solid #888;" title="JupyterLab logout button"/>\</a> + +**Classic notebook**: Click on the control panel button on the top right +of your screen. + +\<img alt="" src="%ATTACHURL%/notebook_app_control_panel_btn.png" +style="border: 1px solid #888;" title="Classic notebook (control panel +button)" /> + +Now you are back on the JupyterHub page and you can stop your server by +clicking on \<img alt="" height="24" +src="%ATTACHURL%/stop_my_server.png" title="Stop button" />. + +## Error handling + +We want to explain some errors that you might face sooner or later. If +you need help open a ticket at HPC support. + +### Error while starting a session + +\<a href="%ATTACHURL%/error_batch_job_submission_failed.png">\<img +alt="" width="400" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/error_batch_job_submission_failed.png>" +style="border: 1px solid #888;" title="Error message: Batch job +submission failed."/>\</a> + +This message often appears instantly if your Slurm parameters are not +valid. Please check those settings against the available hardware. +Useful pages for valid Slurm parameters: + +- [Slurm batch system (Taurus)](SystemTaurus#Batch_System) +- [General information how to use Slurm](Slurm) + +### Error message in JupyterLab + +\<a href="%ATTACHURL%/jupyterlab_error_directory_not_found.png">\<img +alt="" width="400" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/jupyterlab_error_directory_not_found.png>" +style="border: 1px solid #888;" title="Error message: Directory not +found"/>\</a> + +If the connection to your notebook server unexpectedly breaks you maybe +will get this error message. +Sometimes your notebook server might hit a Slurm or hardware limit and +gets killed. Then usually the logfile of the corresponding Slurm job +might contain useful information. These logfiles are located in your +home directory and have the name "jupyter-session-\<jobid>.log". + +------------------------------------------------------------------------ + +## Advanced tips + +### Standard environments + +The default python kernel uses conda environments based on the [Watson +Machine Learning Community Edition (formerly +PowerAI)](https://developer.ibm.com/linuxonpower/deep-learning-powerai/) +package suite. You can open a list with all included packages of the +exact standard environment through the spawner form: + +\<img alt="environment_package_list.png" +src="%ATTACHURL%/environment_package_list.png" style="border: 1px solid +#888;" title="JupyterHub environment package list" /> + +This list shows all packages of the currently selected conda +environment. This depends on your settings for partition (cpu +architecture) and standard environment. + +There are three standard environments: + +- production, +- test, +- python-env-python3.8.6. + +**Python-env-python3.8.6**virtual environment can be used for all x86 +partitions(gpu2, alpha, etc). It gives the opportunity to create a user +kernel with the help of a python environment. + +Here's a short list of some included software: + +| | | | +|------------|-----------|--------| +| | generic\* | ml | +| Python | 3.6.10 | 3.6.10 | +| R\*\* | 3.6.2 | 3.6.0 | +| WML CE | 1.7.0 | 1.7.0 | +| PyTorch | 1.3.1 | 1.3.1 | +| TensorFlow | 2.1.1 | 2.1.1 | +| Keras | 2.3.1 | 2.3.1 | +| numpy | 1.17.5 | 1.17.4 | +| matplotlib | 3.3.1 | 3.0.3 | + +\* generic = all partitions except ml + +\*\* R is loaded from the [module system](Compendium.RuntimeEnvironment) + +### Creating and using your own environment + +Interactive code interpreters which are used by Jupyter Notebooks are +called kernels. +Creating and using your own kernel has the benefit that you can install +your own preferred python packages and use them in your notebooks. + +We currently have two different architectures at Taurus. Build your +kernel environment on the **same architecture** that you want to use +later on with the kernel. In the examples below we use the name +"my-kernel" for our user kernel. We recommend to prefix your kernels +with keywords like "intel", "ibm", "ml", "venv", "conda". This way you +can later recognize easier how you built the kernel and on which +hardware it will work. + +**Intel nodes** (e.g. haswell, gpu2): + + srun --pty -n 1 -c 2 --mem-per-cpu 2583 -t 08:00:00 bash -l + +If you don't need Sandy Bridge support for your kernel you can create +your kernel on partition 'haswell'. + +**Power nodes** (ml partition): + + srun --pty -p ml -n 1 -c 2 --mem-per-cpu 5772 -t 08:00:00 bash -l + +Create a virtual environment in your home directory. You can decide +between python virtualenvs or conda environments. + +<span class="twiki-macro RED"></span> **Note** <span +class="twiki-macro ENDCOLOR"></span>: Please take in mind that Python +venv is the preferred way to create a Python virtual environment. + +#### Python virtualenv + + $ module load Python/3.8.6-GCCcore-10.2.0 + + $ mkdir user-kernel #please use Workspaces! + + $ cd user-kernel + + $ virtualenv --system-site-packages my-kernel + Using base prefix '/sw/installed/Python/3.6.6-fosscuda-2018b' + New python executable in .../user-kernel/my-kernel/bin/python + Installing setuptools, pip, wheel...done. + + $ source my-kernel/bin/activate + + (my-kernel) $ pip install ipykernel + Collecting ipykernel + ... + Successfully installed ... ipykernel-5.1.0 ipython-7.5.0 ... + + (my-kernel) $ pip install --upgrade pip + + (my-kernel) $ python -m ipykernel install --user --name my-kernel --display-name="my kernel" + Installed kernelspec my-kernel in .../.local/share/jupyter/kernels/my-kernel + + [now install additional packages for your notebooks] + + (my-kernel) $ deactivate + +#### Conda environment + +Load the needed module for Intel nodes + + $ module load Anaconda3 + +... or for IBM nodes (ml partition): + + $ module load PythonAnaconda + +Continue with environment creation, package installation and kernel +registration: + + $ mkdir user-kernel #please use Workspaces! + + $ conda create --prefix /home/<USER>/user-kernel/my-kernel python=3.6 + Collecting package metadata: done + Solving environment: done + [...] + + $ conda activate /home/<USER>/user-kernel/my-kernel + + $ conda install ipykernel + Collecting package metadata: done + Solving environment: done + [...] + + $ python -m ipykernel install --user --name my-kernel --display-name="my kernel" + Installed kernelspec my-kernel in [...] + + [now install additional packages for your notebooks] + + $ conda deactivate + +Now you can start a new session and your kernel should be available. + +\*In JupyterLab\*: + +Your kernels are listed on the launcher page: + +\<a href="%ATTACHURL%/user-kernel_in_jupyterlab_launcher.png">\<img +alt="jupyterlab_app.png" height="410" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/user-kernel_in_jupyterlab_launcher.png>" +style="border: 1px solid #888;" title="JupyterLab kernel launcher +list"/>\</a> + +You can switch kernels of existing notebooks in the menu: + +\<a href="%ATTACHURL%/jupyterlab_change_kernel.png">\<img +alt="jupyterlab_app.png" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/jupyterlab_change_kernel.png>" +style="border: 1px solid #888;" title="JupyterLab kernel switch"/>\</a> + +**In classic notebook app**: + +Your kernel is listed in the New menu: + +\<a href="%ATTACHURL%/user-kernel_in_jupyter_notebook.png">\<img +alt="jupyterlab_app.png" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/user-kernel_in_jupyter_notebook.png>" +style="border: 1px solid #888;" title="Classic notebook (create notebook +with new kernel)"/>\</a> + +You can switch kernels of existing notebooks in the kernel menu: + +\<a href="%ATTACHURL%/switch_kernel_in_jupyter_notebook.png">\<img +alt="jupyterlab_app.png" +src="<https://doc.zih.tu-dresden.de/hpc-wiki/pub/Compendium/JupyterHub/switch_kernel_in_jupyter_notebook.png>" +style="border: 1px solid #888;" title="Classic notebook (kernel +switch)"/>\</a> + +**Note**: Both python venv and conda virtual environments will be +mention in the same list. + +### Loading modules + +You have now the option to preload modules from the LMOD module +system. +Select multiple modules that will be preloaded before your notebook +server starts. The list of available modules depends on the module +environment you want to start the session in (scs5 or ml). The right +module environment will be chosen by your selected partition. diff --git a/twiki2md/root/WebHome/Login.md b/twiki2md/root/WebHome/Login.md new file mode 100644 index 000000000..4b23d7c11 --- /dev/null +++ b/twiki2md/root/WebHome/Login.md @@ -0,0 +1,81 @@ +# Login to the High Performance Computers + + + +The best way to login to the linux machines is via ssh. From a Linux +console, the command syntax is `ssh user@host`. The additional option +\<span>-X\</span>C enables X11 forwarding for graphical applications +(the -C enables compression which usually improves usability in this +case). + +The following table gives an overview of all clusters. + +| Hostname | Description | +|:--------------------------|:--------------------| +| taurus.hrsk.tu-dresden.de | BULL system - SLURM | + +**Attention:** For security reasons, this port is only accessible for +hosts within the domains of TU Dresden. Guests from other research +institutes can use the +[VPN](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn) +gateway of the ZIH. Information on these topics can be found on our web +pages <http://www.tu-dresden.de/zih>. + +## Access from a Windows workstation + +We suggest Windows users use MobaXTerm. ( [see details +here](MobaXterm)). Benefits of MobaXTerm include: + +- easy to use +- graphical user interface +- file transfer via drag and drop + +## Access from a Linux workstation + +### SSH access + +**Attention:** Please use an up-to-date SSH client. The login nodes +accept the following encryption algorithms: +*aes128-ctr,aes192-ctr,aes256-ctr,<aes128-gcm@openssh.com>,<aes256-gcm@openssh.com>,<chacha20-poly1305@openssh.com>,<chacha20-poly1305@openssh.com>* + +If your workstation is within the campus network, you can connect to the +HPC login servers directly, e.g., for Taurus: + + ssh <zih-login>@taurus.hrsk.tu-dresden.de + +If you connect for the fist time, the client will ask you to verify the +host by the fingerprint: + +``` bash +user@pc:~# ssh <zih-login>@taurus.hrsk.tu-dresden.de +The authenticity of host 'taurus.hrsk.tu-dresden.de (141.30.73.104)' can't be established. +RSA key fingerprint is SHA256:HjpVeymTpk0rqoc8Yvyc8d9KXQ/p2K0R8TJ27aFnIL8. +Are you sure you want to continue connecting (yes/no)? +``` + +Compare this fingerprint with the fingerprints in the table below. If +they match you can type "yes". + +<span class="twiki-macro TABLE" caption="Fingerprints"></span> + +| Host | Fingerprint | +|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| tauruslogin3.hrsk.tu-dresden.de\<br />tauruslogin4.hrsk.tu-dresden.de\<br />tauruslogin5.hrsk.tu-dresden.de\<br />tauruslogin6.hrsk.tu-dresden.de\<br />taurus.hrsk.tu-dresden.de | SHA256:/M1lW1KTOlxj8UFZJS4tLi+8TyndcDqrZfLGX7KAU8s (RSA)\<br />SHA256:PeCpW/gAFLvHDzTP2Rb93NxD+rpUsyQY8WebjQC7kz0 (ECDSA)\<br />SHA256:nNxjtCny1kB0N0epHaOPeY1YFd0ri2Dvt2CK7rOGlXg (ED25519)\<br /> or:\<br /> MD5:b8:e1:21:ed:38:1a:ba:1a:5b:2b:bc:35:31:62:21:49 (RSA)\<br />MD5:47:7e:24:46:ab:30:59:2c:1f:e8:fd:37:2a:5d:ee:25 (ECDSA)\<br />MD5:7c:0c:2b:8b:83:21:b2:08:19:93:6d:03:80:76:8a:7b (ED25519) | +| taurusexport.hrsk.tu-dresden.de\<br /> taurusexport3.hrsk.tu-dresden.de\<br /> taurusexport4.hrsk.tu-dresden.de | SHA256:Qjg79R+5x8jlyHhLBZYht599vRk+SujnG1yT1l2dYUM (RSA)\<br />SHA256:qXTZnZMvdqTs3LziA12T1wkhNcFqTHe59fbbU67Qw3g (ECDSA)\<br />SHA256:jxWiddvDe0E6kpH55PHKF0AaBg/dQLefQaQZ2P4mb3o (ED25519)\<br /> or:\<br />MD5:1e:4c:2d:81:ee:58:1b:d1:3c:0a:18:c4:f7:0b:23:20 (RSA)\<br />MD5:96:62:c6:80:a8:1f:34:64:86:f3:cf:c5:9b:cd:af:da (ECDSA)\<br />MD5:fe:0a:d2:46:10:4a:08:40:fd:e1:99:b7:f2:06:4f:bc (ED25519) | + +From outside the TUD campus network + +Use a VPN (virtual private network) to enter the campus network, which +allows you to connect directly to the HPC login servers. + +For more information on our VPN and how to set it up, please visit the +corresponding [ZIH service catalogue +page](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn). + +### Access using JupyterHub + +A JupyterHub installation offering IPython Notebook is available under: + +<https://taurus.hrsk.tu-dresden.de/jupyter> + +See the documentation under [JupyterHub](JupyterHub). diff --git a/twiki2md/root/WebHome/MachineLearning.md b/twiki2md/root/WebHome/MachineLearning.md new file mode 100644 index 000000000..0436dc670 --- /dev/null +++ b/twiki2md/root/WebHome/MachineLearning.md @@ -0,0 +1,59 @@ +## Machine Learning + +On the machine learning nodes, you can use the tools from [IBM Power +AI](PowerAI). + +### Interactive Session Examples + +#### Tensorflow-Test + + tauruslogin6 :~> srun -p ml --gres=gpu:1 -n 1 --pty --mem-per-cpu=10000 bash + srun: job 4374195 queued and waiting for resources + srun: job 4374195 has been allocated resources + taurusml22 :~> ANACONDA2_INSTALL_PATH='/opt/anaconda2' + taurusml22 :~> ANACONDA3_INSTALL_PATH='/opt/anaconda3' + taurusml22 :~> export PATH=$ANACONDA3_INSTALL_PATH/bin:$PATH + taurusml22 :~> source /opt/DL/tensorflow/bin/tensorflow-activate + taurusml22 :~> tensorflow-test + Basic test of tensorflow - A Hello World!!!... + + #or: + taurusml22 :~> module load TensorFlow/1.10.0-PythonAnaconda-3.6 + +Or to use the whole node: `--gres=gpu:6 --exclusive --pty` + +##### In Singularity container: + + rotscher@tauruslogin6:~> srun -p ml --gres=gpu:6 --pty bash + [rotscher@taurusml22 ~]$ singularity shell --nv /scratch/singularity/powerai-1.5.3-all-ubuntu16.04-py3.img + Singularity powerai-1.5.3-all-ubuntu16.04-py3.img:~> export PATH=/opt/anaconda3/bin:$PATH + Singularity powerai-1.5.3-all-ubuntu16.04-py3.img:~> . /opt/DL/tensorflow/bin/tensorflow-activate + Singularity powerai-1.5.3-all-ubuntu16.04-py3.img:~> tensorflow-test + +### Additional libraries + +The following NVIDIA libraries are available on all nodes: + +| | | +|-------|---------------------------------------| +| NCCL | /usr/local/cuda/targets/ppc64le-linux | +| cuDNN | /usr/local/cuda/targets/ppc64le-linux | + +Note: For optimal NCCL performance it is recommended to set the +**NCCL_MIN_NRINGS** environment variable during execution. You can try +different values but 4 should be a pretty good starting point. + + export NCCL_MIN_NRINGS=4 + +\<span style="color: #222222; font-size: 1.385em;">HPC\</span> + +The following HPC related software is installed on all nodes: + +| | | +|------------------|------------------------| +| IBM Spectrum MPI | /opt/ibm/spectrum_mpi/ | +| PGI compiler | /opt/pgi/ | +| IBM XLC Compiler | /opt/ibm/xlC/ | +| IBM XLF Compiler | /opt/ibm/xlf/ | +| IBM ESSL | /opt/ibmmath/essl/ | +| IBM PESSL | /opt/ibmmath/pessl/ | diff --git a/twiki2md/root/WebHome/MigrateToAtlas.md b/twiki2md/root/WebHome/MigrateToAtlas.md new file mode 100644 index 000000000..fa53026bc --- /dev/null +++ b/twiki2md/root/WebHome/MigrateToAtlas.md @@ -0,0 +1,115 @@ +# Migration to Atlas + + Atlas is a different machine than +Deimos, please have a look at the table: + +| | | | +|---------------------------------------------------|------------|-----------| +| | **Deimos** | **Atlas** | +| **number of hosts** | 584 | 92 | +| **cores per host** | 2...8 | 64 | +| **memory \[GB\] per host** | 8...64 | 64..512 | +| **example benchmark: SciMark (higher is better)** | 655 | 584 | + +A single thread on Atlas runs with a very poor performance in comparison +with the 6 year old Deimos. The reason for this is that the AMD CPU +codenamed "Bulldozer" is designed for multi-threaded use. + +## Modules + +We have grouped the module definitions for a better overview. This is +only for displaying the available modules, not for loading a module. All +available modules can be made visible with `module load ALL; module av` +. For more details, please see [module +groups.](RuntimeEnvironment#Module_Groups) + +#BatchSystem + +## Batch System + +Although we are running LSF as batch system there are certain +differences to the older versions known from Deimos and Mars. + +The most important changes are: + +- Specify maximum runtime instead of a queue (`-W <hh:mm>`). +- Specify needed memory (per process in MByte) with + `-M <memory per process in MByte>`, the default is 300 MB, e.g. + `-M 2000`. + +| | | | +|-----------------------|--------|------------------------------------------------------| +| Hosts on Atlas | number | per process/core user memory limit in MB (-M option) | +| nodes with 64 GB RAM | 48 | 940 | +| nodes with 128 GB RAM | 24 | 1950 | +| nodes with 256 GB RAM | 12 | 4000 | +| nodes with 512 GB RAM | 8 | 8050 | + +- Jobs with a job runtime greater than 72 hours (jobs that will run in + the queue `long`) will be collected over the day and scheduled in a + time window in accordance with their priority. +- Interactive Jobs with X11 tunneling need an additional option `-XF` + to work (`bsub -Is -XF -n <N> -W <hh:mm> -M <MEM> bash`). +- The load of the system can be seen with `lsfview` and `lsfnodestat`. + +Atlas is designed as a high-throughput machine. With the large compute +nodes you have to be more precise in your resource requests. + +- In ninety nine percent of the cases it is enough when you specify + your processor requirements with `-n <n>` and your memory + requirements with `-M <memory per process in MByte>`. +- Please use \<span class="WYSIWYG_TT">-x\</span>("exclusive use of a + hosts") only with care and when you really need it. + - The option `-x` in combination with `-n 1` leads to an + "efficiency" of only 1.5% - in contrast with 50% on the single + socket nodes at Deimos. + - You will be charged for the whole blocked host(s) within your + CPU hours budget. + - Don't use `-x` for memory reasons, please use `-M` instead. +- Please use `-M <memory per process in MByte>` to specify your memory + requirements per process. +- Please don't use `-R "span[hosts=1]"` or `-R "span[ptile=<n>]"` or + any other \<span class="WYSIWYG_TT">-R "..."\</span>option, the + batch system is smart enough to select the best hosts in accordance + with your processor and memory requirements. + - Jobs with a processor requirement ≤ 64 will always be scheduled + on one node. + - Larger jobs will use just as many hosts as needed, e.g. 160 + processes will be scheduled on three hosts. + +For more details, please see the pages on [LSF](PlatformLSF). + +## Software + +Depending on the applications we have seen a broad variety of different +performances running binaries from Deimos on Atlas. Some can run without +touching, with others we have seen significant degradations so that a +re-compile made sense. + +### Applications + +As a default, all applications provided by ZIH should run an atlas +without problems. Please [tell us](mailto:hpcsupport@zih.tu-dresden.de) +if you are missing your application or experience severe performance +degradation. Please include "Atlas" in your subject. + +### Development + +From the benchmarking point of view, the best compiler for the AMD +Bulldozer processor, the best compiler comes from the Open64 suite. For +convenience, other compilers are installed, Intel 12.1 shows good +results as well. Please check the best compiler flags at [this +overview](http://developer.amd.com/Assets/CompilerOptQuickRef-62004200.pdf). +For best performance, please use [ACML](Libraries#ACML) as BLAS/LAPACK +library. + +### MPI parallel applications + +Please note the more convenient syntax on Atlas. Therefore, please use a +command like + + bsub -W 2:00 -M 200 -n 8 mpirun a.out + +to submit your MPI parallel applications. + +- Set DENYTOPICVIEW = WikiGuest diff --git a/twiki2md/root/WebHome/NoIBJobs.md b/twiki2md/root/WebHome/NoIBJobs.md new file mode 100644 index 000000000..580f7af87 --- /dev/null +++ b/twiki2md/root/WebHome/NoIBJobs.md @@ -0,0 +1,48 @@ +# Run jobs on a nodes without Infiniband + +## **Please be aware:** + +- These hints are meant only for the downtime of the IB fabric or + parts of it. Do not use this setup in a normal, healthy system! +- This setup must not run by jobs producing large amounts of output + data! +- MPI jobs over multiple nodes can not run. +- Jobs using /scratch or /lustre/ssd can not run.\<hr /> + +At the moment when parts of the IB stop we will start batch system +plugins to parse for this batch system option: +\<span>--comment=NO_IB\</span> . Jobs with this option set can run on +nodes without Infiniband access if (and only if) they have set the +\<span>--tmp\</span>-option as well: + +*From the Slurm documentation:* + +`--tmp` = Specify a minimum amount of temporary disk space per node. +Default units are megabytes unless the SchedulerParameters configuration +parameter includes the "default_gbytes" option for gigabytes. Different +units can be specified using the suffix \[K\|M\|G\|T\]. This option +applies to job allocations. + +Keep in mind: Since the scratch file system are not available and the +project file system is read-only mounted at the compute nodes you have +to work in /tmp. + +A simple job script should do this: + +- create a temporary directory on the compute node in `/tmp` and go + there\<span class="WYSIWYG_TT">\<br />\</span> +- start the application (under /sw/ or /projects/)using input data + from somewhere in the project file system +- archive and transfer the results to some global location + +<!-- --> + + #SBATCH --comment=NO_IB + #SBATCH --tmp 2G + MYTEMP=/tmp/$JOBID + mkdir $MYTEMP; + cd $MYTEMP + <path_to_binary>/myapp < <path_to_input_data> > ./$JOBID_out + # tar if it makes sense! + rsync -a $MYTEMP taurusexport3:<path_to_output_data>/ + rm -rf $MYTEMP diff --git a/twiki2md/root/WebHome/PreservationResearchData.md b/twiki2md/root/WebHome/PreservationResearchData.md new file mode 100644 index 000000000..d99a4e058 --- /dev/null +++ b/twiki2md/root/WebHome/PreservationResearchData.md @@ -0,0 +1,112 @@ +# Longterm Preservation for Research Data + + + +### Why should research data be preserved? + +There are several reasons. On the one hand, research data should be +preserved to make the results reproducible. On the other hand research +data could be used a second time for investigating another question. In +the latter case persistent identifiers (like DOI) are needed to make +these data findable and citable. In both cases it is important to add +meta-data to the data. + +### Which research data should be preserved? + +Since large quantities of data are nowadays produced it is not possible +to store everything. The researcher needs to decide which data are worth +and important to keep. + +In case these data come from simulations, there are two possibilities: 1 +Storing the result of the simulations 1 Storing the software and the +input-values + +Which of these possibilities is preferable depends on the time the +simulations need and on the size of the result of the calculations. Here +one needs to estimate, which possibility is cheaper. + +**This is, what DFG says** (translated from +<http://www.dfg.de/download/pdf/foerderung/programme/lis/ua_inf_empfehlungen_200901.pdf>, +page 2): + +*Primary research data are data, which were created in the course* *of +studies of sources, experiments, measurements or surveys. They are the* +*basis of scholarly publications*. *The definition of primary research +data depends on the subject*. *Each community of researchers should +decide by itself, if raw data are* *already primary research data or at +which degree of aggregation data* *should be preserved. Further it +should be agreed upon the granularity* *of research data: how many data +yield one set of data, which will be* *given a persistent identifier*. + +### Why should I add Meta-Data to my data? + +Many researchers think, that adding meta-data is time-consuming and +senseless but that isn't true. On the contrary, adding meta-data is very +important, since they should enable other researchers to know, how and +in which circumstances these data are created, in which format they are +saved, and which software in which version is needed to view the data, +and so on, so that other researchers can reproduce these data or use +them for new investigations. Last but not least meta-data should enable +you in ten years time to know what your data describe, which you created +such a long time ago. + +### What are Meta-Data? + +Meta-data means data about data. Meta-data are information about the +stored file. There can be administrative meta-data, descriptive +meta-data, technical meta-data and so on. Often meta-data are stored in +XML-format but free text is also possible. There are some meta-data +standards like [Dublin Core](http://dublincore.org/) or +[LMER](http://www.dnb.de/EN/Standardisierung/LMER/lmer_node.html). Below +are some examples: + +- possible meta-data for a book would be: + - Title + - Author + - Publisher + - Publication year + - ISBN + +<!-- --> + +- possible meta-data for an electronically saved image would be: + - resolution of the image + - information about the colour depth of the picture + - file format (jpg or tiff or ...) + - file size + - how was this image created (digital camera, scanner, ...) + - description of what the image shows + - creation date of the picture + - name of the person who made the picture + +<!-- --> + +- meta-data for the result of a calculation/simulation could be: + - file format + - file size + - input data + - which software in which version was used to calculate the + result/to do the simulation + - configuration of the software + - date of the calculation/simulation (start/end or start/duration) + - computer on which the calculation/simulation was done + - name of the person who submitted the calculation/simulation + - description of what was calculated/simulated + +### Where can I get more information about management of research data? + +Got to + +- <http://www.forschungsdaten.org/> (german version) or +- <http://www.forschungsdaten.org/en/> (english version) + +to find more information about managing research data. + +### I want to store my research data at ZIH. How can I do that? + +Longterm preservation of research data is under construction at ZIH and +in a testing phase. Nevertheless you can already use the archiving +service. \<br /> If you would like to become a test user, please write +an E-Mail to Dr. Klaus Khler (klaus.koehler \[at\] tu-dresden.de). + +-- Main.DanielaKoudela - 2012-03-26 diff --git a/twiki2md/root/WebHome/RuntimeEnvironment.md b/twiki2md/root/WebHome/RuntimeEnvironment.md new file mode 100644 index 000000000..4c350b33c --- /dev/null +++ b/twiki2md/root/WebHome/RuntimeEnvironment.md @@ -0,0 +1,200 @@ +# Runtime Environment + + Make sure you know how to work +with a Linux system. Documentations and tutorials can be easily found on +the internet or in your library. + +#AnchorModule + +## Modules + +To allow the user to switch between different versions of installed +programs and libraries we use a *module concept*. A module is a user +interface that provides utilities for the dynamic modification of a +user's environment, i.e., users do not have to manually modify their +environment variables ( `PATH` , `LD_LIBRARY_PATH`, ...) to access the +compilers, loader, libraries, and utilities. + +For all applications, tools, libraries etc. the correct environment can +be easily set by e.g. `module load Mathematica`. If several versions are +installed they can be chosen like `module load MATLAB/2019b`. A list of +all modules shows `module avail`. Other important commands are: + +| Command | Description | +|:------------------------------|:-----------------------------------------------------------------| +| `module help` | show all module options | +| `module list` | list all user-installed modules | +| `module purge` | remove all user-installed modules | +| `module avail` | list all available modules | +| `module spider` | search for modules across all environments, can take a parameter | +| `module load <modname>` | load module `modname` | +| `module unload <modname>` | unloads module `modname` | +| `module switch <mod1> <mod2>` | unload module `mod1` ; load module `mod2` | + +Module files are ordered by their topic on our HPC systems. By default, +with `module av` you will see all available module files and topics. If +you just wish to see the installed versions of a certain module, you can +use `module av softwarename` and it will display the available versions +of `softwarename` only. + +### Lmod: An Alternative Module Implementation + +Historically, the module command on our HPC systems has been provided by +the rather dated *Environment Modules* software which was first +introduced in 1991. As of late 2016, we also offer the new and improved +[LMOD](https://www.tacc.utexas.edu/research-development/tacc-projects/lmod) +as an alternative. It has a handful of advantages over the old Modules +implementation: + +- all modulefiles are cached, which especially speeds up tab + completion with bash +- sane version ordering (9.0 \< 10.0) +- advanced version requirement functions (atleast, between, latest) +- auto-swapping of modules (if a different version was already loaded) +- save/auto-restore of loaded module sets (module save) +- multiple language support +- properties, hooks, ... +- depends_on() function for automatic dependency resolution with + reference counting + +### Module Environments + +On Taurus, there exist different module environments, each containing a +set of software modules. They are activated via the meta module +**modenv** which has different versions, one of which is loaded by +default. You can switch between them by simply loading the desired +modenv-version, e.g.: + + module load modenv/ml + +| | | | +|--------------|------------------------------------------------------------------------|---------| +| modenv/scs5 | SCS5 software | default | +| modenv/ml | HPC-DA software (for use on the "ml" partition) | | +| modenv/hiera | Hierarchical module tree (for use on the "romeo" and "gpu3" partition) | | + +The old modules (pre-SCS5) are still available after loading +**modenv**/**classic**, however, due to changes in the libraries of the +operating system, it is not guaranteed that they still work under SCS5. +Please don't use modenv/classic if you do not absolutely have to. Most +software is available under modenv/scs5, too, just be aware of the +possibly different spelling (case-sensitivity). + +You can use **module spider \<modname>** to search for a specific +software in all modenv environments. It will also display information on +how to load a found module when giving a precise module (with version) +as the parameter. + +Also see the information under \<a href="SCS5Software" +title="SCS5Software">SCS5Software\</a>. + +### Per-Architecture Builds + +Since we have a heterogenous cluster, we do individual builds of some of +the software for each architecture present. This ensures that, no matter +what partition the software runs on, a build optimized for the host +architecture is used automatically. This is achieved by having +`/sw/installed` symlinked to different directories on the compute nodes. + +However, not every module will be available for each node type or +partition. Especially when introducing new hardware to the cluster, we +do not want to rebuild all of the older module versions and in some +cases cannot fall-back to a more generic build either. That's why we +provide the script: **ml_arch_avail** that displays the availability of +modules for the different node architectures. + +E.g.: + + $ ml_arch_avail CP2K + CP2K/6.1-foss-2019a: haswell, rome + CP2K/5.1-intel-2018a: sandy, haswell + CP2K/6.1-foss-2019a-spglib: haswell, rome + CP2K/6.1-intel-2018a: sandy, haswell + CP2K/6.1-intel-2018a-spglib: haswell + +shows all modules that match on CP2K, and their respective availability. +Note that this will not work for meta-modules that do not have an +installation directory (like some toolchain modules). + +#AnchorPrivateModule + +### Private User Module Files + +Private module files allow you to load your own installed software into +your environment and to handle different versions without getting into +conflicts. + +You only have to call `module use <path to your module files>`, which +adds your directory to the list of module directories that are searched +by the `module` command. Within the privatemodules directory you can add +directories for each software you wish to install and add - also in this +directory - a module file for each version you have installed. Further +information about modules can be found at <https://lmod.readthedocs.io> +. + +This is an example of a private module file: + + dolescha@venus:~/module use $HOME/privatemodules + + dolescha@venus:~/privatemodules> ls + null testsoftware + + dolescha@venus:~/privatemodules/testsoftware> ls + 1.0 + + dolescha@venus:~> module av + ------------------------------- /work/home0/dolescha/privatemodules --------------------------- + null testsoftware/1.0 + + dolescha@venus:~> module load testsoftware + Load testsoftware version 1.0 + + dolescha@venus:~/privatemodules/testsoftware> cat 1.0 + #%Module###################################################################### + ## + ## testsoftware modulefile + ## + proc ModulesHelp { } { + puts stderr "Loads testsoftware" + } + + set version 1.0 + set arch x86_64 + set path /home/<user>/opt/testsoftware/$version/$arch/ + + prepend-path PATH $path/bin + prepend-path LD_LIBRARY_PATH $path/lib + + if [ module-info mode load ] { + puts stderr "Load testsoftware version $version" + } + +### Private Project Module Files + +Private module files allow you to load your group-wide installed +software into your environment and to handle different versions without +getting into conflicts. + +The module files have to be stored in your global projects directory, +e.g. `/projects/p_projectname/privatemodules`. An example for a module +file can be found in the section above. + +To use a project-wide module file you have to add the path to the module +file to the module environment with following command +`module use /projects/p_projectname/privatemodules`. + +After that, the modules are available in your module environment and you +can load the modules with `module load` . + +## Misc + +An automated [backup](FileSystems#AnchorBackup) system provides security +for the HOME-directories on `Taurus` and `Venus` on a daily basis. This +is the reason why we urge our users to store (large) temporary data +(like checkpoint files) on the /scratch -Filesystem or at local scratch +disks. + +`Please note`: We have set `ulimit -c 0` as a default to prevent you +from filling the disk with the dump of a crashed program. `bash` -users +can use `ulimit -Sc unlimited` to enable the debugging via analyzing the +core file ( limit coredumpsize unlimited for tcsh ). diff --git a/twiki2md/root/WebHome/SCS5Software.md b/twiki2md/root/WebHome/SCS5Software.md new file mode 100644 index 000000000..6d6032084 --- /dev/null +++ b/twiki2md/root/WebHome/SCS5Software.md @@ -0,0 +1,161 @@ +# SCS5 Migration Hints + +Bull's new cluster software is called SCS 5 (*Super Computing Suite*). +Here are the major changes from the user's perspective: + +| software | old | new | +|:--------------------------------|:-------|:---------| +| Red Hat Enterprise Linux (RHEL) | 6.x | 7.x | +| Linux kernel | 2.26 | 3.10 | +| glibc | 2.12 | 2.17 | +| Infiniband stack | OpenIB | Mellanox | +| Lustre client | 2.5 | 2.10 | + +## Host Keys + +Due to the new operating system, the host keys of the login nodes have +also changed. If you have logged into tauruslogin6 before and still have +the old one saved in your `known_hosts` file, just remove it and accept +the new one after comparing its fingerprint with those listed under +[Login](Login#tableLogin2). + +## Using software modules + +Starting with SCS5, we only provide +[Lmod](RuntimeEnvironment#Lmod:_An_Alternative_Module_Implementation) as +the environment module tool of choice. + +As usual, you can get a list of the available software modules via: + + module available + # or short: + ml av + +There is a special module that is always loaded (sticky) called +**modenv**. It determines the module environment you can see. + +| | | | +|----------------|-------------------------------------------------|---------| +| modenv/scs5 | SCS5 software | default | +| modenv/ml | HPC-DA software (for use on the "ml" partition) | | +| modenv/classic | Manually built pre-SCS5 (AE4.0) software | hidden | + +The old modules (pre-SCS5) are still available after loading the +corresponding **modenv** version (**classic**), however, due to changes +in the libraries of the operating system, it is not guaranteed that they +still work under SCS5. That's why those modenv versions are hidden. + +Example: + + $ ml modenv/classic ansys/19.0 + + The following have been reloaded with a version change: + 1) modenv/scs5 => modenv/classic + + Module ansys/19.0 loaded. + +**modenv/scs5** will be loaded by default and contains all the software +that was built especially for SCS5. + +### Which modules should I use? + +If possible, please use the modules from **modenv/scs5**. In case there +is a certain software missing, you can write an email to +<hpcsupport@zih.tu-dresden.de> and we will try to install the latest +version of this particular software for you. + +However, if you still need *older* versions of some software, you have +to resort to using the modules in the old module environment +(**modenv/classic** most probably). We won't keep those around forever +though, so in the long-term, it is advisable to migrate your workflow to +up-to-date versions of the software used. + +### Compilers, MPI-Libraries and Toolchains + +Since we are mainly using EasyBuild to install software now, we are +following their toolchain schemes: +<http://easybuild.readthedocs.io/en/latest/Common-toolchains.html> + +We mostly install software using the "intel" toolchain, because in most +cases, the resulting code performs best on our Intel-based +architectures. There are alternatives like GCC (foss), PGI or Clang/LLVM +though. + +Generally speaking, the toolchains in this new environment are separated +into more parts (modules) than you will be used to, coming from +modenv/classic. A full toolchain, like "intel", "foss" or "iomkl" +consists of several sub-modules making up the layers of + +- compilers +- MPI library +- math library (providing BLAS/LAPACK/FFT routines etc.) + +For instance, the "intel" toolchain has the following structure: + +| | | +|--------------|------------| +| toolchain | intel | +| compilers | icc, ifort | +| mpi library | impi | +| math library | imkl | + +On the other hand, the "foss" toolchain looks like this: + +| | | +|----------------|---------------------| +| toolchain | foss | +| compilers | GCC (gcc, gfortran) | +| mpi library | OpenMPI | +| math libraries | OpenBLAS, FFTW | + +If you want to combine the Intel compilers and MKL with OpenMPI, you'd +have to use the "iomkl" toolchain: + +| | | +|--------------|------------| +| toolchain | iomkl | +| compilers | icc, ifort | +| mpi library | OpenMPI | +| math library | imkl | + +There are also subtoolchains that skip a layer or two, e.g. "iccifort" +only consists of the respective compilers, same as "GCC". Then there is +"iompi" that includes Intel compilers+OpenMPI but no math library, etc. + +#### What is this "GCCcore" I keep seeing and how does it relate to "GCC"? + +GCCcore includes only the compilers/standard libraries of the GNU +compiler collection but without "binutils". It is used as a dependency +for many modules without getting in the way, e.g. the Intel compilers +also rely on libstdc++ from GCC, but you don't want to load two compiler +modules at the same time, so "intel" also depends on "GCCcore". You can +think of it as more of a runtime dependency rather than a full-fledged +compiler toolchain. If you want to compile your own code with the GNU +compilers, you have to load the module: "**GCC"** instead, "GCCcore" +won't be enough. + +There are [ongoing +discussions](https://github.com/easybuilders/easybuild-easyconfigs/issues/6366) +in the EasyBuild community to maybe change this in the future in order +to avoid the potential confusion this GCCcore module brings with it. + +#### I have been using "bullxmpi" so far, where can I find it? + +bullxmpi was more or less a rebranded OpenMPI 1.6 with some additions +from Bull. It is not supported anymore and Bull has abandoned it in +favor of a standard OpenMPI 2.0.2 build as their default in SCS5. You +should migrate your code to our OpenMPI module or maybe even try Intel +MPI instead. + +#### Where have the analysis tools from Intel Parallel Studio XE gone? + +Since "intel" is only a toolchain module now, it does not include the +entire Parallel Studio anymore. Tools like the Intel Advisor, Inspector, +Trace Analyzer or VTune Amplifier are available as separate modules now: + +| product | module | +|:----------------------|:----------| +| Intel Advisor | Advisor | +| Intel Inspector | Inspector | +| Intel Trace Analyzer | itac | +| Intel VTune Amplifier | VTune | diff --git a/twiki2md/root/WebHome/Slurmfeatures.md b/twiki2md/root/WebHome/Slurmfeatures.md new file mode 100644 index 000000000..419b1dea8 --- /dev/null +++ b/twiki2md/root/WebHome/Slurmfeatures.md @@ -0,0 +1,40 @@ +## Node features for selective job submission + +The nodes in our HPC system are becoming more diverse in multiple +aspects: hardware, mounted storage, software. The system administrators +can decribe the set of properties and it is up to the user to specify +her/his requirements. These features should be thought of as changing +over time (e.g. a file system get stuck on a certain node). + +A feature can be used with the Slurm option `--constrain` or `-C` like +"\<span>srun -C fs_lustre_scratch2 ...\</span>" with `srun` or `sbatch`. +Combinations like\<span class="WYSIWYG_TT"> +--constraint="fs_beegfs_global0&DA"\</span> are allowed. For a detailed +description of the possible constraints, please refer to the Slurm +documentation (<https://slurm.schedmd.com/srun.html>). + +**Remark:** A feature is checked only for scheduling. Running jobs are +not affected by changing features. + +### Available features on Taurus + +| feature | description | +|:--------|:-------------------------------------------------------------------------| +| DA | subset of Haswell nodes with a high bandwidth to NVMe storage (island 6) | + +#### File system features + +A feature \<span class="WYSIWYG_TT">fs\_\* \</span>is active if a +certain file system is mounted and available on a node. Access to these +file systems are tested every few minutes on each node and the Slurm +features set accordingly. + +| feature | description | +|:-------------------|:---------------------------------------------------------------------| +| fs_lustre_scratch2 | /scratch mounted read-write (the OS mount point is /lustre/scratch2) | +| fs_lustre_ssd | /lustre/ssd mounted read-write | +| fs_warm_archive_ws | /warm_archive/ws mounted read-only | +| fs_beegfs_global0 | /beegfs/global0 mounted read-write | + +For certain projects, specific file systems are provided. For those, +additional features are available, like `fs_beegfs_<projectname>`. diff --git a/twiki2md/root/WebHome/SoftwareDevelopment.md b/twiki2md/root/WebHome/SoftwareDevelopment.md new file mode 100644 index 000000000..6e45f442e --- /dev/null +++ b/twiki2md/root/WebHome/SoftwareDevelopment.md @@ -0,0 +1,59 @@ +# Software Development at HPC systems + +This section should provide you with the basic knowledge and tools to +get you out of trouble. It will tell you: + +- How to compile your code +- Using mathematical libraries +- Find caveats and hidden errors in application codes +- Handle debuggers +- Follow system calls and interrupts +- Understand the relationship between correct code and performance + +Some hints that are helpful: + +- Stick to standards wherever possible, e.g. use the **`-std`** flag + for GNU and Intel C/C++ compilers. Computers are short living + creatures, migrating between platforms can be painful. In addition, + running your code on different platforms greatly increases the + reliably. You will find many bugs on one platform that never will be + revealed on another. +- Before and during performance tuning: Make sure that your code + delivers the correct results. + +Some questions you should ask yourself: + +- Given that a code is parallel, are the results independent from the + numbers of threads or processes? +- Have you ever run your Fortran code with array bound and subroutine + argument checking (the **`-check all`** and **`-traceback`** flags + for the Intel compilers)? +- Have you checked that your code is not causing floating point + exceptions? +- Does your code work with a different link order of objects? +- Have you made any assumptions regarding storage of data objects in + memory? + +Subsections: + +- [Compilers](Compilers) +- [Debugging Tools](Debugging Tools) + - [Debuggers](Debuggers) (GDB, Allinea DDT, Totalview) + - [Tools to detect MPI usage errors](MPIUsageErrorDetection) + (MUST) +- [Performance Tools](Performance Tools) (Score-P, Vampir, performance + counters, etc.) +- [Libraries](Libraries) +- [Miscellaneous](Miscellaneous) + +Intel Tools Seminar \[Oct. 2013\] + +- [TU-Dresden_Intel_Multithreading_Methodologies.pdf](%ATTACHURL%/TU-Dresden_Intel_Multithreading_Methodologies.pdf): + Intel Multithreading Methodologies +- [TU-Dresden_Advisor_XE.pdf](%ATTACHURL%/TU-Dresden_Advisor_XE.pdf): + Intel Advisor XE - Threading prototyping tool for software + architects +- [TU-Dresden_Inspector_XE.pdf](%ATTACHURL%/TU-Dresden_Inspector_XE.pdf): + Inspector XE - Memory-, Thread-, Pointer-Checker, Debugger +- [TU-Dresden_Intel_Composer_XE.pdf](%ATTACHURL%/TU-Dresden_Intel_Composer_XE.pdf): + Intel Composer - Compilers, Libraries diff --git a/twiki2md/root/WebHome/StepByStepTaurus.md b/twiki2md/root/WebHome/StepByStepTaurus.md new file mode 100644 index 000000000..03aa8538a --- /dev/null +++ b/twiki2md/root/WebHome/StepByStepTaurus.md @@ -0,0 +1,10 @@ +# Step by step examples for working on Taurus + +(in development) + +- From Windows: + [login](Login#Prerequisites_for_Access_to_a_Linux_Cluster_From_a_Windows_Workstation) + and file transfer +- Short introductionary presentation on the module an job system on + taurus with focus on AI/ML: [Using taurus for + AI](%ATTACHURL%/Scads_-_Using_taurus_for_AI.pdf) diff --git a/twiki2md/root/WebHome/SystemAltix.md b/twiki2md/root/WebHome/SystemAltix.md new file mode 100644 index 000000000..6f26ccc8c --- /dev/null +++ b/twiki2md/root/WebHome/SystemAltix.md @@ -0,0 +1,74 @@ +# SGI Altix + +**`%RED%This page is deprecated! The SGI Atlix is a former system! [[Compendium.Hardware][(Current hardware)]]%ENDCOLOR%`** + +The SGI Altix is shared memory system for large parallel jobs using up +to 2000 cores in parallel ( [information on the +hardware](HardwareAltix)). It's partitions are Mars (login), Jupiter, +Saturn, Uranus, and Neptun (interactive). + +## Compiling Parallel Applications + +This installation of the Message Passing Interface supports the MPI 1.2 +standard with a few MPI-2 features (see `man mpi` ). There is no command +like `mpicc`, instead you just have to use the normal compiler (e.g. +`icc`, `icpc`, or `ifort`) and append `-lmpi` to the linker command +line. Since the include files as well as the library are in standard +directories there is no need to append additional library- or +include-paths. + +- Note for C++ programmers: You need to link with + `-lmpi++abi1002 -lmpi` instead of `-lmpi`. +- Note for Fortran programmers: The MPI module is only provided for + the Intel compiler and does not work with gfortran. + +Please follow these following guidelines to run your parallel program +using the batch system on Mars. + +## Batch system + +Applications on an HPC system can not be run on the login node. They +have to be submitted to compute nodes with dedicated resources for the +user's job. Normally a job can be submitted with these data: + +- number of CPU cores, +- requested CPU cores have to belong on one node (OpenMP programs) or + can distributed (MPI), +- memory per process, +- maximum wall clock time (after reaching this limit the process is + killed automatically), +- files for redirection of output and error messages, +- executable and command line parameters. + +### LSF + +The batch sytem on Atlas is LSF. For general information on LSF, please +follow [this link](PlatformLSF). + +### Submission of Parallel Jobs + +The MPI library running on the Altix is provided by SGI and highly +optimized for the ccNUMA architecture of this machine. However, +communication within a partition is faster than across partitions. Take +this into consideration when you submit your job. + +Single-partition jobs can be started like this: + + <span class='WYSIWYG_HIDDENWHITESPACE'> </span>bsub -R "span[hosts=1]" -n 16 mpirun -np 16 a.out<span class='WYSIWYG_HIDDENWHITESPACE'> </span> + +Really large jobs with over 256 CPUs might run over multiple partitions. +Cross-partition jobs can be submitted via PAM like this + + <span class='WYSIWYG_HIDDENWHITESPACE'> </span>bsub -n 1024 pamrun a.out<span class='WYSIWYG_HIDDENWHITESPACE'> </span> + +### Batch Queues + +| Batch Queue | Admitted Users | Available CPUs | Default Runtime | Max. Runtime | +|:---------------|:-----------------|:--------------------|:----------------|:-------------| +| `interactive` | `all` | `min. 1, max. 32` | `12h` | `12h` | +| `small` | `all` | `min. 1, max. 63` | `12h` | `120h` | +| `intermediate` | `all` | `min. 64, max. 255` | `12h` | `120h` | +| `large` | `all` | `min.256, max.1866` | `12h` | `24h` | +| `ilr` | `selected users` | `min. 1, max. 768` | `12h` | `24h` | + +-- Main.UlfMarkwardt - 2013-02-27 diff --git a/twiki2md/root/WebHome/SystemTaurus.md b/twiki2md/root/WebHome/SystemTaurus.md new file mode 100644 index 000000000..1d9c9347f --- /dev/null +++ b/twiki2md/root/WebHome/SystemTaurus.md @@ -0,0 +1,216 @@ +# Taurus + + + +## Information about the Hardware + +[Detailed information on the current HPC hardware can be found +here.](HardwareTaurus) + +## Applying for Access to the System + +Project and login application forms for taurus are available +[here](Access). + +## Login to the System + +Login to the system is available via ssh at taurus.hrsk.tu-dresden.de. +There are several login nodes (internally called tauruslogin3 to +tauruslogin6). Currently, if you use taurus.hrsk.tu-dresden.de, you will +be placed on tauruslogin5. It might be a good idea to give the other +login nodes a try if the load on tauruslogin5 is rather high (there will +once again be load balancer soon, but at the moment, there is none). + +Please note that if you store data on the local disk (e.g. under /tmp), +it will be on only one of the three nodes. If you relogin and the data +is not there, you are probably on another node. + +You can find an list of fingerprints [here](Login#SSH_access). + +## Transferring Data from/to Taurus + +taurus has two specialized data transfer nodes. Both nodes are +accessible via `taurusexport.hrsk.tu-dresden.de`. Currently, only rsync, +scp and sftp to these nodes will work. A login via SSH is not possible +as these nodes are dedicated to data transfers. + +These nodes are located behind a firewall. By default, they are only +accessible from IP addresses from with the Campus of the TU Dresden. +External IP addresses can be enabled upon request. These requests should +be send via eMail to `servicedesk@tu-dresden.de` and mention the IP +address range (or node names), the desired protocol and the time frame +that the firewall needs to be open. + +We are open to discuss options to export the data in the scratch file +system via CIFS or other protocols. If you have a need for this, please +contact the Service Desk as well. + +**Phase 2:** The nodes taurusexport\[3,4\] provide access to the +`/scratch` file system of the second phase. + +You can find an list of fingerprints [here](Login#SSH_access). + +## Compiling Parallel Applications + +You have to explicitly load a compiler module and an MPI module on +Taurus. Eg. with `module load GCC OpenMPI`. ( [read more about +Modules](Compendium.RuntimeEnvironment), [read more about +Compilers](Compendium.Compilers)) + +Use the wrapper commands like e.g. `mpicc` (`mpiicc` for intel), +`mpicxx` (`mpiicpc`) or `mpif90` (`mpiifort`) to compile MPI source +code. To reveal the command lines behind the wrappers, use the option +`-show`. + +For running your code, you have to load the same compiler and MPI module +as for compiling the program. Please follow the following guiedlines to +run your parallel program using the batch system. + +## Batch System + +Applications on an HPC system can not be run on the login node. They +have to be submitted to compute nodes with dedicated resources for the +user's job. Normally a job can be submitted with these data: + +- number of CPU cores, +- requested CPU cores have to belong on one node (OpenMP programs) or + can distributed (MPI), +- memory per process, +- maximum wall clock time (after reaching this limit the process is + killed automatically), +- files for redirection of output and error messages, +- executable and command line parameters. + +The batch system on Taurus is Slurm. If you are migrating from LSF +(deimos, mars, atlas), the biggest difference is that Slurm has no +notion of batch queues any more. + +- [General information on the Slurm batch system](Slurm) +- Slurm also provides process-level and node-level [profiling of + jobs](Slurm#Job_Profiling) + +### Partitions + +Please note that the islands are also present as partitions for the +batch systems. They are called + +- romeo (Island 7 - AMD Rome CPUs) +- julia (large SMP machine) +- haswell (Islands 4 to 6 - Haswell CPUs) +- gpu (Island 2 - GPUs) + - gpu2 (K80X) +- smp2 (SMP Nodes) + +**Note:** usually you don't have to specify a partition explicitly with +the parameter -p, because SLURM will automatically select a suitable +partition depending on your memory and gres requirements. + +### Run-time Limits + +**Run-time limits are enforced**. This means, a job will be canceled as +soon as it exceeds its requested limit. At Taurus, the maximum run time +is 7 days. + +Shorter jobs come with multiple advantages:\<img alt="part.png" +height="117" src="%ATTACHURL%/part.png" style="float: right;" +title="part.png" width="284" /> + +- lower risk of loss of computing time, +- shorter waiting time for reservations, +- higher job fluctuation; thus, jobs with high priorities may start + faster. + +To bring down the percentage of long running jobs we restrict the number +of cores with jobs longer than 2 days to approximately 50% and with jobs +longer than 24 to 75% of the total number of cores. (These numbers are +subject to changes.) As best practice we advise a run time of about 8h. + +Please always try to make a good estimation of your needed time limit. +For this, you can use a command line like this to compare the requested +timelimit with the elapsed time for your completed jobs that started +after a given date: + + sacct -X -S 2021-01-01 -E now --format=start,JobID,jobname,elapsed,timelimit -s COMPLETED + +Instead of running one long job, you should split it up into a chain +job. Even applications that are not capable of chreckpoint/restart can +be adapted. The HOWTO can be found [here](CheckpointRestart), + +### Memory Limits + +**Memory limits are enforced.** This means that jobs which exceed their +per-node memory limit will be killed automatically by the batch system. +Memory requirements for your job can be specified via the *sbatch/srun* +parameters: **--mem-per-cpu=\<MB>** or **--mem=\<MB>** (which is "memory +per node"). The **default limit** is **300 MB** per cpu. + +Taurus has sets of nodes with a different amount of installed memory +which affect where your job may be run. To achieve the shortest possible +waiting time for your jobs, you should be aware of the limits shown in +the following table. + +| Partition | Nodes | # Nodes | Cores per Node | Avail. Memory per Core | Avail. Memory per Node | GPUs per node | +|:-------------------|:-----------------------------------------|:--------|:----------------|:-----------------------|:-----------------------|:------------------| +| `haswell64` | `taurusi[4001-4104,5001-5612,6001-6612]` | `1328` | `24` | `2541 MB` | `61000 MB` | `-` | +| `haswell128` | `taurusi[4105-4188]` | `84` | `24` | `5250 MB` | `126000 MB` | `-` | +| `haswell256` | `taurusi[4189-4232]` | `44` | `24` | `10583 MB` | `254000 MB` | `-` | +| `broadwell` | `taurusi[4233-4264]` | `32` | `28` | `2214 MB` | `62000 MB` | `-` | +| `smp2` | `taurussmp[3-7]` | `5` | `56` | `36500 MB` | `2044000 MB` | `-` | +| `gpu2` | `taurusi[2045-2106]` | `62` | `24` | `2583 MB` | `62000 MB` | `4 (2 dual GPUs)` | +| `gpu2-interactive` | `taurusi[2045-2108]` | `64` | `24` | `2583 MB` | `62000 MB` | `4 (2 dual GPUs)` | +| `hpdlf` | `taurusa[3-16]` | `14` | `12` | `7916 MB` | `95000 MB` | `3` | +| `ml` | `taurusml[1-32]` | `32` | `44 (HT: 176)` | `1443 MB*` | `254000 MB` | `6` | +| `romeo` | `taurusi[7001-7192]` | `192` | `128 (HT: 256)` | `1972 MB*` | `505000 MB` | `-` | +| `julia` | `taurussmp8` | `1` | `896` | `27343 MB*` | `49000000 MB` | `-` | + +\* note that the ML nodes have 4way-SMT, so for every physical core +allocated (e.g., with SLURM_HINT=nomultithread), you will always get +4\*1443MB because the memory of the other threads is allocated +implicitly, too. + +### Submission of Parallel Jobs + +To run MPI jobs ensure that the same MPI module is loaded as during +compile-time. In doubt, check you loaded modules with `module list`. If +your code has been compiled with the standard `bullxmpi` installation, +you can load the module via `module load bullxmpi`. Alternative MPI +libraries (`intelmpi`, `openmpi`) are also available. + +Please pay attention to the messages you get loading the module. They +are more up-to-date than this manual. + +## + +## GPUs + +Island 2 of taurus contains a total of 128 NVIDIA Tesla K80 (dual) GPUs +in 64 nodes. + +More information on how to program applications for GPUs can be found +[GPU Programming](GPU Programming). + +The following software modules on taurus offer GPU support: + +- `CUDA` : The NVIDIA CUDA compilers +- `PGI` : The PGI compilers with OpenACC support + +## Hardware for Deep Learning (HPDLF) + +The partition hpdlf contains 14 servers. Each of them has: + +- 2 sockets CPU E5-2603 v4 (1.70GHz) with 6 cores each, +- 3 consumer GPU cards NVIDIA GTX1080, +- 96 GB RAM. + +## Energy Measurement + +Taurus contains sophisticated energy measurement instrumentation. +Especially HDEEM is available on the haswell nodes of Phase II. More +detailed information can be found at +[EnergyMeasurement](EnergyMeasurement). + +## Low level optimizations + +x86 processsors provide registers that can be used for optimizations and +performance monitoring. Taurus provides you access to such features via +the [X86Adapt](X86Adapt) software infrastructure. diff --git a/twiki2md/root/WebHome/SystemVenus.md b/twiki2md/root/WebHome/SystemVenus.md new file mode 100644 index 000000000..f8b7d14cc --- /dev/null +++ b/twiki2md/root/WebHome/SystemVenus.md @@ -0,0 +1,86 @@ +# Venus + + + +## Information about the hardware + +Detailed information on the currect HPC hardware can be found +[here.](HardwareVenus) + +## Applying for Access to the System + +Project and login application forms for taurus are available +[here](Access). + +## Login to the System + +Login to the system is available via ssh at `venus.hrsk.tu-dresden.de`. + +The RSA fingerprints of the Phase 2 Login nodes are: + + MD5:63:65:c6:d6:4e:5e:03:9e:07:9e:70:d1:bc:b4:94:64 + +and + + SHA256:Qq1OrgSCTzgziKoop3a/pyVcypxRfPcZT7oUQ3V7E0E + +You can find an list of fingerprints [here](Login#SSH_access). + +## MPI + +The installation of the Message Passing Interface on Venus (SGI MPT) +supports the MPI 2.2 standard (see `man mpi` ). There is no command like +`mpicc`, instead you just have to use the "serial" compiler (e.g. `icc`, +`icpc`, or `ifort`) and append `-lmpi` to the linker command line. + +Example: + + <span class='WYSIWYG_HIDDENWHITESPACE'> </span>% icc -o myprog -g -O2 -xHost myprog.c -lmpi<span class='WYSIWYG_HIDDENWHITESPACE'> </span> + +Notes: + +- C++ programmers: You need to link with both libraries: + `-lmpi++ -lmpi`. +- Fortran programmers: The MPI module is only provided for the Intel + compiler and does not work with gfortran. + +Please follow the following guidelines to run your parallel program +using the batch system on Venus. + +## Batch system + +Applications on an HPC system can not be run on the login node. They +have to be submitted to compute nodes with dedicated resources for the +user's job. Normally a job can be submitted with these data: + +- number of CPU cores, +- requested CPU cores have to belong on one node (OpenMP programs) or + can distributed (MPI), +- memory per process, +- maximum wall clock time (after reaching this limit the process is + killed automatically), +- files for redirection of output and error messages, +- executable and command line parameters. + +The batch sytem on Venus is Slurm. For general information on Slurm, +please follow [this link](Slurm). + +### Submission of Parallel Jobs + +The MPI library running on the UV is provided by SGI and highly +optimized for the ccNUMA architecture of this machine. + +On Venus, you can only submit jobs with a core number which is a +multiple of 8 (a whole CPU chip and 128 GB RAM). Parallel jobs can be +started like this: + + <span class='WYSIWYG_HIDDENWHITESPACE'> </span>srun -n 16 a.out<span class='WYSIWYG_HIDDENWHITESPACE'> </span> + +**Please note:** There are different MPI libraries on Taurus and Venus, +so you have to compile the binaries specifically for their target. + +### File Systems + +- The large main memory on the system allows users to create ramdisks + within their own jobs. The documentation on how to use these + ramdisks can be found [here](RamDiskDocumentation). diff --git a/twiki2md/root/WebHome/TaurusII.md b/twiki2md/root/WebHome/TaurusII.md new file mode 100644 index 000000000..1517542e7 --- /dev/null +++ b/twiki2md/root/WebHome/TaurusII.md @@ -0,0 +1,31 @@ +# Taurus phase II - user testing + +With the installation of the second phase of Taurus we now have the full +capacity of the system. Until the merger in September, both phases work +like isolated HPC systems. Both machines share their accounting data, so +that all projects can seamlessly migrate to the new system. + +In September we will shut down the phase 1 nodes, their hardware will be +updated, and they will be merged with phase 2. + +Basic information for Taurus, phase 2: + +- Please use the login nodes\<span class="WYSIWYG_TT"> + tauruslogin\[3-5\].hrsk.tu-dresden.de\</span> for the new system. +- We have mounted the same file systems like on our other HPC systems: + - /home/ + - /projects/ + - /sw + - Taurus phase 2 has it's own /scratch file system (capacity 2.5 + PB). +- All nodes have 24 cores. +- Memory capacity is 64/128/256 GB per node. The batch system handles + your requests like in phase 1. We have other memory-per-core limits! +- Our 64 GPU nodes now have 2 cards with 2 GPUs, each. + +For more details, please refer to our updated +[documentation](SystemTaurus). + +Thank you for testing the system with us! + +Ulf Markwardt diff --git a/twiki2md/root/WebHome/Test.md b/twiki2md/root/WebHome/Test.md new file mode 100644 index 000000000..a1b5589b3 --- /dev/null +++ b/twiki2md/root/WebHome/Test.md @@ -0,0 +1,2 @@ +<span class="twiki-macro TREE" web="Compendium" +formatting="ullist"></span> -- Main.MatthiasKraeusslein - 2021-05-10 diff --git a/twiki2md/root/WebHome/TypicalProjectSchedule.md b/twiki2md/root/WebHome/TypicalProjectSchedule.md new file mode 100644 index 000000000..c7b404ea8 --- /dev/null +++ b/twiki2md/root/WebHome/TypicalProjectSchedule.md @@ -0,0 +1,546 @@ +# Typical project schedule + + + +## \<span style="font-size: 1em;">0. Application for HPC login\</span> + +In order to use the HPC systems installed at ZIH, a project application +form has to be filled in. The HPC project manager should hold a +professorship (university) or head a research group. You may also apply +for the "Schnupperaccount" (trial account) for one year. Check the +[Access](Access) page for details. + +## \<span style="font-size: 1em;">1. Request for resources\</span> + +Important note: Taurus is based on the Linux system. Thus for the +effective work, you should have to know how to work with +[Linux](https://en.wikipedia.org/wiki/Linux) based systems and [Linux +Shell](https://ubuntu.com/tutorials/command-line-for-beginners#1-overview). +Beginners can find a lot of different tutorials on the internet, [for +example](https://swcarpentry.github.io/shell-novice/). + +### \<span style="font-size: 1em;">1.1 How do I determine the required CPU / GPU hours?\</span> + +Taurus is focused on data-intensive computing. The cluster is oriented +on the work with the high parallel code. Please keep it in mind for the +transfer sequential code from a local machine. So far you will have +execution time for the sequential program it is reasonable to use +[Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law) to roughly +predict execution time in parallel. Think in advance about the +parallelization strategy for your project. + +### \<span style="font-size: 1em;">1.2 What software do I need? What is already available (in the correct version)?\</span> + +The good practice for the HPC clusters is use software and packages +where parallelization is possible. The open-source software is more +preferable than proprietary. However, the majority of popular +programming languages, scientific applications, software, packages +available or could be installed on Taurus in different ways. First of +all, check the [Software module list](SoftwareModulesList). There are +two different software environments: **scs5** (the regular one) and +**ml** (environment for the Machine Learning partition). Keep in mind +that Taurus has a Linux based operating system. + +## 2. Access to the cluster + +### SSH access + +%RED%Important note:%ENDCOLOR%\<span style="font-size: 1em;"> ssh to +Taurus is only possible from \</span> **inside** \<span +style="font-size: 1em;"> TU Dresden Campus. Users from outside should +use \</span> **VPN** \<span style="font-size: 1em;"> (see \</span>\<a +href="<https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn>" +target="\_top">here\</a>\<span style="font-size: 1em;">).\</span> + +The recommended way to connect to the HPC login servers directly via +ssh: + + ssh <zih-login>@taurus.hrsk.tu-dresden.de + +Please put this command in the terminal and replace \<zih-login> with +your login that you received during the access procedure. Accept the +host verifying and enter your password. You will be loaded by login +nodes in your Taurus home directory. + +This method requires two conditions: Linux OS, workstation within the +campus network. For other options and details check the \<a href="Login" +target="\_blank">Login page\</a>. + +Useful links: [Access](Access), [Project Request +Form](ProjectRequestForm), [Terms Of Use](TermsOfUse) + +## 3. Available software, use of the software + +According to 1.2, first of all, check the [Software module +list](SoftwareModulesList). Keep in mind that there are two different +environments: **scs5** (for the x86 architecture) and **ml** +(environment for the Machine Learning partition based on the Power9 +architecture). + +\<span style="font-size: 1em;">Work with the software on Taurus could be +started only after allocating the resources by \</span> [batch +systems](BatchSystems)\<span style="font-size: 1em;">. By default, you +are in the login nodes. They are not specified for the work, only for +the login. Allocating resources will be done by batch system \</span> +[SLURM](Slurm). + +There are a lot of different possibilities to work with software on +Taurus: + +**a.** **Modules** + +\<span style="font-size: 1em;">The easiest way to start working with +software is using the \</span>\<a +href="RuntimeEnvironment#Module_Environments" target="\_blank">Modules +system\</a>\<span style="font-size: 1em;">. Modules are a way to use +frameworks, compilers, loader, libraries, and utilities. The module is a +user interface that provides utilities for the dynamic modification of a +user's module environment without manual modifications. You could use +them for **srun**, bath jobs (**sbatch**) and the Jupyterhub.\</span> + +**b. JupyterNotebook** + +The Jupyter Notebook is an open-source web application that allows +creating documents containing live code, equations, visualizations, and +narrative text. \<span style="font-size: 1em;">There is \</span>\<a +href="JupyterHub" target="\_self">jupyterhub\</a>\<span +style="font-size: 1em;"> on Taurus, where you can simply run your +Jupyter notebook on HPC nodes using modules, preloaded or custom virtual +environments. Moreover, you can run a [manually created remote jupyter +server](DeepLearning#Jupyter_notebook) for more specific cases.\</span> + +**c.** **Containers** + +\<span style="font-size: 1em;">Some tasks require using containers. It +can be done on Taurus by [Singularity](https://sylabs.io/). Details +could be found in the [following +chapter](TypicalProjectSchedule#Use_of_containers).\</span> + +Useful links: [Libraries](Libraries), [Deep Learning](DeepLearning), +[Jupyter Hub](JupyterHub), [Big Data +Frameworks](BigDataFrameworks:ApacheSparkApacheFlinkApacheHadoop), +[R](DataAnalyticsWithR), [Applications for various fields of +science](Applications) + +## 4. Create a project structure. Data management + +Correct organisation of the project structure is a straightforward way +to the efficient work of the whole team. There have to be rules and +regulations for working with the project that every member should +follow. \<span style="font-size: 1em;">The uniformity of the project +could be achieved by using for each member of a team: the same **data +storage** or set of them, the same **set of software** (packages, +libraries etc), **access rights** to project data should be taken into +account and set up correctly. \</span> + +### 4.1 Data storage and management + +#### 4.1.1 Taxonomy of file systems + +\<span style="font-size: 1em;">As soon as you have access to Taurus you +have to manage your data. The main [concept](HPCStorageConcept2019) of +working with data on Taurus is using [Workspaces](WorkSpaces). Use it +properly:\</span> + +- use a `/home` directory for the limited amount of personal data, + simple examples and the results of calculations. The home directory + is not a working directory! However, \<span + class="WYSIWYG_TT">/home\</span> file system is backed up using + snapshots; +- use **workspace** as a place for working data (i.e. datasets); + Recommendations of choosing the correct storage system for workspace + presented below. + +**Recommendations to choose of storage system:**\<span +style`"font-size: 1em;">For data that seldomly changes but consumes a lot of space, the </span> ==warm_archive=` +\<span style="font-size: 1em;"> can be used. (Note that this is +\</span>mounted** read-only**\<span style="font-size: 1em;">on the +compute nodes). For a series of calculations that works on the same data +please use a \</span> **scratch**\<span style="font-size: 1em;">based +workspace. \</span> **SSD** \<span style="font-size: 1em;">, in its +turn, is the fastest available file system made only for large parallel +applications running with millions of small I/O (input, output +operations).\</span>\<span style="font-size: 1em;"> If the batch job +needs a directory for temporary data then +\</span>**\<span>SSD\</span>**\<span style="font-size: 1em;"> is a good +choice as well. The data can be deleted afterwards.\</span> + +Note: Keep in mind that every working space has a storage duration ( +i.e. ssd - 30 days). Thus be careful with the expire date otherwise it +could vanish. The core data of your project should be [backed +up](FileSystems#Backup_and_snapshots_of_the_file_system) and +[archived](PreservationResearchData)(for the most +[important](https://www.dcc.ac.uk/guidance/how-guides/five-steps-decide-what-data-keep) +data). + +#### \<span style="font-size: 1em;">4.1.2 Backup \</span> + +\<span +style`"font-size: 1em;">The backup is a crucial part of any project. Organize it at the beginning of the project. If you will lose/delete your data in the "no back up" file systems it can not be restored! The b</span><span style="font-size: 13px;">ackup on Taurus is </span><b style="font-size: 1em;">only </b><span style="font-size: 13px;">available in the </span> =/home` +\<span style`"font-size: 13px;"> and the </span> =/projects` \<span +style="font-size: 13px;"> file systems! Backed up files could be +restored by the user. Details could be found +[here](FileSystems#Backup_and_snapshots_of_the_file_system).\</span> + +#### 4.1.3 Folder structure and organizing data + +\<span style="font-size: 1em;">Organizing of living data using the file +system helps for consistency and structuredness of the project. +\</span>\<span style="font-size: 1em;">We recommend following the rules +for your work regarding:\</span> + +- Organizing the data: Never change the original data; Automatize the + organizing the data; Clearly separate intermediate and final output + in the filenames; Carry identifier and original name along in your + analysis pipeline; Make outputs clearly identifiable; Document your + analysis steps. +- Naming Data: Keep s\<span style="font-size: 1em;">hort, but + meaningful names; \</span>\<span style="font-size: 1em;">Keep + standard file endings; File names dont replace documentation and + metadata; Use standards of your discipline; \</span>\<span + style="font-size: 1em;">Make rules for your project, document and + keep them (See the \</span> [README + recommendations](TypicalProjectSchedule#README_recommendation) + below) + +\<span style="font-size: 1em;">This is the example of an organisation +(hierarchical) for the folder structure. Use it as a visual illustration +of the above:\</span> + +\<img align="justify" alt="Organizing_Data-using_file_systems.png" +height="161" src="%ATTACHURL%/Organizing_Data-using_file_systems.png" +title="Organizing_Data-using_file_systems.png" width="300" /> + +Keep in mind [input-process-output +pattern](https://en.wikipedia.org/wiki/IPO_model#Programming) for the +work with folder structure. + +#### 4.1.4 README recommendation + +In general, [README](https://en.wikipedia.org/wiki/README) is just +simple general information of software/project that exists in the same +directory/repository of the project. README is used to explain the +details project and the **structure** of the project/folder in a short +way. We recommend using readme as for entire project as for every +important folder in the project. + +Example of the structure for the README:\<br />\<span style="font-size: +1em;">Think first: What is calculated why? (Description); \</span>\<span +style="font-size: 1em;">What is expected? (software and version)\<br +/>\</span>\<span style="font-size: 1em;">Example text file\<br +/>\</span>\<span style="font-size: 1em;"> Title:\<br />\</span>\<span +style="font-size: 1em;"> User:\<br />\</span>\<span style="font-size: +1em;"> Date:\<br />\</span>\<span style="font-size: 1em;"> +Description:\<br />\</span>\<span style="font-size: 1em;"> software:\<br +/>\</span>\<span style="font-size: 1em;"> version:\</span> + +#### 4.1.5 Metadata + +Another important aspect is the +[Metadata](http://dublincore.org/resources/metadata-basics/). It is +sufficient to use +[Metadata](PreservationResearchData#Why_should_I_add_Meta_45Data_to_my_data_63) +for your project on Taurus. [Metadata +standards](https://en.wikipedia.org/wiki/Metadata_standard) will help to +do it easier (i.e. [Dublin core](https://dublincore.org/), +[OME](https://www.openmicroscopy.org/)) + +#### 4.1.6 Data hygiene + +Don't forget about data hygiene: Classify your current data into +critical (need it now), necessary (need it later) or unnecessary +(redundant, trivial or obsolete); Track and classify data throughout its +lifecycle (from creation, storage and use to sharing, archiving and +destruction); Erase the data you dont need throughout its lifecycle. + +### \<span style="font-size: 1em;">4.2 Software packages\</span> + +As was written before the module concept is the basic concept for using +software on Taurus. Uniformity of the project has to be achieved by +using the same set of software on different levels. It could be done by +using environments. There are two types of environments should be +distinguished: runtime environment (the project level, use scripts to +load [modules](RuntimeEnvironment)), Python virtual environment. The +concept of the environment will give an opportunity to use the same +version of the software on every level of the project for every project +member. + +#### Private individual and project modules files + +[Private individual and project module +files](RuntimeEnvironment#Private_Project_Module_Files)\<span +style="font-size: 1em;"> will be discussed in [chapter +](TypicalProjectSchedule#A_7._Use_of_specific_software_40packages_44_libraries_44_etc_41)\</span> +[7](TypicalProjectSchedule#A_7._Use_of_specific_software_40packages_44_libraries_44_etc_41)\<span +style="font-size: 1em;">. Project modules list is a powerful instrument +for effective teamwork.\</span> + +#### Python virtual environment + +If you are working with the Python then it is crucial to use the virtual +environment on Taurus. The main purpose of Python virtual environments +(don't mess with the software environment for modules) is to create an +isolated environment for Python projects (self-contained directory tree +that contains a Python installation for a particular version of Python, +plus a number of additional packages). + +**Vitualenv (venv)** is a standard Python tool to create isolated Python +environments. We recommend using venv to work with Tensorflow and +Pytorch on Taurus. It has been integrated into the standard library +under the \<a href="<https://docs.python.org/3/library/venv.html>" +target="\_blank">venv module\</a>. **Conda** is the second way to use a +virtual environment on the Taurus. \<a +href="<https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>" +target="\_blank">Conda\</a> is an open-source package management system +and environment management system from the Anaconda. + +[Detailed information](Python#Virtual_environment) about using the +virtual environment. + +### \<span style="font-size: 1em;">4.3 Application software availability\</span> + +Software created for the purpose of the project should be available for +all members of the group. The instruction of how to use the software: +installation of packages, compilation etc should be documented and gives +the opportunity to comfort efficient and safe work. + +### 4.4 Access rights + +The concept of **permissions** and **ownership** is crucial in Linux. +See the +[HPC-introduction](%PUBURL%/Compendium/WebHome/HPC-Introduction.pdf?t=1602081321) +slides for the understanding of the main concept. Standard Linux +changing permission command(i.e `chmod`) valid for Taurus as well. The +**group** access level contains members of your project group. Be +careful with 'write' permission and never allow to change the original +data. + +Useful links: [Data Management](DataManagement), [File +Systems](FileSystems), [Get Started with HPC-DA](GetStartedWithHPCDA), +[Project Management](ProjectManagement), [Preservation research +data](PreservationResearchData) + +## 5. Data moving + +### 5.1 Moving data to/from the HPC machines + +To copy data to/from the HPC machines, the Taurus [export +nodes](ExportNodes) should be used as a preferred way. There are three +possibilities to exchanging data between your local machine (lm) and the +HPC machines (hm):\<span> **SCP, RSYNC, SFTP**. \</span>\<span +style`"font-size: 1em;">Type following commands in the terminal of the local machine. The </span> ==SCP=` +\<span style="font-size: 1em;"> command was used for the following +example.\</span> + +#### Copy data from lm to hm + + scp <file> <zih-user>@taurusexport.hrsk.tu-dresden.de:<target-location> #Copy file from your local machine. For example: scp helloworld.txt mustermann@taurusexport.hrsk.tu-dresden.de:/scratch/ws/mastermann-Macine_learning_project/ + + scp -r <directory> <zih-user>@taurusexport.hrsk.tu-dresden.de:<target-location> #Copy directory from your local machine. + +#### Copy data from hm to lm + + scp <zih-user>@taurusexport.hrsk.tu-dresden.de:<file> <target-location> #Copy file. For example: scp mustermann@taurusexport.hrsk.tu-dresden.de:/scratch/ws/mastermann-Macine_learning_project/helloworld.txt /home/mustermann/Downloads + + scp -r <zih-user>@taurusexport.hrsk.tu-dresden.de:<directory> <target-location> #Copy directory + +### 5.2 Moving data inside the HPC machines. Datamover + +The best way to transfer data inside the Taurus is the \<a +href="DataMover" target="\_blank">datamover\</a>. It is the special data +transfer machine provides the best data speed. To load, move, copy etc. +files from one file system to another file system, you have to use +commands with **dt** prefix, such as: **\<span>dtcp, dtwget, dtmv, dtrm, +dtrsync, dttar, dtls. \</span>**\<span style="font-size: 1em;">These +commands submit a job to the data transfer machines that execute the +selected command. Except for the '\</span>\<span style="font-size: +1em;">dt'\</span>\<span style="font-size: 1em;"> prefix, their syntax is +the same as the shell command without the '\</span>\<span +style="font-size: 1em;">dt\</span>\<span style="font-size: +1em;">'\</span>**.** + +Keep in mind: The warm_archive is not writable for jobs. However, you +can store the data in the warm archive with the datamover. + +Useful links: [Data Mover](DataMover), [Export Nodes](ExportNodes) + +## 6. Use of hardware + +To run the software, do some calculations or compile your code compute +nodes have to be used. Login nodes which are using for login can not be +used for your computations. Submit your tasks (by using +[jobs](https://en.wikipedia.org/wiki/Job_(computing))) to compute nodes. +The [SLURM](Slurm) (scheduler to handle your jobs) is using on Taurus +for this purposes. [HPC +Introduction](%PUBURL%/Compendium/WebHome/HPC-Introduction.pdf) is a +good resource to get started with it. + +### 6.1 What do I need a CPU or GPU? + +The main difference between CPU and GPU architecture is that a CPU is +designed to handle a wide range of tasks quickly, but are limited in the +concurrency of tasks that can be running. While GPUs can process data +much faster than a CPU due to massive parallelism (but the amount of +data which single GPU's core can handle is small), GPUs are not as +versatile as CPUs. + +### 6.2 Selection of suitable hardware + +Available [hardware](HardwareTaurus): Normal compute nodes (Haswell\[ +[64,128,256](SystemTaurus#Run_45time_and_Memory_Limits)\], Broadwell, +[Rome](RomeNodes)), Large [SMP nodes](SDFlex), Accelerator(GPU) nodes: +(gpu2 partition, [ml partition](Power9)). + +The exact partition could be specified by `-p` flag with the srun +command or in your batch job. + +Majority of the basic task could be done on the conventional nodes like +a Haswell. SLURM will automatically select a suitable partition +depending on your memory and --gres (gpu) requirements. If you do not +specify the partition most likely you will be addressed to the Haswell +partition (1328 nodes in total). + +#### Parallel jobs: + +**MPI jobs**: For MPI jobs typically allocates one core per task. +Several nodes could be allocated if it is necessary. SLURM will +automatically find suitable hardware. Normal compute nodes are perfect +for this task. + +**OpenMP jobs**: An SMP-parallel job can only run **within a node**, so +it is necessary to include the options **-N 1** and **-n 1**. Using +--cpus-per-task N SLURM will start one task and you will have N CPUs. +The maximum number of processors for an SMP-parallel program is 896 on +Taurus ( [SMP](SDFlex) island). + +**GPUs** partitions are best suited for **repetitive** and +**highly-parallel** computing tasks. If you have a task with potential +[data +parallelism](https://en.wikipedia.org/wiki/Data_parallelism#:~:text=Data%20parallelism%20is%20parallelization%20across,on%20each%20element%20in%20parallel.) +most likely that you need the GPUs. Beyond video rendering, GPUs excel +in tasks such as machine learning, financial simulations and risk +modelling. Use the gpu2 and ml partition only if you need GPUs! +Otherwise using the x86 partitions (e.g Haswell) most likely would be +more beneficial. + +**Interactive jobs**: SLURM can forward your X11 credentials to the +first (or even all) node for a job with the --x11 option. To use an +interactive job you have to specify -X flag for the ssh login. + +### 6.3 Interactive vs. sbatch + +However, using srun directly on the shell will lead to blocking and +launch an interactive job. Apart from short test runs, it is +**recommended to launch your jobs into the background by using batch +jobs**. For that, you can conveniently put the parameters directly into +the job file which you can submit using `sbatch` \[options\] \<job +file>. + +### 6.4 Processing of data for input and output + +Pre-processing and post-processing of the data is a crucial part for the +majority of data-dependent projects. The quality of this work influence +on the computations. However, pre- and post-processing in many cases can +be done completely or partially on a local pc and then +[transferred](TypicalProjectSchedule#A_5._Data_moving) to the Taurus. +Please use Taurus for the computation-intensive tasks. + +Useful links: [Batch Systems](BatchSystems), [Hardware +Taurus](HardwareTaurus), [HPC-DA](HPCDA), [Slurm](Slurm) + +## 7. Use of specific software (packages, libraries, etc) + +### 7.1 Modular system + +The modular concept is the easiest way to work with the software on +Taurus. It allows to user to switch between different versions of +installed programs and provides utilities for the dynamic modification +of a user's environment. The information can be found +[here](RuntimeEnvironment#Modules). + +#### Private project and user modules files + +[Private project module +files](RuntimeEnvironment#Private_Project_Module_Files)\<span +style="font-size: 1em;"> allow you to load your group-wide installed +software into your environment and to handle different versions. It +allows creating your own software environment for the project. You can +create a list of modules that will be loaded for every member of the +team. It gives opportunity on unifying work of the team and defines the +reproducibility of results. Private modules can be loaded like other +modules with \</span>\<span class="WYSIWYG_TT">module load\</span>\<span +style="font-size: 1em;">.\</span> + +[Private user module +files](RuntimeEnvironment#Private_User_Module_Files) allow you to load +your own installed software into your environment. It works in the same +manner as to project modules but for your private use. + +### 7.2 Use of containers + +[Containerization](https://www.ibm.com/cloud/learn/containerization) +encapsulating or packaging up software code and all its dependencies to +run uniformly and consistently on any infrastructure. On Taurus +[Singularity](https://sylabs.io/) used as a standard container solution. +Singularity enables users to have full control of their environment. +This means that you dont have to ask an HPC support to install anything +for you - you can put it in a Singularity container and run! As opposed +to Docker (the most famous container solution), Singularity is much more +suited to being used in an HPC environment and more efficient in many +cases. Docker containers can easily be used in Singularity. Information +about the use of Singularity on Taurus can be found [here](Container). + +\<span style="font-size: 1em;">In some cases using Singularity requires +a Linux machine with root privileges (e.g. using the ml partition), the +same architecture and a compatible kernel. For many reasons, users on +Taurus cannot be granted root permissions. A solution is a Virtual +Machine (VM) on the ml partition which allows users to gain root +permissions in an isolated environment. There are two main options on +how to work with VM on Taurus:\<br />\</span>\<span style="font-size: +1em;">1. \</span> [VM tools](VMTools)\<span style="font-size: 1em;">. +Automative algorithms for using virtual machines;\<br />\</span>\<span +style="font-size: 1em;">2. \</span> [Manual method](Cloud)\<span +style="font-size: 1em;">. It required more operations but gives you more +flexibility and reliability.\<br />\</span>\<span style="font-size: +1em;">Additional Information: Examples of the definition for the +Singularity container (\</span> +[here](SingularityExampleDefinitions)\<span style="font-size: 1em;">) +and some hints (\</span> [here](SingularityRecipeHints)\<span +style="font-size: 1em;">).\</span> + +Useful links: [Containers](Container), [Custom EasyBuild +Environment](CustomEasyBuildEnvironment), [Cloud](Cloud) + +## 8. Structuring experiments + +- \<p>Input data\</p> +- \<p>Calculation results\</p> +- \<p>Log files\</p> +- \<p>Submission scripts (examples / code for survival)\</p> + +## What if everything didn't help? + +### Create a ticket: how do I do that? + +The best way to ask about the help is to create a ticket. In order to do +that you have to write a message to the <hpcsupport@zih.tu-dresden.de> +with a detailed description of your problem. If it possible please add +logs, used environment and write a minimal executable example for the +purpose to recreate the error or issue. + +### \<span style="font-size: 1em;">Communication with HPC support\</span> + +There is the HPC support team who is responsible for the support of HPC +users and stable work of the cluster. You could find the +[details](https://tu-dresden.de/zih/hochleistungsrechnen/support) in the +right part of any page of the compendium. However, please, before the +contact with the HPC support team check the documentation carefully +(starting points: [ main page](WebHome), [HPC-DA](HPCDA)), use a +[search](WebSearch) and then create a ticket. The ticket is a preferred +way to solve the issue, but in some terminable cases, you can call to +ask for help. + +Useful link: [Further Documentation](FurtherDocumentation) + +\<span style="font-size: 1em;">-- Main.AndreiPolitov - +2020-09-14\</span> -- GitLab