diff --git a/doc.zih.tu-dresden.de/docs/software/compilers.md b/doc.zih.tu-dresden.de/docs/software/compilers.md index 19a70e4638aa126176c8d705d472176e4bbbb915..4292602e02e77bf01ad04c8c01643aadcc8c580a 100644 --- a/doc.zih.tu-dresden.de/docs/software/compilers.md +++ b/doc.zih.tu-dresden.de/docs/software/compilers.md @@ -1,20 +1,20 @@ # Compilers -The following compilers are available on our platforms: +The following compilers are available on the ZIH system: -| | | | | +| | GNU Compiler Collection | Intel Compiler | PGI Compiler (Nvidia HPC SDK) | |----------------------|-----------|------------|-------------| -| | **Intel** | **GNU** | **PGI** | -| **C Compiler** | `icc` | `gcc` | `pgcc` | -| **C++ Compiler** | `icpc` | `g++` | `pgc++` | -| **Fortran Compiler** | `ifort` | `gfortran` | `pgfortran` | +| Further information | [GCC website](https://gcc.gnu.org/) | [C/C++](https://software.intel.com/en-us/c-compilers), [Fortran](https://software.intel.com/en-us/fortran-compilers) | [PGI website](https://www.pgroup.com) | +| Module name | GNU | intel | PGI | +| C Compiler | `gcc` | `icc` | `pgcc` | +| C++ Compiler | `g++` | `icpc` | `pgc++` | +| Fortran Compiler | `gfortran` | `ifort` | `pgfortran` | -For an overview of the installed compiler versions, please see our automatically updated -[SoftwareModulesList]**todo**SoftwareModulesList. +For an overview of the installed compiler versions, please use `module spider <module name>` +on the ZIH systems. -All C compiler support ANSI C and C99 with a couple of different language options. The support for -Fortran77, Fortran90, Fortran95, and Fortran2003 differs from one compiler to the other. Please -check the man pages to verify that your code can be compiled. +All compilers support various language standards, at least up to ISO C11, ISO C++ 2014, and Fortran 2003. +Please check the man pages to verify that your code can be compiled. Please note that the linking of C++ files normally requires the C++ version of the compiler to link the correct libraries. @@ -24,89 +24,59 @@ the correct libraries. Common options are: - `-g` to include information required for debugging -- `-pg` to generate gprof -style sample-based profiling information during the run +- `-pg` to generate gprof-like sample-based profiling information during the run - `-O0`, `-O1`, `-O2`, `-O3` to customize the optimization level from no (`-O0`) to aggressive (`-O3`) optimization - `-I` to set search path for header files - `-L` to set search path for libraries -Please note that aggressive optimization allows deviation from the strict IEEE arithmetic. Since the -performance impact of options like `-mp` is very hard the user herself has to balance speed and -desired accuracy of her application. There are several options for profiling, profile-guided -optimization, data alignment and so on. You can list all available compiler options with the option -`-help`. Reading the man-pages is a good idea, too. - -The user benefits from the (nearly) same set of compiler flags for optimization for the C,C++, and -Fortran-compilers. In the following table, only a couple of important compiler-dependent options are -listed. For more detailed information, the user should refer to the man pages or use the option --help to list all options of the compiler. - -\| **GCC** \| **Open64** \| **Intel** \| **PGI** \| **Pathscale** \| -Description\* \| - -| | | | | | | -|----------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------------|-------------------------------------------------------------------------------------| -| `-fopenmp` | `-mp` | `-openmp` | `-mp` | `-mp` | turn on OpenMP support | -| `-ieee-fp` | `-fno-fast-math` | `-mp` | `-Kieee` | `-no-fast-math` | use this flag to limit floating-point optimizations and maintain declared precision | -| `-ffast-math` | `-ffast-math` | `-mp1` | `-Knoieee` | `-ffast-math` | some floating-point optimizations are allowed, less performance impact than `-mp` . | -| `-Ofast` | `-Ofast` | `-fast` | `-fast` | `-Ofast` | Maximize performance, implies a couple of other flags | -| | | `-fpe`<span class="twiki-macro FOOTNOTE">ifort only</span> `-ftz`<span class="twiki-macro FOOTNOTE">flushes denormalized numbers to zero: On Itanium 2 an underflow raises an underflow exception that needs to be handled in software. This takes about 1000 cycles!</span> | `-Ktrap`... | | Controls the behavior of the processor when floating-point exceptions occur. | -| `-mavx` `-msse4.2` | `-mavx` `-msse4.2` | `-msse4.2` | `-fastsse` | `-mavx` | "generally optimal flags" for supporting SSE instructions | -| | `-ipa` | `-ipo` | `-Mipa` | `-ipa` | inter procedure optimization (across files) | -| | | `-ip` | `-Mipa` | | inter procedure optimization (within files) | -| | `-apo` | `-parallel` | `-Mconcur` | `-apo` | Auto-parallelizer | -| `-fprofile-generate` | | `-prof-gen` | `-Mpfi` | `-fb-create` | Create instrumented code to generate profile in file \<FN> | -| `-fprofile-use` | | `-prof-use` | `-Mpfo` | `-fb-opt` | Use profile data for optimization. - Leave all other optimization options | - -*We can not generally give advice as to which option should be used - even -O0 sometimes leads to a -fast code. To gain maximum performance please test the compilers and a few combinations of -optimization flags. In case of doubt, you can also contact ZIH and ask the staff for help.* - -### Vector Extensions - -To build an executable for different node types (e.g. Sandybridge and -Westmere) the option `-msse4.2 -axavx` (for Intel compilers) uses SSE4.2 -as default path and runs along a different execution path if AVX is -available. This increases the size of the program code (might result in +Please note that aggressive optimization allows deviation from the strict IEEE arithmetic. +Since the performance impact of options like `-fp-model strict` is very hard you +have to balance speed and desired accuracy of your application yourself. + +The user benefits from the (nearly) same set of compiler flags for optimization for the C, C++, and +Fortran-compilers. +In the following table, only a couple of important compiler-dependent options are listed. +For more detailed information about these and further flags, the user should refer to the man +pages or use the option `--help` to list all options of the compiler. + +| GCC | Intel | PGI | Description | +|----------------------|--------------|-------------|-------------------------------------------------------------------------------------| +| `-fopenmp` | `-fopenmp` | `-mp` | turn on OpenMP support | +| `-std=c99`, `-std=c++11`, `-std=f2018` | `-std=c99`, `-std=c++11`, `-std18` | `-c99`, `--c++11`, n/a | set language standard, for example C99, C++11, Fortran 2018 | +| `-mieee-fp` `-frounding-math` | `-fp-model precise` or `-fp-model strict` | `-Kieee` | limit floating-point optimizations and maintain declared precision | +| `-ffast-math` | `-mp1` or `-fp-model fast` | `-Mfprelaxed` | allow floating-point optimizations, may violate IEEE conformance | +| `-Ofast` | `-fast` | `-fast` | Maximize performance, implies a couple of other flags | +| `-fsignaling-nans` `-fno-trapping-math` | C/C++: `-fpe-trap`, Fortran: `-fpe-all` | `-Ktrap` | controls the behavior when floating-point exceptions occur | +| `-mavx` `-msse4.2` | `-mavx` `-msse4.2` | `-fastsse` | "generally optimal flags" for supporting SSE instructions | +| `-flto` | `-ipo` | `-Mipa` | interprocedural / link-time optimization (across source files) | +| `-floop-parallelize-all -ftree-parallelize-loops=<numthreads>` | `-parallel` | `-Mconcur` | auto-parallelizer | +| `-fprofile-generate` | `-prof-gen` | `-Mpfi` | create instrumented code to generate profile in file | +| `-fprofile-use` | `-prof-use` | `-Mpfo` | use profile data for optimization | + +!!! note + We can not generally give advice as to which option should be used. + To gain maximum performance please test the compilers and a few combinations of + optimization flags. + In case of doubt, you can also contact [HPC support](../support.md) and ask the staff for help. + +### Architecture-specific Optimizations + +Different architectures of CPUs feature different vector extensions (like SSE4.2 and AVX) +to accelerate computations. +The following matrix shows proper compiler flags for the architectures at the ZIH: + +| Architecture | GCC | Intel | PGI | +|--------------------|----------------------|----------------------|-----| +| Intel Haswell | `-march=haswell` | `-march=haswell` | `-tp=haswell` | +| AMD Rome | `-march=znver2` | `-march=core-avx2` | `-tp=zen` | +| Intel Cascade Lake | `-march=cascadelake` | `-march=cascadelake` | `-tp=skylake` | +| Host's architecture | `-march=native` | `-xHost` | | + +To build an executable for different node types (e.g. Cascade Lake with AVX512 and +Haswell without AVX512) the option `-march=haswell -axcascadelake` (for Intel compilers) +uses vector extension up to AVX2 as default path and runs along a different execution +path if AVX512 is available. +This increases the size of the program code (might result in poorer L1 instruction cache hits) but enables to run the same program on different hardware types. - -To optimize for the host architecture, the flags: - -| GCC | Intel | -|:--------------|:-------| -| -march=native | -xHost | - -can be used. - -The following matrix shows some proper optimization flags for the -different hardware in Taurus, as of 2020-04-08: - -| Arch | GCC | Intel Compiler | -|:-----------------------|:-------------------|:-----------------| -| **Intel Sandy Bridge** | -march=sandybridge | -xAVX | -| **Intel Haswell** | -march=haswell | -xCORE-AVX2 | -| **AMD Rome** | -march=znver2 | -march=core-avx2 | -| **Intel Cascade Lake** | -march=cascadelake | -xCOMMON-AVX512 | - -## Compiler Optimization Hints - -To achieve the best performance the compiler needs to exploit the -parallelism in the code. Therefore it is sometimes necessary to provide -the compiler with some hints. Some possible directives are (Fortran -style): - -| | | -|--------------------------|------------------------------------| -| `CDEC$ ivdep` | ignore assumed vector dependencies | -| `CDEC$ swp` | try to software-pipeline | -| `CDEC$ noswp` | disable software-pipeline | -| `CDEC$ loop count (n)` | hint for optimization | -| `CDEC$ distribute point` | split this large loop | -| `CDEC$ unroll (n)` | unroll (n) times | -| `CDEC$ nounroll` | do not unroll | -| `CDEC$ prefetch a` | prefetch array a | -| `CDEC$ noprefetch a` | do not prefetch array a | - -The compiler directives are the same for `ifort` and `icc` . The syntax for C/C++ is like `#pragma -ivdep`, `#pragma swp`, and so on.