review of compilers.md: thorough content update, removed deprecaded parts

895a3d0d · Matthias Lieber · 10a8b607 · 895a3d0d
Commit 895a3d0d authored 3 years ago by Matthias Lieber
--- a/doc.zih.tu-dresden.de/docs/software/compilers.md
+++ b/doc.zih.tu-dresden.de/docs/software/compilers.md
@@ -2,19 +2,18 @@
 The following compilers are available on our platforms:
-|                      |           |            |             |
+|                      | GNU Compiler Collection | Intel Compiler | PGI Compiler (Nvidia HPC SDK) |
 |----------------------|-----------|------------|-------------|
-|                      | **Intel** | **GNU**    | **PGI**     |
+| Further information  | [GCC website](https://gcc.gnu.org/) | [C/C++](https://software.intel.com/en-us/c-compilers), [Fortran](https://software.intel.com/en-us/fortran-compilers) | [PGI website](https://www.pgroup.com) |
-| **C Compiler**       | `icc`     | `gcc`      | `pgcc`      |
+| Module name          | GNU        | intel     | PGI         |
-| **C++ Compiler**     | `icpc`    | `g++`      | `pgc++`     |
+| C Compiler           | `gcc`      | `icc`     | `pgcc`      |
-| **Fortran Compiler** | `ifort`   | `gfortran` | `pgfortran` |
+| C++ Compiler         | `g++`      | `icpc`    | `pgc++`     |
+| Fortran Compiler     | `gfortran` | `ifort`   | `pgfortran` |
-For an overview of the installed compiler versions, please see our automatically updated
+For an overview of the installed compiler versions, please use `ml spider <module name>` on the ZIH systems.
-[SoftwareModulesList]**todo**SoftwareModulesList.
-All C compiler support ANSI C and C99 with a couple of different language options. The support for
+All compilers support various language standards, at least up to ISO C11, ISO C++ 2014, and Fortran 2003.
-Fortran77, Fortran90, Fortran95, and Fortran2003 differs from one compiler to the other. Please
+Please check the man pages to verify that your code can be compiled.
-check the man pages to verify that your code can be compiled.
 Please note that the linking of C++ files normally requires the C++ version of the compiler to link
 the correct libraries.
@@ -30,83 +29,53 @@ Common options are:
 - `-I` to set search path for header files
 - `-L` to set search path for libraries
-Please note that aggressive optimization allows deviation from the strict IEEE arithmetic. Since the
+Please note that aggressive optimization allows deviation from the strict IEEE arithmetic.
-performance impact of options like `-mp` is very hard the user herself has to balance speed and
+Since the performance impact of options like `-fp-model strict` is very hard the user herself
-desired accuracy of her application. There are several options for profiling, profile-guided
+has to balance speed and desired accuracy of her application.
-optimization, data alignment and so on. You can list all available compiler options with the option
-`-help`. Reading the man-pages is a good idea, too.
+The user benefits from the (nearly) same set of compiler flags for optimization for the C, C++, and
+Fortran-compilers.
-The user benefits from the (nearly) same set of compiler flags for optimization for the C,C++, and
+In the following table, only a couple of important compiler-dependent options are listed.
-Fortran-compilers. In the following table, only a couple of important compiler-dependent options are
+For more detailed information about these and further flags, the user should refer to the man
-listed.  For more detailed information, the user should refer to the man pages or use the option
+pages or use the option `--help` to list all options of the compiler.
-help to list all options of the compiler.
+| GCC | Intel | PGI | Description |
-\| **GCC** \| **Open64** \| **Intel** \| **PGI** \| **Pathscale** \|
+|----------------------|--------------|-------------|-------------------------------------------------------------------------------------|
-Description\* \|
+| `-fopenmp`           | `-fopenmp`    | `-mp`       | turn on OpenMP support |
+| `-std=c99`, `-std=c++11`, `-std=f2018`   | `-std=c99`, `-std=c++11`, `-std18`       | `-c99`, `--c++11`, n/a  | set language standard, for example C99, C++11, Fortran 2018 |
-|                      |                    |                                                                                                                                                                                                                                                                              |             |                 |                                                                                     |
+| `-mieee-fp` `-frounding-math`  | `-fp-model precise` or `-fp-model strict`        | `-Kieee`    | limit floating-point optimizations and maintain declared precision |
-|----------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------------|-------------------------------------------------------------------------------------|
+| `-ffast-math`        | `-mp1` or `-fp-model fast`  | `-Mfprelaxed`  | allow floating-point optimizations, may violate IEEE conformance |
-| `-fopenmp`           | `-mp`              | `-openmp`                                                                                                                                                                                                                                                                    | `-mp`       | `-mp`           | turn on OpenMP support                                                              |
+| `-Ofast`             | `-fast`      | `-fast`     | Maximize performance, implies a couple of other flags                               |
-| `-ieee-fp`           | `-fno-fast-math`   | `-mp`                                                                                                                                                                                                                                                                        | `-Kieee`    | `-no-fast-math` | use this flag to limit floating-point optimizations and maintain declared precision |
+| `-fsignaling-nans` `-fno-trapping-math` | C/C++: `-fpe-trap`, Fortran: `-fpe-all` | `-Ktrap` | controls the behavior when floating-point exceptions occur   |
-| `-ffast-math`        | `-ffast-math`      | `-mp1`                                                                                                                                                                                                                                                                       | `-Knoieee`  | `-ffast-math`   | some floating-point optimizations are allowed, less performance impact than `-mp` . |
+| `-mavx` `-msse4.2`   | `-mavx` `-msse4.2`   | `-fastsse`  | "generally optimal flags" for supporting SSE instructions                           |
-| `-Ofast`             | `-Ofast`           | `-fast`                                                                                                                                                                                                                                                                      | `-fast`     | `-Ofast`        | Maximize performance, implies a couple of other flags                               |
+| `-flto`              | `-ipo`       | `-Mipa`     | interprocedural / link-time optimization (across source files)                                         |
-|                      |                    | `-fpe`<span class="twiki-macro FOOTNOTE">ifort only</span> `-ftz`<span class="twiki-macro FOOTNOTE">flushes denormalized numbers to zero: On Itanium 2 an underflow raises an underflow exception that needs to be handled in software. This takes about 1000 cycles!</span> | `-Ktrap`... |                 | Controls the behavior of the processor when floating-point exceptions occur.        |
+| `-floop-parallelize-all -ftree-parallelize-loops=<numthreads>` | `-parallel`  | `-Mconcur`  | auto-parallelizer                                                                   |
-| `-mavx` `-msse4.2`   | `-mavx` `-msse4.2` | `-msse4.2`                                                                                                                                                                                                                                                                   | `-fastsse`  | `-mavx`         | "generally optimal flags" for supporting SSE instructions                           |
+| `-fprofile-generate` | `-prof-gen`  | `-Mpfi`     | create instrumented code to generate profile in file                                |
-|                      | `-ipa`             | `-ipo`                                                                                                                                                                                                                                                                       | `-Mipa`     | `-ipa`          | inter procedure optimization (across files)                                         |
+| `-fprofile-use`      | `-prof-use`  | `-Mpfo`     | use profile data for optimization      |
-|                      |                    | `-ip`                                                                                                                                                                                                                                                                        | `-Mipa`     |                 | inter procedure optimization (within files)                                         |
-|                      | `-apo`             | `-parallel`                                                                                                                                                                                                                                                                  | `-Mconcur`  | `-apo`          | Auto-parallelizer                                                                   |
+!!! note
-| `-fprofile-generate` |                    | `-prof-gen`                                                                                                                                                                                                                                                                  | `-Mpfi`     | `-fb-create`    | Create instrumented code to generate profile in file \<FN>                           |
+    We can not generally give advice as to which option should be used.
-| `-fprofile-use`      |                    | `-prof-use`                                                                                                                                                                                                                                                                  | `-Mpfo`     | `-fb-opt`       | Use profile data for optimization. - Leave all other optimization options           |
+    To gain maximum performance please test the compilers and a few combinations of
+    optimization flags.
-*We can not generally give advice as to which option should be used - even -O0 sometimes leads to a
+    In case of doubt, you can also contact [HPC support](../support.md) and ask the staff for help.
-fast code. To gain maximum performance please test the compilers and a few combinations of
-optimization flags.  In case of doubt, you can also contact ZIH and ask the staff for help.*
+### Architecture-specific Optimizations
-### Vector Extensions
+Different architectures of CPUs feature different vector extensions (like SSE4.2 and AVX)
+to accelerate computations.
-To build an executable for different node types (e.g. Sandybridge and
+The following matrix shows proper compiler flags for the architectures at the ZIH:
-Westmere) the option `-msse4.2 -axavx` (for Intel compilers) uses SSE4.2
-as default path and runs along a different execution path if AVX is
+| Architecture       | GCC                  | Intel                | PGI |
-available. This increases the size of the program code (might result in
+|--------------------|----------------------|----------------------|-----|
+| Intel Haswell      | `-march=haswell`     | `-march=haswell`     | `-tp=haswell` |
+| AMD Rome           | `-march=znver2`      | `-march=core-avx2`   | `-tp=zen` |
+| Intel Cascade Lake | `-march=cascadelake` | `-march=cascadelake` | `-tp=skylake` |
+| Host's architecture  | `-march=native`      | `-xHost`             | |
+To build an executable for different node types (e.g. Cascade Lake with AVX512 and
+Haswell without AVX512) the option `-march=haswell -axcascadelake` (for Intel compilers)
+uses vector extension up to AVX2 as default path and runs along a different execution
+path if AVX512 is available.
+This increases the size of the program code (might result in
 poorer L1 instruction cache hits) but enables to run the same program on
 different hardware types.
-To optimize for the host architecture, the flags:
-| GCC           | Intel  |
-|:--------------|:-------|
-| -march=native | -xHost |
-can be used.
-The following matrix shows some proper optimization flags for the
-different hardware in Taurus, as of 2020-04-08:
-| Arch                   | GCC                | Intel Compiler   |
-|:-----------------------|:-------------------|:-----------------|
-| **Intel Sandy Bridge** | -march=sandybridge | -xAVX            |
-| **Intel Haswell**      | -march=haswell     | -xCORE-AVX2      |
-| **AMD Rome**           | -march=znver2      | -march=core-avx2 |
-| **Intel Cascade Lake** | -march=cascadelake | -xCOMMON-AVX512  |
-## Compiler Optimization Hints
-To achieve the best performance the compiler needs to exploit the
-parallelism in the code. Therefore it is sometimes necessary to provide
-the compiler with some hints. Some possible directives are (Fortran
-style):
-|                          |                                    |
-|--------------------------|------------------------------------|
-| `CDEC$ ivdep`            | ignore assumed vector dependencies |
-| `CDEC$ swp`              | try to software-pipeline           |
-| `CDEC$ noswp`            | disable software-pipeline          |
-| `CDEC$ loop count (n)`   | hint for optimization              |
-| `CDEC$ distribute point` | split this large loop              |
-| `CDEC$ unroll (n)`       | unroll (n) times                   |
-| `CDEC$ nounroll`         | do not unroll                      |
-| `CDEC$ prefetch a`       | prefetch array a                   |
-| `CDEC$ noprefetch a`     | do not prefetch array a            |
-The compiler directives are the same for `ifort` and `icc` . The syntax for C/C++ is like `#pragma
-ivdep`, `#pragma swp`, and so on.