compilers.md

# Compilers

The following compilers are available on our platforms:

|                      |           |            |             |
|----------------------|-----------|------------|-------------|
|                      | **Intel** | **GNU**    | **PGI**     |
| **C Compiler**       | `icc`     | `gcc`      | `pgcc`      |
| **C++ Compiler**     | `icpc`    | `g++`      | `pgc++`     |
| **Fortran Compiler** | `ifort`   | `gfortran` | `pgfortran` |

For an overview of the installed compiler versions, please see our automatically updated
[SoftwareModulesList]**todo**SoftwareModulesList.

All C compiler support ANSI C and C99 with a couple of different language options. The support for
Fortran77, Fortran90, Fortran95, and Fortran2003 differs from one compiler to the other. Please
check the man pages to verify that your code can be compiled.

Please note that the linking of C++ files normally requires the C++ version of the compiler to link
the correct libraries.

## Compiler Flags

Common options are:

- `-g` to include information required for debugging
- `-pg` to generate gprof -style sample-based profiling information during the run
- `-O0`, `-O1`, `-O2`, `-O3` to customize the optimization level from
  no (`-O0`) to aggressive (`-O3`) optimization
- `-I` to set search path for header files
- `-L` to set search path for libraries

Please note that aggressive optimization allows deviation from the strict IEEE arithmetic. Since the
performance impact of options like `-mp` is very hard the user herself has to balance speed and
desired accuracy of her application. There are several options for profiling, profile-guided
optimization, data alignment and so on. You can list all available compiler options with the option
`-help`. Reading the man-pages is a good idea, too.

The user benefits from the (nearly) same set of compiler flags for optimization for the C,C++, and
Fortran-compilers. In the following table, only a couple of important compiler-dependant options are
listed.  For more detailed information, the user should refer to the man pages or use the option
-help to list all options of the compiler.

\| **GCC** \| **Open64** \| **Intel** \| **PGI** \| **Pathscale** \|
Description\* \|

|                      |                    |                                                                                                                                                                                                                                                                              |             |                 |                                                                                     |
|----------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------------|-------------------------------------------------------------------------------------|
| `-fopenmp`           | `-mp`              | `-openmp`                                                                                                                                                                                                                                                                    | `-mp`       | `-mp`           | turn on OpenMP support                                                              |
| `-ieee-fp`           | `-fno-fast-math`   | `-mp`                                                                                                                                                                                                                                                                        | `-Kieee`    | `-no-fast-math` | use this flag to limit floating-point optimizations and maintain declared precision |
| `-ffast-math`        | `-ffast-math`      | `-mp1`                                                                                                                                                                                                                                                                       | `-Knoieee`  | `-ffast-math`   | some floating-point optimizations are allowed, less performance impact than `-mp` . |
| `-Ofast`             | `-Ofast`           | `-fast`                                                                                                                                                                                                                                                                      | `-fast`     | `-Ofast`        | Maximize performance, implies a couple of other flags                               |
|                      |                    | `-fpe`<span class="twiki-macro FOOTNOTE">ifort only</span> `-ftz`<span class="twiki-macro FOOTNOTE">flushes denormalized numbers to zero: On Itanium 2 an underflow raises an underflow exception that needs to be handled in software. This takes about 1000 cycles!</span> | `-Ktrap`... |                 | Controls the behavior of the processor when floating-point exceptions occur.        |
| `-mavx` `-msse4.2`   | `-mavx` `-msse4.2` | `-msse4.2`                                                                                                                                                                                                                                                                   | `-fastsse`  | `-mavx`         | "generally optimal flags" for supporting SSE instructions                           |
|                      | `-ipa`             | `-ipo`                                                                                                                                                                                                                                                                       | `-Mipa`     | `-ipa`          | inter procedure optimization (across files)                                         |
|                      |                    | `-ip`                                                                                                                                                                                                                                                                        | `-Mipa`     |                 | inter procedure optimization (within files)                                         |
|                      | `-apo`             | `-parallel`                                                                                                                                                                                                                                                                  | `-Mconcur`  | `-apo`          | Auto-parallelizer                                                                   |
| `-fprofile-generate` |                    | `-prof-gen`                                                                                                                                                                                                                                                                  | `-Mpfi`     | `-fb-create`    | Create intrumented code to generate profile in file \<FN>                           |
| `-fprofile-use`      |                    | `-prof-use`                                                                                                                                                                                                                                                                  | `-Mpfo`     | `-fb-opt`       | Use profile data for optimization. - Leave all other optimization options           |

*We can not generally give advice as to which option should be used - even -O0 sometimes leads to a
fast code. To gain maximum performance please test the compilers and a few combinations of
optimization flags.  In case of doubt, you can also contact ZIH and ask the staff for help.*

### Vector Extensions

To build an executable for different node types (e.g. Sandybridge and
Westmere) the option `-msse4.2 -axavx` (for Intel compilers) uses SSE4.2
as default path and runs along a different execution path if AVX is
available. This increases the size of the program code (might result in
poorer L1 instruction cache hits) but enables to run the same program on
different hardware types.

To optimize for the host architecture, the flags:

| GCC           | Intel  |
|:--------------|:-------|
| -march=native | -xHost |

can be used.

The following matrix shows some proper optimization flags for the
different hardware in Taurus, as of 2020-04-08:

| Arch                   | GCC                | Intel Compiler   |
|:-----------------------|:-------------------|:-----------------|
| **Intel Sandy Bridge** | -march=sandybridge | -xAVX            |
| **Intel Haswell**      | -march=haswell     | -xCORE-AVX2      |
| **AMD Rome**           | -march=znver2      | -march=core-avx2 |
| **Intel Cascade Lake** | -march=cascadelake | -xCOMMON-AVX512  |

## Compiler Optimization Hints

To achieve the best performance the compiler needs to exploit the
parallelism in the code. Therefore it is sometimes necessary to provide
the compiler with some hints. Some possible directives are (Fortran
style):

|                          |                                   |
|--------------------------|-----------------------------------|
| `CDEC$ ivdep`            | ignore assumed vector dependences |
| `CDEC$ swp`              | try to software-pipeline          |
| `CDEC$ noswp`            | disable softeware-pipeling        |
| `CDEC$ loop count (n)`   | hint for optimzation              |
| `CDEC$ distribute point` | split this large loop             |
| `CDEC$ unroll (n)`       | unroll (n) times                  |
| `CDEC$ nounroll`         | do not unroll                     |
| `CDEC$ prefetch a`       | prefetch array a                  |
| `CDEC$ noprefetch a`     | do not prefetch array a           |

The compiler directives are the same for `ifort` and `icc` . The syntax for C/C++ is like `#pragma
ivdep`, `#pragma swp`, and so on.