Skip to content
Snippets Groups Projects
Commit 895a3d0d authored by Matthias Lieber's avatar Matthias Lieber
Browse files

review of compilers.md: thorough content update, removed deprecaded parts

parent 10a8b607
No related branches found
No related tags found
3 merge requests!322Merge preview into main,!319Merge preview into main,!266review of compilers.md: thorough content update, removed deprecaded parts
...@@ -2,19 +2,18 @@ ...@@ -2,19 +2,18 @@
The following compilers are available on our platforms: The following compilers are available on our platforms:
| | | | | | | GNU Compiler Collection | Intel Compiler | PGI Compiler (Nvidia HPC SDK) |
|----------------------|-----------|------------|-------------| |----------------------|-----------|------------|-------------|
| | **Intel** | **GNU** | **PGI** | | Further information | [GCC website](https://gcc.gnu.org/) | [C/C++](https://software.intel.com/en-us/c-compilers), [Fortran](https://software.intel.com/en-us/fortran-compilers) | [PGI website](https://www.pgroup.com) |
| **C Compiler** | `icc` | `gcc` | `pgcc` | | Module name | GNU | intel | PGI |
| **C++ Compiler** | `icpc` | `g++` | `pgc++` | | C Compiler | `gcc` | `icc` | `pgcc` |
| **Fortran Compiler** | `ifort` | `gfortran` | `pgfortran` | | C++ Compiler | `g++` | `icpc` | `pgc++` |
| Fortran Compiler | `gfortran` | `ifort` | `pgfortran` |
For an overview of the installed compiler versions, please see our automatically updated For an overview of the installed compiler versions, please use `ml spider <module name>` on the ZIH systems.
[SoftwareModulesList]**todo**SoftwareModulesList.
All C compiler support ANSI C and C99 with a couple of different language options. The support for All compilers support various language standards, at least up to ISO C11, ISO C++ 2014, and Fortran 2003.
Fortran77, Fortran90, Fortran95, and Fortran2003 differs from one compiler to the other. Please Please check the man pages to verify that your code can be compiled.
check the man pages to verify that your code can be compiled.
Please note that the linking of C++ files normally requires the C++ version of the compiler to link Please note that the linking of C++ files normally requires the C++ version of the compiler to link
the correct libraries. the correct libraries.
...@@ -30,83 +29,53 @@ Common options are: ...@@ -30,83 +29,53 @@ Common options are:
- `-I` to set search path for header files - `-I` to set search path for header files
- `-L` to set search path for libraries - `-L` to set search path for libraries
Please note that aggressive optimization allows deviation from the strict IEEE arithmetic. Since the Please note that aggressive optimization allows deviation from the strict IEEE arithmetic.
performance impact of options like `-mp` is very hard the user herself has to balance speed and Since the performance impact of options like `-fp-model strict` is very hard the user herself
desired accuracy of her application. There are several options for profiling, profile-guided has to balance speed and desired accuracy of her application.
optimization, data alignment and so on. You can list all available compiler options with the option
`-help`. Reading the man-pages is a good idea, too. The user benefits from the (nearly) same set of compiler flags for optimization for the C, C++, and
Fortran-compilers.
The user benefits from the (nearly) same set of compiler flags for optimization for the C,C++, and In the following table, only a couple of important compiler-dependent options are listed.
Fortran-compilers. In the following table, only a couple of important compiler-dependent options are For more detailed information about these and further flags, the user should refer to the man
listed. For more detailed information, the user should refer to the man pages or use the option pages or use the option `--help` to list all options of the compiler.
-help to list all options of the compiler.
| GCC | Intel | PGI | Description |
\| **GCC** \| **Open64** \| **Intel** \| **PGI** \| **Pathscale** \| |----------------------|--------------|-------------|-------------------------------------------------------------------------------------|
Description\* \| | `-fopenmp` | `-fopenmp` | `-mp` | turn on OpenMP support |
| `-std=c99`, `-std=c++11`, `-std=f2018` | `-std=c99`, `-std=c++11`, `-std18` | `-c99`, `--c++11`, n/a | set language standard, for example C99, C++11, Fortran 2018 |
| | | | | | | | `-mieee-fp` `-frounding-math` | `-fp-model precise` or `-fp-model strict` | `-Kieee` | limit floating-point optimizations and maintain declared precision |
|----------------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|-----------------|-------------------------------------------------------------------------------------| | `-ffast-math` | `-mp1` or `-fp-model fast` | `-Mfprelaxed` | allow floating-point optimizations, may violate IEEE conformance |
| `-fopenmp` | `-mp` | `-openmp` | `-mp` | `-mp` | turn on OpenMP support | | `-Ofast` | `-fast` | `-fast` | Maximize performance, implies a couple of other flags |
| `-ieee-fp` | `-fno-fast-math` | `-mp` | `-Kieee` | `-no-fast-math` | use this flag to limit floating-point optimizations and maintain declared precision | | `-fsignaling-nans` `-fno-trapping-math` | C/C++: `-fpe-trap`, Fortran: `-fpe-all` | `-Ktrap` | controls the behavior when floating-point exceptions occur |
| `-ffast-math` | `-ffast-math` | `-mp1` | `-Knoieee` | `-ffast-math` | some floating-point optimizations are allowed, less performance impact than `-mp` . | | `-mavx` `-msse4.2` | `-mavx` `-msse4.2` | `-fastsse` | "generally optimal flags" for supporting SSE instructions |
| `-Ofast` | `-Ofast` | `-fast` | `-fast` | `-Ofast` | Maximize performance, implies a couple of other flags | | `-flto` | `-ipo` | `-Mipa` | interprocedural / link-time optimization (across source files) |
| | | `-fpe`<span class="twiki-macro FOOTNOTE">ifort only</span> `-ftz`<span class="twiki-macro FOOTNOTE">flushes denormalized numbers to zero: On Itanium 2 an underflow raises an underflow exception that needs to be handled in software. This takes about 1000 cycles!</span> | `-Ktrap`... | | Controls the behavior of the processor when floating-point exceptions occur. | | `-floop-parallelize-all -ftree-parallelize-loops=<numthreads>` | `-parallel` | `-Mconcur` | auto-parallelizer |
| `-mavx` `-msse4.2` | `-mavx` `-msse4.2` | `-msse4.2` | `-fastsse` | `-mavx` | "generally optimal flags" for supporting SSE instructions | | `-fprofile-generate` | `-prof-gen` | `-Mpfi` | create instrumented code to generate profile in file |
| | `-ipa` | `-ipo` | `-Mipa` | `-ipa` | inter procedure optimization (across files) | | `-fprofile-use` | `-prof-use` | `-Mpfo` | use profile data for optimization |
| | | `-ip` | `-Mipa` | | inter procedure optimization (within files) |
| | `-apo` | `-parallel` | `-Mconcur` | `-apo` | Auto-parallelizer | !!! note
| `-fprofile-generate` | | `-prof-gen` | `-Mpfi` | `-fb-create` | Create instrumented code to generate profile in file \<FN> | We can not generally give advice as to which option should be used.
| `-fprofile-use` | | `-prof-use` | `-Mpfo` | `-fb-opt` | Use profile data for optimization. - Leave all other optimization options | To gain maximum performance please test the compilers and a few combinations of
optimization flags.
*We can not generally give advice as to which option should be used - even -O0 sometimes leads to a In case of doubt, you can also contact [HPC support](../support.md) and ask the staff for help.
fast code. To gain maximum performance please test the compilers and a few combinations of
optimization flags. In case of doubt, you can also contact ZIH and ask the staff for help.* ### Architecture-specific Optimizations
### Vector Extensions Different architectures of CPUs feature different vector extensions (like SSE4.2 and AVX)
to accelerate computations.
To build an executable for different node types (e.g. Sandybridge and The following matrix shows proper compiler flags for the architectures at the ZIH:
Westmere) the option `-msse4.2 -axavx` (for Intel compilers) uses SSE4.2
as default path and runs along a different execution path if AVX is | Architecture | GCC | Intel | PGI |
available. This increases the size of the program code (might result in |--------------------|----------------------|----------------------|-----|
| Intel Haswell | `-march=haswell` | `-march=haswell` | `-tp=haswell` |
| AMD Rome | `-march=znver2` | `-march=core-avx2` | `-tp=zen` |
| Intel Cascade Lake | `-march=cascadelake` | `-march=cascadelake` | `-tp=skylake` |
| Host's architecture | `-march=native` | `-xHost` | |
To build an executable for different node types (e.g. Cascade Lake with AVX512 and
Haswell without AVX512) the option `-march=haswell -axcascadelake` (for Intel compilers)
uses vector extension up to AVX2 as default path and runs along a different execution
path if AVX512 is available.
This increases the size of the program code (might result in
poorer L1 instruction cache hits) but enables to run the same program on poorer L1 instruction cache hits) but enables to run the same program on
different hardware types. different hardware types.
To optimize for the host architecture, the flags:
| GCC | Intel |
|:--------------|:-------|
| -march=native | -xHost |
can be used.
The following matrix shows some proper optimization flags for the
different hardware in Taurus, as of 2020-04-08:
| Arch | GCC | Intel Compiler |
|:-----------------------|:-------------------|:-----------------|
| **Intel Sandy Bridge** | -march=sandybridge | -xAVX |
| **Intel Haswell** | -march=haswell | -xCORE-AVX2 |
| **AMD Rome** | -march=znver2 | -march=core-avx2 |
| **Intel Cascade Lake** | -march=cascadelake | -xCOMMON-AVX512 |
## Compiler Optimization Hints
To achieve the best performance the compiler needs to exploit the
parallelism in the code. Therefore it is sometimes necessary to provide
the compiler with some hints. Some possible directives are (Fortran
style):
| | |
|--------------------------|------------------------------------|
| `CDEC$ ivdep` | ignore assumed vector dependencies |
| `CDEC$ swp` | try to software-pipeline |
| `CDEC$ noswp` | disable software-pipeline |
| `CDEC$ loop count (n)` | hint for optimization |
| `CDEC$ distribute point` | split this large loop |
| `CDEC$ unroll (n)` | unroll (n) times |
| `CDEC$ nounroll` | do not unroll |
| `CDEC$ prefetch a` | prefetch array a |
| `CDEC$ noprefetch a` | do not prefetch array a |
The compiler directives are the same for `ifort` and `icc` . The syntax for C/C++ is like `#pragma
ivdep`, `#pragma swp`, and so on.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment