Commit f51815cf authored by Frank Winkler's avatar Frank Winkler
Browse files

Added all pika supported architectures.

parent ffb64ca2
SHORT Metrics for ProPE
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE
PMC1 FP_ARITH_INST_RETIRED_SCALAR_SINGLE
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE
MBOX0C0 CAS_COUNT_RD
MBOX0C1 CAS_COUNT_WR
MBOX1C0 CAS_COUNT_RD
MBOX1C1 CAS_COUNT_WR
MBOX2C0 CAS_COUNT_RD
MBOX2C1 CAS_COUNT_WR
MBOX3C0 CAS_COUNT_RD
MBOX3C1 CAS_COUNT_WR
MBOX4C0 CAS_COUNT_RD
MBOX4C1 CAS_COUNT_WR
MBOX5C0 CAS_COUNT_RD
MBOX5C1 CAS_COUNT_WR
MBOX6C0 CAS_COUNT_RD
MBOX6C1 CAS_COUNT_WR
MBOX7C0 CAS_COUNT_RD
MBOX7C1 CAS_COUNT_WR
PWR0 PWR_PKG_ENERGY
PWR3 PWR_DRAM_ENERGY
METRICS
ipc FIXC0/FIXC1
mem_bw (MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0/time
flops_sp (PMC0*4.0+PMC1+PMC2*8.0)/time
rapl_power (PWR0+PWR3)/time
LONG
--
SHORT Metrics for ProPE
EVENTSET
PMC0 FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE
PMC1 FP_ARITH_INST_RETIRED_SCALAR_DOUBLE
PMC2 FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE
METRICS
flops_dp (PMC0*2.0+PMC1+PMC2*4.0)/time
LONG
--
SHORT Metrics for ProPE
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
PMC0 AVX_INSTS_CALC
MBOX0C0 CAS_COUNT_RD
MBOX0C1 CAS_COUNT_WR
MBOX1C0 CAS_COUNT_RD
MBOX1C1 CAS_COUNT_WR
MBOX2C0 CAS_COUNT_RD
MBOX2C1 CAS_COUNT_WR
MBOX3C0 CAS_COUNT_RD
MBOX3C1 CAS_COUNT_WR
MBOX4C0 CAS_COUNT_RD
MBOX4C1 CAS_COUNT_WR
MBOX5C0 CAS_COUNT_RD
MBOX5C1 CAS_COUNT_WR
MBOX6C0 CAS_COUNT_RD
MBOX6C1 CAS_COUNT_WR
MBOX7C0 CAS_COUNT_RD
MBOX7C1 CAS_COUNT_WR
PWR0 PWR_PKG_ENERGY
PWR3 PWR_DRAM_ENERGY
METRICS
ipc FIXC0/FIXC1
mem_bw (MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0/time
flops_sp (PMC0*8.0)/time
rapl_power (PWR0+PWR3)/time
LONG
--
flops_sp: Packed SP MFLOP/s
SHORT ProPE metrics: IPC, FLOPS, MEM_BW
EVENTSET
PMC3 PM_FLOP_CMPL
PMC4 PM_RUN_INST_CMPL
PMC5 PM_RUN_CYC
MBOX0C0 PM_MBA0_READ_BYTES
MBOX0C1 PM_MBA0_WRITE_BYTES
MBOX1C0 PM_MBA1_READ_BYTES
MBOX1C1 PM_MBA1_WRITE_BYTES
MBOX2C0 PM_MBA2_READ_BYTES
MBOX2C1 PM_MBA2_WRITE_BYTES
MBOX3C0 PM_MBA3_READ_BYTES
MBOX3C1 PM_MBA3_WRITE_BYTES
MBOX4C0 PM_MBA4_READ_BYTES
MBOX4C1 PM_MBA4_WRITE_BYTES
MBOX5C0 PM_MBA5_READ_BYTES
MBOX5C1 PM_MBA5_WRITE_BYTES
MBOX6C0 PM_MBA6_READ_BYTES
MBOX6C1 PM_MBA6_WRITE_BYTES
MBOX7C0 PM_MBA7_READ_BYTES
MBOX7C1 PM_MBA7_WRITE_BYTES
METRICS
ipc PMC4/PMC5
flops_sp PMC3*2.0/time
mem_bw (MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0/time
LONG
--
-
mem_bw (MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX4C0+MBOX5C0+MBOX6C0+MBOX7C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1+MBOX4C1+MBOX5C1+MBOX6C1+MBOX7C1)*64.0/time
SHORT Metrics for ProPE
EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_SINGLE
PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_SINGLE
PMC2 SIMD_FP_256_PACKED_SINGLE
MBOX0C0 CAS_COUNT_RD
MBOX0C1 CAS_COUNT_WR
MBOX1C0 CAS_COUNT_RD
MBOX1C1 CAS_COUNT_WR
MBOX2C0 CAS_COUNT_RD
MBOX2C1 CAS_COUNT_WR
MBOX3C0 CAS_COUNT_RD
MBOX3C1 CAS_COUNT_WR
PWR0 PWR_PKG_ENERGY
PWR3 PWR_DRAM_ENERGY
METRICS
ipc FIXC0/FIXC1
mem_bw (MBOX0C0+MBOX1C0+MBOX2C0+MBOX3C0+MBOX0C1+MBOX1C1+MBOX2C1+MBOX3C1)*64.0/time
flops_sp (PMC0*4.0+PMC1+PMC2*8.0)/time
rapl_power (PWR0+PWR3)/time
LONG
--
SHORT Metrics for ProPE
EVENTSET
PMC0 FP_COMP_OPS_EXE_SSE_FP_PACKED_DOUBLE
PMC1 FP_COMP_OPS_EXE_SSE_FP_SCALAR_DOUBLE
PMC2 SIMD_FP_256_PACKED_DOUBLE
METRICS
flops_dp (PMC0*2.0+PMC1+PMC2*4.0)/time
LONG
--
SHORT PIKA metric group 1
EVENTSET
PMC0 RETIRED_INSTRUCTIONS
PMC1 CPU_CLOCKS_UNHALTED
PMC2 RETIRED_SSE_AVX_FLOPS_SINGLE_ALL
PMC3 MERGE
DFC0 DRAM_CHANNEL_0
DFC1 DRAM_CHANNEL_1
METRICS
ipc PMC0/PMC1
mem_bw (DFC0+DFC1)*64.0/time
flops_sp PMC2/time
LONG
--
Profiling group to measure memory bandwidth drawn by all cores of a socket.
Since this group is based on Uncore events it is only possible to measure on a
per socket base.
The group provides almost accurate results for the total bandwidth and data volume.
AMD describes this metric as "approximate" in the documentation for AMD Rome.
Profiling group to measure single precisision FLOP rate. The event might
have a higher per-cycle increment than 15, so the MERGE event is required.
SHORT PIKA metric group 2
EVENTSET
PMC2 RETIRED_SSE_AVX_FLOPS_DOUBLE_ALL
PMC3 MERGE
METRICS
flops_dp PMC2/time
LONG
--
Profiling group to measure double precisision FLOP rate. The event might
have a higher per-cycle increment than 15, so the MERGE event is required.
SHORT PIKA metric group 1
EVENTSET
PMC0 RETIRED_INSTRUCTIONS
PMC1 CPU_CLOCKS_UNHALTED
DFC0 DRAM_CHANNEL_0
DFC1 DRAM_CHANNEL_1
METRICS
ipc PMC0/PMC1
mem_bw (DFC0+DFC1)*(4.0/(num_numadomains/num_sockets))*64.0/time
LONG
--
The fixed counter registers are broken. Thus, we use PMC registers to determine
the IPC.
-
Ryzen implements the RAPL interface previously introduced by Intel.
This interface enables to monitor the consumed energy on the core and package
domain. It is not documented by AMD which parts of the CPU are in which domain.
For now, we ignoring PWR0 RAPL_CORE_ENERGY.
-
Profiling group to measure memory bandwidth drawn by all cores of a socket.
Since this group is based on Uncore events it is only possible to measure on a
per socket base.
The group provides almost accurate results for the total bandwidth
and data volume.
The metric formulas contain a correction factor of (4.0/(num_numadomains/num_sockets)) to
return the value for all 4 memory controllers in NPS1 mode per socket. This is probably
a work-around. Requested info from AMD but no answer.
SHORT PIKA metric group 2
EVENTSET
PMC2 RETIRED_SSE_AVX_FLOPS_ALL
PMC3 MERGE
METRICS
flops_any PMC2/time
LONG
--
Profiling group to measure (single-precisision) FLOP rate. The event might
have a higher per-cycle increment than 15, so the MERGE event is required. In
contrast to AMD Zen, the Zen2 microarchitecture does not provide events to
differentiate between single- and double-precision.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment