Enable profiling
Enable tracing
Enables recording calling context information for every event
The calling context is the call chain of functions to the current position in
the running program. This call chain will also be annotated with source code
information if possible.
This is a prerequisite for sampling but also works with instrumented
applications.
Note that when tracing is also enabled, Score-P does not write the usual
Enter/Leave records into the OTF2 trace, but new records.
See also SCOREP_TRACING_CONVERT_CALLING_CONTEXT_EVENTS.
Note also that this supresses events from the compiler instrumentation.
Be verbose
Total memory in bytes per process to be consumed by the measurement system
SCOREP_TOTAL_MEMORY will be split into pages of size SCOREP_PAGE_SIZE (potentially reduced to a multiple of SCOREP_PAGE_SIZE). Maximum size is 4 GBminus one SCOREP_PAGE_SIZE.
Memory page size in bytes
If not a power of two, SCOREP_PAGE_SIZE will be increased to the next larger power of two. SCOREP_TOTAL_MEMORY will be split up into pages of (the adjusted) SCOREP_PAGE_SIZE. Minimum size is 512 bytes.
Name of the experiment directory as child of the current working directory
The experiment directory is created directly under the current working directory. No parent directories will be created. The experiment directory is only created if it is requested by at least one substrate. When no experiment name is given (the default) Score-P names the experiment directory `scorep-measurement-tmp' and renames this after a successful measurement to a generated name based on the current time.
Overwrite an existing experiment directory
If you specified a specific experiment directory name, but this name is already given, you can force overwriting it with this flag. The previous experiment directory will be renamed.
The machine name used in profile and trace output
We suggest using a unique name, e.g., the fully qualified domain name. The default machine name was set at configure time (see the INSTALL file for customization options).
Executable of the application
File name, preferably with full path, of the application's executable. This is a fallback if Score-P cannot determine the exetuable's name automatically. The name is required by some compiler adapters. They will complain if this environment variable is needed.
Use system tree sequence definitions
Enables an internal system tree representation that specifies a sequence of system tree nodes with one record instead of creating one record per system tree node, location group or location. It is more scalable and has less memory requirements than single-node records. However, it costs inidividual names of nodes, but simply enumerates them based on types. Currently, system tree sequence definitions support only MPI (and trivially single-process) applications.
Force the creation of experiment directory and configuration files
If this is set to 'true' (which is the default), the experiment directory will
be created along with some configuration files, even if no substrate writes
data (i.e., profiling and tracing are disabled and no substrate plugin
registered for writing).
If this is set to 'false', the directory will only be created if any substrate
actually writes data.
Timer used during measurement
The following timers are available for this installation:
Low overhead time stamp counter (X86_64) timer.
gettimeofday timer.
clock_gettime timer with CLOCK_MONOTONIC_RAW as clock.
Application's symbol table obtained via 'nm -l' for compiler instrumentation
File name, preferably with full path, of
$ nm -l
Only needed if generating the file at measurement initialization time fails,
e.g., if using the 'system()' command from the compute nodes isn't possible.
Number of foreign task objects that are collected before they are put into the common task object exchange buffer
The profiling creates a record for every task instance that is running. To avoid locking, the required memory is taken from a preallocated memory block. Each thread has its own memory block. On task completion, the created object can be reused by other tasks. However, if tasks migrate, the data structure migrates with them. Thus, if there is an imbalance in the migration from a source thread that starts the execution of tasks towards a sink thread that completes the tasks, the source thread may continually creating new task objects while in the sink, released task objects are collected. Thus, if the sink collected a certain number of tasks it should trigger a backflow of its collected task objects. However, this requires locking which should be avoided as much as possible. Thus, we do not want the locking to happen on every migrated task, but only if a certain imbalance occurs. This environment variable determines the number of migrated task instances that must be collected before the backflow is triggered.
Maximum depth of the calltree
Base for construction of the profile filename
String which is used as based to create the filenames for the profile files.
Profile output format
Sets the output format for the profile.
The following formats are supported:
No profile output. This does not disable profile recording.
Tau snapshot format.
Stores the sum for every metric per callpath per location in Cube4 format.
Stores an extended set of statistics in Cube4 format.
Sums all locations within a location group and stores the data in Cube4 format.
Sums all locations within a location group and store in addition some statistical data about the distribution among the location of a location group.
Stores the initial location, the slowest location and the fastest location per process. Sums all other locations within a location group. The result is stored in Cube4 format.
Clusters locations within a location group if they have the same calltree structure. Sums locations within a cluster. Stores the result in Cube4 format.
Default format. If Cube4 is supported, Cube4 is the default else the Tau snapshot format is default.
Enable clustering
Maximum cluster count for iteration clustering
Specifies the level of strictness when comparing call trees for equivalence
Possible levels:
No structural similarity required.
The sub-trees structure must match.
The sub-trees structure and the number of visits must match.
The structure of the call-path to MPI calls must match.
Nodes that are not on an MPI call-path may differ.
Like above, but the number of visits of the MPI calls must match, too.
Like above, but the number of visits must match also match on all nodes on the call-path to an MPI function.
Name of the clustered region
The clustering can only cluster one dynamic region. If more than one dynamic region are defined by the user, the region is clustered which is exited first. If another region should be clustered instead you can specify the region name in this variable. If the variable is unset or empty, the first exited dynamic region is clustered.
Write .core files if an error occurred
If an error occurs inside the profiling system, the profiling is disabled. For debugging reasons, it might be feasible to get the state of the local stack at these points. It is not recommended to enable this feature for large scale measurements.
Whether or not to use libsion as OTF2 substrate
Maximum number of processes that share one sion file (must be > 0)
All processes are than evenly distributed over the number of needed files to fulfill this constraint. E.g., having 4 processes and setting the maximum to 3 would result in 2 files each holding 2 processes.
Write calling context information as a sequence of Enter/Leave events to trace
When recording the calling context of events (instrumented or sampled) than
these could be stored in the trace either as the new CallingContext records
from OTF2 or they could be converted to the legacy Enter/Leave records. This
can be controlled with this variable, where the former is the false value.
This is only in effect if SCOREP_ENABLING_UNWINDING is on.
Note that enabling this will result in an increase of records per event and
also of the loss of the source code locations.
This option exists only for backwards compatibility for tools, which cannot
handle the new OTF2 records. This option my thus be removed in future
releases.
Enable online access interface
Online access registry service port
Online access registry service hostname
Base port for online access server
Application name to be registered
A file name which contain the filter rules
Specify list of used plugins
List of requested substrate plugin names that will be used during program run.
Separator of substrate plugin names
Character that separates plugin names in SCOREP_SUBSTRATE_PLUGINS.
PAPI metric names to measure
List of requested PAPI metric names that will be collected during program run.
PAPI metric names to measure per-process
List of requested PAPI metric names that will be recorded only by first thread of a process.
Separator of PAPI metric names
Character that separates metric names in SCOREP_METRIC_PAPI and SCOREP_METRIC_PAPI_PER_PROCESS.
Resource usage metric names to measure
List of requested resource usage metric names that will be collected during program run.
Resource usage metric names to measure per-process
List of requested resource usage metric names that will be recorded only by first thread of a process.
Separator of resource usage metric names
Character that separates metric names in SCOREP_METRIC_RUSAGE and SCOREP_METRIC_RUSAGE_PER_PROCESS.
Specify list of used plugins
List of requested metric plugin names that will be used during program run.
Separator of plugin names
Character that separates plugin names in SCOREP_METRIC_PLUGINS.
PERF metric names to measure
List of requested PERF metric names that will be collected during program run.
PERF metric names to measure per-process
List of requested PERF metric names that will be recorded only by first thread of a process.
Separator of PERF metric names
Character that separates metric names in SCOREP_METRIC_PERF and SCOREP_METRIC_PERF_PER_PROCESS.
Set the sampling event and period:
This selects the interrupt source for sampling.
This is only in effect if SCOREP_ENABLE_UNWINDING is on.
Possible values:
- perf event (perf_
period in number of events, default: 10000000
e.g., perf_cycles@2000000
- PAPI event (PAPI_
period in number of events, default: 10000000
e.g., PAPI_TOT_CYC@2000000
- timer (POSIX timer, invalid for multi-threaded)
period in us, default: 10000
e.g., timer@2000
Separator of sampling event names
Character that separates sampling event names in SCOREP_SAMPLING_EVENTS
Record hardware topology information for this platform, if available.
Record the Process x Thread topology.
Record topologies provided by user instrumentation
Record MPI cartesian topologies.
A file name which configures selective recording
Determines the number of concurrently used communicators per process
Determines the number of concurrently used windows for MPI one-sided communication per process
Maximum amount of concurrently active access or exposure epochs per process
Maximum number of concurrently used MPI groups per process
The names of the function groups which are measured
Other functions are not measured.
Possible groups are:
All MPI functions
Communicator and group management
Collective functions
Default configuration.
Includes:
- cg
- coll
- env
- io
- p2p
- rma
- topo
- xnonblock
Environmental management
MPI Error handling
External interface functions
MPI file I/O
Peer-to-peer communication
Miscellaneous
PControl
One sided communication
Process management
Topology
MPI datatype functions
Extended non-blocking events
Test events for uncompleted requests
Enable tracking of memory allocations done by calls to MPI_ALLOC_MEM and MPI_FREE_MEM
Requires that the MISC group is also recorded.
Enable online mpi wait states analysis
Enable tracking of memory allocations done by calls to the SHMEM allocation API
CUDA measurement features
Sets the CUDA measurement mode to capture:
CUDA runtime API
CUDA driver API
CUDA kernels
Serialized kernel recording
Fixed CUDA kernel metrics
CUDA memory copies
Record implicit and explicit CUDA synchronization
GPU compute idle time
GPU idle time (memory copies are not idle)
Record CUDA memory (de)allocations as a counter
Record references between CUDA activities
Flush CUDA activity buffer at program exit
CUDA runtime API and GPU activities.
Includes:
- runtime
- kernel
- memcpy
Total memory in bytes for the CUDA record buffer
Chunk size in bytes for the CUDA record buffer (ignored for CUDA 5.5 and earlier)
OpenCL measurement features
Sets the OpenCL measurement mode to capture:
OpenCL runtime API
OpenCL kernels
OpenCL buffer reads/writes
OpenCL API and GPU activities.
Includes:
- api
- kernel
- memcpy
Memory in bytes for the OpenCL command queue buffer
OpenACC measurement features
Sets the OpenACC measurement mode to capture:
OpenACC regions
OpenACC wait operations
OpenACC enqueue operations (kernel, upload, download)
OpenACC device memory allocations
Record kernel properties such as the kernel name as well as the gang, worker and vector size for kernel launch operations
Record variable names for OpenACC data allocation and enqueue upload/download
OpenACC regions, enqueue and wait operations.
Includes:
- regions
- wait
- enqueue
Memory recording
Memory (de)allocations are recorded via the libc/C++ API.