The following section focuses on the ability to list and to query performance variables provided by the MPI implementation. Performance variables provide insight into MPI implementation-specific internals and can represent information such as the state of the MPI implementation (e.g., waiting blocked, receiving, not active), aggregated timing data for submodules, or queue sizes and lengths.
Rationale.
The interface for performance variables is separate from the interface for
control variables, since performance variables have different requirements
and parameters. By keeping them separate, the interface provides cleaner
semantics and allows for more performance optimization opportunities.
( End of rationale.)
Some performance variables and classes refer to events. In general,
such events describe state transitions within software or hardware related
to the performance of an MPI application. The events offered through the
callback-driven event-notification interface described in
Section Events also refer to such state transitions;
however, the set of state transitions referred to by performance variables and
events as described in Section Events may not be identical.
Each performance variable is associated with a class that describes its basic semantics, possible datatypes, basic behavior, its starting value, whether it can overflow, and when and how an MPI implementation can change the variable's value. The starting value is the value that is assigned to the variable the first time that it is used or whenever it is reset.
Advice to users.
If a performance variable belongs to a class that can overflow, it is up to
the user to protect against this overflow, e.g., by frequently reading and
resetting the variable value.
( End of advice to users.)
Advice
to implementors.
MPI implementations should use large enough datatypes for each
performance variable to avoid overflows under normal circumstances.
( End of advice to implementors.)
The classes are defined by the following constants:
An MPI implementation exports a set of N performance variables through the MPI tool information interface. If N is zero, then the MPI implementation does not export any performance variables; otherwise the provided performance variables are indexed from 0 to N-1. This index number is used in subsequent calls to identify the individual variables.
An MPI implementation is allowed to increase the number of performance variables during the execution of an MPI application when new variables become available through dynamic loading. However, MPI implementations are not allowed to change the index of a performance variable or to delete a variable once it has been added to the set. When a variable becomes inactive, e.g., through dynamic unloading, accessing its value should return a corresponding return code.
The following function can be used to query the number of performance variables, mpiargnum_pvar:
MPI_T_PVAR_GET_NUM(num_pvar) | |
OUT num_pvar | returns number of performance variables (integer) |
The function MPI_T_PVAR_GET_INFO provides access to additional information for each variable.
MPI_T_PVAR_GET_INFO(pvar_index, name, name_len, verbosity, var_class, datatype, enumtype, desc, desc_len, bind, readonly, continuous, atomic) | |
IN pvar_index | index of the performance variable to be queried between 0 and mpiargnum_pvar-1 (integer) |
OUT name | buffer to return the string containing the name of the performance variable (string) |
INOUT name_len | length of the string and/or buffer for name (integer) |
OUT verbosity | verbosity level of this variable (integer) |
OUT var_class | class of performance variable (integer) |
OUT datatype | MPI datatype of the information stored in the performance variable (handle) |
OUT enumtype | optional descriptor for enumeration information (handle) |
OUT desc | buffer to return the string containing a description of the performance variable (string) |
INOUT desc_len | length of the string and/or buffer for desc (integer) |
OUT bind | type of MPI object to which this variable must be bound (integer) |
OUT readonly | flag indicating whether the variable can be written/reset (integer) |
OUT continuous | flag indicating whether the variable can be started and stopped or is continuously active (integer) |
OUT atomic | flag indicating whether the variable can be atomically read and reset (integer) |
After a successful call to MPI_T_PVAR_GET_INFO for a particular variable, subsequent calls to this routine that query information about the same variable must return the same information. An MPI implementation is not allowed to alter any of the returned values.
If any OUT parameter to MPI_T_PVAR_GET_INFO is a NULL pointer, the implementation will ignore the parameter and not return a value for the parameter.
The arguments name and name_len are used to return the name of the performance variable as described in Section Convention for Returning Strings. If completed successfully, the routine is required to return a name of at least length one.
The argument verbosity returns the verbosity level of the variable (see Section Verbosity Levels).
The class of the performance variable is returned in the parameter var_class. The class must be one of the constants defined in Section Performance Variable Classes.
The combination of the name and the class of the performance variable must be unique with respect to all other names for performance variables used by the MPI implementation.
Advice
to implementors.
Groups of variables that belong closely together, but have different
classes, can have the same name. This choice is useful, e.g., to refer to
multiple variables that describe a single resource (like the level, the
total size, as well as high- and low-water marks).
( End of advice to implementors.)
The argument datatype returns the MPI datatype that is used to
represent the performance variable.
If the variable is of type MPI_INT, MPI can optionally specify an enumeration for the values represented by this variable and return it in enumtype. In this case, MPI returns an enumeration identifier, which can then be used to gather more information as described in Section Datatype System. Otherwise, enumtype is set to MPI_T_ENUM_NULL. If the datatype is not MPI_INT or the argument enumtype is the null pointer, no enumeration type is returned.
Returning a description is optional. If an MPI implementation does not return a description, the first character for desc must be set to the null character and desc_len must be set to one at the return from this function.
The parameter bind returns the type of the MPI object to which the variable must be bound or the value MPI_T_BIND_NO_OBJECT (see Section Binding MPI Tool Information Interface Variables to MPI Objects).
Upon return, the argument readonly is set to zero if the variable can be written or reset by the user. It is set to one if the variable can only be read.
Upon return, the argument continuous is set to zero if the variable can be started and stopped by the user, i.e., it is possible for the user to control if and when the value of a variable is updated. It is set to one if the variable is always active and cannot be controlled by the user.
Upon return, the argument atomic is set to zero if the variable cannot be read and reset atomically. Only variables for which the call sets atomic to one can be used in a call to MPI_T_PVAR_READRESET.
If a performance variable has an equivalent name and has the same class across connected MPI processes, the following OUT parameters must be identical: verbosity, varclass, datatype, enumtype, bind, readonly, continuous, and atomic. The returned description must be equivalent.
MPI_T_PVAR_GET_INDEX(name, var_class, pvar_index) | |
IN name | the name of the performance variable (string) |
IN var_class | the class of the performance variable (integer) |
OUT pvar_index | the index of the performance variable (integer) |
MPI_T_PVAR_GET_INDEX is a function for retrieving the index of a performance variable given a known variable name and class. The name and var_class parameters are provided by the caller, and pvar_index is returned by the MPI implementation. The name parameter is a string terminated with a null character.
This routine returns MPI_SUCCESS on success and returns MPI_T_ERR_INVALID_NAME if name does not match the name of any performance variable of the specified var_class provided by the implementation at the time of the call.
Rationale.
This routine is provided to enable fast retrieval of performance variables
by a tool, assuming it knows the name of the variable for which it is
looking. The number of variables exposed by the implementation can change
over time, so it is not possible for the tool to simply iterate over the
list of variables once at initialization. Although using MPI
implementation specific variable names is not portable across MPI
implementations, tool developers may choose to take this route for lower
overhead at runtime because the tool will not have to iterate over the
entire set of variables to find a specific one.
( End of rationale.)
Within a single program, multiple components can use the MPI tool information interface. To avoid collisions with respect to accesses to performance variables, users of the MPI tool information interface must first create a performance experiment session. Subsequent calls that access performance variables can then be made within the context of this performance experiment session. Starting, stopping, reading, writing, or resetting a variable in one performance experiment session shall not influence whether a variable is started, stopped, read, written, or reset in another performance experiment session.
MPI_T_PVAR_SESSION_CREATE(pe_session) | |
OUT pe_session | identifier of performance experiment session (handle) |
This call creates a new performance experiment session for accessing performance variables and returns a handle for this performance experiment session in the argument pe_session of type MPI_T_pvar_session.
MPI_T_PVAR_SESSION_FREE(pe_session) | |
INOUT pe_session | identifier of performance experiment session (handle) |
This call frees an existing performance experiment session. Calls to the MPI tool information interface can no longer be made within the context of a performance experiment session after it is freed. On a successful return, MPI sets the performance experiment session identifier to MPI_T_PVAR_SESSION_NULL.
Before using a performance variable, a user must first allocate a handle of type MPI_T_pvar_handle for the variable by binding it to an MPI object (see also Section Binding MPI Tool Information Interface Variables to MPI Objects).
MPI_T_PVAR_HANDLE_ALLOC(pe_session, pvar_index, obj_handle, handle, count) | |
INOUT pe_session | identifier of performance experiment session (handle) |
IN pvar_index | index of performance variable for which handle is to be allocated (integer) |
IN obj_handle | reference to a handle of the MPI object to which this variable is supposed to be bound (pointer) |
OUT handle | allocated handle (handle) |
OUT count | number of elements used to represent this variable (integer) |
This routine binds the performance variable specified by the argument index to an MPI object in the performance experiment session identified by the parameter pe_session. The object is passed in the argument obj_handle as an address to a local variable that stores the object's handle. The argument obj_handle is ignored if the MPI_T_PVAR_GET_INFO call for this performance variable returned MPI_T_BIND_NO_OBJECT in the argument bind. The handle allocated to reference the variable is returned in the argument handle. Upon successful return, count contains the number of elements (of the datatype returned by a previous MPI_T_PVAR_GET_INFO call) used to represent this variable.
Advice to users.
The count can be different based on the MPI object to which the performance variable was bound. For example, variables bound to communicators could have a count that matches the size of the communicator.
It is not portable to pass references to predefined MPI object handles,
such as MPI_COMM_WORLD, to this routine, since their
implementation depends on the MPI library. Instead, such an object
handle should be stored in a local variable and the address of this local
variable should be passed into MPI_T_PVAR_HANDLE_ALLOC.
( End of advice to users.)
The value of index should be in the range from 0 to mpiargnum_pvar-1,
where mpishortargnum_pvar is the number of available performance
variables as determined from a prior call to
MPI_T_PVAR_GET_NUM. The type of the MPI object it
references must be consistent with the type returned in the bind
argument in a prior call to MPI_T_PVAR_GET_INFO.
For all routines in the rest of this section that take both handle and pe_session as IN or INOUT arguments, if the handle argument passed in is not associated with the pe_session argument, MPI_T_ERR_INVALID_HANDLE is returned.
MPI_T_PVAR_HANDLE_FREE(pe_session, handle) | |
INOUT pe_session | identifier of performance experiment session (handle) |
INOUT handle | handle to be freed (handle) |
When a handle is no longer needed, a user of the MPI tool information interface should call MPI_T_PVAR_HANDLE_FREE to free the handle in the performance experiment session identified by the parameter pe_session and the associated resources in the MPI implementation. On a successful return, MPI sets the handle to MPI_T_PVAR_HANDLE_NULL.
Performance variables that have the continuous flag set during the query procedure are continuously updated once a handle has been allocated. Such variables may be queried at any time, but they cannot be started or stopped by the user. All other variables are in a stopped state after their handle has been allocated; their values are not updated until they have been started by the user.
MPI_T_PVAR_START(pe_session, handle) | |
IN pe_session | identifier of performance experiment session (handle) |
INOUT handle | handle of a performance variable (handle) |
This functions starts the performance variable with the handle identified by the parameter handle in the performance experiment session identified by the parameter pe_session.
If the constant MPI_T_PVAR_ALL_HANDLES is passed in handle, the MPI implementation attempts to start all variables within the performance experiment session identified by the parameter pe_session for which handles have been allocated. In this case, the routine returns MPI_SUCCESS if all variables are started successfully (even if there are no noncontinuous variables to be started), otherwise MPI_T_ERR_PVAR_NO_STARTSTOP is returned. Continuous variables and variables that are already started are ignored when MPI_T_PVAR_ALL_HANDLES is specified.
MPI_T_PVAR_STOP(pe_session, handle) | |
IN pe_session | identifier of performance experiment session (handle) |
INOUT handle | handle of a performance variable (handle) |
This functions stops the performance variable with the handle identified by the parameter handle in the performance experiment session identified by the parameter pe_session.
If the constant MPI_T_PVAR_ALL_HANDLES is passed in handle, the MPI implementation attempts to stop all variables within the performance experiment session identified by the parameter pe_session for which handles have been allocated. In this case, the routine returns MPI_SUCCESS if all variables are stopped successfully (even if there are no noncontinuous variables to be stopped), otherwise MPI_T_ERR_PVAR_NO_STARTSTOP is returned. Continuous variables and variables that are already stopped are ignored when MPI_T_PVAR_ALL_HANDLES is specified.
MPI_T_PVAR_READ(pe_session, handle, buf) | |
IN pe_session | identifier of performance experiment session (handle) |
IN handle | handle of a performance variable (handle) |
OUT buf | initial address of storage location for variable value (choice) |
The MPI_T_PVAR_READ call queries the value of the performance variable with the handle handle in the performance experiment session identified by the parameter pe_session and stores the result in the buffer identified by the parameter buf. The user is responsible to ensure that the buffer is of the appropriate size to hold the entire value of the performance variable (based on the datatype and count returned by the corresponding previous calls to MPI_T_PVAR_GET_INFO and MPI_T_PVAR_HANDLE_ALLOC, respectively).
The constant MPI_T_PVAR_ALL_HANDLES cannot be used as an argument for the function MPI_T_PVAR_READ.
MPI_T_PVAR_WRITE(pe_session, handle, buf) | |
IN pe_session | identifier of performance experiment session (handle) |
INOUT handle | handle of a performance variable (handle) |
IN buf | initial address of storage location for variable value (choice) |
The MPI_T_PVAR_WRITE call attempts to write the value of the performance variable with the handle identified by the parameter handle in the performance experiment session identified by the parameter pe_session. The value to be written is passed in the buffer identified by the parameter buf. The user must ensure that the buffer is of the appropriate size to hold the entire value of the performance variable (based on the datatype and count returned by the corresponding previous calls to MPI_T_PVAR_GET_INFO and MPI_T_PVAR_HANDLE_ALLOC, respectively).
If it is not possible to change the variable, the function returns MPI_T_ERR_PVAR_NO_WRITE.
The constant MPI_T_PVAR_ALL_HANDLES cannot be used as an argument for the function MPI_T_PVAR_WRITE.
MPI_T_PVAR_RESET(pe_session, handle) | |
IN pe_session | identifier of performance experiment session (handle) |
INOUT handle | handle of a performance variable (handle) |
The MPI_T_PVAR_RESET call sets the performance variable with the handle identified by the parameter handle to its starting value specified in Section Performance Variable Classes. If it is not possible to change the variable, the function returns MPI_T_ERR_PVAR_NO_WRITE.
If the constant MPI_T_PVAR_ALL_HANDLES is passed in handle, the MPI implementation attempts to reset all variables within the performance experiment session identified by the parameter pe_session for which handles have been allocated. In this case, the routine returns MPI_SUCCESS if all variables are reset successfully (even if there are no valid handles or all are read-only), otherwise MPI_T_ERR_PVAR_NO_WRITE is returned. Read-only variables are ignored when MPI_T_PVAR_ALL_HANDLES is specified.
MPI_T_PVAR_READRESET(pe_session, handle, buf) | |
IN pe_session | identifier of performance experiment session (handle) |
INOUT handle | handle of a performance variable (handle) |
OUT buf | initial address of storage location for variable value (choice) |
This call atomically combines the functionality of MPI_T_PVAR_READ and MPI_T_PVAR_RESET with the same semantics as if these two calls were called separately. If the variable cannot be read and reset atomically, this routine returns MPI_T_ERR_PVAR_NO_ATOMIC.
The constant MPI_T_PVAR_ALL_HANDLES cannot be used as an argument for the function MPI_T_PVAR_READRESET.
Advice
to implementors.
Sampling-based tools rely on the ability to call the MPI tool information
interface, in particular routines to start, stop, read, write, and reset
performance variables, from any program context, including asynchronous
contexts such as signal handlers. MPI implementations should strive, if
possible in their particular environment, to enable these usage scenarios
for all or a subset of the routines mentioned above. If implementing only
a subset, the read, write, and reset routines are typically the most
critical for sampling based tools. An MPI implementation should clearly
document any restrictions on the program contexts in which the MPI tool
information interface can be used. Restrictions might include guaranteeing
usage outside of all signals or outside a specific set of signals. Any
restrictions could be documented, for example, through the description
returned by MPI_T_PVAR_GET_INFO.
( End of advice to implementors.)
Rationale.
All routines to read, to write or to reset performance variables require
the performance experiement session argument. This requirement keeps the interface consistent and
allows the use of MPI_T_PVAR_ALL_HANDLES where appropriate.
Further, this opens up additional performance optimizations for the
implementation of handles.
( End of rationale.)
Example
Detecting Receives with long unexpected message queues.
The following example shows a sample tool to identify receive operations that occur during times with long message queues. This examples assumes that the MPI implementation exports a variable with the name ``MPI_T_UMQ_LENGTH'' to represent the current length of the unexpected message queue. The tool is implemented as a PMPI tool using the MPI profiling interface.
The tool consists of three parts: (1) the initialization (by intercepting the call to MPI_INIT), (2) the test for long unexpected message queues (by intercepting calls to MPI_RECV), and (3) the clean-up phase (by intercepting the call to MPI_FINALIZE). To capture all receives, the example would have to be extended to have similar wrappers for all receive operations.
Part 1---Initialization: During initialization, the tool searches for the variable and, once the right index is found, allocates a performance experiment session and a handle for the variable with the found index, and starts the performance variable.
Part 2---Testing the Queue Lengths During Receives: During every receive operation, the tool reads the unexpected queue length through the matching performance variable and compares it against a predefined threshold.
Part 3---Termination: In the wrapper for MPI_FINALIZE, the MPI tool information interface is finalized.