The following section focuses on the ability to list and to query performance variables provided by the MPI implementation. Performance variables provide insight into MPI implementation specific internals and can represent information such as the state of the MPI implementation (e.g., waiting blocked, receiving, not active), aggregated timing data for submodules, or queue sizes and lengths.
Rationale.
The interface for performance variables is separate from the interface
for control variables, since performance variables have different
requirements and parameters. By keeping them separate, the interface
provides cleaner semantics and allows for more performance optimization
opportunities.
( End of rationale.)
Each performance variable is associated with a class that describes its basic semantics, possible datatypes, basic behavior, its starting value, whether it can overflow, and when and how an MPI implementation can change the variable's value. The starting value is the value that is assigned to the variable the first time that it is used or whenever it is reset.
Advice to users.
If a performance variable belongs to a class that can overflow, it is
up to the user to protect against this overflow, e.g., by
frequently reading and resetting the variable value.
( End of advice to users.)
Advice
to implementors.
MPI implementations should use large enough datatypes for each
performance variable to avoid overflows under normal circumstances.
( End of advice to implementors.)
The classes are defined by the following constants:
An MPI implementation exports a set of N performance variables through the MPI tool information interface. If N is zero, then the MPI implementation does not export any performance variables; otherwise the provided performance variables are indexed from 0 to N-1. This index number is used in subsequent calls to identify the individual variables.
An MPI implementation is allowed to increase the number of performance variables during the execution of an MPI application when new variables become available through dynamic loading. However, MPI implementations are not allowed to change the index of a performance variable or to delete a variable once it has been added to the set. When a variable becomes inactive, e.g., through dynamic unloading, accessing its value should return a corresponding error code.
The following function can be used to query the number of performance variables, N:
MPI_T_PVAR_GET_NUM(num_pvar) | |
OUT num_pvar | returns number of performance variables (integer) |
int MPI_T_pvar_get_num(int *num_pvar)
The function MPI_T_PVAR_GET_INFO provides access to additional information for each variable.
int MPI_T_pvar_get_info(int pvar_index, char *name, int *name_len, int *verbosity, int *var_class, MPI_Datatype *datatype, MPI_T_enum *enumtype, char *desc, int *desc_len, int *bind, int *readonly, int *continuous, int *atomic)
After a successful call to MPI_T_PVAR_GET_INFO for a particular variable, subsequent calls to this routine that query information about the same variable must return the same information. An MPI implementation is not allowed to alter any of the returned values.
If any OUT parameter to MPI_T_PVAR_GET_INFO is a NULL pointer, the implementation will ignore the parameter and not return a value for the parameter.
The arguments name and name_len are used to return the name of the performance variable as described in Section Convention for Returning Strings . If completed successfully, the routine is required to return a name of at least length one.
The argument verbosity returns the verbosity level of the variable (see Section Verbosity Levels ).
The class of the performance variable is returned in the parameter var_class. The class must be one of the constants defined in Section Performance Variable Classes .
The combination of the name and the class of the performance variable must be unique with respect to all other names for performance variables used by the MPI implementation.
Advice
to implementors.
Groups of variables that belong closely together, but have different classes,
can have the same name. This choice is useful, e.g., to refer to multiple variables
that describe a single resource (like the level, the total size, as well as
high and low watermarks).
( End of advice to implementors.)
The argument datatype returns the MPI datatype that is used
to represent the performance variable.
If the variable is of type MPI_INT, MPI can optionally specify an enumeration for the values represented by this variable and return it in enumtype. In this case, MPI returns an enumeration identifier, which can then be used to gather more information as described in Section Datatype System . Otherwise, enumtype is set to MPI_T_ENUM_NULL. If the datatype is not MPI_INT or the argument enumtype is the null pointer, no enumeration type is returned.
Returning a description is optional. If an MPI implementation does not return a description, the first character for desc must be set to the null character and desc_len must be set to one at the return from this function.
The parameter bind returns the type of the MPI object to which the variable must be bound or the value MPI_T_BIND_NO_OBJECT (see Section Binding MPI Tool Information Interface Variables to MPI Objects ).
Upon return, the argument readonly is set to zero if the variable can be written or reset by the user. It is set to one if the variable can only be read.
Upon return, the argument continuous is set to zero if the variable can be started and stopped by the user, i.e., it is possible for the user to control if and when the value of a variable is updated. It is set to one if the variable is always active and cannot be controlled by the user.
Upon return, the argument atomic is set to zero if the variable cannot be read and reset atomically. Only variables for which the call sets atomic to one can be used in a call to MPI_T_PVAR_READRESET.
If a performance variable has an equivalent name and has the same class across connected processes, the following OUT parameters must be identical: verbosity, varclass, datatype, enumtype, bind, readonly, continuous, and atomic. The returned description must be equivalent.
MPI_T_PVAR_GET_INDEX(name, var_class, pvar_index) | |
IN name | the name of the performance variable (string) |
IN var_class | the class of the performance variable (integer) |
OUT pvar_index | the index of the performance variable (integer) |
int MPI_T_pvar_get_index(const char *name, int var_class, int *pvar_index)
MPI_T_PVAR_GET_INDEX is a function for retrieving the index of a performance variable given a known variable name and class. The name and var_class parameters are provided by the caller, and pvar_index is returned by the MPI implementation. The name parameter is a string terminated with a null character.
This routine returns MPI_SUCCESS on success and returns MPI_T_ERR_INVALID_NAME if name does not match the name of any performance variable of the specified var_class provided by the implementation at the time of the call.
Rationale.
This routine is provided to enable fast retrieval of performance
variables by a tool, assuming it knows the name of the variable for
which it is looking. The number of variables exposed by the implementation
can change over time, so it is not possible for the tool to simply iterate
over the list of variables once at initialization. Although using MPI
implementation specific variable names is not portable across MPI
implementations, tool developers may choose to take this route for lower
overhead at runtime because the tool will not have to iterate over the
entire set of variables to find a specific one.
( End of rationale.)
Within a single program, multiple components can use the MPI tool information interface. To avoid collisions with respect to accesses to performance variables, users of the MPI tool information interface must first create a session. Subsequent calls that access performance variables can then be made within the context of this session. Any call executed in a session must not influence the results in any other session.
MPI_T_PVAR_SESSION_CREATE(session) | |
OUT session | identifier of performance session (handle) |
int MPI_T_pvar_session_create(MPI_T_pvar_session *session)
This call creates a new session for accessing performance variables and returns a handle for this session in the argument session of type MPI_T_pvar_session.
MPI_T_PVAR_SESSION_FREE(session) | |
INOUT session | identifier of performance experiment session (handle) |
int MPI_T_pvar_session_free(MPI_T_pvar_session *session)
This call frees an existing session. Calls to the MPI tool information interface can no longer be made within the context of a session after it is freed. On a successful return, MPI sets the session identifier to MPI_T_PVAR_SESSION_NULL.
Before using a performance variable, a user must first allocate a handle of type MPI_T_pvar_handle for the variable by binding it to an MPI object (see also Section Binding MPI Tool Information Interface Variables to MPI Objects ).
MPI_T_PVAR_HANDLE_ALLOC(session, pvar_index, obj_handle, handle, count) | |
IN session | identifier of performance experiment session (handle) |
IN pvar_index | index of performance variable for which handle is to be allocated (integer) |
IN obj_handle | reference to a handle of the MPI object to which this variable is supposed to be bound (pointer) |
OUT handle | allocated handle (handle) |
OUT count | number of elements used to represent this variable (integer) |
int MPI_T_pvar_handle_alloc(MPI_T_pvar_session session, int pvar_index, void *obj_handle, MPI_T_pvar_handle *handle, int *count)
This routine binds the performance variable specified by the argument index to an MPI object in the session identified by the parameter session. The object is passed in the argument obj_handle as an address to a local variable that stores the object's handle. The argument obj_handle is ignored if the MPI_T_PVAR_GET_INFO call for this performance variable returned MPI_T_BIND_NO_OBJECT in the argument bind. The handle allocated to reference the variable is returned in the argument handle. Upon successful return, count contains the number of elements (of the datatype returned by a previous MPI_T_PVAR_GET_INFO call) used to represent this variable.
Advice to users.
The count can be different based on the MPI object to which the performance variable was bound. For example, variables bound to communicators could have a count that matches the size of the communicator.
It is not portable to pass references to predefined MPI object handles,
such as MPI_COMM_WORLD, to this routine, since their
implementation depends on the MPI library. Instead, such an object handle should
be stored in a local variable and the address of this local variable
should be passed into MPI_T_PVAR_HANDLE_ALLOC.
( End of advice to users.)
The value of index
should be in the range 0 to mpiargnum_pvar-1, where mpishortargnum_pvar is the number of
available performance variables as determined from a prior call to
MPI_T_PVAR_GET_NUM. The type of the MPI object it references must be consistent with the type
returned in the bind argument in a prior call to MPI_T_PVAR_GET_INFO.
For all routines in the rest of this section that take both handle and session as IN or INOUT arguments, if the handle argument passed in is not associated with the session argument, MPI_T_ERR_INVALID_HANDLE is returned.
MPI_T_PVAR_HANDLE_FREE(session, handle) | |
IN session | identifier of performance experiment session (handle) |
INOUT handle | handle to be freed (handle) |
int MPI_T_pvar_handle_free(MPI_T_pvar_session session, MPI_T_pvar_handle *handle)
When a handle is no longer needed, a user of the MPI tool information interface should call MPI_T_PVAR_HANDLE_FREE to free the handle in the session identified by the parameter session and the associated resources in the MPI implementation. On a successful return, MPI sets the handle to MPI_T_PVAR_HANDLE_NULL.
Performance variables that have the continuous flag set during the query operation are continuously operating once a handle has been allocated. Such variables may be queried at any time, but they cannot be started or stopped by the user. All other variables are in a stopped state after their handle has been allocated; their values are not updated until they have been started by the user.
MPI_T_PVAR_START(session, handle) | |
IN session | identifier of performance experiment session (handle) |
IN handle | handle of a performance variable (handle) |
int MPI_T_pvar_start(MPI_T_pvar_session session, MPI_T_pvar_handle handle)
This functions starts the performance variable with the handle identified by the parameter handle in the session identified by the parameter session.
If the constant MPI_T_PVAR_ALL_HANDLES is passed in handle, the MPI implementation attempts to start all variables within the session identified by the parameter session for which handles have been allocated. In this case, the routine returns MPI_SUCCESS if all variables are started successfully (even if there are no non-continuous variables to be started), otherwise MPI_T_ERR_PVAR_NO_STARTSTOP is returned. Continuous variables and variables that are already started are ignored when MPI_T_PVAR_ALL_HANDLES is specified.
MPI_T_PVAR_STOP(session, handle) | |
IN session | identifier of performance experiment session (handle) |
IN handle | handle of a performance variable (handle) |
int MPI_T_pvar_stop(MPI_T_pvar_session session, MPI_T_pvar_handle handle)
This functions stops the performance variable with the handle identified by the parameter handle in the session identified by the parameter session.
If the constant MPI_T_PVAR_ALL_HANDLES is passed in handle, the MPI implementation attempts to stop all variables within the session identified by the parameter session for which handles have been allocated. In this case, the routine returns MPI_SUCCESS if all variables are stopped successfully (even if there are no non-continuous variables to be stopped), otherwise MPI_T_ERR_PVAR_NO_STARTSTOP is returned. Continuous variables and variables that are already stopped are ignored when MPI_T_PVAR_ALL_HANDLES is specified.
MPI_T_PVAR_READ(session, handle, buf) | |
IN session | identifier of performance experiment session (handle) |
IN handle | handle of a performance variable (handle) |
OUT buf | initial address of storage location for variable value (choice) |
int MPI_T_pvar_read(MPI_T_pvar_session session, MPI_T_pvar_handle handle, void* buf)
The MPI_T_PVAR_READ call queries the value of the performance variable with the handle handle in the session identified by the parameter session and stores the result in the buffer identified by the parameter buf. The user is responsible to ensure that the buffer is of the appropriate size to hold the entire value of the performance variable (based on the datatype and count returned by the corresponding previous calls to MPI_T_PVAR_GET_INFO and MPI_T_PVAR_HANDLE_ALLOC, respectively).
The constant MPI_T_PVAR_ALL_HANDLES cannot be used as an argument for the function MPI_T_PVAR_READ.
MPI_T_PVAR_WRITE(session,handle, buf) | |
IN session | identifier of performance experiment session (handle) |
IN handle | handle of a performance variable (handle) |
IN buf | initial address of storage location for variable value (choice) |
int MPI_T_pvar_write(MPI_T_pvar_session session, MPI_T_pvar_handle handle, const void* buf)
The MPI_T_PVAR_WRITE call attempts to write the value of the performance variable with the handle identified by the parameter handle in the session identified by the parameter session. The value to be written is passed in the buffer identified by the parameter buf. The user must ensure that the buffer is of the appropriate size to hold the entire value of the performance variable (based on the datatype and count returned by the corresponding previous calls to MPI_T_PVAR_GET_INFO and MPI_T_PVAR_HANDLE_ALLOC, respectively).
If it is not possible to change the variable, the function returns MPI_T_ERR_PVAR_NO_WRITE.
The constant MPI_T_PVAR_ALL_HANDLES cannot be used as an argument for the function MPI_T_PVAR_WRITE.
MPI_T_PVAR_RESET(session, handle) | |
IN session | identifier of performance experiment session (handle) |
IN handle | handle of a performance variable (handle) |
int MPI_T_pvar_reset(MPI_T_pvar_session session, MPI_T_pvar_handle handle)
The MPI_T_PVAR_RESET call sets the performance variable with the handle identified by the parameter handle to its starting value specified in Section Performance Variable Classes . If it is not possible to change the variable, the function returns MPI_T_ERR_PVAR_NO_WRITE.
If the constant MPI_T_PVAR_ALL_HANDLES is passed in handle, the MPI implementation attempts to reset all variables within the session identified by the parameter session for which handles have been allocated. In this case, the routine returns MPI_SUCCESS if all variables are reset successfully (even if there are no valid handles or all are read-only), otherwise MPI_T_ERR_PVAR_NO_WRITE is returned. Read-only variables are ignored when MPI_T_PVAR_ALL_HANDLES is specified.
MPI_T_PVAR_READRESET(session, handle, buf) | |
IN session | identifier of performance experiment session (handle) |
IN handle | handle of a performance variable (handle) |
OUT buf | initial address of storage location for variable value (choice) |
int MPI_T_pvar_readreset(MPI_T_pvar_session session, MPI_T_pvar_handle handle, void* buf)
This call atomically combines the functionality of MPI_T_PVAR_READ and MPI_T_PVAR_RESET with the same semantics as if these two calls were called separately. If atomic operations on this variable are not supported, this routine returns MPI_T_ERR_PVAR_NO_ATOMIC.
The constant MPI_T_PVAR_ALL_HANDLES cannot be used as an argument for the function MPI_T_PVAR_READRESET.
Advice
to implementors.
Sampling-based tools rely on the ability to call the MPI
tool information interface, in particular routines to start, stop,
read, write, and reset performance variables, from any program
context, including asynchronous contexts such as signal handlers.
MPI implementations should strive, if possible in their particular
environment, to enable these usage scenarios for all or a subset of the
routines mentioned above. If implementing only a subset, the
read, write, and reset routines are typically the most critical
for sampling based tools. An MPI implementation should clearly
document any restrictions on the program contexts in which
the MPI tool information interface can be used. Restrictions
might include guaranteeing usage outside of all signals or
outside a specific set of signals. Any restrictions could be
documented, for example, through the description returned by
MPI_T_PVAR_GET_INFO.
( End of advice to implementors.)
Rationale.
All routines to read, to write or to reset performance variables require the
session argument. This requirement keeps the interface consistent and allows the
use of MPI_T_PVAR_ALL_HANDLES where appropriate.
Further, this opens up additional performance optimizations for
the implementation of handles.
( End of rationale.)
Example
The following example shows a sample tool to identify receive operations that occur during times with long message queues. This examples assumes that the MPI implementation exports a variable with the name `` MPI_T_UMQ_LENGTH'' to represent the current length of the unexpected message queue. The tool is implemented as a PMPI tool using the MPI profiling interface.
The tool consists of three parts: (1) the initialization (by intercepting the call to MPI_INIT), (2) the test for long unexpected message queues (by intercepting calls to MPI_RECV), and (3) the clean-up phase (by intercepting the call to MPI_FINALIZE). To capture all receives, the example would have to be extended to have similar wrappers for all receive operations.
Part 1--- Initialization: During initialization, the tool searches for the variable and, once the right index is found, allocates a session and a handle for the variable with the found index, and starts the performance variable.
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <assert.h> #include <mpi.h> /* Global variables for the tool */ static MPI_T_pvar_session session; static MPI_T_pvar_handle handle; int MPI_Init(int *argc, char ***argv ) { int err, num, i, index, namelen, verbosity; int var_class, bind, threadsup; int readonly, continuous, atomic, count; char name[18]; MPI_Comm comm; MPI_Datatype datatype; MPI_T_enum enumtype; err=PMPI_Init(argc,argv); if (err!=MPI_SUCCESS) return err; err=PMPI_T_init_thread(MPI_THREAD_SINGLE,&threadsup); if (err!=MPI_SUCCESS) return err; err=PMPI_T_pvar_get_num(&num); if (err!=MPI_SUCCESS) return err; index=-1; i=0; while ((i<num) && (index<0) && (err==MPI_SUCCESS)) { /* Pass a buffer that is at least one character longer than */ /* the name of the variable being searched for to avoid */ /* finding variables that have a name that has a prefix */ /* equal to the name of the variable being searched. */ namelen=18; err=PMPI_T_pvar_get_info(i, name, &namelen, &verbosity, &var_class, &datatype, &enumtype, NULL, NULL, &bind, &readonly, &continuous, &atomic); if (strcmp(name,"MPI_T_UMQ_LENGTH")==0) index=i; i++; } if (err!=MPI_SUCCESS) return err; /* this could be handled in a more flexible way for a generic tool */ assert(index>=0); assert(var_class==MPI_T_PVAR_CLASS_LEVEL); assert(datatype==MPI_INT); assert(bind==MPI_T_BIND_MPI_COMM); /* Create a session */ err=PMPI_T_pvar_session_create(&session); if (err!=MPI_SUCCESS) return err; /* Get a handle and bind to MPI_COMM_WORLD */ comm=MPI_COMM_WORLD; err=PMPI_T_pvar_handle_alloc(session, index, &comm, &handle, &count); if (err!=MPI_SUCCESS) return err; /* this could be handled in a more flexible way for a generic tool */ assert(count==1); /* Start variable */ err=PMPI_T_pvar_start(session, handle); if (err!=MPI_SUCCESS) return err; return MPI_SUCCESS; }
Part 2 --- Testing the Queue Lengths During Receives: During every receive operation, the tool reads the unexpected queue length through the matching performance variable and compares it against a predefined threshold.
#define THRESHOLD 5 int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) { int value, err; if (comm==MPI_COMM_WORLD) { err=PMPI_T_pvar_read(session, handle, &value); if ((err==MPI_SUCCESS) && (value>THRESHOLD)) { /* tool identified receive called with long UMQ */ /* execute tool functionality, */ /* e.g., gather and print call stack */ } } return PMPI_Recv(buf, count, datatype, source, tag, comm, status); }
Part 3 --- Termination: In the wrapper for MPI_FINALIZE, the MPI tool information interface is finalized.
int MPI_Finalize(void) { int err; err=PMPI_T_pvar_handle_free(session, &handle); err=PMPI_T_pvar_session_free(&session); err=PMPI_T_finalize(); return PMPI_Finalize(); }