Many parallel computation algorithms involve repetitively executing a collective communication operation with the same arguments each time. As with persistent point-to-point operations (see Section Persistent Communication Requests), persistent collective operations allow the MPI programmer to specify operations that will be reused frequently (with fixed arguments). MPI can be designed to select a more efficient way to perform the collective operation based on the parameters specified when the operation is initialized. This ``planned-transfer'' approach [53,42] can offer significant performance benefits for programs with repetitive communication patterns.
In terms of data movement, each persistent collective operation has the same effect as its blocking and nonblocking counterparts for intra-communicators and inter-communicators after completion. Likewise, upon completion, persistent collective reduction operations perform the same operation as their blocking and nonblocking counterparts, and the same restrictions and recommendations on reduction orders apply (see also Section Reduce).
Initialization calls for MPI persistent collective operations are nonlocal and follow all the existing rules for collective operations, in particular ordering; programs that do not conform to these restrictions are erroneous. After initialization, all arrays associated with input arguments (such as arrays of counts, displacements, and datatypes in the vector versions of the collectives) must not be modified until the corresponding persistent request is freed with MPI_REQUEST_FREE.
According to the definitions in Section MPI Procedures,
the persistent collective initialization procedures are incomplete.
They are also nonlocal procedures because they may or may not return
before they are called in all MPI processes of the MPI process group
associated with the specified communicator.
Advice to users.
This is one of the exceptions in which incomplete procedures are nonlocal and therefore blocking.
( End of advice to users.)
The request argument is an output argument
that can be used zero or more times
with MPI_START or MPI_STARTALL
in order to start the collective operation.
The request is initially inactive after the initialization call.
Once initialized,
persistent collective operations can be started in any order
and the order can differ among the MPI processes in the communicator.
Rationale.
All ordering requirements that an implementation may need to match up collective operations across the communicator
are achieved through the ordering requirements of the initialization functions. This enables out-of-order starts
for the persistent operations, and particularly supports their use in MPI_STARTALL.
( End of rationale.)
Advice
to implementors.
An MPI implementation should do no worse than duplicating the communicator during the initialization function,
caching the input arguments, and
calling the appropriate nonblocking collective function, using the cached arguments, during MPI_START.
High-quality implementations should be able to amortize setup costs and further optimize by taking advantage of early-binding,
such as efficient and effective pre-allocation of certain resources and algorithm selection.
( End of advice to implementors.)
A request must be inactive when it is started.
Starting the operation makes the request active.
Once any MPI process starts a persistent collective operation,
it must complete that operation
and
all other MPI processes in the communicator must eventually
start (and complete) the same persistent collective operation.
Persistent collective operations cannot be
matched with blocking or nonblocking collective operations.
Completion of a persistent collective operation makes the corresponding request inactive.
After starting a persistent collective operation, all associated
send buffers must not be modified and all associated receive buffers
must not be accessed until the corresponding persistent request is
completed.
Completing a persistent collective request, for example using MPI_TEST or MPI_WAIT, makes it inactive, but does not free the request. This is the same behavior as for persistent point-to-point requests. Inactive persistent collective requests can be freed using MPI_REQUEST_FREE. It is erroneous to free an active persistent collective request. Persistent collective operations cannot be canceled; it is erroneous to use MPI_CANCEL on a persistent collective request.
For every nonblocking collective communication operation in MPI, there is a corresponding persistent collective operation with the analogous API signature.
The collective persistent API signatures include an info object in order to support optimization hints and other information that may be nonstandard. Persistent collective operations may be optimized during communicator creation or by the initialization operation of an individual persistent collective. Note that communicator-scoped hints should be provided using MPI_COMM_SET_INFO while, for operation-scoped hints, they are supplied to the persistent collective communication initialization functions using the info argument.