MPI includes a variant of the reduce operations where the result is scattered to all processes in a group on return.
MPI_REDUCE_SCATTER( sendbuf, recvbuf, recvcounts, datatype, op, comm) | |
IN sendbuf | starting address of send buffer (choice) |
OUT recvbuf | starting address of receive buffer (choice) |
IN recvcounts | non-negative integer array specifying the number of elements in result distributed to each process. Array must be identical on all calling processes. |
IN datatype | data type of elements of input buffer (handle) |
IN op | operation (handle) |
IN comm | communicator (handle) |
int MPI_Reduce_scatter(void* sendbuf, void* recvbuf, int *recvcounts, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
MPI_REDUCE_SCATTER(SENDBUF, RECVBUF, RECVCOUNTS, DATATYPE, OP, COMM, IERROR)
<type> SENDBUF(*), RECVBUF(*)
INTEGER RECVCOUNTS(*), DATATYPE, OP, COMM, IERROR
void MPI::Comm::Reduce_scatter(const void* sendbuf, void* recvbuf, int recvcounts[], const MPI::Datatype& datatype, const MPI::Op& op) const = 0
If comm is an intracommunicator, MPI_REDUCE_SCATTER first does an element-wise reduction on vector of elements in the send buffer defined by sendbuf, count and datatype. Next, the resulting vector of results is split into n disjoint segments, where n is the number of members in the group. Segment i contains recvcounts[i] elements. The i-th segment is sent to process i and stored in the receive buffer defined by recvbuf, recvcounts[i] and datatype.
Advice
to implementors.
The MPI_REDUCE_SCATTER
routine is functionally equivalent to:
an
MPI_REDUCE
collective
operation
with count equal to
the sum of recvcounts[i] followed by
MPI_SCATTERV with sendcounts equal to recvcounts.
However, a direct implementation may run faster.
( End of advice to implementors.)
The ``in place'' option for intracommunicators is specified by passing
MPI_IN_PLACE in
the sendbuf argument.
In this case, the input data is taken from the top of the receive
buffer.
If comm is an intercommunicator, then the result of the reduction
of the data provided by processes in group A is scattered among
processes in group B, and vice versa. Within each group, all
processes provide the same recvcounts argument, and the sum
of the recvcounts entries should be the same for the two groups.
Rationale.
The last restriction is needed so that the length of the send
buffer can be determined by the sum of the local recvcounts entries.
Otherwise, a communication is needed to figure out how many elements
are reduced.
( End of rationale.)