MPI_REDUCE_SCATTER extends the functionality of MPI_REDUCE_SCATTER_BLOCK such that the scattered blocks can vary in size. Block sizes are determined by the recvcounts array, such that the i-th block contains recvcounts[i] elements.
MPI_REDUCE_SCATTER(sendbuf, recvbuf, recvcounts, datatype, op, comm) | |
IN sendbuf | starting address of send buffer (choice) |
OUT recvbuf | starting address of receive buffer (choice) |
IN recvcounts | nonnegative integer array (of length group size) specifying the number of elements of the result distributed to each MPI process. |
IN datatype | datatype of elements of send and receive buffers (handle) |
IN op | operation (handle) |
IN comm | communicator (handle) |
If comm is an intra-communicator, MPI_REDUCE_SCATTER first performs a global, element-wise reduction on vectors of elements in the send buffers defined by sendbuf, count and datatype, using the operation op, where n is the number of MPI processes in the group of comm. The routine is called by all group members using the same arguments for recvcounts, datatype, op and comm. The resulting vector is treated as n consecutive blocks where the number of elements of the i-th block is recvcounts[i]. The blocks are scattered to the MPI processes of the group. The i-th block is sent to MPI process i and stored in the receive buffer defined by recvbuf, recvcounts[i] and datatype.
Advice
to implementors.
The MPI_REDUCE_SCATTER
routine is functionally equivalent to
an
MPI_REDUCE
collective
operation
with count equal to
the sum of recvcounts[i] followed by
MPI_SCATTERV with sendcounts equal to recvcounts.
However, a direct implementation may run faster.
( End of advice to implementors.)
The ``in place'' option for intra-communicators is specified by passing
MPI_IN_PLACE in
the sendbuf argument.
In this case, the input data is taken from the receive
buffer. It is not required to specify the ``in
place'' option on all MPI processes, since the MPI processes for which
recvcounts[i] =0 may not have allocated a receive buffer.
If comm is an inter-communicator, then the result of the reduction of the data provided by MPI processes in one group (group A) is scattered among MPI processes in the other group (group B), and vice versa. Within each group, all MPI processes provide the same recvcounts argument, and provide input vectors of elements stored in the send buffers, where n is the size of the group. The resulting vector from the other group is scattered in blocks of recvcounts[i] elements among the MPI processes in the group. The number of elements count must be the same for the two groups.
Rationale.
The last restriction is needed so that the length of the send
buffer can be determined by the sum of the local recvcounts entries.
Otherwise, communication is needed to figure out how many elements
are reduced.
( End of rationale.)