All-Reduce

107. All-Reduce

Up: Global Reduction Operations Next: Process-local reduction Previous: Example of User-defined Reduce

MPI includes a variant of the reduce operations where the result is returned to all processes in a group. MPI requires that all processes from the same group participating in these operations receive identical results.

MPI_ALLREDUCE( sendbuf, recvbuf, count, datatype, op, comm)

IN sendbuf starting address of send buffer (choice)

OUT recvbuf starting address of receive buffer (choice)

IN count number of elements in send buffer (non-negative integer)

IN datatype data type of elements of send buffer (handle)

IN op operation (handle)

IN comm communicator (handle)

int MPI_Allreduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)

MPI_ALLREDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE, OP, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER COUNT, DATATYPE, OP, COMM, IERROR { void MPI::Comm::Allreduce(const void* sendbuf, void* recvbuf, int count, const MPI::Datatype& datatype, const MPI::Op& op) const = 0 (binding deprecated, see Section Deprecated since MPI-2.2 ) }
If comm is an intracommunicator, MPI_ALLREDUCE behaves the same as MPI_REDUCE except that the result appears in the receive buffer of all the group members.

Advice to implementors.

The all-reduce operations can be implemented as a reduce, followed by a broadcast. However, a direct implementation can lead to better performance. ( End of advice to implementors.)

The ``in place'' option for intracommunicators is specified by passing the value MPI_IN_PLACE to the argument sendbuf at all processes. In this case, the input data is taken at each process from the receive buffer, where it will be replaced by the output data.

If comm is an intercommunicator, then the result of the reduction of the data provided by processes in group A is stored at each process in group B, and vice versa. Both groups should provide count and datatype arguments that specify the same type signature.

The following example uses an intracommunicator.
Example A routine that computes the product of a vector and an array that are distributed across a group of processes and returns the answer at all nodes (see also Example Predefined Reduction Operations ).

SUBROUTINE PAR_BLAS2(m, n, a, b, c, comm) 
REAL a(m), b(m,n)    ! local slice of array 
REAL c(n)            ! result 
REAL sum(n) 
INTEGER n, comm, i, j, ierr 
 
! local sum 
DO j= 1, n 
  sum(j) = 0.0 
  DO i = 1, m 
    sum(j) = sum(j) + a(i)*b(i,j) 
  END DO 
END DO 
 
! global sum 
CALL MPI_ALLREDUCE(sum, c, n, MPI_REAL, MPI_SUM, comm, ierr) 
 
! return result at all nodes 
RETURN

Up: Global Reduction Operations Next: Process-local reduction Previous: Example of User-defined Reduce

Return to MPI-2.2 Standard Index
Return to MPI Forum Home Page

(Unofficial) MPI-2.2 of September 4, 2009
HTML Generated on September 10, 2009

MPI_ALLREDUCE( sendbuf, recvbuf, count, datatype, op, comm)
IN sendbuf	starting address of send buffer (choice)
OUT recvbuf	starting address of receive buffer (choice)
IN count	number of elements in send buffer (non-negative integer)
IN datatype	data type of elements of send buffer (handle)
IN op	operation (handle)
IN comm	communicator (handle)