7.8. All-to-All Scatter/Gather

Up: Collective Communication Next: Global Reduction Operations Previous: Example using MPI_ALLGATHER

MPI_ALLTOALL(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm)
IN sendbuf	starting address of send buffer (choice)
IN sendcount	number of elements sent to each MPI process (non-negative integer)
IN sendtype	datatype of send buffer elements (handle)
OUT recvbuf	address of receive buffer (choice)
IN recvcount	number of elements received from any MPI process (non-negative integer)
IN recvtype	datatype of receive buffer elements (handle)
IN comm	communicator (handle)

C binding
int MPI_Alltoall(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm) int MPI_Alltoall_c(const void *sendbuf, MPI_Count sendcount, MPI_Datatype sendtype, void *recvbuf, MPI_Count recvcount, MPI_Datatype recvtype, MPI_Comm comm) Fortran 2008 binding
MPI_Alltoall(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, ierror) TYPE(*), DIMENSION(..), INTENT(IN) :: sendbuf INTEGER, INTENT(IN) :: sendcount, recvcount TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype TYPE(*), DIMENSION(..) :: recvbuf TYPE(MPI_Comm), INTENT(IN) :: comm INTEGER, OPTIONAL, INTENT(OUT) :: ierror MPI_Alltoall(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm, ierror) !(_c) TYPE(*), DIMENSION(..), INTENT(IN) :: sendbuf INTEGER(KIND=MPI_COUNT_KIND), INTENT(IN) :: sendcount, recvcount TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype TYPE(*), DIMENSION(..) :: recvbuf TYPE(MPI_Comm), INTENT(IN) :: comm INTEGER, OPTIONAL, INTENT(OUT) :: ierror Fortran binding
MPI_ALLTOALL(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RECVTYPE, COMM, IERROR

MPI_ALLTOALL is an extension of MPI_ALLGATHER to the case where each MPI process sends distinct data to each of the receivers. The j-th block sent from MPI process i is received by MPI process j and is placed in the i-th block of recvbuf.

The type signature associated with sendcount, sendtype, at an MPI process must be equal to the type signature associated with recvcount, recvtype at any other MPI process. This implies that the amount of data sent must be equal to the amount of data received, pairwise between every pair of MPI processes. As usual, however, the type maps may be different.

If comm is an intra-communicator, the outcome is as if each MPI process executed a send to each MPI process (itself included) with a call to,

Image file

and a receive from every other MPI process with a call to,

All arguments on all MPI processes are significant. The argument comm must have identical values on all MPI processes.

The ``in place'' option for intra-communicators is specified by passing MPI_IN_PLACE to the argument sendbuf at all MPI processes. In such a case, sendcount and sendtype are ignored. The data to be sent is taken from the recvbuf and replaced by the received data. Data sent and received must have the same type map as specified by recvcount and recvtype.

Rationale.

For large MPI_ALLTOALL instances, allocating both send and receive buffers may consume too much memory. The ``in place'' option effectively halves the application memory consumption and is useful in situations where the data to be sent will not be used by the sending MPI process after the MPI_ALLTOALL exchange (e.g., in parallel Fast Fourier Transforms). ( End of rationale.)

Advice to implementors.

Users may opt to use the ``in place'' option in order to conserve memory. Quality MPI implementations should thus strive to minimize system buffering. ( End of advice to implementors.)
If comm is an inter-communicator, then the outcome is as if each MPI process in group A sends a message to each MPI process in group B, and vice versa. The j-th send buffer of MPI process i in group A should be consistent with the i-th receive buffer of MPI process j in group B, and vice versa.

Advice to users.

When a complete exchange is executed in the inter-communicator case, then the number of data items sent from MPI processes in group A to MPI processes in group B need not equal the number of items sent in the reverse direction. In particular, one can have unidirectional communication by specifying sendcount = 0 in the reverse direction. ( End of advice to users.)

MPI_ALLTOALLV(sendbuf, sendcounts, sdispls, sendtype, recvbuf, recvcounts, rdispls, recvtype, comm)
IN sendbuf	starting address of send buffer (choice)
IN sendcounts	nonnegative integer array (of length group size) specifying the number of elements to send to each rank
IN sdispls	integer array (of length group size). Entry j specifies the displacement (relative to sendbuf) from which to take the outgoing data destined for MPI process j
IN sendtype	datatype of send buffer elements (handle)
OUT recvbuf	address of receive buffer (choice)
IN recvcounts	nonnegative integer array (of length group size) specifying the number of elements that can be received from each rank
IN rdispls	integer array (of length group size). Entry i specifies the displacement (relative to recvbuf) at which to place the incoming data from MPI process i
IN recvtype	datatype of receive buffer elements (handle)
IN comm	communicator (handle)

C binding
int MPI_Alltoallv(const void *sendbuf, const int sendcounts[], const int sdispls[], MPI_Datatype sendtype, void *recvbuf, const int recvcounts[], const int rdispls[], MPI_Datatype recvtype, MPI_Comm comm) int MPI_Alltoallv_c(const void *sendbuf, const MPI_Count sendcounts[], const MPI_Aint sdispls[], MPI_Datatype sendtype, void *recvbuf, const MPI_Count recvcounts[], const MPI_Aint rdispls[], MPI_Datatype recvtype, MPI_Comm comm) Fortran 2008 binding
MPI_Alltoallv(sendbuf, sendcounts, sdispls, sendtype, recvbuf, recvcounts, rdispls, recvtype, comm, ierror) TYPE(*), DIMENSION(..), INTENT(IN) :: sendbuf INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype TYPE(*), DIMENSION(..) :: recvbuf TYPE(MPI_Comm), INTENT(IN) :: comm INTEGER, OPTIONAL, INTENT(OUT) :: ierror MPI_Alltoallv(sendbuf, sendcounts, sdispls, sendtype, recvbuf, recvcounts, rdispls, recvtype, comm, ierror) !(_c) TYPE(*), DIMENSION(..), INTENT(IN) :: sendbuf INTEGER(KIND=MPI_COUNT_KIND), INTENT(IN) :: sendcounts(*), recvcounts(*) INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: sdispls(*), rdispls(*) TYPE(MPI_Datatype), INTENT(IN) :: sendtype, recvtype TYPE(*), DIMENSION(..) :: recvbuf TYPE(MPI_Comm), INTENT(IN) :: comm INTEGER, OPTIONAL, INTENT(OUT) :: ierror Fortran binding
MPI_ALLTOALLV(SENDBUF, SENDCOUNTS, SDISPLS, SENDTYPE, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPE, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNTS(*), SDISPLS(*), SENDTYPE, RECVCOUNTS(*), RDISPLS(*), RECVTYPE, COMM, IERROR

MPI_ALLTOALLV adds flexibility to MPI_ALLTOALL in that the location of data for the send is specified by sdispls and the location of the placement of the data on the receive side is specified by rdispls.

If comm is an intra-communicator, then the j-th block sent from MPI process i is received by MPI process j and is placed in the i-th block of recvbuf. These blocks need not all have the same size.

The type signature associated with sendcounts[j], sendtype at MPI process i must be equal to the type signature associated with recvcounts[i], recvtype at MPI process j. This implies that the amount of data sent must be equal to the amount of data received, pairwise between every pair of MPI processes. Distinct type maps between sender and receiver are still allowed.

The outcome is as if each MPI process sent a message to every other MPI process with,

and received a message from every other MPI process with a call to

All arguments on all MPI processes are significant. The argument comm must have identical values on all MPI processes.

The ``in place'' option for intra-communicators is specified by passing MPI_IN_PLACE to the argument sendbuf at all MPI processes. In such a case, sendcounts, sdispls and sendtype are ignored. The data to be sent is taken from the recvbuf and replaced by the received data. Data sent and received must have the same type map as specified by the recvcounts array and the recvtype, and is taken from the locations of the receive buffer specified by rdispls.

Advice to users.

Specifying the ``in place'' option (which must be given on all MPI processes) implies that the same amount and type of data is sent and received between any two MPI processes in the group of the communicator. Different pairs of MPI processes can exchange different amounts of data. Users must ensure that recvcounts[j] and recvtype on MPI process i match recvcounts[i] and recvtype on MPI process j. This symmetric exchange can be useful in applications where the data to be sent will not be used by the sending MPI process after the MPI_ALLTOALLV exchange. ( End of advice to users.)
If comm is an inter-communicator, then the outcome is as if each MPI process in group A sends a message to each MPI process in group B, and vice versa. The j-th send buffer of MPI process i in group A should be consistent with the i-th receive buffer of MPI process j in group B, and vice versa.

Rationale.

The definitions of MPI_ALLTOALL and MPI_ALLTOALLV give as much flexibility as one would achieve by specifying n independent, point-to-point communications, with two exceptions: all messages use the same datatype, and messages are scattered from (or gathered to) sequential storage. ( End of rationale.)

Advice to implementors.

Although the discussion of collective communication in terms of point-to-point operation implies that each message is transferred directly from sender to receiver, implementations may use a tree communication pattern. Messages can be forwarded by intermediate nodes where they are split (for scatter) or concatenated (for gather), if this is more efficient. ( End of advice to implementors.)

MPI_ALLTOALLW(sendbuf, sendcounts, sdispls, sendtypes, recvbuf, recvcounts, rdispls, recvtypes, comm)
IN sendbuf	starting address of send buffer (choice)
IN sendcounts	nonnegative integer array (of length group size) specifying the number of elements to send to each rank
IN sdispls	integer array (of length group size). Entry j specifies the displacement in bytes (relative to sendbuf) from which to take the outgoing data destined for MPI process j (array of integers)
IN sendtypes	array of datatypes (of length group size). Entry j specifies the type of data to send to MPI process j (array of handles)
OUT recvbuf	address of receive buffer (choice)
IN recvcounts	nonnegative integer array (of length group size) specifying the number of elements that can be received from each rank
IN rdispls	integer array (of length group size). Entry i specifies the displacement in bytes (relative to recvbuf) at which to place the incoming data from MPI process i (array of integers)
IN recvtypes	array of datatypes (of length group size). Entry i specifies the type of data received from MPI process i (array of handles)
IN comm	communicator (handle)

C binding
int MPI_Alltoallw(const void *sendbuf, const int sendcounts[], const int sdispls[], const MPI_Datatype sendtypes[], void *recvbuf, const int recvcounts[], const int rdispls[], const MPI_Datatype recvtypes[], MPI_Comm comm) int MPI_Alltoallw_c(const void *sendbuf, const MPI_Count sendcounts[], const MPI_Aint sdispls[], const MPI_Datatype sendtypes[], void *recvbuf, const MPI_Count recvcounts[], const MPI_Aint rdispls[], const MPI_Datatype recvtypes[], MPI_Comm comm) Fortran 2008 binding
MPI_Alltoallw(sendbuf, sendcounts, sdispls, sendtypes, recvbuf, recvcounts, rdispls, recvtypes, comm, ierror) TYPE(*), DIMENSION(..), INTENT(IN) :: sendbuf INTEGER, INTENT(IN) :: sendcounts(*), sdispls(*), recvcounts(*), rdispls(*) TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) TYPE(*), DIMENSION(..) :: recvbuf TYPE(MPI_Comm), INTENT(IN) :: comm INTEGER, OPTIONAL, INTENT(OUT) :: ierror MPI_Alltoallw(sendbuf, sendcounts, sdispls, sendtypes, recvbuf, recvcounts, rdispls, recvtypes, comm, ierror) !(_c) TYPE(*), DIMENSION(..), INTENT(IN) :: sendbuf INTEGER(KIND=MPI_COUNT_KIND), INTENT(IN) :: sendcounts(*), recvcounts(*) INTEGER(KIND=MPI_ADDRESS_KIND), INTENT(IN) :: sdispls(*), rdispls(*) TYPE(MPI_Datatype), INTENT(IN) :: sendtypes(*), recvtypes(*) TYPE(*), DIMENSION(..) :: recvbuf TYPE(MPI_Comm), INTENT(IN) :: comm INTEGER, OPTIONAL, INTENT(OUT) :: ierror Fortran binding
MPI_ALLTOALLW(SENDBUF, SENDCOUNTS, SDISPLS, SENDTYPES, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPES, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNTS(*), SDISPLS(*), SENDTYPES(*), RECVCOUNTS(*), RDISPLS(*), RECVTYPES(*), COMM, IERROR

MPI_ALLTOALLW is the most general form of complete exchange. Like MPI_TYPE_CREATE_STRUCT, the most general type constructor, MPI_ALLTOALLW allows separate specification of count, displacement and datatype. In addition, to allow maximum flexibility, the displacement of blocks within the send and receive buffers is specified in bytes.

If comm is an intra-communicator, then the j-th block sent from MPI process i is received by MPI process j and is placed in the i-th block of recvbuf. These blocks need not all have the same size.

The type signature associated with sendcounts[j], sendtypes[j] at MPI process i must be equal to the type signature associated with recvcounts[i], recvtypes[i] at MPI process j. This implies that the amount of data sent must be equal to the amount of data received, pairwise between every pair of MPI processes. Distinct type maps between sender and receiver are still allowed.

The outcome is as if each MPI process sent a message to every other MPI process with

and received a message from every other MPI process with a call to

All arguments on all MPI processes are significant. The argument comm must describe the same communicator on all MPI processes.

Like for MPI_ALLTOALLV, the ``in place'' option for intra-communicators is specified by passing MPI_IN_PLACE to the argument sendbuf at all MPI processes. In such a case, sendcounts, sdispls and sendtypes are ignored. The data to be sent is taken from the recvbuf and replaced by the received data. Data sent and received must have the same type map as specified by the recvcounts and recvtypes arrays, and is taken from the locations of the receive buffer specified by rdispls.

If comm is an inter-communicator, then the outcome is as if each MPI process in group A sends a message to each MPI process in group B, and vice versa. The j-th send buffer of MPI process i in group A should be consistent with the i-th receive buffer of MPI process j in group B, and vice versa.

Rationale.

The MPI_ALLTOALLW function generalizes several MPI functions by carefully selecting the input arguments. For example, by making all but one MPI process have sendcounts[i] = 0, this achieves an MPI_SCATTERW function. ( End of rationale.)

Up: Collective Communication Next: Global Reduction Operations Previous: Example using MPI_ALLGATHER

Return to MPI-4.1 Standard Index
Return to MPI Forum Home Page

(Unofficial) MPI-4.1 of November 2, 2023
HTML Generated on November 19, 2023