The semantics of RMA operations is best understood by assuming that the system maintains a separate public copy of each window, in addition to the original location in process memory (the private window copy). There is only one instance of each variable in process memory, but a distinct public copy of the variable for each window that contains it. A load accesses the instance in process memory (this includes MPI sends). A store accesses and updates the instance in process memory (this includes MPI receives), but the update may affect other public copies of the same locations. A get on a window accesses the public copy of that window. A put or accumulate on a window accesses and updates the public copy of that window, but the update may affect the private copy of the same locations in process memory, and public copies of other overlapping windows. This is illustrated in Figure 21 .
The following rules specify the latest time at which an operation must complete at the origin or the target. The update performed by a get call in the origin process memory is visible when the get operation is complete at the origin (or earlier); the update performed by a put or accumulate call in the public copy of the target window is visible when the put or accumulate has completed at the target (or earlier). The rules also specify the latest time at which an update of one window copy becomes visible in another overlapping copy.
The rules above also define, by implication, when an update to a public window copy becomes visible in another overlapping public window copy. Consider, for example, two overlapping windows, win1 and win2. A call to MPI_WIN_FENCE(0, win1) by the window owner makes visible in the process memory previous updates to window win1 by remote processes. A subsequent call to MPI_WIN_FENCE(0, win2) makes these updates visible in the public copy of win2. A correct program must obey the following rules.
Rationale.
The last constraint on correct RMA accesses may seem unduly
restrictive, as it forbids concurrent accesses to nonoverlapping
locations in a window. The reason for this constraint is that, on
some architectures, explicit coherence restoring operations may be
needed at synchronization points.
A different operation may be needed for locations that were
locally updated by stores and for locations that were remotely
updated by put or accumulate operations. Without this constraint,
the MPI library will have to track
precisely which locations in a window were updated by a put or
accumulate call. The additional overhead of maintaining such
information is considered prohibitive.
( End of rationale.)
Advice to users.
A user can write correct programs by following the following rules:
With the post-start synchronization, the target process can tell the origin process that its window is now ready for RMA access; with the complete-wait synchronization, the origin process can tell the target process that it has finished its RMA accesses to the window.
The RMA synchronization operations define when updates are guaranteed
to become visible in public and private windows. Updates may become
visible earlier, but such behavior is implementation dependent.
( End of advice to users.)
The semantics are illustrated by the following examples:
Example
Rule 5:
Process A: Process B:
window location X
MPI_Win_lock(EXCLUSIVE,B)
store X /* local update to private copy of B */
MPI_Win_unlock(B)
/* now visible in public window copy */
MPI_Barrier MPI_Barrier
MPI_Win_lock(EXCLUSIVE,B)
MPI_Get(X) /* ok, read from public window */
MPI_Win_unlock(B)
Example
Rule 6:
Process A: Process B:
window location X
MPI_Win_lock(EXCLUSIVE,B)
MPI_Put(X) /* update to public window */
MPI_Win_unlock(B)
MPI_Barrier MPI_Barrier
MPI_Win_lock(EXCLUSIVE,B)
/* now visible in private copy of B */
load X
MPI_Win_unlock(B)
Note that the private copy of X has not necessarily been updated
after the barrier, so omitting the lock-unlock at process B may lead to
the load returning an obsolete value.
Example
The rules do not guarantee that process A in the following sequence will
see the value of X as updated by the local store by B before the lock.
Process A: Process B:
window location X
store X /* update to private copy of B */
MPI_Win_lock(SHARED,B)
MPI_Barrier MPI_Barrier
MPI_Win_lock(SHARED,B)
MPI_Get(X) /* X may not be in public window copy */
MPI_Win_unlock(B)
MPI_Win_unlock(B)
/* update on X now visible in public window */
Example
In the following sequence
Process A: Process B:
window location X
window location Y
store Y
MPI_Win_post(A,B) /* Y visible in public window */
MPI_Win_start(A) MPI_Win_start(A)
store X /* update to private window */
MPI_Win_complete MPI_Win_complete
MPI_Win_wait
/* update on X may not yet visible in public window */
MPI_Barrier MPI_Barrier
MPI_Win_lock(EXCLUSIVE,A)
MPI_Get(X) /* may return an obsolete value */
MPI_Get(Y)
MPI_Win_unlock(A)
it is not guaranteed that process B reads the value of X as per the local
update by process A, because neither MPI_WIN_WAIT nor
MPI_WIN_COMPLETE calls by process A ensure visibility in the public window copy.
To allow B to read the value of X stored by A the local store must be replaced by a local
MPI_PUT that updates the public window copy. Note that by this replacement X
may become visible in the private copy in process memory of A only
after the MPI_WIN_WAIT call in process A. The update on Y made
before the MPI_WIN_POST call is visible in the public window
after the MPI_WIN_POST call and therefore correctly gotten by
process B. The MPI_GET(Y) call could be moved to the epoch
started by the MPI_WIN_START operation, and process B would
still get the value stored by A.
Example
Finally, in the following sequence
Process A: Process B:
window location X
MPI_Win_lock(EXCLUSIVE,B)
MPI_Put(X) /* update to public window */
MPI_Win_unlock(B)
MPI_Barrier MPI_Barrier
MPI_Win_post(B)
MPI_Win_start(B)
load X /* access to private window */
/* may return an obsolete value */
MPI_Win_complete
MPI_Win_wait
rules (5,6) do not guarantee that the private copy of X at B has
been updated before the load takes place. To ensure that the value put
by process A is read, the local load must be replaced with a local
MPI_GET operation, or must be placed after the call to
MPI_WIN_WAIT.
Up: Contents
Next: Atomicity
Previous: Error Classes
Return to MPI-2.2 Standard Index
Return to MPI Forum Home Page
(Unofficial) MPI-2.2 of September 4, 2009
HTML Generated on September 10, 2009