Consistency semantics define the outcome of multiple accesses to a single file. All file accesses in MPI are relative to a specific file handle created from a collective open. MPI provides three levels of consistency: sequential consistency among all accesses using a single file handle, sequential consistency among all accesses using file handles created from a single collective open with atomic mode enabled, and user-imposed consistency among accesses other than the above. Sequential consistency means the behavior of a set of operations will be as if the operations were performed in some serial order consistent with program order; each access appears atomic, although the exact ordering of accesses is unspecified. User-imposed consistency may be obtained using program order and calls to MPI_FILE_SYNC.
Let FH1 be the set of file handles created from one particular collective open of the file FOO, and FH2 be the set of file handles created from a different collective open of FOO. Note that nothing restrictive is said about FH1 and FH2: the sizes of FH1 and FH2 may be different, the groups of processes used for each open may or may not intersect, the file handles in FH1 may be destroyed before those in FH2 are created, etc. Consider the following three cases: a single file handle (e.g., ), two file handles created from a single collective open (e.g., and ), and two file handles from different collective opens (e.g., and ).
For the purpose of consistency semantics, a matched pair (Section Split Collective Data Access Routines) of split collective data access operations (e.g., MPI_FILE_READ_ALL_BEGIN and MPI_FILE_READ_ALL_END) compose a single data access operation. Similarly, a nonblocking data access routine (e.g., MPI_FILE_IREAD) and the routine that completes the request (e.g., MPI_WAIT) also compose a single data access operation. For all cases below, these data access operations are subject to the same constraints as blocking data access operations.
Advice to users.
For an MPI_FILE_IREAD and MPI_WAIT pair,
the operation begins when MPI_FILE_IREAD is called and
ends when MPI_WAIT returns.
( End of advice to users.)
Assume that A1 and A2 are two data access operations.
Let D1 (D2) be the set of absolute byte displacements
of every byte accessed in A1 (A2).
The two data accesses overlap
if
.
The two data accesses conflict
if they overlap and at least one is a write access.
Let SEQfh be a sequence of file operations on a single file handle, bracketed by MPI_FILE_SYNCs on that file handle. (Both opening and closing a file implicitly perform an MPI_FILE_SYNC.) SEQfh is a ``write sequence'' if any of the data access operations in the sequence are writes or if any of the file manipulation operations in the sequence change the state of the file (e.g., MPI_FILE_SET_SIZE or MPI_FILE_PREALLOCATE). Given two sequences, SEQ1 and SEQ2, we say they are not concurrent if one sequence is guaranteed to completely precede the other (temporally).
The requirements for guaranteeing sequential consistency among all accesses to a particular file are divided into the three cases given below. If any of these requirements are not met, then the value of all data in that file is implementation dependent.
Case 1: . All operations on fh1 are sequentially consistent if atomic mode is set. If nonatomic mode is set, then all operations on fh1 are sequentially consistent if they are either nonconcurrent, nonconflicting, or both.
Case 2: and . Assume A1 is a data access operation using fh1a, and A2 is a data access operation using fh1b. If for any access A1, there is no access A2 that conflicts with A1, then MPI guarantees sequential consistency.
However, unlike POSIX semantics, the default MPI semantics for conflicting accesses do not guarantee sequential consistency. If A1 and A2 conflict, sequential consistency can be guaranteed by either enabling atomic mode via the MPI_FILE_SET_ATOMICITY routine, or meeting the condition described in Case 3 below.
Case 3: and . Consider access to a single file using file handles from distinct collective opens. In order to guarantee sequential consistency, MPI_FILE_SYNC must be used (both opening and closing a file implicitly perform an MPI_FILE_SYNC).
Sequential consistency is guaranteed among accesses to a single file if for any write sequence SEQ1 to the file, there is no sequence SEQ2 to the file that is concurrent with SEQ1. To guarantee sequential consistency when there are write sequences, MPI_FILE_SYNC must be used together with a mechanism that guarantees nonconcurrency of the sequences.
See the examples in Section Examples for further clarification of some of these consistency semantics.
MPI_FILE_SET_ATOMICITY(fh, flag) | |
INOUT fh | file handle (handle) |
IN flag | true to set atomic mode, false to set nonatomic mode (logical) |
Let FH be the set of file handles created by one collective open. The consistency semantics for data access operations using FH is set by collectively calling MPI_FILE_SET_ATOMICITY on FH. MPI_FILE_SET_ATOMICITY is collective; all processes in the group must pass identical values for fh and flag. If flag is true, atomic mode is set; if flag is false, nonatomic mode is set.
Changing the consistency semantics for an open file only affects new data accesses. All completed data accesses are guaranteed to abide by the consistency semantics in effect during their execution. Nonblocking data accesses and split collective operations that have not completed (e.g., via MPI_WAIT) are only guaranteed to abide by nonatomic mode consistency semantics.
Advice
to implementors.
Since the semantics guaranteed by atomic mode are stronger than
those guaranteed by nonatomic mode, an implementation is
free to adhere to the more stringent atomic mode semantics
for outstanding requests.
( End of advice to implementors.)
MPI_FILE_GET_ATOMICITY(fh, flag) | |
IN fh | file handle (handle) |
OUT flag | true if atomic mode, false if nonatomic mode (logical) |
MPI_FILE_GET_ATOMICITY returns the current consistency semantics for data access operations on the set of file handles created by one collective open. If flag is true, atomic mode is enabled; if flag is false, nonatomic mode is enabled.
MPI_FILE_SYNC(fh) | |
INOUT fh | file handle (handle) |
Calling MPI_FILE_SYNC with fh causes all previous writes to fh by the calling process to be transferred to the storage device. If other processes have made updates to the storage device, then all such updates become visible to subsequent reads of fh by the calling process. MPI_FILE_SYNC may be necessary to ensure sequential consistency in certain cases (see above).
MPI_FILE_SYNC is a collective operation.
The user is responsible for ensuring that all nonblocking requests and split collective operations on fh have been completed before calling MPI_FILE_SYNC---otherwise, the call to MPI_FILE_SYNC is erroneous.