Temporary Data Movement and Temporary Memory Modification

20.1.18. Temporary Data Movement and Temporary Memory Modification

Up: Support for Fortran Next: Permanent Data Movement Previous: The Fortran TARGET Attribute

The compiler is allowed to temporarily modify data in memory. Normally, this problem may occur only when overlapping communication and computation, as in Example Solutions, Case (b) on page Solutions. Example Temporary Data Movement and Temporary Memory Modification also shows a possibility that could be problematic.

Example Overlapping Communication and Computation.

Image file

Example The compiler may substitute the nested loops through loop fusion.

Image file

Example Another optimization is based on the usage of a separate memory storage area, e.g., in a GPU.

Image file

In the compiler-generated, possible optimization in Example Temporary Data Movement and Temporary Memory Modification, buf(100,100) from Example Temporary Data Movement and Temporary Memory Modification is equivalenced with the 1-dimensional array buf_1dim(10000). The nonblocking receive may asynchronously receive the data in the boundary buf(1,1:100) while the fused loop is temporarily using this part of the buffer. When the tmp data is written back to buf, the previous data of buf(1,1:100) is restored and the received data is lost. The principle behind this optimization is that the receive buffer data buf(1,1:100) was temporarily moved to tmp.

Example Temporary Data Movement and Temporary Memory Modification shows a second possible optimization. The whole array is temporarily moved to local_buf.

When storing local_buf back to the original location buf, then this implies overwriting the section of buf that serves as a receive buffer in the nonblocking MPI call, i.e., this storing back of local_buf is therefore likely to interfere with asynchronously received data in buf(1,1:100).

Note that this problem may also occur:

With the local buffer at the origin process, between an RMA communication call and the ensuing synchronization call; see Chapter One-Sided Communications.
With the window buffer at the target process between two ensuing RMA synchronization calls.
With the local buffer in MPI parallel file I/O split collective operations between the MPI_ XXX_BEGIN and MPI_ XXX_END calls; see Section Split Collective Data Access Routines.

As already mentioned in Section The Fortran ASYNCHRONOUS Attributebute on page The Fortran ASYNCHRONOUS Attributebute of Section Problems with Code Movement and Register Optimization, the ASYNCHRONOUS attribute can prevent compiler optimization with temporary data movement, but only if the receive buffer and the local references are separated into different variables, as shown in Example Permanent Data Movement and in Example Comparison with C.

Note also that the methods

calling MPI_F_SYNC_REG (or such a user-defined routine),
using module variables and COMMON blocks, and
the TARGET attribute

cannot be used to prevent such temporary data movement. These methods influence compiler optimization when library routines are called. They cannot prevent the optimizations of the code fragments shown in Example Temporary Data Movement and Temporary Memory Modification and Temporary Data Movement and Temporary Memory Modification.

Note also that compiler optimization with temporary data movement should not be prevented by declaring buf as VOLATILE because the VOLATILE implies that all accesses to any storage unit (word) of buf must be directly done in the main memory exactly in the sequence defined by the application program. The VOLATILE attribute prevents all register and cache optimizations. Therefore, VOLATILE may cause a huge performance degradation.

Instead of solving the problem, it is better to prevent the problem: when overlapping communication and computation, the nonblocking communication (or nonblocking or split collective I/O) and the computation should be executed on different variables, and the communication should be protected with the ASYNCHRONOUS attribute. In this case, the temporary memory modifications are done only on the variables used in the computation and cannot have any side effect on the data used in the nonblocking MPI operations.
Rationale.

This is a strong restriction for application programs. To weaken this restriction, a new or modified asynchronous feature in the Fortran language would be necessary: an asynchronous attribute that can be used on parts of an array and together with asynchronous operations outside the scope of Fortran. If such a feature becomes available in a future edition of the Fortran standard, then this restriction also may be weakened in a later version of the MPI standard. ( End of rationale.)
In Example Permanent Data Movement (which is a solution for the problem shown in Example Solutions and in Example Comparison with C (which is a solution for the problem shown in Example Temporary Data Movement and Temporary Memory Modification), the array is split into inner and halo part and both disjoint parts are passed to a subroutine separated_sections. This routine overlaps the receiving of the halo data and the calculations on the inner part of the array. In a second step, the whole array is used to do the calculation on the elements where inner+halo is needed. Note that the halo and the inner area are strided arrays. Those can be used in nonblocking communication only with a Fortran 2018 (or TS 29113) based MPI library.

Up: Support for Fortran Next: Permanent Data Movement Previous: The Fortran TARGET Attribute

Return to MPI-4.1 Standard Index
Return to MPI Forum Home Page

(Unofficial) MPI-4.1 of November 2, 2023
HTML Generated on November 19, 2023