An MPI implementation may be unable or choose not to handle some failures that occur during MPI calls. These can include failures that generate exceptions or traps, such as floating point errors or access violations. The set of failures that are handled by MPI is implementation-dependent. Each such failure causes an error to be raised.
The above text takes precedence over any text on error handling within this document. Specifically, text that states that errors will be handled should be read as may be handled. More background information about how MPI treats errors can be found in Section Error Handling.
Figure 24: Diagram for deciding which error handler is invoked.
A user can associate error handlers to four types of objects: communicators, windows, files, and sessions. The specified error handling routine will be used for any error that occurs during an MPI procedure or an operation that refers to the respective object. Figure 24 presents a diagram of the error handler that is invoked in different situations. When the MPI procedure or operation refers to a communicator, window, or file, the error handler for that object will be invoked; otherwise, if the procedure or operation refers to a session, the error handler for the session will be invoked. Some MPI procedures have indirect references to these objects. For example, in a procedure that takes a request handle as a parameter, an error during the corresponding operation is raised on the communicator, window, or file on which the request has been initialized. Similarly, a group contains a reference to the session from which it was derived, and procedures on groups invoke the error handler from that session. The referenced object may have been destroyed before an error is raised (e.g., a procedure on a group derived from a session that has been finalized), in this case, the associated error handler for the object cannot be obtained.
MPI procedures that do not refer to an MPI object from which the associated error handler can be obtained, directly or indirectly, are considered to be attached to the communicator MPI_COMM_SELF when using the World Model (see Section The World Model). When MPI_COMM_SELF is not initialized (i.e., before MPI_INIT / MPI_INIT_THREAD, after MPI_FINALIZE, or when using the Sessions Model exclusively) the error raises the initial error handler (set during the launch operation, see Reserved Keys). The attachment of error handlers to objects is purely local: different processes may attach different error handlers to corresponding objects.
Several predefined error handlers are available in MPI:
The implementation-specific error information resulting from
MPI_ERRORS_ARE_FATAL and
MPI_ERRORS_ABORT provided to the invoking environment should be
meaningful to the end-user, for example a predefined error class.
( End of advice to implementors.)
Implementations may provide additional predefined error handlers and
programmers can code their own error handlers.
Unless otherwise requested, the error handler MPI_ERRORS_ARE_FATAL is set as the default initial error handler and associated with predefined communicators. Thus, if the user chooses not to control error handling, every error that MPI handles is treated as fatal. Since (almost) all MPI calls return an error code, a user may choose to handle errors in its main code, by testing the return code of MPI calls and executing a suitable recovery code when the call was not successful. In this case, the error handler MPI_ERRORS_RETURN will be used. Usually it is more convenient and more efficient not to test for errors after each MPI call, and have such error handled by a nontrivial MPI error handler. Note that unlike predefined communicators, windows and files do not inherit from the initial error handler, as defined in Sections Error Handling and I/O Error Handling respectively.
When an error is raised, MPI will provide the user information about that error using an error code. Some errors might prevent MPI from completing further API calls successfully and those functions will continue to report errors until the cause of the error is corrected or the user terminates the application. The user can make the determination of whether or not to attempt to continue when handling such an error.
Advice to users.
For example, users may be unable to correct errors corresponding to some error classes, such as
MPI_ERR_INTERN. Such errors may cause subsequent MPI calls to complete in error.
( End of advice to users.)
Advice
to implementors.
A high-quality implementation will, to the greatest possible extent,
circumscribe the impact of an error, so that normal processing can
continue after an error handler was invoked. The implementation
documentation will
provide information on the possible effect of each class of errors and available
recovery actions.
( End of advice to implementors.)
An MPI error handler is an opaque object, which is accessed by a handle.
MPI calls are provided to create new error handlers, to associate error
handlers with objects, and to test which error handler is associated with
an object.
C has
distinct typedefs for user defined error handling callback
functions that
accept
communicator, file, window, and session arguments.
In Fortran there are four user routines.
An error handler object is created by a call to MPI_ XXX_CREATE_ERRHANDLER, where XXX is, respectively, COMM, WIN, FILE, or SESSION.
An error handler is attached to a communicator, window, file, or session by a call to MPI_ XXX_SET_ERRHANDLER. The error handler must be either a predefined error handler, or an error handler that was created by a call to MPI_ XXX_CREATE_ERRHANDLER, with matching XXX. An error handler can also be attached to a session using the errorhandler argument to MPI_SESSION_INIT. The predefined error handlers MPI_ERRORS_RETURN and MPI_ERRORS_ARE_FATAL can be attached to communicators, windows, files, or sessions.
The error handler currently associated with a communicator, window, file, or session can be retrieved by a call to MPI_ XXX_GET_ERRHANDLER.
The MPI function MPI_ERRHANDLER_FREE can be used to free an error handler that was created by a call to MPI_ XXX_CREATE_ERRHANDLER.
MPI_ XXX_GET_ERRHANDLER behave as if a new error handler object is created. That is, once the error handler is no longer needed, MPI_ERRHANDLER_FREE should be called with the error handler returned from MPI_ XXX_GET_ERRHANDLER to mark the error handler for deallocation. This provides behavior similar to that of MPI_COMM_GROUP and MPI_GROUP_FREE.
Advice
to implementors.
High-quality implementations should raise an error when an error handler
that
was created by a call to MPI_ XXX_CREATE_ERRHANDLER is
attached to an object of the wrong type with a call to
MPI_YYY_SET_ERRHANDLER. To do so, it is necessary to
maintain, with each error handler, information on the typedef of the
associated user function.
( End of advice to implementors.)
The syntax for these calls is given below.