Abstract
Checkpoints and their causal relationships are modeled by means of occurrence graphs. An alogrithm is presented in occurrence graph terms, which can be used to produce sets of checkpoints at varying levels of detail from that at the lowest level. An informal proof of the correctness of the alogrithm is presented. In order to ensure that the operation of application software running on a fault tolerant computer may be continued after a fault has arisen, the execution of that software may be checkpointed. It is desirable to produce sets of checkpoints at a hierarchy of levels in the system to allow recovery to take place on a scale commensurate with the component which has failed. An experimental implementation of the alogrithm on a simulated multicomputer system is briefly described.
Original language | English |
---|---|
Publisher | Royal Signals and Radar Establishment |
Volume | RSRE-M--3322 |
Publication status | Published - 1980 |