State restoration in Ada 95: A portable approach to supporting software fault tolerance

P. Rogers, A. J. Wellings

Research output: Contribution to journalArticlepeer-review

Abstract

Studies indicate that techniques for tolerating hardware faults are so effective that software design errors are the leading cause of all faults encountered. To handle these unanticipated software faults, two main approaches have been proposed: N-version programming and recovery blocks. Both are based on the concept of design diversity: the assumption that different designs will exhibit different faults (if any) for the same inputs and will, therefore, provide alternatives for each other. Both approaches have advantages, but this paper focuses upon recovery blocks; specifically, the requirement to save and restore application state. Judicious saving of state has been described as `checkpointing' for over a decade. Using the object-oriented features of the revised Ada language (Ada 95) - a language widely used in this domain - we present three portable implementations of a checkpointing facility and discuss the trade-offs offered by each. Results of the implementation of these mechanisms are used to highlight both the strengths and weaknesses of some of the object-oriented features of Ada. We then show a reusable implementation of recovery blocks illustrating the checkpointing schemes. A performance analysis is made and measurements are presented in support of the analysis.

Original languageEnglish
Pages (from-to)237-255
Number of pages19
JournalJournal of Systems and Software
Volume50
Issue number3
DOIs
Publication statusPublished - 15 Mar 2000

Keywords

  • ERROR RECOVERY
  • ATOMIC ACTIONS
  • SYSTEMS
  • RELIABILITY

Cite this