Hierarchical Strategies for Efficient Fault Recovery on the Reconfigurable PAnDA Device

Martin Albrecht Trefzer, David Michael Renwick Lawson, Simon Jonathan Bale, James Walker, Andrew Martin Tyrrell

Research output: Contribution to journalArticlepeer-review

Abstract

A novel hierarchical fault-tolerance methodology for reconfigurable devices is presented. A bespoke multi-reconfigurable FPGA architecture, the programmable analogue and digital array (PAnDA), is introduced allowing fine-grained reconfiguration beyond any other FPGA architecture currently in existence. Fault blind circuit repair strategies, which require no specific information of the nature or location of faults, are developed, exploiting architectural features of PAnDA. Two fault recovery techniques, stochastic and deterministic strategies, are proposed and results of each, as well as a comparison of the two, are presented. Both approaches are based on creating algorithms performing fine-grained hierarchical partial reconfiguration on faulty circuits in order to repair them. While the stochastic approach provides insights into feasibility of the method, the deterministic approach aims to generate optimal repair strategies for generic faults induced into a specific circuit. It is shown that both techniques successfully repair the benchmark circuits used after random faults are induced in random circuit locations, and the deterministic strategies are shown to operate efficiently and effectively after optimisation for a specific use case. The methods are shown to be generally applicable to any circuit on PAnDA, and to be straightforwardly customisable for any FPGA fabric providing some regularity and symmetry in its structure.
Original languageEnglish
Article number7756359
Pages (from-to)930-945
Number of pages16
JournalIEEE Transactions on Computers
Volume66
Issue number6
Early online date24 Nov 2016
DOIs
Publication statusPublished - 1 Jun 2017

Bibliographical note

© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

Keywords

  • Circuit faults
  • Computer architecture
  • Routing
  • Field programmable gate arrays
  • Maintenance engineering
  • Transistors
  • Fabrics
  • FPGA
  • Special-Purpose and Application-Based Systems
  • Performance of Systems
  • Reconfigurable Computing Architectures
  • Reconfigurable Hardware
  • Fault tolerance

Cite this