Abstract
Large Language Models' (LLMs) ability to explain complex texts has raised the question of whether their encoded knowledge is sufficient to reason about system failures. The current weaknesses in LLMs, like misalignment and hallucinations suggest they make unsuitable safety analysts, but could ``fast but flawed'' analysis still be useful? LLMs can rapidly parse system descriptions for design mitigation strategies like redundancy, they can trace failure propagation from common mode faults (such as loss of power or hydraulics) to higher level events and even incorporate non-functional risks from outside the functional specification into an analysis. But despite their knowledge of hardware component failure modes, we found LLMs remain weak at failure logic reasoning. We used OpenAI's Generative Pre-trained Transformer (GPT) Builder to develop a specific role for analysing failure logic and generating the corresponding fault tree visualisation. Although there are no objective measures that qualitatively assess failure logic analysis (i.e.\/ logical errors have variable significance) or whether the choice of higher-level failure modes is a ``good model'' of system failure, we report on the iterative process of developing the GPT, our inability to override the underlying model behaviour to counter its weaknesses, and conclude by reflecting on the productivity gains of using LLMs despite their flawed reasoning.
Original language | English |
---|---|
Publication status | Published - 16 Jul 2024 |
Keywords
- Large Language Models · Fault Tree Analysis · Failure Logic