Abstract
Neural networks have proved an effective means of learning control policies for
autonomous systems, such as Maritime Autonomous Surface Ships (MASS), but
these learned policies are difficult to understand due to the black-box nature of
neural networks. This lack of interpretability makes safety assurance for such
autonomous systems challenging.
The fields of eXplainable Artificial Intelligence (XAI) and eXplainable
Reinforcement Learning (XRL) aim to interpret the decision-making processes of
neural networks and autonomous agents respectively. In particular, work on
causal explanations aims to provide ``why" and ``why not" explanations for why a
model made a given decision. However, most work on explainability to date
utilizes a distilled version of the original model. While this distilled policy is
interpretable, it necessarily degrades in performance when compared to the
original model, and is not guaranteed to be an accurate reflection of the
decision-making processes in the original model, and as such cannot be used to
guarantee its safety.
Recent work on understanding the geometry of ReLU neural networks shows
that a ReLU network corresponds to a piecewise linear function divided into
regions defined by an n-dimensional convex polytope. Through this lens, a
neural network can be understood as dividing the input space into distinct
regions which apply a single linear function for each output neuron.
We show that this geometric representation can be used to generate causal
explanations for the network's behaviour similar to previous work, but which
extracts rules directly from the geometry of Neural Networks with the ReLU
activation function, and is therefore an accurate reflection of the network's
behaviour.
autonomous systems, such as Maritime Autonomous Surface Ships (MASS), but
these learned policies are difficult to understand due to the black-box nature of
neural networks. This lack of interpretability makes safety assurance for such
autonomous systems challenging.
The fields of eXplainable Artificial Intelligence (XAI) and eXplainable
Reinforcement Learning (XRL) aim to interpret the decision-making processes of
neural networks and autonomous agents respectively. In particular, work on
causal explanations aims to provide ``why" and ``why not" explanations for why a
model made a given decision. However, most work on explainability to date
utilizes a distilled version of the original model. While this distilled policy is
interpretable, it necessarily degrades in performance when compared to the
original model, and is not guaranteed to be an accurate reflection of the
decision-making processes in the original model, and as such cannot be used to
guarantee its safety.
Recent work on understanding the geometry of ReLU neural networks shows
that a ReLU network corresponds to a piecewise linear function divided into
regions defined by an n-dimensional convex polytope. Through this lens, a
neural network can be understood as dividing the input space into distinct
regions which apply a single linear function for each output neuron.
We show that this geometric representation can be used to generate causal
explanations for the network's behaviour similar to previous work, but which
extracts rules directly from the geometry of Neural Networks with the ReLU
activation function, and is therefore an accurate reflection of the network's
behaviour.
Original language | English |
---|---|
Publication status | Published - 2025 |
Event | Yorkshire Innovation in Science and Engineering Conference (YISEC) - University of York, York Duration: 26 Jun 2025 → 27 Jun 2025 |
Conference
Conference | Yorkshire Innovation in Science and Engineering Conference (YISEC) |
---|---|
City | York |
Period | 26/06/25 → 27/06/25 |