Projects per year
Regions of high-dimensional input spaces that are underrepresented in training datasets reduce machine-learnt classifier performance, and may lead to corner cases and unwanted bias for classifiers used in decision making systems. When these regions belong to otherwise well-represented classes, their presence and negative impact are very hard to identify. We propose an approach for the detection and mitigation of such rare subclasses in deep neural network classifiers. The new approach is underpinned by an easy-to-compute commonality metric that supports the detection of rare subclasses, and comprises methods for reducing the impact of these subclasses during both model training and model exploitation. We demonstrate our approach using two well-known datasets, MNIST's handwritten digits and Kaggle's cats/dogs, identifying rare subclasses and producing models which compensate for subclass rarity. In addition we demonstrate how our run-time approach increases the ability of users to identify samples likely to be misclassified at run-time.
|Title of host publication||IEEE AI Test 2021 conference|
|Publication status||Accepted/In press - 27 Jun 2021|
- Machine learning
- Deep Neural Networks
- data imbalance