Classifying changes to LabVIEW and simulink models via changeset metrics

Saheed Popoola*, Xin Zhao, Jeff Gray, Antonio Garcia-Dominguez

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Automated classification of software changes can help to understand the reason why a change was made. Support for the classification of changes can also guide the adoption of quality control practices as bugfix trends are observed, and cluster related sets of changes for similar management of the changed artifacts, thereby reducing maintenance efforts. A number of change classification techniques have been developed based on information extracted from the change author, change message, change size, or changed file. However, most of these approaches have targeted textual general-purpose programming languages. Furthermore, some of these approaches are computationally expensive because they often require the analysis of the whole source code, while others rely on the developers’ ability to describe a commit via a well-written message. In this paper, we present an approach to classify changes to models into the appropriate maintenance type via a set of metrics that are extracted from the version history of models. We developed seven metrics related to changes applied to models and model elements. We then conducted an empirical study involving 10 classifiers to determine the classifier that offers the best performance for automating the change classification process. These classifiers were trained on over 300 changesets extracted from the version history of 28 Simulink repositories, and 60 changesets from 10 LabVIEW repositories. The results of the study show that the Random Forest classifier offers the best performance for Simulink models, while the Bayes Net offers the best performance for LabVIEW models. The Random Forest classifier has also been evaluated by comparing its results with labels extracted from the discussions within the issues reported in a similar time frame. The evaluation results show that the Random Forest classifier is able to achieve an F-1 score of 0.83, thereby showing its ability to classify changes into the appropriate categories intended by the original developers.

Original languageEnglish
Journal Innovations in Systems and Software Engineering
Early online date9 Sept 2024
DOIs
Publication statusE-pub ahead of print - 9 Sept 2024

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy.

Keywords

  • Change classification
  • Changeset metrics
  • Classifier
  • LabVIEW
  • Simulink

Cite this