Is Complexity-based Clustering of Process Metrics as Effective as in Static Code Metrics


ÖZTÜRK M. M.

Baltic Journal of Modern Computing, vol.7, no.1, pp.31-46, 2019 (Journal Indexed in ESCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 7 Issue: 1
  • Publication Date: 2019
  • Doi Number: 10.22364/bjmc.2019.7.1.03
  • Title of Journal : Baltic Journal of Modern Computing
  • Page Numbers: pp.31-46
  • Keywords: cross-project defect prediction, fuzzy clustering, process metrics, ATTRIBUTES

Abstract

Defect prediction is not a new sub-field of software engineering. However, in this field, there are various research problems that are still attractive for many researchers. Cross-project defect prediction (CPDP), which is one of the popular issues of defect prediction, is still intriguing. To address this problem, generally instances or feature-focused experiments are performed but there is a lack of novel pre-processing methods. The main objective of this work is to propose a fuzzy clustering method that is based on complexity in CPDP. It helps selecting training data of CPDP. Hence, an opportunity that provides comparing static code and process metrics emerges. In this work, complexity-based fuzzy clustering that helps to select training instances of CPDP is proposed for process metrics. In the method, fuzzy membership levels are associated with a complexity value based on process metrics. In the experiment, 18 data sets including static code and process metrics together are employed. The findings of the experiment show that although static code metrics perform better than process metrics in terms of area under the curve (AUC), process metrics outperforms static code metrics in matthew's correlation coefficient (MCC) and F-measure parameters. Furthermore, in accordance with the used data sets, it has been detected that there is not any linear model among process metrics including number of revisions (NR), number of modified lines (NML), and number of distinct committers (NDC). This work asserts that the approach on the basis of training instance selection of CPDP yields remarkable success in process metrics. Moreover, in overall performance, process metrics are rather suitable for clustering-based instance selection.