The impact of parameter optimization of ensemble learning on defect prediction


Computer Science Journal of Moldova, vol.27, no.1, pp.85-128, 2019 (ESCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 27 Issue: 1
  • Publication Date: 2019
  • Journal Name: Computer Science Journal of Moldova
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus
  • Page Numbers: pp.85-128
  • Keywords: Defect prediction, parameter optimization, ensemble learning, BOOSTING ALGORITHM, SOFTWARE, MACHINE
  • Süleyman Demirel University Affiliated: Yes


Machine learning algorithms have configurable parameters which are generally used with default settings by practitioners. Making modifications on the parameters of machine learning algorithm is called hyperparameter optimization (HO) performed to find out the most suitable parameter setting in classification experiments. Such studies propose either using default classification model or optimal parameter configuration. This work investigates the effects of applying HO on ensemble learning algorithms in terms of defect prediction performance. Further, this paper presents a new ensemble learning algorithm called novelEnsemble for defect prediction data sets. The method has been tested on 27 data sets. Proposed method is then compared with three alternatives. Welch's Heteroscedastic F Test is used to examine the difference between performance parameters. To control the magnitude of the difference, Cliff's Delta is applied on the results of comparison algorithms. According to the results of the experiment: 1) Ensemble methods featuring HO performs better than a single predictor; 2) Despite the error of triTraining decreases linearly, it produces errors at an unacceptable level; 3) novelEnsemble yields promising results especially in terms of area under the curve (AUC) and Matthews Correlation Coefficient (MCC); 4) HO is not stagnant depending on the scale of the data set; 5) Each ensemble learning approach may not create a favorable effect on HO. To demonstrate the prominence of hyperparameter selection process, the experiment is validated with suitable statistical analyzes. The study revealed that the success of HO which is, contrary to expectations, not depended on the type of the classifiers but rather on the design of ensemble learners.