Prediction of heart disease by classifying with feature selection and machine learning methods


PROGRESS IN NUTRITION, vol.22, no.2, pp.660-670, 2020 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 22 Issue: 2
  • Publication Date: 2020
  • Doi Number: 10.23751/pn.v22i2.9830
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, EMBASE
  • Page Numbers: pp.660-670
  • Süleyman Demirel University Affiliated: Yes


Study Objectives: Cardiovascular diseases are among the most common diseases experienced by human beings. In addition, these diseases require spending too much money to be treated. According to the World Health Organization report, 56 million death cases occurred in the World in 2012. Methods: The aim to determine the method (s) with the most accurate classification rate of cardiovascular diseases by using machine learning and feature selection methods. To fulfill this aim, 18 machine learning methods divided into 6 different categories, and 3 different feature selection was used in this study. These methods were analyzed via WEKA, Python and MATLAB computer program. Results: According to the results of the analysis, SVM (PolyKernel) with an 85.148% ratio was found to be the most successful machine learning algorithm without feature selection. After the Correlation-based Feature Selection (CFS) feature selection, the most successful algorithm was Naive Bayes and Fuzzy RoughSet with a ratio of 84.818%. However, after using Chi-Square feature selection, the most successful algorithm was found to be the RBF Network algorithm with 81.188% ratio. Conclusion: Consequently, it is recommended that specialist doctors who want to classify heart disease should use the SVM (PolyKernel) algorithm if they are not going to use feature selection whereas they should use should the Naive Bayes algorithm if they are going to use CFS as a feature selection. Additionally, if they are to use Fuzzy Rough Set and Chi-Square as the feature selection, it is recommended that they use the RBFNetwork algorithm.