SCLAVOEM: hyper parameter optimization approach to predictive modelling of COVID-19 infodemic tweets using smote and classifier vote ensemble


Creative Commons License

Olaleye T., Abayomi-Alli A., Adesemowo K., Arogundade O. T. , Misra S., KÖSE U.

SOFT COMPUTING, 2022 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Publication Date: 2022
  • Doi Number: 10.1007/s00500-022-06940-0
  • Journal Name: SOFT COMPUTING
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Keywords: Fake news, COVID-19, Infodemic, Twitter, Tweet, Ensemble machine learning, Bag-of-words, Parameter optimization, DETECTING FAKE NEWS, SOCIAL MEDIA, SENTIMENT ANALYSIS, FUTURE, REPUTATION, LEVEL
  • Süleyman Demirel University Affiliated: Yes

Abstract

Fake COVID-19 tweets are dangerous since they are misinformative, completely inaccurate, as threatening the efforts for flattening the pandemic curve. Thus, aside the COVID-19 pandemic, dealing with fake news and myths about the virus constitute an infodemic issue, which must be tackled by ensuring only valid information. In this context, this study proposed the Synthetic Minority Over-Sampling Technique (SMOTE) and the classifier vote ensemble (SCLAVOEM) method as a fake news classifier and a hyper parameter optimization approach for predictive modelling of COVID-19 infodemic tweets. Hyper parameter optimization variables were deployed across specific points of the proposed model and a minority oversampling of training sets was applied within imbalanced class representations. Experimental applications by the SCLAVOEM for COVID-19 infodemic prediction returned 0.999 and 1.000 weighted averages for F-measure and area under curve (AUC), respectively. Thanks to the SMOTE, the performance increases of 3.74 and 1.11%; 5.05 and 0.29%; 4.59 and 8.05% was seen in three different data sets. Eventually, the SCLAVOEM provided a framework for predictive detecting 'fake tweets' and three classifiers: 'positive', 'negative' and 'click-trap' (piege a clics). It is thought that the model will automatically flag fake information on Twitter, hence protecting the public from inaccurate and information overload.