Enhanced Textual Data Classification using Particle Swarm Optimization (PSO) Algorithm

Taye Oladele Aro, Hakeem Babalola Akande, Kayode Sakariyah Adewole, Kehinde Moses Aregbesola and Muhammed Besiru Jibrin
Keywords: Feature Selection, Text Classification, Particle Swarm Optimization, Textual Data
Journal of ICT Development, Applications and Research 2020 2(), 1-14. Published: July 11, 2020


Abstract

The upturn in digital data acquisition techniques has resultedina large volume of data. Finding suitablearrangements and trends to analyze the text documents from a huge volume of data remains a serious challenge. The number of irrelevant and redundant features from text data is on the high increase, hence the need to introduce an effective feature selection approach to get the most features that are relevant from the huge text data.This paper applied Particle Swarm Optimisation (PSO) algorithm in selecting important featuresfor accurate text classification.Five classification algorithms: C4.5 Decision Tree, K-Nearest Neighbour (KNN), Multinomial Naïve Bayes (MNB), Rep-Tree (RT), and Radial Basis Function (RBF) were used. The developedtext classification model was evaluated using two datasets: SMSSpam and Sentiment Analysis dataset. Experimental resultsshowed that when the PSO has not applied the best accuracy of 98.2239% was obtained in MNB for SMS Spam Dataset, the best accuracy of 84.0333% was obtained in MNB for Sentiment Analysis Dataset. The precision value recorded when PSO was not applied gave the best value of 0.983 in MNB for SMS Spam Dataset and the best precision value of 0.724 in MNB for Sentiment Analysis Dataset. Also, the best recall when PSO was not introduced gave0.982 in MNB for SMS Spam Dataset, the best recall of 0.840 was obtained in MNB for Sentiment Analysis Dataset. The improvement in results when PSO was applied only showed that the accuracy of 65.2667% in KNN and 71.2333% in RBF for Sentiment Analysis Dataset. The precision recorded an improvement of 0.713 in RBF and also the recalls of 0.653 in KNN and 0.712 in RBF for Sentiment Analysis Dataset. Finally, the study concluded that the MNB as a classifier performed effectively without the application of PSO algorithm in the text data classification in terms of accuracy, precision and recall.