TY - JOUR
T1 - Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
AU - Iparraguirre-Villanueva, Orlando
AU - Melgarejo-Graciano, Melquiades
AU - Castro-Leon, Gloria
AU - Olaya-Cotera, Sandro
AU - Ruiz-Alvarado, John
AU - Epifanía-Huerta, Andrés
AU - Cabanillas-Carbonell, Michael
AU - Zapata-Paulini, Joselyn
N1 - Publisher Copyright:
© 2023, International Association of Online Engineering. All Rights Reserved.
PY - 2023
Y1 - 2023
N2 - In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work identifies, classifies, and analyzes tweets related to real natural disasters through tweets with the hashtag #Nat-uralDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geo-located tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.
AB - In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work identifies, classifies, and analyzes tweets related to real natural disasters through tweets with the hashtag #Nat-uralDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geo-located tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.
KW - classification
KW - disasters
KW - machine learning
KW - natural
KW - tweets
UR - http://www.scopus.com/inward/record.url?scp=85168821565&partnerID=8YFLogxK
U2 - 10.3991/ijim.v17i14.39907
DO - 10.3991/ijim.v17i14.39907
M3 - Artículo
AN - SCOPUS:85168821565
SN - 1865-7923
VL - 17
SP - 144
EP - 162
JO - International Journal of Interactive Mobile Technologies
JF - International Journal of Interactive Mobile Technologies
IS - 14
ER -