TY - GEN
T1 - Exploring Deep Neural Networks and Decision Tree for Spanish Text Classification
AU - Shiguihara, Pedro
AU - Berton, Lilian
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/8/11
Y1 - 2022/8/11
N2 - Nowadays, huge amounts of information are available on social networks, blogs, websites, and digital libraries. Most of this information is in unstructured text format, so text mining approaches have become increasingly studied to process all this data. Text classification aims to automatically classify documents into predetermined categories, applying machine learning (ML) algorithms. In this paper, we collected a dataset set related to reviews of a food store in Peru and compared different vectorization models, such as Term Frequency Inverse Document Frequency (TF-IDF), Bag of Words (BoW), and classification algorithms, such as traditional ML classifiers SVM, Decision Tree, MLP, KNN, Naive Bayes and a recent approach "deep jointly informed neural networks"(DJINN) that initialize deep feedforward neural networks based on decision trees. The results show DJINN gets a F1-score higher than traditional ML, being a promising technique for text classification.
AB - Nowadays, huge amounts of information are available on social networks, blogs, websites, and digital libraries. Most of this information is in unstructured text format, so text mining approaches have become increasingly studied to process all this data. Text classification aims to automatically classify documents into predetermined categories, applying machine learning (ML) algorithms. In this paper, we collected a dataset set related to reviews of a food store in Peru and compared different vectorization models, such as Term Frequency Inverse Document Frequency (TF-IDF), Bag of Words (BoW), and classification algorithms, such as traditional ML classifiers SVM, Decision Tree, MLP, KNN, Naive Bayes and a recent approach "deep jointly informed neural networks"(DJINN) that initialize deep feedforward neural networks based on decision trees. The results show DJINN gets a F1-score higher than traditional ML, being a promising technique for text classification.
KW - Classification
KW - Machine Learning
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85138826886&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/3b848ee4-e767-32fa-98b8-c1842599eeaa/
U2 - 10.1109/INTERCON55795.2022.9870087
DO - 10.1109/INTERCON55795.2022.9870087
M3 - Contribución a la conferencia
AN - SCOPUS:85138826886
SN - 9781665486361
T3 - Proceedings of the 2022 IEEE 29th International Conference on Electronics, Electrical Engineering and Computing, INTERCON 2022
SP - 1
EP - 4
BT - Proceedings of the 2022 IEEE 29th International Conference on Electronics, Electrical Engineering and Computing, INTERCON 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 August 2022 through 13 August 2022
ER -