Effect of part-of-speech and lemmatization filtering in email classification for automatic reply

Rogerio Bonatti, Arthur G. De Paula, Victor S. Lamarca, Fabio G. Cozman

Resultado de la investigación: Contribución a una conferenciaArtículo de conferencia

1 Cita (Scopus)

Resumen

Copyright © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and baseline categorization experiments using Naive Bayes and Support Vector Machines. We then discuss the effect of lemmatization and the role of part-of-speech tagging filtering on precision and recall. Support Vector Machines classification coupled with non-lemmatized selection of verbs and nouns, adjectives and adverbs was the best approach, with 87.3% maximum accuracy. Straightforward lemmatization in Portuguese led to the lowest classification results in the group, with 85.3% and 81.7% precision in SVM and Naive Bayes respectively. Thus, while lemmatization reduced precision and recall, part-of-speech filtering improved overall results.
Idioma originalInglés estadounidense
Páginas496-501
Número de páginas6
EstadoPublicada - 1 ene 2016
Publicado de forma externa
EventoAAAI Workshop - Technical Report -
Duración: 1 ene 2016 → …

Conferencia

ConferenciaAAAI Workshop - Technical Report
Período1/01/16 → …

Huella Profundice en los temas de investigación de 'Effect of part-of-speech and lemmatization filtering in email classification for automatic reply'. En conjunto forman una huella única.

  • Citar esto

    Bonatti, R., De Paula, A. G., Lamarca, V. S., & Cozman, F. G. (2016). Effect of part-of-speech and lemmatization filtering in email classification for automatic reply. 496-501. Papel presentado en AAAI Workshop - Technical Report, .