TY - GEN
T1 - A Gaussian Model for Feature Selection in Protein Fold Recognition
AU - Shiguihara-Juárez, Pedro
AU - Murrugarra-Llerena, Nils
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/27
Y1 - 2018/12/27
N2 - Protein fold recognition is an important task to discover new biological functions of proteins. In this context, machine learning techniques have been used to protein fold recognition, stating this task as a classification problem. However, in many cases, the similarity of patterns to protein fold recognition becomes this process in a complex task, limiting the performance of the machine learning techniques. In this paper, we propose a feature selection method to support machine learning methods for protein fold recognition, using gaussian distributions in the process of features analysis. We cluster features by gaussian distributions. These clusters give information to reduce the dimensionality of the features. After that, we use baselines classifiers to protein fold recognition, using a well-known dataset for this task. The results suggest that the clustering and reduction of dimensionality of features using gaussian distribution can help to improve the accuracy of machine learning techniques on this task.
AB - Protein fold recognition is an important task to discover new biological functions of proteins. In this context, machine learning techniques have been used to protein fold recognition, stating this task as a classification problem. However, in many cases, the similarity of patterns to protein fold recognition becomes this process in a complex task, limiting the performance of the machine learning techniques. In this paper, we propose a feature selection method to support machine learning methods for protein fold recognition, using gaussian distributions in the process of features analysis. We cluster features by gaussian distributions. These clusters give information to reduce the dimensionality of the features. After that, we use baselines classifiers to protein fold recognition, using a well-known dataset for this task. The results suggest that the clustering and reduction of dimensionality of features using gaussian distribution can help to improve the accuracy of machine learning techniques on this task.
UR - http://www.scopus.com/inward/record.url?scp=85061483159&partnerID=8YFLogxK
U2 - 10.1109/SHIRCON.2018.8593155
DO - 10.1109/SHIRCON.2018.8593155
M3 - Contribución a la conferencia
AN - SCOPUS:85061483159
T3 - Proceedings of the 2018 IEEE Sciences and Humanities International Research Conference, SHIRCON 2018
BT - Proceedings of the 2018 IEEE Sciences and Humanities International Research Conference, SHIRCON 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 November 2018 through 22 November 2018
ER -