Artificial Intelligence Approach for Protein Sequence Analysis

Farida Alaaeldin Mostafa Mohamed;

Abstract


Protein sequence analysis helps in the prediction of protein functions. The objective of this thesis is to propose new deep learning models that are capable of classifying proteins based on their features extracted in either 1D or 3D and investigate the impact of data variations using 3D features on the deep learning-based protein sequence classification.
Regarding the 1D features, different protein descriptors were used and decomposed into modified feature descriptors using Empirical Mode Decomposition that were not employed in protein studies. Uniquely, we introduced using Convolutional Neural Network to learn and classify protein diseases. A dataset of 1563 protein sequences was classified into 3 different disease classes: AIDS, Tumor suppressor, and Proto-oncogene.
Results showed a significant increase in the performance of the Convolutional Neural Network model using modified feature descriptor over Support Vector Machine using rbf kernel function by 23.3% in accuracy. CTDT modified feature descriptors improved the deep learning model results by 19.5%, 39.6%, 23.3%, 29.9%, 24.3%, and 31.2% in AUC, MCC, accuracy, F1- score, recall, and precision, evaluation metrices respectively.
Regarding the 3D features, uniquely five feature extraction groups were utilized to create 3D features with two sizes (7x7x7 and 9x9x9). Three datasets are employed in the assessment, which are different in their sorts, sizes, and balance state namely, Disease and two Phage Virion Proteins datasets.
Results showed that the 7x7x7 feature matrix has a positive correlation between its dimensions, which has positive impact on the results reaching 71% in PVP-Balanced and 86% in disease dataset. Using the sum of the first three Intrinsic Mode Function components had a better impact than using the first component improving accuracy to 86.6% for disease dataset. The dataset size had a significant positive impact on training the Convolutional Neural Network model reaching 84%. 


Other data

Title Artificial Intelligence Approach for Protein Sequence Analysis
Other Titles نهج ذكاء اصطناعي لتحليل تسلسل البروتين
Authors Farida Alaaeldin Mostafa Mohamed
Issue Date 2022

Attached Files

File SizeFormat
BB13182.pdf387.6 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check



Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.