Intelligent Techniques for Protein Secondary Structure Prediction

Hanan Yousry Wahba Hendy;

Abstract


Protein is considered the building block of anyliving organism. Protein performs various functions in the human body, these functions differ from one to another according to the way the protein bonds together. The protein is initially composed of a sequence of amino acids which are named as the primary structure. Then the protein forms its secondary, tertiary and quaternary structures by forming hydrogen bonds.

The primary structure can be extracted from raw protein using simple scientific experiments. There have been various amino acid sequences discovered through the years. However, the secondary structure sequences cannot be extracted in the same manner. Moreover, the diseases and protein disorders can be detected when examining the secondary structure not the primary one. That is why it is crucial to find a way to get the secondary structure of a given primary sequence. Prediction is considered a solution to this problem. Given only the knowledge of primary sequence, it is required to predict the corresponding secondary one.

Various machine learning techniques have been used through the last decadeto try to predict the protein secondary structure. The most commonly used paradigm was the Artificial Neural Networks. Variations of ANN havebeen used to increase the protein secondary structure prediction accuracy. Then few used case based reasoning and mixed integer optimization.

This thesis presents a study on the different techniques used for protein secondary structure prediction. The techniques are divided into three generations starting with statistical generation and ending with Machine learning one.
Then the thesis discusses five different approaches that are used for predicting the protein secondary structure in detail along with their computation parameters. These approaches are: Case based reasoning, Artificial Neural Networks, Decision Tables, Decision Trees and Bayes Networks. Two different datasets are used with different sequence lengths and with proper distribution among different amino acids. In Case Based Reasoning, eight different experiments are conducted resulting in prediction accuracy of 88%. In ANN, one thousand twenty-four experiments are conducted using different computation parameters resulting in accuracy of 68%, 81% and 86% for predicting alpha, beta and alpha and beta together respectively.
Then for the statistical techniques, ZeroR is used to determine the baseline accuracy for the other three. Eight experiments are conducted for each of the Decision Tree, Decision Table and Bayes Network. The accuracies reach 70%, 71% and 75% respectively. Moreover, two ANN hybrid techniques are


Other data

Title Intelligent Techniques for Protein Secondary Structure Prediction
Other Titles الطرق الذكية للتنبؤ بالبنية الثانوية للبروتين
Authors Hanan Yousry Wahba Hendy
Issue Date 2016

Attached Files

File SizeFormat
G13069.pdf346.5 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 2 in Shams Scholar
downloads 1 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.