Intelligent Techniques for Protein Secondary Structure Prediction
Hanan Yousry Wahba Hendy;
Abstract
Protein is considered the building block of anyliving organism. Protein performs various functions in the human body, these functions differ from one to another according to the way the protein bonds together. The protein is initially composed of a sequence of amino acids which are named as the primary structure. Then the protein forms its secondary, tertiary and quaternary structures by forming hydrogen bonds.
The primary structure can be extracted from raw protein using simple scientific experiments. There have been various amino acid sequences discovered through the years. However, the secondary structure sequences cannot be extracted in the same manner. Moreover, the diseases and protein disorders can be detected when examining the secondary structure not the primary one. That is why it is crucial to find a way to get the secondary structure of a given primary sequence. Prediction is considered a solution to this problem. Given only the knowledge of primary sequence, it is required to predict the corresponding secondary one.
Various machine learning techniques have been used through the last decadeto try to predict the protein secondary structure. The most commonly used paradigm was the Artificial Neural Networks. Variations of ANN havebeen used to increase the protein secondary structure prediction accuracy. Then few used case based reasoning and mixed integer optimization.
This thesis presents a study on the different techniques used for protein secondary structure prediction. The techniques are divided into three generations starting with statistical generation and ending with Machine learning one.
Then the thesis discusses five different approaches that are used for predicting the protein secondary structure in detail along with their computation parameters. These approaches are: Case based reasoning, Artificial Neural Networks, Decision Tables, Decision Trees and Bayes Networks. Two different datasets are used with different sequence lengths and with proper distribution among different amino acids. In Case Based Reasoning, eight different experiments are conducted resulting in prediction accuracy of 88%. In ANN, one thousand twenty-four experiments are conducted using different computation parameters resulting in accuracy of 68%, 81% and 86% for predicting alpha, beta and alpha and beta together respectively.
Then for the statistical techniques, ZeroR is used to determine the baseline accuracy for the other three. Eight experiments are conducted for each of the Decision Tree, Decision Table and Bayes Network. The accuracies reach 70%, 71% and 75% respectively. Moreover, two ANN hybrid techniques are
The primary structure can be extracted from raw protein using simple scientific experiments. There have been various amino acid sequences discovered through the years. However, the secondary structure sequences cannot be extracted in the same manner. Moreover, the diseases and protein disorders can be detected when examining the secondary structure not the primary one. That is why it is crucial to find a way to get the secondary structure of a given primary sequence. Prediction is considered a solution to this problem. Given only the knowledge of primary sequence, it is required to predict the corresponding secondary one.
Various machine learning techniques have been used through the last decadeto try to predict the protein secondary structure. The most commonly used paradigm was the Artificial Neural Networks. Variations of ANN havebeen used to increase the protein secondary structure prediction accuracy. Then few used case based reasoning and mixed integer optimization.
This thesis presents a study on the different techniques used for protein secondary structure prediction. The techniques are divided into three generations starting with statistical generation and ending with Machine learning one.
Then the thesis discusses five different approaches that are used for predicting the protein secondary structure in detail along with their computation parameters. These approaches are: Case based reasoning, Artificial Neural Networks, Decision Tables, Decision Trees and Bayes Networks. Two different datasets are used with different sequence lengths and with proper distribution among different amino acids. In Case Based Reasoning, eight different experiments are conducted resulting in prediction accuracy of 88%. In ANN, one thousand twenty-four experiments are conducted using different computation parameters resulting in accuracy of 68%, 81% and 86% for predicting alpha, beta and alpha and beta together respectively.
Then for the statistical techniques, ZeroR is used to determine the baseline accuracy for the other three. Eight experiments are conducted for each of the Decision Tree, Decision Table and Bayes Network. The accuracies reach 70%, 71% and 75% respectively. Moreover, two ANN hybrid techniques are
Other data
| Title | Intelligent Techniques for Protein Secondary Structure Prediction | Other Titles | الطرق الذكية للتنبؤ بالبنية الثانوية للبروتين | Authors | Hanan Yousry Wahba Hendy | Issue Date | 2016 |
Attached Files
| File | Size | Format | |
|---|---|---|---|
| G13069.pdf | 346.5 kB | Adobe PDF | View/Open |
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.