A NOVEL SEQUENCE-BASED NEGATIVE SAMPLING APPROACH FOR IMPROVING PROTEIN-PROTEIN INTERACTIONS PREDICTION USING MACHINE LEARNING TECHNIQUES
Barkat, M. Sayed; Sherin M. Moussa; Badr, Nagwa;
Abstract
Protein–protein interactions (PPIs) have been involved in numerous diseases’ progression in drug discovery. Although PPIs prediction is a crucial and well-studied task in bioinformatics, they still lack thorough investigations for several proteins. The cost of understanding PPIs and identifying protein–protein non-interactions (PPNIs) using sequence alignment make the current computational methods inefficient, so identifying PPNIs without applying sequence alignment has become a necessity. In this research, a machine learning approach is proposed for PPIs prediction based on protein sequence information, in which we introduced “Features-based Negative Generation” which is a novel approach for identifying PPNIs samples. This method measures sequence features' similarity without alignment for an affordable computational feasibility. After PPNIs identification the Conjoint Triad (COT) and Epitopes are used for features extraction and results of both are compared to achieve higher accuracy with less time consumption. Five machine learning techniques were investigated to learn from the interacting pairs sequence, obtaining PPI features. Support vector machine (SVM) with polynomial and RBF kernel functions, Linear SVM, Tree Model (TM) and Linear Model, and the (TM) achieved the best result with an accuracy of 97.8%. The experimentation of PPIs prediction using generated negative dataset and COT using 343 features achieved an accuracy of 97.8%, versus 93% using random negative dataset using COT also. Applying Epitopes with our PPNIs dataset using 21 features achieved an accuracy of 94.5% versus 92.5% with random negative dataset, which indicates that identified PPNIs datasets are clearer, less noise and prediction of PPI using identified PPNIs is more accurate. We compared PPI prediction accuracy using identified PPNIs which extracted using our method with that obtained by other methods in the literature, and we found improvement in our favor of between 2 and 7%. Considering Epitopes for features extraction is faster than COT by an average of 83%.
Other data
Title | A NOVEL SEQUENCE-BASED NEGATIVE SAMPLING APPROACH FOR IMPROVING PROTEIN-PROTEIN INTERACTIONS PREDICTION USING MACHINE LEARNING TECHNIQUES | Authors | Barkat, M. Sayed; Sherin M. Moussa ; Badr, Nagwa | Keywords | Biological Pathways;Conjoint Triad Method;Drug Discovery;Epitopes;Machine Learning;Ppnis Sampling;Protein-Protein Interaction;Protein–Protein Negative Interactions | Issue Date | 31-Jul-2022 | Journal | Journal of Theoretical and Applied Information Technology | ISSN | 19928645 | Scopus ID | 2-s2.0-85137985322 |
Recommend this item
Similar Items from Core Recommender Database
Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.