A NOVEL SEQUENCE-BASED NEGATIVE SAMPLING APPROACH FOR IMPROVING PROTEIN-PROTEIN INTERACTIONS PREDICTION USING MACHINE LEARNING TECHNIQUES

Barkat, M. Sayed; Sherin M. Moussa; Badr, Nagwa

A NOVEL SEQUENCE-BASED NEGATIVE SAMPLING APPROACH FOR IMPROVING PROTEIN-PROTEIN INTERACTIONS PREDICTION USING MACHINE LEARNING TECHNIQUES

Barkat, M. Sayed; Sherin M. Moussa; Badr, Nagwa;

Abstract

Protein–protein interactions (PPIs) have been involved in numerous diseases’ progression in drug discovery. Although PPIs prediction is a crucial and well-studied task in bioinformatics, they still lack thorough investigations for several proteins. The cost of understanding PPIs and identifying protein–protein non-interactions (PPNIs) using sequence alignment make the current computational methods inefficient, so identifying PPNIs without applying sequence alignment has become a necessity. In this research, a machine learning approach is proposed for PPIs prediction based on protein sequence information, in which we introduced “Features-based Negative Generation” which is a novel approach for identifying PPNIs samples. This method measures sequence features' similarity without alignment for an affordable computational feasibility. After PPNIs identification the Conjoint Triad (COT) and Epitopes are used for features extraction and results of both are compared to achieve higher accuracy with less time consumption. Five machine learning techniques were investigated to learn from the interacting pairs sequence, obtaining PPI features. Support vector machine (SVM) with polynomial and RBF kernel functions, Linear SVM, Tree Model (TM) and Linear Model, and the (TM) achieved the best result with an accuracy of 97.8%. The experimentation of PPIs prediction using generated negative dataset and COT using 343 features achieved an accuracy of 97.8%, versus 93% using random negative dataset using COT also. Applying Epitopes with our PPNIs dataset using 21 features achieved an accuracy of 94.5% versus 92.5% with random negative dataset, which indicates that identified PPNIs datasets are clearer, less noise and prediction of PPI using identified PPNIs is more accurate. We compared PPI prediction accuracy using identified PPNIs which extracted using our method with that obtained by other methods in the literature, and we found improvement in our favor of between 2 and 7%. Considering Epitopes for features extraction is faster than COT by an average of 83%.

Other data

Title	A NOVEL SEQUENCE-BASED NEGATIVE SAMPLING APPROACH FOR IMPROVING PROTEIN-PROTEIN INTERACTIONS PREDICTION USING MACHINE LEARNING TECHNIQUES
Authors	Barkat, M. Sayed; Sherin M. Moussa ; Badr, Nagwa
Keywords	Biological Pathways;Conjoint Triad Method;Drug Discovery;Epitopes;Machine Learning;Ppnis Sampling;Protein-Protein Interaction;Protein–Protein Negative Interactions
Issue Date	31-Jul-2022
Journal	Journal of Theoretical and Applied Information Technology
ISSN	19928645
Scopus ID	2-s2.0-85137985322

Recommend this item

Similar Items from Core Recommender Database

Google Scholar^TM

Check

views 36 in Shams Scholar

A NOVEL SEQUENCE-BASED NEGATIVE SAMPLING APPROACH FOR IMPROVING PROTEIN-PROTEIN INTERACTIONS PREDICTION USING MACHINE LEARNING TECHNIQUES

Barkat, M. Sayed; Sherin M. Moussa; Badr, Nagwa;

Abstract

Other data

Google ScholarTM

Google Scholar^TM