Developing a DNA Sequence Classification Model Using Deep Learning Methods

Marwah Ahmad Mohamed Sobhy Ahmad Helaly;

Abstract


Biological taxonomic classification is an important task needed for the identification and discovery of organisms, as well as the inference of their evolutionary relationships. DNA sequence classification is a key task in Bioinformatics. Biological Systematics is one of the ways DNA sequences can be classified, where species are organized into taxonomies. This study allows the identification, grouping and study of organisms and the interrelationships between species. It allows biologists to discover origins of organisms, understand how the organisms possibly evolved, and possibly assist in answering unresolved biological research questions.

DNA sequences are fairly long in length, split into multiple contiguous segments called genes. Each gene is responsible for a certain function, containing the information needed to build either an RNA molecule or a functional protein, although there are some genes that are not known to have a certain function. This thesis uses bacterial 16S rRNA barcode genes for taxonomic classification over five different taxonomic ranks, which codes for the 30S subunit of a bacterial ribosome. Barcode gene (or marker gene) sequences are short specific and standard regions of biological sequences that represent genetic markers in species. They are unique in their high mutation rates that easily result in highly discriminative characteristics, which make them favorable for the classification and differentiation between species. Characteristics that make this 16S rRNA gene highly desirable and efficient as a genetic marker is that it exists in almost all bacterial species, it hasn’t changed over time and its 1,500 bp sequence is large enough for efficient classification.

The order and structure of components in a biological sequence has an essential and primary role in what a sequence is and does. Therefore, in order to be able to efficiently classify between different bacterial classes, interactions and positions of the biological components in sequences must be known – which is an essential challenge in biological sequence classification. Many other related research on this topic have been trying to find an efficient representation of sequences with an efficient combination of a Deep Learning classification model that can understand these important aspects of a biological sequence. However, the representations either had little or no positional information encoded about a sequence.

This thesis studies the efficiency of different types of commonly used representation methods and CNN
models with different hyperparameters, in order to extract the most efficient of them. Then a model is proposed for efficiently representing sequences and the positional information of their components. The sequences are represented


Other data

Title Developing a DNA Sequence Classification Model Using Deep Learning Methods
Other Titles تطوير نموذج تصنيف تسلسل الحمض النووى باستخدام اساليب التعلم العميق
Authors Marwah Ahmad Mohamed Sobhy Ahmad Helaly
Issue Date 2020

Attached Files

File SizeFormat
BB1375.pdf293.07 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 6 in Shams Scholar
downloads 14 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.