Developing High Performance Arabic Speech Recognition Engine

Hamzah Ahmed Abdurab Alsayadi;

Abstract


Abstract
Speech recognition systems play an important role in human–machine interactions. Many systems exist for Arabic speech with modern standard Arabic (MSA), how- ever, there are limited systems for dialectal Arabic speech. Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In addition, the Arabic language comprises many properties, some of which are ideal for building automatic speech recognition systems such as syntax and phonology, while other properties are unsuitable for developing speech systems. Importantly, most data are in non-diacritized form, vary in dialect, and contain morphological complexity. Moreover, the Arabic dialects lack a standard structure. Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. There are two approaches for automatic speech recognition including: i) traditional ASR based on traditional methods; ii) end-to-end ASR based on deep learning methods. In this thesis, we employed a high performance multi Arabic speech recognition system us- ing conventional ASR and end-to-end ASR approaches. We present different Arabic ASR systems for diacritized MSA, non-diacritized modern standard Arabic (MSA), dialectal Arabic. This thesis comprises conventional Arabic ASR and end-to-end Arabic ASR approaches as follows:
Conventional Arabic ASR: in this approach, our overall system is a combina-
tion of seven acoustic models based on Gaussian mixture model (GMM), subspace GMM (SGMM), and deep neural network (DNN) for diacritized Arabic. Acous- tic features are created using Mel-Frequency cepstral coefficients (MFCC) which is adapted based on linear discriminative analysis (LDA) method. This acoustic features is used to train and evaluate all models. After GMM model training, it is adapted using two adaptation techniques namely maximum mutual information (MMI) and minimum phone error (MPE) to build new models based on main acous- tic and GMM features. Then, SGMM is trained based on main acoustic and GMM features. We used one adaptation technique namely boosted MMI (bMMI) to adapt SGMM model in order to produce a new model. Finally, we employ DNN mod- els based on main acoustic and GMM features. After DNN model training, it is adapted using one MPE technique to build a new model.


Other data

Title Developing High Performance Arabic Speech Recognition Engine
Other Titles تطوير محرك عالي الأداء للتعرف على الكلام باللغة العربية
Authors Hamzah Ahmed Abdurab Alsayadi
Issue Date 2022

Attached Files

File SizeFormat
BB14051.pdf771.91 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check



Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.