Developing High Performance Arabic Speech Recognition Engine

Hamzah Ahmed Abdurab Alsayadi

Developing High Performance Arabic Speech Recognition Engine

Hamzah Ahmed Abdurab Alsayadi;

Abstract

Abstract
Speech recognition systems play an important role in human–machine interactions. Many systems exist for Arabic speech with modern standard Arabic (MSA), how- ever, there are limited systems for dialectal Arabic speech. Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In addition, the Arabic language comprises many properties, some of which are ideal for building automatic speech recognition systems such as syntax and phonology, while other properties are unsuitable for developing speech systems. Importantly, most data are in non-diacritized form, vary in dialect, and contain morphological complexity. Moreover, the Arabic dialects lack a standard structure. Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. There are two approaches for automatic speech recognition including: i) traditional ASR based on traditional methods; ii) end-to-end ASR based on deep learning methods. In this thesis, we employed a high performance multi Arabic speech recognition system us- ing conventional ASR and end-to-end ASR approaches. We present different Arabic ASR systems for diacritized MSA, non-diacritized modern standard Arabic (MSA), dialectal Arabic. This thesis comprises conventional Arabic ASR and end-to-end Arabic ASR approaches as follows:
Conventional Arabic ASR: in this approach, our overall system is a combina-
tion of seven acoustic models based on Gaussian mixture model (GMM), subspace GMM (SGMM), and deep neural network (DNN) for diacritized Arabic. Acous- tic features are created using Mel-Frequency cepstral coefficients (MFCC) which is adapted based on linear discriminative analysis (LDA) method. This acoustic features is used to train and evaluate all models. After GMM model training, it is adapted using two adaptation techniques namely maximum mutual information (MMI) and minimum phone error (MPE) to build new models based on main acous- tic and GMM features. Then, SGMM is trained based on main acoustic and GMM features. We used one adaptation technique namely boosted MMI (bMMI) to adapt SGMM model in order to produce a new model. Finally, we employ DNN mod- els based on main acoustic and GMM features. After DNN model training, it is adapted using one MPE technique to build a new model.

Other data

Title	Developing High Performance Arabic Speech Recognition Engine
Other Titles	تطوير محرك عالي الأداء للتعرف على الكلام باللغة العربية
Authors	Hamzah Ahmed Abdurab Alsayadi
Issue Date	2022

Attached Files

File	Size	Format
BB14051.pdf	771.91 kB	Adobe PDF	View/Open

Recommend this item

Similar Items from Core Recommender Database

Google Scholar^TM

Check

Developing High Performance Arabic Speech Recognition Engine

Hamzah Ahmed Abdurab Alsayadi;

Abstract

Other data

Attached Files

Google ScholarTM

Google Scholar^TM