ADVANCED TECHNIQUES IN SPEAKER DIARIZATION FOR ARABIC TV BRPADCAST

Mohamed Salem Mohamed Elhady;

Abstract


Speaker Diarization is known as the task that answers the question; who spoke, when and where in an audio file or set of audio files that contain an unknown number of speakers. The determination of speaker segments is done in an unsupervised manner. Originally, Speaker Diarization was proposed as a research topic related to speech recognition. In recent years, it has been introduced as an independent research topic. Competitions and workshops have been dedicated to that area. In this thesis, we propose advanced techniques in Speaker Diarization for Arabic TV broadcast. We focus on Arabic as considered one of the most complex spread languages and the strongest representative of Semitic languages. Our Speaker Diarization system composed of two main blocks; Speech Activity Detector and Speaker Clustering.
In Speech Activity Detection we tackle the problem of speech/non-speech segmen- tation. We propose two main enhancements in that area; first, a phoneme based speech activity detector. In the phoneme recognition system, we utilize Speech Recognition techniques to solve the problem of speech and non-speech discrimination. Developing a phoneme recognition system could achieve an accuracy of 99% in speech detection and over 97.2% in non-speech detection. Second; i-vectors for speech activity detection. In that experiment, we developed a technique based on speaker recognition techniques. We start by a classification experiment of speech and non-speech using SVM. Classification results achieved 98% to classify speech and non-speech. Those results were motivating to install the i-vector technique in our Speech Activity Detection system. We compare the proposed systems with famous state of the art techniques as SVM-HMM and GMM-HMM.
The second problem we investigate is Speaker Clustering. We started by developing state of the art techniques in Speaker diarization which currently based on i-vectors and cosine based Hierarchal Agglomerative Clustering (HAC). In this area, we propose enhanced clustering technique based on i-vectors and Agglomerative clustering associated with the supervised classification. We experiment three main classification techniques SVM, DNN, and Random Forrest. We compare the enhanced techniques with state of the art techniques. Results show improvement over state of the art techniques using SVM enhancement by 1.7% reducing Diarization Error Rate from State of the art Baseline system of 24.4% to 22.67%


Other data

Title ADVANCED TECHNIQUES IN SPEAKER DIARIZATION FOR ARABIC TV BRPADCAST
Other Titles تقنيات متقدمة في فصل المتحدثين في البث التلفزيوني العربي
Authors Mohamed Salem Mohamed Elhady
Issue Date 2017

Attached Files

File SizeFormat
V1596.pdf570.42 kBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 6 in Shams Scholar
downloads 95 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.