A hybrid cross-language name matching technique using novel modified Levenshtein Distance

Medhat, Doaa; Hassan, Ahmed; Salama, Cherif;

Abstract


Name matching is a key component in various applications in our life like record linkage and data mining applications. This process suffers from multiple complexities such as matching data from different languages or data written by people from different cultures. In this paper, we present a new modified Cross-Language Levenshtein Distance (CLLD) algorithm that supports matching names across different writing scripts and with many-to-many characters mapping. In addition, we present a hybrid cross-language name matching technique that uses phonetic matching technique mixed with our proposed CLLD algorithm to improve the overall f-measure and speed up the matching process. Our experiments demonstrate that this method substantially outperforms a number of well-known standard phonetic and approximate string similarity methods in terms of precision, recall, and f-measure.


Other data

Title A hybrid cross-language name matching technique using novel modified Levenshtein Distance
Authors Medhat, Doaa; Hassan, Ahmed; Salama, Cherif 
Keywords cross language;deduplication;entity resolution;multilingual;name matching;record linkage
Issue Date 25-Jan-2016
Conference Proceedings - 2015 10th International Conference on Computer Engineering and Systems, ICCES 2015
ISBN 9781467399715
DOI 10.1109/ICCES.2015.7393046
Scopus ID 2-s2.0-84963536818

Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check



Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.