Machine Understanding through Unsupervised Web Semantification

ميشيل نعيم نجيب جرجس


This thesis summarizes our efforts to build three modules in the direction of machine understanding. The first module is a framework to build classifiers given any set of Wikipedia pages in any level of granularity possibly in any size. We named it ClassifyWiki. We tested our framework over more than 100 entity classes using our dataset based on ClassifyWiki does not learn some specific classes like all previous systems but, theoretically, it can generate classifiers for any entity class. We report 83% macro-averaged f1-score using 50 positive training instances. The second module, we present, is WikiTrends. WikiTrends creates a new analytics layer out of a source of semi-structured and unstructured data. WikiTrends can generate any mix of data to present a new understating of the world. Sample analytics reports were generated like assigning each country some unforgettable additions to humanity, the gender battle down to 1000 BC, tracking trending occupations, musical instruments, and film genres, and summarizing the world view in heat maps. And the last one is ASU, a system submitted in COLING W-NUT workshop in 2016. The system tackled Twitter Named Entity Recognition task. Our system experimentally shows an incremental approach in designing two LSTM models: One for entity detection and the other for extracting and classifying on a set of 10 fine-grained classes. This study presents experimentally the eFFect of adding/removing many features in the input representation along with an analysis on the network design. We report a 39% f1-score for the typed model on the test set and a 55% for the non typed one bringing ASU to be the fifth system out of ten participants.

Other data

Other Titles تمكين الحاسب من الفهم عن طريق تحديد دلالات الألفاظ للشبكة العنكبوتية بدون إشراف
Issue Date 2017

File SizeFormat 
J2390.pdf401.67 kBAdobe PDFView/Open
Recommend this item

CORE Recommender

Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.