TOWARDS MINING WEB CONTENT OUTLIERS

Ayman Hassan Tanira;

Abstract


The task of outlier detection is to find small fraction of data that are exceptional when compared with rest large amount of data. Finding outliers from huge data repositories is like finding needles in a haystack. The existing outlier detection algorithms were designed for mining numeric data which cannot be applied directly to mine outliers from Web datasets because the Web contains data of different types such as: text, hypertext, images, video, audio, etc.
Web content outliers are Web documents with varying contents compared to other documents taken from the same category. Mining Web document outliers may lead to the identification of competitors, emerging business trends in electronic commerce, improving the quality of results obtained from a Web search engine, and cleaning corpus used in Web documents classification.
This thesis concentrates on enhancing current approaches for detecting Web

document outliers. It introduces a Web document outlier mining system aiming trequired for identifying the closest neighbors for every document in the collection.

The experimental results on two different datasets with embedded motifs showed that FindWDO with N-grams outperforms similar algorithms in the same domain with respect to the accuracy of results.


Other data

Title TOWARDS MINING WEB CONTENT OUTLIERS
Other Titles نحو التنقيب فى المحتوى خارج السياق على الويب
Authors Ayman Hassan Tanira
Issue Date 2007

Attached Files

File SizeFormat
ص2324.pdf1.11 MBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 2 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.