Frequent Itemset Mining for Big Data

Ahmed Farouk Abd El_Aziz;

Abstract


There are many highly-optimized serial algorithms for frequent-itemset mining. These algorithms are still an active research area because of its many application areas.

The serial algorithms, however, cannot cope with the ever-increasing data sets that occur in many of the application areas. This is because the serial performance of processors almost stopped increasing in recent years.

Instead, processors as well as complete systems are getting more and more parallel so that the huge data sets can only be mined in a reasonable time when the available parallelism is effectively used.

We carried out many experiments on very large data sets. On such data sets, our algorithms were always superior in terms of runtime.

A comparative study between three well known algorithms which are parallel buddy prima, parallel apriori, and our proposed algorithm (enhancement of parallel buddy prima) was presented.

This algorithm efficiently removes the shortcoming of previously proposed Parallel Buddy Prima Algorithm .The calculation of 1 frequent item set is distributed among client nodes which gives better performance.

Several algorithms [27] [28] have been proposed so for to mine the entire frequent item set in a transaction database parallels. Proposed algorithms differ from one another in the method of handling the candidate sets, and the method to calculate the support count of each candidate set by creating a model structure of each data set at each client node .These structure is building according to the transaction size and transaction prime multiple representation value .By this way we prevent full scan each time we need to calculate support count of a specific candidate set.
Calculating of support count of a specific candidate set does not require full scan of data set .But scanning is starting at a specific position in our structure according to candidate set size and the value of its prime number representation which must equal than or less than transaction prime number value to be compared.

Best scenario of our algorithm occurs when number of transactions which have smallest size is larger than those transactions with large size. Because that total number of scan will be decreased.


Other data

Title Frequent Itemset Mining for Big Data
Other Titles تحسين الأداء في طرق التنقيب عن البيانات بإيجاد الانماط المتكررة في قواعد البيانات الضخمة
Authors Ahmed Farouk Abd El_Aziz
Issue Date 2015

Attached Files

File SizeFormat
G6258.pdf1.01 MBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 3 in Shams Scholar
downloads 4 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.