NOVEL CLASSIFICATION FEATURE SETS FOR SOURCE CODE PLAGIARISM DETECTION OF JAVA FILES

Eman Hosam Adel El-Sayed;

Abstract


In programming learning environments, the pressure of delivering many assignments makes plagiarism become the easiest solution. This problem of plagiarism threatens the learning process and obstructs the evaluation fairness. Therefore, fast, automatic and accurate detection of source code plagiarism becomes of the essence. This research proposes novel classification feature sets to detect whether a Java file is plagiarized. The proposed feature sets are based on using histograms to summarize the similarity matrix of function signatures and comparing the lexical code similarity of each individual class pair. For testing the effectiveness, a source code plagiarism dataset that consists of 12K Java files was used. The results show a 4% improvement in F-Measure. A re-annotation to the dataset is performed and improves F-Measure by 7.5%.


Other data

Title NOVEL CLASSIFICATION FEATURE SETS FOR SOURCE CODE PLAGIARISM DETECTION OF JAVA FILES
Other Titles مجموعات مبتكرة من الخصائص لتصنيف و اكتشاف السرقة الأدبية لبرامج الجافا
Authors Eman Hosam Adel El-Sayed
Issue Date 2021

Attached Files

File SizeFormat
BB10012.pdf1.06 MBAdobe PDFView/Open
Recommend this item

Similar Items from Core Recommender Database

Google ScholarTM

Check

views 1 in Shams Scholar


Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.