Sarah Ahmed Soliman,


A constantly increasing amount of medical data is produced and captured electronically in everyday clinical practice. Automated knowledge discovery techniques, which employ data mining and machine learning techniques are capable of providing decision support for clinicians and discover new relevant patterns in silos of electronic patient data. However, the collected heterogeneous medical data lacks structural, functional and semantic interoperability.A decision tree is one of the most utilized data mining techniques addresses these issues, was proposed and experimentally tested through weka data mining tool.
In this thesis, the thrombosis dataset collected from Chiba University and announced in third European conference of knowledge discovery in 1999 have been presented.
The topics investigated and experimentally proved in the thesis allow us to conclude that:
• The knowledge discovery data methodology, in comparison with other data mining applications methodologies, for the first time outlines the detailed process model specific to the issues and constraints of the medical domain. To achieve that, the initial KDD process addressing the issues of medical data pre-processing, data transformation, data discritization and data evaluation.
• C4.5 was implemented as data mining technique in order to improve the accuracy and extract more predicative rules. Gain ratio was calculated to take the highest ratio to be the root node of the tree.
• 10-Cross validation method are used to test and evaluate the dataset.
• The accuracy of the created predictive model has been increased by applying the KDD methodology:The improvement of accuracy reached to 90.89 % and this lead that the error percentage reduced.

After the data preparation, we obtained 407 different cases to be used in c4.5 implementation, 347 patients of them were not had the thrombosis and 67 patients had the thrombosis in different degrees. The result of c4.5 algorithm proved that the classification accuracy in the generated decision tree was improved; it was reached to 90.89%. Nevertheless, the c4.5 algorithm has some weaknesses such as memory usage for computation and building large decision trees.
6.2. Future work
The area of machine learning and data mining are achieved more of trends and motivations in the field of medical due to the large amount of database. Moving away from working on small datasets in the form of flat files that are supposed to choose appropriate algorithm/methods capable of predicate.
Future works may concentrate on Support Vector Machine and neural network as two of the most important classifier algorithm that would use under ensemble methods to achieve better performance.

Other data

Other Titles استخدام طرق تعليم الالة لعمل نظام تشخيص علاجى ذكى
Authors Sarah Ahmed Soliman
Issue Date 2016

Files in This Item:

File SizeFormat
G11221.pdf252.32 kBAdobe PDFView/Open
Recommend this item
Page have 5 view

Items in Ain Shams Scholar are protected by copyright, with all rights reserved, unless otherwise indicated.