Modification of information gain measure to select the best group of attributes in a data set for a binary decision tree inducer
No Thumbnail Available
Date
2015
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Uva Wellassa University of Sri Lanka
Abstract
Classification is one of the frequently used techniques in data mining processes which can be applied to accurately predict
the target class for each case in a data set. The Decision tree (DT) algorithms are one of the powerful classification and
prediction methods which facilitate decision making in sequential decision making for a given dataset (Han & Kamber,
2006; Bramer, 2007). The major strengths of the DT algorithms are their ability to generate understandable rules, to
handle both numerical and categorical attributes and also provide a clear indication of which attributes are most salient
for prediction or classification (Kangaiammal, 2013). ID3 and C4.5 are multi splitting algorithms and developed by J.
Ross Quinlan in1986 and 1993 respectively. That can be used to Entropy, information gain (IG) and Gain ratio as attribute
selection measures. These measurements can be utilized to make the binary decision tree to reduce the complexity of the
decision tree. If the algorithm identifies more than one attributes with equal IG in the data set, then it will select the initial
attribute as a splitting node of a tree. This attribute may not be the best attribute for decision making when it is compared
with the other attributes of equal IG. Therefore, the aim of this study is to improve the IG measure to select the best
attribute in a dataset and plot a binary decision tree.
Description
Keywords
Science and Technology, Technology, Information Technology, Database, System