AIxploratorium



Gain RatioYou should have seen the following tree:
Notice this tree is much smaller than the tree produced by Splitting Randomly, as it has only 12 nodes, including 5 internal nodes. It is claiming the MallRats win when:
We can then use this tree to predict the outcome of the game:
Following the relevant path, we see the answer is "Lost", as this game will start at 9pm. Problems with Information GainWhile there are several motivations for using InformationGain as a quality measure, there are also several limitations. One problem is: it tends to prefer attributes with MANY values; ie, it would prefer to split on say the 20valued "OpponentName" attribute (if we had included it here), over the current 2 or 3valued attributes. Indeed, it would further prefer to split on "AudienceSize" field, which could have any integer values {1,2,3,..., 150, 151, ..., 1023, 1024, ... }). Why? The information possible for a kary attribute is between 0 and ln k; this means the values for a binary attribute cannot be larger than ln 2 = 1, while the largest possible value for a 20ary attribute is ln 20 == 4.3, and for a 1024ary attribute is ln 1024 = 10. Note, however, these largerarity attributes are not necessarily better, even though the InformationGain criteria would prefer them. (Indeed, it is likely that "AudienceSize" may in fact be one of the least important attributes!) One trick to avoid this specific problem is to use GainRatio, which basically levels the playing field by penalizing the multiplevalued attributes. (How? See Gain Ratio Note for details.) Now you try it...To try out this criteria: Go back to the applet and change the "Splitting Function" to "Gain Ratio" (Recall that you can change the splitting function using the "Splitting Function" submenu under the "Algorithm" menu). Then choose "Run" from the "Algorithm" menu ("<Ctrl><Alt>R") to see the resulting tree. Or you can "Trace Run", or "Step" or whatever. If you leftclick on any unfinished node , you will now get the Gain Ratio score for the attributes, rather than the Gain score. You can also flip back and forth between these splitting functions while growing the tree. (Again see Manual for details.) 

