DMS Home

DM Methodology

Rule Induction Methods

Decision trees algorithms are based on divide-and-conquer approach to the classification problem. They work in a top-down manner, seeking at each stage an attribute to split on, that separates the classes best, and then recursively processing the partitions resulted from the split.

An alternative approach is to take each class separately, and try to cover all examples in that class, at the same time excluding examples not in the class. This is so called, covering approach, because at each stage a rule is determined that covers some of the examples.

Covering algorithms operate by adding tests to the rule that is under construction, always trying to create a rule with maximum accuracy. Whereas decision tree algorithm chooses an attribute to maximize the separation between the classes (using information gain criterion), the covering algorithm chooses an attribute-value pair to maximize the probability of the desired classification.

Figure 1 depicts the basic rule induction method (actually the Prism method, described in the book by I.H. Witten and E. Frank "Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations").

Figure 1: Basic rule induction method

This method generates only 'correct' rules, measured by the accuracy formula p/t. Any rule with accuracy less than 100 0s incorrect, since it assigns examples to the target class in question that do really belong to the target class. The Prism method continues adding clauses to each rule until the rule is correct. The outer loop iterates over the classes, generating rules for each class in turn, re-initializing each time E to the complete training set of examples. The algorithm always starts with an empty rule which covers all examples, and then restricts it by adding new conditions (attribute-value pairs), until it covers only examples of the desired target class. At each stage of adding the condition to the rule, the best attribute-value pair in terms of p/t (accuracy) is chosen. If there are more attribute-value pairs with the same value of p/t, then the one with greatest coverage is chosen.


Links to online tutorials on rule induction techniques


A tutorial on rule induction
by P.Flach
http://macflach.cs.bris.ac.uk/~flach/presentations/IDAHTML/sld001.htm

Rule induction: Ross Quinlan's ID3 algorithm
(part of the Data Mining tutorial)
by P.Ross
http://www.dcs.napier.ac.uk/~peter/vldb/dm/dm.html


© 2001 LIS - Rudjer Boskovic Institute
Last modified: January 20 2006 13:40:22.