DMS Tutorial - Evaluation of generated models

Evaluation of models (discovered knowledge)

How to win a classifier contest?

Assume that we are in a contest to design the best classifier on some sample data. The person running the contest must reserve test cases for judging the winner. These cases are not seen by any contestant until the end of the contest, when the classifiers are compared. The classifier that makes the fewest mistakes, i.e., the classifier with the lowest error rate, is declared the winner.

We note that these hidden test cases are a special group of test cases. They are used strictly for determining the exact true error rate. During the contest, the contestants must proceed with the classifier design as if these test cases didn't exist. Having large numbers of hidden test cases is atypical of most real-world situations. Normally, one has a given set of samples, and one must estimate the true error rate of the classifier. Unless we have a huge number of samples, in a real-world situation, large numbers of cases will not be available for hiding. Setting aside cases for pure testing will reduce the number of cases for training.

How should the contestant get the most out of the data? For any classification method, the following steps should be taken for obtaining the best results:

Using resampling, i.e., repeated train-and-test partitions, estimate the error rate.
Generate a number of classifiers with different complexities
Select the classifier complexity fit with the lowest error rate. Now, apply the identical classification method to all the sample cases.
The particular resampling methods that should be used depends on the number of available samples. Here are the guidelines:
- For sample sizes greater than 100, use cross-validation. Either stratified l0-fold cross-validation or leaving-one-out is acceptable. 10-fold is far less expensive computationally than leaving-one-out and can be used with confidence for samples numbering in the hundreds.
- For samples sizes less than 100, use leaving-one-out.
- For very small samples (fewer than 50 cases) in addition to the leave-one-out estimator, the .632 bootstrap and 100 stratified 2-fold cross-validations can be computed. Use leaving-one-out except for the following two conditions: Use the .632 bootstrap estimate when the leave-one-out estimate of the error rate is less than .632B. Similarly use the repeated 2-fold cross-validation estimate when the leave-one-out estimate is greater than the repeated 2-fold cross-validation estimate.

These resampling techniques provide reliable estimates of the true error rate. Nearly all the cases are used for training, and all cases are used for testing. Because the error estimates are for classifiers trained on nearly all cases, the identical classification method can be reapplied to all sample cases.

For purposes of comparison of classifiers and methods, resampling provides an added advantage. Using the same data, researchers can readily duplicate analysis conditions and compare published error estimates with new results. Using only a single random train-and-test partition gives space to an explanation that observed divergences from a published result could arise from the natural variability of the partitions.