Problem Understanding
Determining objective
First important step in the whole data mining process is understanding the need to do data mining, i.e. understanding the problem we have to solve. This is the objective of the data mining effort. Problems can be diverse: optimizing response of customers to some marketing campaign, prevent fraudolent usage of credit cards, detection of hostile logging on computer systems, etc. To be capable to solve the problem efficiently we also have to:- understand problem perspective, competing objectives and constraints
- uncover important factors influencing the outcome
Define success criteria
Once the problem is defined, it is advisable to define the success criteria: what makes our data mining succesful. Criteria can be objective (quantitative): for example improved number of detected deviations, improved response rate of customers to some marketing campaign, percentage of correct patient diagnoses. Criteria can be of subjective, or qualitative nature. In that case domain expert assesses the results of the data mining effort with respect to existing background knowledge about the problem. In such a case results must contain some new and useful insight into the relationships of domain variables.Assess situation
Once the problem and criteria for its successfull solution are well defined, we have to assess all important aspects surrounding the problem:- what is the expertise or background knowledge we have about the problem - do we understand problem terminology enough;
- data is the central item in a data mining problem - we have to be aware of its potential to be able to solve the problem;
- it is good to define specific terminology for the problem (problem domain terminology and related data mining terminology), in order to improve communication between domain experts and data mining experts;
- we must estimate the potential cost (duration) and benefits of the data mining project to be sure that it is feasible.
Determine data mining goals
We have determined what is the problem and criteria for its successfull solution. We have to "translate" project goals into data mining terms. Data Mining goals differ from overall problem solving goals, as illustrated below:problem solving goal | data mining goal |
increase sales | determine customer properties with respect to their purchasing power | prevent credit card fraud |
find critical patterns for fraudolent card usage or build an accurate algorithm for automatic fraud detection |
Definition of the problem and based on it its data mining goal is directly related to a basic division of data mining problem types (which are more thoroughly discussed in Modelling section):
- data description and summarization
- classification
- prediction
- association discovery
- dependency analysis
- segmentation
Outputs of the data mining process differ depending on the techniques used, so once the problem type(s) are defined it is good to describe intended data mining outputs of the project. Succes criteria in data mining terminology should also be specified: we can request certain level of predictive accuracy (classification and prediction problems), propensity or lift, or try to define specific criteria of a domain expert in case we want a new insight into a problem solution.
Produce a project plan
Finally, we can make a plan. We have to set major steps to be performed, with deliverables defined at each step. We can also plan what techniques will be used at each stage.© 2001 LIS - Rudjer Boskovic Institute
Last modified: June 29 2004 11:30:06.