Event Based Data Mining Process Models

Event Based Data Mining Process Models

Robert L. Grossman
Open Data Partners

September, 2004


Introduction

Beginning about 1996, there has been a broad consensus within the data mining community about the essential steps in a data mining process model. For example, the CRISP-DM Process Model (CPM) is a description of this consensus approach.

During the past three to four years, several essential modifications to this model have emerged:

Event Based Data Mining Process Model

The term Event-Based Process Model (EBPM) is sometimes used to describe a data mining process model incorporating the components above. Variants of the EBPM are being standardized by the vendor-supported Data Mining Group. Open Data Partners has been very active in the development of standards for event-based data mining process models and the deployment of systems using these process models.

Essential Steps in the Event-Based Data Mining Process Model

In this section, we list some of the main steps in an event-based data mining process model.

Step 1. Problem Identification and Project Design.

Deliverable: statement of problem, including metric for measuring success.

Step 2. Build Data Mining Data Mart

Deliverable: data mart populated with event data. Feature vectors, created below in the process, and scores are also usually stored in the data mart.

Step 3. Data Exploration and Data Quality Assessment

Deliverable: short report containing statistical overview of data.

Step 4. Preparing Feature Vectors for Modeling

Deliverable: DXML or other description of process used to define feature vector from one or more attributes in data tables.

Step 5. Prepare Data Sets for Modeling

Deliverable: short report describing how learning and validation data sets will be prepared.

Step 6. Statistical Modeling

Deliverable: PMML file and/or rule sets describing statistical or data mining model.

Step 7. Model Validation

Deliverable: evaluate quality of model (output: lift of model over random using agreed upon metric).

Step 8. Deployment

Deliverable: PMML file, rule set, or other agreed upon deployment mechanism for operational system.

Step 9. Reporting, Monitoring, and Quality Assessment

Deliverable: Regular, periodic report on quality of model and its effectiveness.