EWA Systems

PRODUCTS
SOLUTIONS
OPEN USE TOOLS
INDUSTRIES
SERVICES
NEWS
EWA SYSTEMS
EWA Systems FAQ

1) Why is Data Mining a big deal?

The advent of computers introduced vast computational power to the world, and an even bigger burden of information. It is estimated that the amount of information in the world doubles every 20 months, and that the size and number of databases is increasing even faster. The increase in use of automated data gathering devices, such as point-of-sale or remote sensing, has contributed to this explosion of available data.

The rationale behind collecting and storing vast amounts of data is simply that data has value. However, while databases are capable of storing mountains of data, it is this analysis that multiplies the value of the stored data by deriving valuable knowledge from the data itself. This analysis is enabled by data mining.

2) Definition of Data Mining?

Put simply, "Data mining" is the automatic analysis of data. While statistics asks if there is support for a certian hypotheses one at a time, data mining reverses the process by asking the data for all of the hypotheses that can be supported.

Data mining extracts information from data, hopefully discovering previously unknown facts or models of the data's behavior. Using these facts or models, data mining techniques is capable of predicting future events. Data mining typically consists of the combination of the following tasks, given in alphabetical order:

Association: The discovery of correlations inside of a data set. For example, when X is high, so is Y, or if someone buys orange juice, they usually also buy milk.

Classification:The discovery of why a categorical variable takes on particular states. For example, humidity and a temperature drop can be used to predict rainy days.

Clustering: The discovery of segments of the data that behave differently from the other data segments. For example, breaking customer's down into their age groups is a form of clustering.

Outlier Analysis: The discovery of unexpected or out-of-control data points. For example, finding a data value of 999, instead of the usual values from 0 to 2.

Regression: The discovery of why a continuous variable takes on particular states. For example, the relationship of zip code to annual salary.

Time Series: The discovery of how data varies over time. For example, the seasonal cyclic behavior of department store monthly sales figures.

3) What is the Difference between Data Mining and Statistics?

Data mining and statistics address similiar questions. Data mining differs from statistics in the approach taken. While statistical processes start with hypotheses, then use the data to prove or disprove them, usually one at a time, data mining starts with the data and attempts to discover all of the hypotheses that can be supported up by the data. Often data miners use statistics to confirm their findings.

4) What is Purpose of Data Mining?

The purpose of data mining is to analyse large volumes of data in order to extract, possibly hidden, knowledge. This knowledge can then be used to make future actions occur faster, more accurately, and with higher confidence. Businesses benefit from this mined knowledge through increased revenues and lowered costs.

5) What is the Data Mining Process?

  1. Collection and Selection: Data mining starts with collecting the data that is to be analyzed. Most data mining techniques work best with large volumes of data, just like statistical methods, in order to ensure high confidence results. On the other hand, too much data may slow the process and clutter the analysis. Determining which data to include requires experience and experimentation.
  2. Preparation: Gathered data are cleansed, manipulated and prepared for analysis. Data manipulation may include adding new variables that roll-up or break-down other variables. The data preparation required will depend on the data mining algorithm to be used.
  3. Data Mining: The chosen data mining algorithm looks for the patterns in the data. Each of these discovered patterns is typically assigned some value of certainty, so that they can be ranked.
  4. Interpretation and Evaluation: The patterns identified by the algorithm are interpreted into knowledge that can be used to support human decision-making or further analysis. The patterns may also be further evaluated using traditional statistical approaches.

Copyright © 2005 by EWA Systems, Inc.