Put simply, "Data mining" is the automatic analysis
of data. While statistics asks if there is support for a certian
hypotheses one at a time, data mining reverses the process
by asking the data for all of the hypotheses that can be supported.
Data mining extracts information from data, hopefully discovering
previously unknown facts or models of the data's behavior.
Using these facts or models, data mining techniques is capable
of predicting future events. Data mining typically consists
of the combination of the following tasks, given in alphabetical
order:
Association: The discovery of correlations
inside of a data set. For example, when X is high, so is Y,
or if someone buys orange juice, they usually also buy milk.
Classification:The discovery of why a categorical
variable takes on particular states. For example, humidity
and a temperature drop can be used to predict rainy days.
Clustering: The discovery of segments of
the data that behave differently from the other data segments.
For example, breaking customer's down into their age groups
is a form of clustering.
Outlier Analysis: The discovery of unexpected
or out-of-control data points. For example, finding a data
value of 999, instead of the usual values from 0 to 2.
Regression: The discovery of why a continuous
variable takes on particular states. For example, the relationship
of zip code to annual salary.
Time Series: The discovery of how data varies
over time. For example, the seasonal cyclic behavior of department
store monthly sales figures.