|
The term
'Data
Mining'
refers
to the process of retrieving concealed patterns from large amounts
of data so that these patterns may be used to reveal meaningful
information that would otherwise not be known if data mining wasn't
performed. It is especially useful in profiling applications
for marketing, fraud detection, surveillance, scientific studies,
etc.
Although the general concept of mining useful but
hidden information from data through 'manual' mathematical analysis
has been around for centuries, the term 'data mining' is generally
applied today to the use of computers in mining computer-based data.
Efforts
to standardize the data mining process across industries have been
undertaken, e.g.,
the European
Cross Industry Standard Process for Data Mining (1999) and
the
Java Data Mining standard (2004). But since data mining is
applicable to a wide variety of fields and is basically
computational in nature, continuous evolution of whatever standards
are defined is expected.
At any
rate, four methods commonly used for 'data mining' are:
1) classification
of data into pre-defined groups, e.g., separating spam emails from
valid ones;
2) cluster
analysis or clustering, which is the
grouping of data into distinct (but not predefined) sets or
'clusters', in such a way that data in each cluster share some
distinct similarity with each other;
3) regression
analysis, which is a mathematical method for modeling the
relationships between independent and dependent variables in a
dataset; and
4) association
rule learning - a systematic method for determining relationships
between variables in enormous datasets in order to take advantage of
regularities in these relationships, e.g., analyzing supermarket
point-of-sale data to determine which products are frequently
purchased together so that marketing efforts can be improved.
To be an
effective tool, data mining must meet at least the following: 1) the
data mining process must be designed well to uncover the patterns
being searched; 2) the dataset being mined must contain data
that truly represent the domain of interest; 3) the right data
(those that exhibit the patterns being analyzed) from the dataset
must be sampled during the mining process; and 4) the patterns
uncovered from the data mining process must be subjected to a
validation or verification process to ensure that they are
truly representative of the entire population.
Some
examples of data mining applications include:
1)
customer relationship management, wherein customer profiles and data
are being analyzed to determine what each customer is most likely to
buy (so that these products or services can be offered to them) and
which buying channels the customer is most likely to use (so that
the offers can be made through these channels);
2) human
resources development, wherein the credentials and characteristics
of a company's most successful employees are analyzed to help the
company hire the best people;
3)
retail business, wherein product sales data are analyzed to
determine which products must be offered together;
4)
genetic studies, wherein variations in the human DNA sequence are
being analyzed to determine their relationships with various
diseases and illnesses;
5)
education, wherein student behavioral data are being analyzed in
relation to student retention in a university;
6)
drug reaction surveillance, wherein adverse drug reaction reporting
patterns are being analyzed to detect safety issues with
prescriptive drugs.
See Also:
More Industry
Articles
|