Custom Search
 

Data Mining

 

 

 

 

         

The term 'Data Mining' refers to the process of retrieving concealed patterns from large amounts of data so that these patterns may be used to reveal meaningful information that would otherwise not be known if data mining wasn't performed.  It is especially useful in profiling applications for marketing, fraud detection, surveillance, scientific studies, etc.

  

 

Although the general concept of mining useful but hidden information from data through 'manual' mathematical analysis has been around for centuries, the term 'data mining' is generally applied today to the use of computers in mining computer-based data.  

   

Efforts to standardize the data mining process across industries have been undertaken, e.g., the European Cross Industry Standard Process for Data Mining (1999) and the Java Data Mining standard (2004). But since data mining is applicable to a wide variety of fields and is basically computational in nature, continuous evolution of whatever standards are defined is expected.

 

At any rate, four methods commonly used for 'data mining' are:

1)  classification of data into pre-defined groups, e.g., separating spam emails from valid ones;

2)  cluster analysis or clustering, which is the grouping of data into distinct (but not predefined) sets or 'clusters', in such a way that data in each cluster share some distinct similarity with each other;

3)  regression analysis, which is a mathematical method for modeling the relationships between independent and dependent variables in a dataset; and

4)  association rule learning - a systematic method for determining relationships between variables in enormous datasets in order to take advantage of regularities in these relationships, e.g., analyzing supermarket point-of-sale data to determine which products are frequently purchased together so that marketing efforts can be improved.

  

To be an effective tool, data mining must meet at least the following: 1) the data mining process must be designed well to uncover the patterns being searched; 2)  the dataset being mined must contain data that truly represent the domain of interest; 3)  the right data (those that exhibit the patterns being analyzed) from the dataset must be sampled during the mining process; and 4) the patterns uncovered from the data mining process must be subjected to a validation or verification process  to ensure that they are truly representative of the entire population.

  

Some examples of data mining applications include:

1) customer relationship management, wherein customer profiles and data are being analyzed to determine what each customer is most likely to buy (so that these products or services can be offered to them) and which buying channels the customer is most likely to use (so that the offers can be made through these channels);

2) human resources development, wherein the credentials and characteristics of a company's most successful employees are analyzed to help the company hire the best people;

3)  retail business, wherein product sales data are analyzed to determine which products must be offered together;

4)  genetic studies, wherein variations in the human DNA sequence are being analyzed to determine their relationships with various diseases and illnesses;

5)  education, wherein student behavioral data are being analyzed in relation to student retention in a university;

6)  drug reaction surveillance, wherein adverse drug reaction reporting patterns are being analyzed to detect safety issues with prescriptive drugs.

    

 

   

See Also:   More Industry Articles