Data mining is a process for analyzing data to find the interesting knowledge or patterns in the data. There may be large amounts of data stored in databases that needs to be analyzed. To gain benefit from this data, it is necessary to analyze the data to understand it. A data that we cannot understand or draw meaningful conclusions from is useless. There are various ways of analyzing data. Traditionally, data has been analyzed by hand to discover interesting knowledge, but handling data by hand is a tiresome job and drawbacks are attached to it.
Some of the major drawbacks are;
- Time-consuming,
- Chances of errors,
- Important information might get lost,
- Not realistic for large databases.
To remove such problem, techniques have been developed to analyze data and find interesting patterns, rules or other useful information. Data mining techniques understand the information and predict the future outcomes. Data mining techniques are used to take decisions based on facts.
What is the process for analyzing data?
To perform data mining, a process consisting of seven steps is usually followed. This process is often called the “Knowledge Discovery in Database” (KDD) process.
Ø Data cleaning: In this step we remove noise or other inconsistencies that may create problem for analyzing the data.
Ø Data integration: In this step we combine the data from various sources to prepare the data that needs to be analyzed. For example, if the data is stored in multiple databases or file, it may be necessary to integrate the data into a single file or database to analyze it.
Ø Data selection: In this step we select the data so that the analysis could be performed.
Ø Data transformation: This step involves the transformation of the data to a proper format that can be analyzed using data mining techniques.
Ø Data mining: In this step we apply data mining techniques to analyze the data and find interesting patterns or interesting knowledge from the data.
Ø Evaluating the knowledge that has been discovered: In this step we evaluate the knowledge that has been extracted from the data. This can be done in terms of objective or subjective measures.
Visualization: In this step we visualize the knowledge that has been obtained from the data.