You run a local departmental store and you log all the details of your customers in the store database. You know the names of your customers and what items they buy each day.
For example, Alex, Jessica and Paul visit your store every Sunday and buys candle. You store this information in your store database. This is data. Any time you want to know who are the visitors that buy candle, you can query your database and get the answer. This is information. You want to know how many candles are sold on each day of week from your store, you can again query your database and you’d get the answer – that’s also information.
But suppose there are 1000 other customers who also buy candle from you on every Sunday (mostly – with some percentage of variations) and all of them are Christian by religion. So, you can conclude that Alex, Jessica and Paul must be also Christian.
Now the religion of Alex, Jessica and Paul were not given to you as data. This could not be retrieved from the database as information. But you learnt this piece of information indirectly. This is the ”knowledge” that you discovered. And this discovery was done through a process called “Data Mining”.
Now there are chances that you are wrong about Alex, Jessica and Paul. But there are fare amount of chances that you are actually right. That is why it is very important to “evaluate” the result of KDD process.
I gave you this example because I wanted to make a clear distinction between knowledge and information in the context of data mining. This is important to understand our first question – why retrieving information from deep down of your database is not same as data mining. No matter how complex the information retrieval process is, no matter how deep the information is located at, it’s still not data mining.
As long as you are not dealing with predictive analysis or not discovering “new” pattern from the existing data – you are not doing data mining.
No comments:
Post a Comment