|
Data warehouses can be extremely valuable tools for
making intelligent business decisions. However, they
frequently result in very large databases that are
difficult to understand and use.
Justifying the cost of a data warehouse can also be
difficult, since the payoffs are initially hypothetical
and will only be realized by deriving value from the
mass of data that has been accumulated. Without new
insights from the data or tangible benefits from actions
directly attributable to analysis of the data,
frustration and disillusion with the technology (and its
cost) may result. Data warehouses users may feel that
they are "data rich, information poor" or "drowning in
data but lacking information."
The challenge is turning data into information - and
putting that information into action.
Without information, action is either impossible or
foolhardy.
Data mining can be thought of as the software and
applications technology that turns data into information
and fulfills the promise of data warehousing. Data
mining is really a means of knowledge discovery --
finding a set of patterns that turns data into
information.
Until recently, data mining was the preserve of
specialists -- statisticians or machine learning experts
-- who practiced their art using arcane, often homegrown
tools. Faced with poorly organized data, these experts
expended much of their energy on cleaning up data to get
it into good enough shape to be processed. Now, much of
this cleanup can be performed during the construction of
the data warehouse.
At the same time, a new generation of data mining
tools aimed at the business user, rather than the
expert, has emerged. These tools mask the complexities
of the algorithms and are easy enough to be used by
sophisticated business analysts -- people who know the
business problems being addressed and understand the
data involved in their solutions.
Data mining is a broad technology that can
potentially benefit any functional area in a business
where there is a major need or opportunity for improved
performance and where data analysis can impact that
improvement.
Part of the power of data mining is that it not only
solves difficult business problems, but it does so in
ways that are repeatable. The data mining process
involves developing models that can be used to solve the
business problem at hand. Since they are models, they
can be reused on new data. As the data in the warehouse
is refreshed, the models can be re-run on new data and
new results obtained.
If patterns in the data change significantly over
time (such as purchasing propensities evolve to new
tastes), the models can be retrained using new data and
can give different results. Thus, after analyzing the
effectiveness of a Thanksgiving promotion, a retailer
can use retrain the model to analyze Christmas
promotions. If new types of data are added, the model
can be revised to take the influence of these new
attributes into account.
This is sometimes called "generalized insight,"
meaning that, unlike insights gained with query or
analytical tools ("specialized insight"), data mining
insights are reusable. This represents a major step
forward in information technology toward the goal of
"continuous insight," where the system will one day
constantly monitor events and automatically adapt to a
new environment.
Companies should make data mining an integral and
continuous part of their business processes. Having
built a model, they can regularly calibrate its accuracy
and revise it when necessary or on a scheduled basis.
Companies can continue to build more sophisticated and
more pinpointed models. For example, they can map
customers into segments and follow and predict their
progress from one segment to another. They can develop
"customer lifetime value" models to guide marketing and
product development efforts. And they can feed results
from one campaign into the development of models for the
next campaign. Data mining becomes a way of life and a
means for staying ahead of the competition.
Data mining is a process that can provide valuable
returns on investment, when utilizing a highly detailed,
customer-centric data warehouse to gain new insight into
transactions and behaviors.
|