June 23-26, 2013
INFORMS Healthcare 2013
October 6–9, 2013
2013 INFORMS Annual Meeting
June 10-14, 2013
Predictive Analytics World
September 8-14, 2013
2013 ASE/IEEE International Conference on Big Data
The untapped potential of time series data mining
By Wayne Thompson, Udo Sglavo and Sascha Schubert
Financial planning and budgeting, supply chain management, retail replenishment and planning – these are just a few of the critical business functions that benefit greatly from data mining, forecasting and time series analysis, three established disciplines of analytics. These three disciplines are used in many industries for many different functions, but now leading organizations are recognizing the impact of combining them to create a more powerful brand of predictive analytics. Before describing the wide array of business advantages that can be gained by integrating data mining, time series analysis and forecasting, let’s look at some definitions.
Data mining is a collection of analytical techniques that enable automated search for patterns and relations within a large portfolio of characteristics to find relationships that can be used for improved decision-making. For example, based on the characteristics of a customer – such as age, demographics, product portfolio, contact history and others – data mining can be used to identify a set of customers most likely to respond to a specific marketing offer.
Time series analysis and forecasting are used to detect temporal patterns from historical time-dependent data and project the detected patterns (such as trend or seasonality) into the future. For example, time series analysis plays a crucial role in forecasting electricity demand for the utilities industry. Electricity demand follows long-term trends, such as population growth and industrial activity, as well as shorter seasonal cycles for time of year (summer versus winter), day of the week (business days versus weekends) and time of day (peak demand to drive air conditioning on hot summer afternoons, and low demand in the middle of the night). Good software detects and reconciles the various temporal patterns and provides both the long-range and near-term forecasts that utility companies require.
Time series data mining combines data mining with time series analysis to:
- Extract features of time series data (such as seasonal patterns, etc.) for building better predictive models.
- Reduce time series data into fewer dimensions by using data mining methods, such as variable selection and clustering.
- Conduct similarity analysis of time series data (pattern detection) for segmenting data, or validating forecasts of new products.
Some practical applications of time series data mining include:
Marketing – To predict the response of a customer to a specific marketing offer, the recency of contact plays an important role. Customers who have reacted positively to a recent offer are more likely to react positively again. Marketing departments across all industries have been using this insight for more targeted customer relationship management. Advanced predictive models often use proxy variables to capture the temporal aspect of the relationship between historical customer behavior and desired future outcome. For example, analysts often manually summarize transactional data about product usage into a set of time series, such as total monthly air minutes, maximum total air use, and change from previous month for mobile phone use. The time series data is then used as input to the predictive models.
But because the integration of this temporal effect was managed manually, it further complicates the already tedious data preparation. Time series data mining (TSDM) tools automate the data preparation phase to include temporal relationships in predictive modeling. This will help speed up the data preparation, as well as improve the accuracy of predictive models.
Inventory management – Often, time series information is collected on a very granular level in organizations. For example, retailers measure sales of items in a store on the SKU level and in daily time intervals. For stores with thousands of items, this results in a large amount of time series with many records because historical data is sometimes collected over many years. This large amount of data often makes it difficult to extract information relevant for decision-making. TSDM tools help analysts quickly reduce the dimensionality of the problem under investigation and extract signals from the noise. SKUs with similar sales trends can be combined into segments without losing critical information. Time series analysis techniques, such as smoothing, can help compress detailed information into a picture that makes general patterns easier to spot.
Fraud detection – The similarity analysis provided with TSDM tools work on the most detailed level in order to spot exceptions to average behavior. Credit card providers can use time series data mining to automate the detection of fraudulent behavior in financial transactions. They do this by comparing many detailed transactional time series against a known pattern of abusive behavior. It is often only in looking across the temporal representation of behavior that undesired behavior becomes apparent. The similarity analysis tool can quickly detect behavior over time with known signatures of fraud and create flags for further investigation if similar patterns are detected.
New product forecasting – A never-ending challenge for consumer goods manufacturers and retailers, new product forecasting situations include: predicting entirely new types of products, new markets for existing products (such as expanding a regional brand nationally or globally) and refinements of existing products (such as “new and improved” versions or packaging changes). All of them require a forecast of future sales without historic data for the new product. However, by using techniques such as similarity analysis, an analyst can examine the demand patterns of past new products having similar attributes and identify the range of demand curves that can be used to model demand for the new product.
By integrating data mining with time series analysis and forecasting, organizations can take the next step in extracting more value from their data to improve decision making.
Wayne Thompson (Wayne.Thompson@sas.com) is a SAS analytics product manager. Udo Sglavo
(Udo.Sglavo@sas.com) is an analytical consultant for SAS. Sascha Schubert (Schubert@sas.com) is a senior solutions architect for SAS Germany. This article is republished with permission from sascom 4Q.
|Register now for Analytics 2011, set for Oct. 24-25 in Orlando, Fla. Join hundreds of attendees to hear experts share the latest trends for data mining, forecasting, text analytics, fraud detection, predictive modeling, operations research and more.|
Keynote speakers include: Radhika Kulkarni and Oliver Schabenberger, SAS (“High-Performance Analytics – Empowering the Analytical Expert”) and Bart Baesens, K.U. Leuven and University of Southampton (“Social Networks in Data Mining: Challenges and Applications”).
Learn more at www.sas.com/analyticsseries/us.