Share with your friends


Analytics Magazine

Text Analytics: Mining for Intelligence

January/February 2011


Deriving meaning from the deluge of documents and purging content chaos.

Fiona McNeillBy Fiona McNeill

Experience is a valuable asset, but how do organizations capture and organize such assets for use in decision-making?

Given that 70 percent to 80 percent of all data is unstructured, one can reasonably assume that organizational decision making – based on data warehouses, business reporting and the like – has often been based on only 20 percent to 30 percent of the available data. There’s great opportunity for improvement, then, when organizations include the intelligence buried in unstructured data assets in the decision-making process.

This means identifying any unique attributes in text collections and harmonizing them with the structured data, including that which is generated by operational systems. It means ensuring that decisions are based on all the available information. That avails organizations a whole new lease on the derived intelligence.

You no longer have to encode the characteristics into discrete categories, such as customer complaints or bad reviews on blogs. With today’s text analytics technology, this data can be input as unstructured data, directly and consistently classifying the essence of the material. This puts context back into the numbers.

I have a condition I call “formaphobia”; I have a really hard time filling in boxes to describe me, my life, feelings and needs. I’d be glad if I never had to check off another box! Creating inputs as boxes was the result of limitations in recording information defined by the relational systems that we are all familiar with. But no longer does the technology have to drive how you receive input from your customers, constituents and operations. Now you can examine, utilize and decipher from the text what you need, and in the case of text mining, find new items that you never even knew existed.

How Organizations Turn Text into Gold

Technology plays a strategic role in decision-making. A business analytics framework integrates all the required elements needed to make decisions and intelligently improve operations.

Text analytics technologies allow new, numeric representations of text to be embedded into traditional statistical and forecasting models. From there, results of this new, previously unknown insight are delivered to the end user/information consumer within the reporting capabilities of the framework.

All of this can be invisible to the information consumer; they could be looking at a spreadsheet or their corporate dashboard and see the forecasted volume of positive and negative sentiment being expressed across different Web channels, for example.

Effective text analytics solutions – in fact, the only sustainable, semantic-driven “smart” ones – are most successful when you don’t even know they are there. What you do know is that your searches find relevant, accurate information, and you see reductions in warranty claims, workplace injuries and customer complaints. Performance is as predicted, and your subject matter experts are freed to do even more intelligent tasks for your organization.

Keep in mind that circumstances change over time. In fact, customer intelligence marketing is designed to do exactly that – change behavior. If your methods are effective, then sentiment will adjust, new words will be used to describe you relative to your competitors, and public perception will change. As such, these systems are not static. They must be dynamic, editable and flexible to change with your business.

Three Keys to Success

Text analytics is designed to extract and decipher the meaning held within documents from repositories, blogs, tweets, customer communication systems, claims and service notes, to name a few. The extraction associated with content categorization and ontologies is a historically-focused activity – looking at what was already written and transcribed, and deciding what the core meaning is within that particular text. When we extend to sentiment analysis and text mining, we discover completely new things that never would be identified by looking at each document in isolation. Semantics are important.

Three keys to success in this regard:
1. With smart organizational cultures, semantic-driven implementations are successful when end-user employees don’t even know the technologies are there – the capabilities are embedded into business processes. No longer are we building little used “field of dreams warehouses.” These are intelligent systems, implemented in operational reporting and activity systems.

2. Semantics is fundamentally about encoding knowledge, recording it and processing it in a meaningful way. In that codification, the text analytics technology documents how people think about and interpret the written word. It’s not Y2K encoding – we have all learned from that. This uses dynamic, learning structures designed to identify concepts and topics and change over time.

3. Organizations need to take a strategic approach. Semantic technology must be open to new rule definitions and adapt to new knowledge the organization derives – refining rules, including new topics. It needs to change with the organization. This means more than just getting through the materials faster. While that is part of the challenge, review of each document in and of itself will not give the insight that you get from examining the entire collection and mining it for new discovery.

Text Analytics Yield Big Wins

A manufacturer with a rich acquisition history had over a million product numbers across four different brands – they had been through a lot of change. For them, getting through that material was like reading 500 copies of “War and Peace.” Text analytics helped them make sense of the data – defining well-organized concepts, capturing key identifiers and predicting which categories any particular product belongs to with 95 percent accuracy.

In the medical field, examining across a patient’s entire unit of care and across collections of patient records, text data research has revealed that the frequently asserted link between diabetes and obesity may not hold true per se. It is at the threshold between normal and obese where the peak of diabetes occurrence is happening – not simply in the obese classification of the body mass index (BMI) spectrum.

A large insurance company used text analytics in evaluating workplace injury claims and found 600 completely new cases of threatening situations (new concepts or categories, if you will) from free-form, handwritten text that would not have been discovered by using code matching.

These examples also help explain the difference between mining and extraction. Looking for known items and concepts and extracting them from text Analyticsvia machine learning (semantic technologies) are advancements associated with text analytics. But it is not mining. Finding things that you already know is not discovery. Identifying completely new concepts or previously unknown associations by examining collections of documents is what mining is all about. Identifying new patterns and word combinations and isolating emerging issues – now that is mining.

There is an art and a science to understanding text, in deriving meaning from the deluge of materials and, by consequence, purging the content chaos. Because the goal is to mirror the human rationalizing process, how we interpret materials using machine learning and statistical techniques requires refinements from human subject-matter experts. Experience is a valuable asset, and dynamically encoding it so the entire organization can systematically and consistently draw upon it can produce an army of experts.

As global product marketing manager, Fiona McNeill oversees the product marketing of text analytics at SAS ( During her 12-year tenure, she has helped organizations derive tangible benefit from their strategic use of technology. In addition to working with a wide range of industries, McNeill has defined product strategy and corporate relationships at SAS. Before joining SAS, McNeill was a member of IBM Global Services.




Study: The magic of animated movies not tied to latest technology

In the nearly 60 years between the 1939 release of Hollywood’s first full-length animated movie, “Snow White and the Seven Dwarfs” and modern hits like “Toy Story,” “Shrek” and more, advances in animation technology have revolutionized not only animation techniques, but moviemaking as a whole. However, a new study in the INFORMS journal Organization Science found that employing the latest technology doesn’t always ensure creative success for a film. Read more →

Six finalists named for Edelman Award

INFORMS selected a diverse group of six finalists for the 47th annual Franz Edelman Award for Achievements in Operations Research and Management Science, the world’s most prestigious award for achievement in the practice of analytics and O.R. The 2018 finalists, who will present their work before a panel of judges at the INFORMS Conference on Analytics & Operations Research in Baltimore on April 15-17, included innovative applications in broadcasting, healthcare, communication, inventory management, vehicle fleet management and alternative energy. Read more →

Are Super Bowl ads worth it? New research suggests benefits persist

On Feb. 4, more than 40 percent of U.S. households will watch the 2018 Super Bowl game on TV. Advertisers will pay up to $4 million for a 30-second spot during the telecast. Is the high cost of advertising worth it? A new study finds that the benefits from Super Bowl ads persist well into the year with increased sales during other sporting events. Further, the research finds that the gains in sales are much more substantial when the advertiser is the sole advertiser from its market category or niche in a particular event. Read more →



2018 INFORMS Conference on Business Analytics and Operations Research
April 15-17, 2018, Baltimore


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to