Share with your friends


Analytics Magazine

Text Analytics: Mining for Intelligence

January/February 2011


Deriving meaning from the deluge of documents and purging content chaos.

Fiona McNeillBy Fiona McNeill

Experience is a valuable asset, but how do organizations capture and organize such assets for use in decision-making?

Given that 70 percent to 80 percent of all data is unstructured, one can reasonably assume that organizational decision making – based on data warehouses, business reporting and the like – has often been based on only 20 percent to 30 percent of the available data. There’s great opportunity for improvement, then, when organizations include the intelligence buried in unstructured data assets in the decision-making process.

This means identifying any unique attributes in text collections and harmonizing them with the structured data, including that which is generated by operational systems. It means ensuring that decisions are based on all the available information. That avails organizations a whole new lease on the derived intelligence.

You no longer have to encode the characteristics into discrete categories, such as customer complaints or bad reviews on blogs. With today’s text analytics technology, this data can be input as unstructured data, directly and consistently classifying the essence of the material. This puts context back into the numbers.

I have a condition I call “formaphobia”; I have a really hard time filling in boxes to describe me, my life, feelings and needs. I’d be glad if I never had to check off another box! Creating inputs as boxes was the result of limitations in recording information defined by the relational systems that we are all familiar with. But no longer does the technology have to drive how you receive input from your customers, constituents and operations. Now you can examine, utilize and decipher from the text what you need, and in the case of text mining, find new items that you never even knew existed.

How Organizations Turn Text into Gold

Technology plays a strategic role in decision-making. A business analytics framework integrates all the required elements needed to make decisions and intelligently improve operations.

Text analytics technologies allow new, numeric representations of text to be embedded into traditional statistical and forecasting models. From there, results of this new, previously unknown insight are delivered to the end user/information consumer within the reporting capabilities of the framework.

All of this can be invisible to the information consumer; they could be looking at a spreadsheet or their corporate dashboard and see the forecasted volume of positive and negative sentiment being expressed across different Web channels, for example.

Effective text analytics solutions – in fact, the only sustainable, semantic-driven “smart” ones – are most successful when you don’t even know they are there. What you do know is that your searches find relevant, accurate information, and you see reductions in warranty claims, workplace injuries and customer complaints. Performance is as predicted, and your subject matter experts are freed to do even more intelligent tasks for your organization.

Keep in mind that circumstances change over time. In fact, customer intelligence marketing is designed to do exactly that – change behavior. If your methods are effective, then sentiment will adjust, new words will be used to describe you relative to your competitors, and public perception will change. As such, these systems are not static. They must be dynamic, editable and flexible to change with your business.

Three Keys to Success

Text analytics is designed to extract and decipher the meaning held within documents from repositories, blogs, tweets, customer communication systems, claims and service notes, to name a few. The extraction associated with content categorization and ontologies is a historically-focused activity – looking at what was already written and transcribed, and deciding what the core meaning is within that particular text. When we extend to sentiment analysis and text mining, we discover completely new things that never would be identified by looking at each document in isolation. Semantics are important.

Three keys to success in this regard:
1. With smart organizational cultures, semantic-driven implementations are successful when end-user employees don’t even know the technologies are there – the capabilities are embedded into business processes. No longer are we building little used “field of dreams warehouses.” These are intelligent systems, implemented in operational reporting and activity systems.

2. Semantics is fundamentally about encoding knowledge, recording it and processing it in a meaningful way. In that codification, the text analytics technology documents how people think about and interpret the written word. It’s not Y2K encoding – we have all learned from that. This uses dynamic, learning structures designed to identify concepts and topics and change over time.

3. Organizations need to take a strategic approach. Semantic technology must be open to new rule definitions and adapt to new knowledge the organization derives – refining rules, including new topics. It needs to change with the organization. This means more than just getting through the materials faster. While that is part of the challenge, review of each document in and of itself will not give the insight that you get from examining the entire collection and mining it for new discovery.

Text Analytics Yield Big Wins

A manufacturer with a rich acquisition history had over a million product numbers across four different brands – they had been through a lot of change. For them, getting through that material was like reading 500 copies of “War and Peace.” Text analytics helped them make sense of the data – defining well-organized concepts, capturing key identifiers and predicting which categories any particular product belongs to with 95 percent accuracy.

In the medical field, examining across a patient’s entire unit of care and across collections of patient records, text data research has revealed that the frequently asserted link between diabetes and obesity may not hold true per se. It is at the threshold between normal and obese where the peak of diabetes occurrence is happening – not simply in the obese classification of the body mass index (BMI) spectrum.

A large insurance company used text analytics in evaluating workplace injury claims and found 600 completely new cases of threatening situations (new concepts or categories, if you will) from free-form, handwritten text that would not have been discovered by using code matching.

These examples also help explain the difference between mining and extraction. Looking for known items and concepts and extracting them from text Analyticsvia machine learning (semantic technologies) are advancements associated with text analytics. But it is not mining. Finding things that you already know is not discovery. Identifying completely new concepts or previously unknown associations by examining collections of documents is what mining is all about. Identifying new patterns and word combinations and isolating emerging issues – now that is mining.

There is an art and a science to understanding text, in deriving meaning from the deluge of materials and, by consequence, purging the content chaos. Because the goal is to mirror the human rationalizing process, how we interpret materials using machine learning and statistical techniques requires refinements from human subject-matter experts. Experience is a valuable asset, and dynamically encoding it so the entire organization can systematically and consistently draw upon it can produce an army of experts.

As global product marketing manager, Fiona McNeill oversees the product marketing of text analytics at SAS ( During her 12-year tenure, she has helped organizations derive tangible benefit from their strategic use of technology. In addition to working with a wide range of industries, McNeill has defined product strategy and corporate relationships at SAS. Before joining SAS, McNeill was a member of IBM Global Services.




Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to