Text Analytics: Mining for Intelligence
Deriving meaning from the deluge of documents and purging content chaos.
By Fiona McNeill
Experience is a valuable asset, but how do organizations capture and organize such assets for use in decision-making?
Given that 70 percent to 80 percent of all data is unstructured, one can reasonably assume that organizational decision making – based on data warehouses, business reporting and the like – has often been based on only 20 percent to 30 percent of the available data. There’s great opportunity for improvement, then, when organizations include the intelligence buried in unstructured data assets in the decision-making process.
This means identifying any unique attributes in text collections and harmonizing them with the structured data, including that which is generated by operational systems. It means ensuring that decisions are based on all the available information. That avails organizations a whole new lease on the derived intelligence.
You no longer have to encode the characteristics into discrete categories, such as customer complaints or bad reviews on blogs. With today’s text analytics technology, this data can be input as unstructured data, directly and consistently classifying the essence of the material. This puts context back into the numbers.
I have a condition I call “formaphobia”; I have a really hard time filling in boxes to describe me, my life, feelings and needs. I’d be glad if I never had to check off another box! Creating inputs as boxes was the result of limitations in recording information defined by the relational systems that we are all familiar with. But no longer does the technology have to drive how you receive input from your customers, constituents and operations. Now you can examine, utilize and decipher from the text what you need, and in the case of text mining, find new items that you never even knew existed.
How Organizations Turn Text into Gold
Technology plays a strategic role in decision-making. A business analytics framework integrates all the required elements needed to make decisions and intelligently improve operations.
Text analytics technologies allow new, numeric representations of text to be embedded into traditional statistical and forecasting models. From there, results of this new, previously unknown insight are delivered to the end user/information consumer within the reporting capabilities of the framework.
All of this can be invisible to the information consumer; they could be looking at a spreadsheet or their corporate dashboard and see the forecasted volume of positive and negative sentiment being expressed across different Web channels, for example.
Effective text analytics solutions – in fact, the only sustainable, semantic-driven “smart” ones – are most successful when you don’t even know they are there. What you do know is that your searches find relevant, accurate information, and you see reductions in warranty claims, workplace injuries and customer complaints. Performance is as predicted, and your subject matter experts are freed to do even more intelligent tasks for your organization.
Keep in mind that circumstances change over time. In fact, customer intelligence marketing is designed to do exactly that – change behavior. If your methods are effective, then sentiment will adjust, new words will be used to describe you relative to your competitors, and public perception will change. As such, these systems are not static. They must be dynamic, editable and flexible to change with your business.
Three Keys to Success
Text analytics is designed to extract and decipher the meaning held within documents from repositories, blogs, tweets, customer communication systems, claims and service notes, to name a few. The extraction associated with content categorization and ontologies is a historically-focused activity – looking at what was already written and transcribed, and deciding what the core meaning is within that particular text. When we extend to sentiment analysis and text mining, we discover completely new things that never would be identified by looking at each document in isolation. Semantics are important.
Three keys to success in this regard:
1. With smart organizational cultures, semantic-driven implementations are successful when end-user employees don’t even know the technologies are there – the capabilities are embedded into business processes. No longer are we building little used “field of dreams warehouses.” These are intelligent systems, implemented in operational reporting and activity systems.
2. Semantics is fundamentally about encoding knowledge, recording it and processing it in a meaningful way. In that codification, the text analytics technology documents how people think about and interpret the written word. It’s not Y2K encoding – we have all learned from that. This uses dynamic, learning structures designed to identify concepts and topics and change over time.
3. Organizations need to take a strategic approach. Semantic technology must be open to new rule definitions and adapt to new knowledge the organization derives – refining rules, including new topics. It needs to change with the organization. This means more than just getting through the materials faster. While that is part of the challenge, review of each document in and of itself will not give the insight that you get from examining the entire collection and mining it for new discovery.
Text Analytics Yield Big Wins
A manufacturer with a rich acquisition history had over a million product numbers across four different brands – they had been through a lot of change. For them, getting through that material was like reading 500 copies of “War and Peace.” Text analytics helped them make sense of the data – defining well-organized concepts, capturing key identifiers and predicting which categories any particular product belongs to with 95 percent accuracy.
In the medical field, examining across a patient’s entire unit of care and across collections of patient records, text data research has revealed that the frequently asserted link between diabetes and obesity may not hold true per se. It is at the threshold between normal and obese where the peak of diabetes occurrence is happening – not simply in the obese classification of the body mass index (BMI) spectrum.
A large insurance company used text analytics in evaluating workplace injury claims and found 600 completely new cases of threatening situations (new concepts or categories, if you will) from free-form, handwritten text that would not have been discovered by using code matching.
These examples also help explain the difference between mining and extraction. Looking for known items and concepts and extracting them from text Analyticsvia machine learning (semantic technologies) are advancements associated with text analytics. But it is not mining. Finding things that you already know is not discovery. Identifying completely new concepts or previously unknown associations by examining collections of documents is what mining is all about. Identifying new patterns and word combinations and isolating emerging issues – now that is mining.
There is an art and a science to understanding text, in deriving meaning from the deluge of materials and, by consequence, purging the content chaos. Because the goal is to mirror the human rationalizing process, how we interpret materials using machine learning and statistical techniques requires refinements from human subject-matter experts. Experience is a valuable asset, and dynamically encoding it so the entire organization can systematically and consistently draw upon it can produce an army of experts.
As global product marketing manager, Fiona McNeill oversees the product marketing of text analytics at SAS (www.sas.com). During her 12-year tenure, she has helped organizations derive tangible benefit from their strategic use of technology. In addition to working with a wide range of industries, McNeill has defined product strategy and corporate relationships at SAS. Before joining SAS, McNeill was a member of IBM Global Services.