Data visualization: The big picture of big data
By Nana S. Banerjee
The 1977 film “Powers of Ten” portrays the universe as an arena of both continuity and change. The short documentary, selected by the Library of Congress as being “culturally, historically or aesthetically significant,” and written and directed by Charles and Ray Eames, begins with a 1-meter distant shot of a man laying by a picnic setting and steadily moves out until it reveals the very edge of the known universe. Then, at a rate of 10-to-the-tenth meters per second, the film rushes us back toward Earth to the reclined man’s hand and further down to the level of a carbon atom on his skin. That fascinating journey into the macro and then micro demonstrates visually the importance of scale and, in a metaphysical sense, the importance of visualization itself.
The importance of data visualization becomes more obvious when viewed within the context of how the human brain works. Much has been written in recent years about how the processes of the brain and how understanding those processes can provide profound insights. In his best-selling 2012 book “Thinking, Fast and Slow,” Nobel laureate Daniel Kahneman introduces the terms System 1 and System 2. The terms differentiate between the information processing that occurs in the human subconscious and conscious minds. System 1 addresses the functions that are uncontrolled and effortless. System 2 comprises functions that are controlled and require effort to engage. In action, System 1 allows us to instantaneously recognize facial expressions – visual processing. In contrast, System 2 allows us to make complex decisions or apply reason.
A little more than a decade before the release of Kahneman’s book, Danish physicist Tor Nørretranders, in his book “The User Illusion: Cutting Consciousness Down to Size,” converts the “bandwidth of human senses” to computer terms. He explains just why data visualization (a manifestation of System 1) is perhaps the most powerful form of data interpretation. Nørretranders demonstrates that when assessing the “language of the mind,” the sense of sight simply operates at an order of magnitude faster than the sense of touch (similar to the bandwidth associated with a network of computers), which in itself operates at an order of magnitude faster than the sense of smell. As such, the sense of smell operates at an order of magnitude faster than the sense of taste (which has a bandwidth similar to a calculator)!
Figure 1: Natural log of relative operating speeds. Source: “The User Illusion: Cutting Consciousness Down to Size,” by Tor Nørretranders (Penguin Press Science)
Realizing how quickly we understand and internalize what we see is at the foundation of what makes data visualization such an important aspect of how we analyze information and make better decisions. That said, the mechanisms behind data visualization create a powerful tool to design effective visualizations to suit any context – whether that tool is a simple, static bar chart or something vastly more complex, multidimensional and interactive. As such, the science behind data visualization ranges from the fundamentals of how we literally see to the complexities of cognitive psychology.
Combining the science with the art – how best to portray the intent of any particular visualization – winds up somewhere on a curve between presentation and exploration. The difference between presentation and exploration can be described as the difference between presenting a known story in a data set using analysis and exploring a not-yet-understood data set using a visual examination. Henry David Thoreau said, “It’s not what you look at that matters, it’s what you see.” With data visualization, the significance of the quote is quite literal. It’s a fully formed discipline that requires multiple skills – among them, the knowledge of statistics, ideas of space, design and topography, and a deep subject matter expertise in the sector being served.
From Theory to Practice
Currently, for any company that deals with a titanic amount of data, data visualization is and will remain an absolutely fundamental tool. Verisk Analytics is a prime example – collecting and maintaining highly granular data on several billions of insurance policies and claims, credit card and debit card transactions, real estate, health services, government and human resources. While many consumer-centric firms have long been skilled in collecting information, they now generate and acquire exponentially growing, disparate and complex quantities of data – and depend on that data in many ways for their very survival in today’s marketplace.
Much of the talk today about data management and analysis and its effect on how business gets done targets the science of analytical modeling. In that pursuit, many firms indeed have come far, and yet they still have farther to go. Large, well-capitalized firms (such as banks, insurers, retailers) spend considerable resources and energy collecting and storing data, not just because they produce a lot of it but more likely because the regulatory environment mandates storing much of it. These firms haven’t spent nearly enough effort aggregating their data across functional silos, integrating internal data with third-party data, analyzing the data, and distributing the resulting insights to people who can take action on it.
As an example from the retail sector, imagine that a retailer is looking to assess the effectiveness of a particular promotional campaign at its retail stores through the holiday season. The management team at the retailer would invariably want to know: Do we know the baseline sales at our stores and our competitor stores before the promotional period? Let’s say maybe. Do we know how shoppers at our stores respond to promotional offers in the regular season? That’s another maybe. Do we know what the weather was like and if it played a role in affecting shopper turnout at our stores during the campaign period? That’s one more maybe. Would all those pieces of information come together at the same time and be presented to management in a manner that’s easy to analyze? That’s highly unlikely. And that’s a great example of where visualization becomes so helpful.
As consumers of information, we’re all demanding visualization in our own way. We’ve started to reject the culture of sound bites and nonsynthesized statistics that agenda-driven interest groups have inundated us with in the last two decades. Visualization allows us to map the information in a way that leads to better decision-making – easier and faster. The 2012 InformationWeek Business Intelligence, Analytics and Information Management Survey, conducted in late 2011, indicated nearly half (45 percent) of the 414 respondents cited “ease-of-use challenges with complex software/less-technically savvy employees” as the second-biggest barrier to adopting business intelligence/analytics products – fractionally behind the biggest barrier, “data quality problems,” cited by 46 percent of respondents.
Figure 2: A detailed inundation map of the New England coastline depicted in Figure 2 shows the surge footprint of Hurricane Sandy in blue. The orange circles represent clustered locations by actual number, with the largest circles containing the most individual locations. Where the blue and orange overlap, the map illustrates where Sandy had the greatest impact. Such maps help companies determine the extent of floods and resulting losses. Source: AIR Touchstone® zoomed-in surge
Mother Nature’s Infographics
Catastrophe risk management has come a long way in its 25-year history, and the sophistication of those analytics goes well beyond the numbers in a database. The end result has been fast, intuitive insight into what drives risk.
Looking back to Superstorm Sandy, healthcare officials in New York City, in advance of the storm, were trying to decide whether to evacuate hospitals. In the end, many chose not to move patients before the storm. Unfortunately, numerous hospitals were then catastrophically flooded, and patients had to be moved during the worst of the deluge.
Certainly, myriad factors go into assessing a situation like that, but as analytics and their visualization become increasingly sophisticated, they’ll be able to help risk-bearing organizations, including insurers and local authorities, develop appropriate prescriptions for mitigating risk – by providing the contextual detail for better-informed decisions.
Today’s advanced climate models are capable of effectively projecting the impact of storms as they get closer to coastlines or geographic regions. Such models can assess the total number of homeowners expected to be affected, when an event is expected to worsen, and when it will be safe for insurance personnel to move into the area. The visualization models enable the decision-maker or assessor to evaluate locations at the individual building level. That capability facilitates a preplanning process and allows companies to communicate proactively with policyholders so they can take certain loss control measures – such as boarding windows, reducing chance of fire, and so on – to mitigate damage. Such models are also allowing insurers to readily project and visualize the impact of fallen trees on power lines serving a group of policyholders.
Given the complexity associated with climate change and the inherent difficulty in the assimilation of evolving and interdependent data, our dependence on a sophisticated and constantly improving visualization capability is far too great to be denied.
The Sight in Business Insight
Unquestionably, the tried-and-true bar, line and pie charts have served us well. But when the complexities of relationships are more nuanced and the data becomes more unstructured, visual analytics need to become more dynamic, multidimensional and customized.
For lenders and insurers, visualization can help identify a range of data issues quickly – from a high-level view of exposure location to exposure composition and completeness, including breakdowns by profile of the entities at risk (customers, businesses, properties, vehicles and so on). Visual link analysis technology helps discover critical, previously hidden connections within data. Seeing those connections – within proprietary data, in data from external sources or through a combination of sources – provides insight and knowledge to make decisions. The technology finds all data elements applicable to a question and draws a picture of the connections among those elements, revealing previously invisible relationships. The contextual approach provides a multidimensional understanding of profitability, customer behavior, and industry trends.
Data integrity can be a significant problem for large organizations, especially where multiple, complex databases are involved. Mapping techniques often find thousands of errors in a fraction of the normal time. Mapping also finds red flags in claims data. Fraud investigators at financial institutions often use visual link analysis to assist in their inquiries. For example, a money-laundering investigator monitors each check, credit card or ATM withdrawal over a specific threshold, and the technology helps in instantly flagging irregular patterns, revealing potential sources of fraud or money laundering. Seeing those connections – within company data, in data from external sources or through a combination of sources can give claims investigators insight and knowledge to help make better decisions.
Visualization is useful in insurance for commercial fleet and personal auto policyholders. Telematics programs use sensors to determine factors as simple as distance (vehicle miles traveled) and as sophisticated as camera-based recording. Devices transmit and store the resultant collection for immediate or deferred analysis, meaningful interpretation and visualization. Although the use of telematics data and visualization is in its early stages, the usage-based insurance (UBI) opt-in rate is expected to increase to 20 percent over the next five years, according to one recent industry poll. Other polls consistently show that two-thirds of consumers are open to telematics-based insurance policies, especially if there’s the potential for premium discounts. Among newer consumers of vehicle insurance — the Gen Ys and the Millennials — the use of telematics and visualization technology is almost expected.
While throughout history and in the present day there is always that rare breed with the unique and innate ability to quickly make sense of disparate sources of information and data, the mortals among us are blessed to be living at a time replete with the data and tools to make those connections for us in a fraction – enabling us not only to make better business decisions but maybe even allow us to see the as yet unforeseen.
Dr. Nana Banerjee is a group executive of Verisk Analytics. He serves as president of Argus Information and Advisory Services and as chief analytics officer of Verisk Analytics.
- 41Many organizations have noticed that the data they own and how they use it can make them different than others to innovate, to compete better and to stay in business. That’s why organizations try to collect and process as much data as possible, transform it into meaningful information with data-driven…
- 39September/October 2014 “From Twitter feeds to photo streams to RFID pings, the big data universe is rapidly expanding, providing unprecedented opportunities to understand the present and peer into the future. Tapping its potential while avoiding its pitfalls doesn’t take magic; it takes a roadmap.” — Chris Berdik, author of “Mind…
- 38Content/Interactive Marketing Opportunities Analytics-Magazine.org can help you build a successful content marketing program or interactive lead generation program. Enhance your position as an industry thought leader and expert in the analytics profession by promoting the following content formats on Analytics-Magazine.org. Product Videos Software Demonstrations White Papers Case Studiesa Research Reports…
- 37March/April 2013 By Vijay Mehrotra As described in the previous edition of Analyze This!, I am currently working on a research study with Jeanne Harris at Accenture’s Institute for High Performance. Specifically, we are seeking to develop a quantitative and qualitative understanding of the current state of analytics practice. If…
- 36The Panama Papers, the unprecedented leak of 11.5 million files from the database of the global law firm Mossack Fonseca, opened up the offshore tax accounts of the rich, famous and powerful – laying bare how they have exploited secretive offshore tax regimes for decades.