Images & videos: really big data
Sizing up the potential impact of prescriptive analytics driven by proliferation of images and video.
By Fritz Venter (LEFT) and Andrew Stein
The human brain simultaneously processes millions of images, movement, sound and other esoteric information from multiple sources. The brain is exceptionally efficient and effective in its capacity to prescribe and direct a course of action and eclipses any computing power available today. Smartphones now record and share images, audios and videos at an incredibly increasing rate, forcing our brains to process more.
Technology is catching up to the brain. Google’s image recognition in “Self-taught Software” is working to replicate the brain’s capacity to learn through experience. In parallel, prescriptive analytics is becoming far more intelligent and capable than predictive analytics. Like the brain, prescriptive analytics learns and adapts as it processes images, videos, audios, text and numbers to prescribe a course of action.
The Future is Now
Google is working on simulating the human brain’s ability to compute, evaluate and choose a course of action using massive neural networks.
The image and video analytics science has scaled with advances in machine vision, multi-lingual speech recognition and rules-based decision engines. Intense interest exists in prescriptive analytics driven by real-time streams of rich image and video content. Consumers with mobile devices drive an explosion of location-tracked image and video data. Lowering costs have democratized cloud-based high-performance computing. Andrew McAfee and Erik Brynjolfsson in Harvard Business Review in October 2012 called this “Big Data: The Management Revolution.”
Image analytics is seen as a potential solution to social, political, economic and industry issues. Thirty years of Intel’s Gordon E. Moore’s law and Harvard Business School’s Clayton Christensen’s disruptive innovation have created the current experience-driven generation that is fully aware of technology’s potential to solve issues plaguing these global domains.
On the consumption side, mobile consumption of video is growing dramatically. Bandwidth is no longer a concern. Prescriptive analytics is poised to deliver relevant video to viewers – beyond Netflix’ algorithm for DVDs to rent based on viewing interests.
Figure 1: Fast-growing consumption of mobile video.
Image Analytics: Technology Process
Image analytics is the automatic algorithmic extraction and logical analysis of information found in image data using digital image processing techniques. The use of bar codes and QR codes are simple examples, but interesting examples are as complex as facial recognition and position and movement analysis.
Today, images and image sequences (videos) make up about 80 percent of all corporate and public unstructured big data. As growth of unstructured data increases, analytical systems must assimilate and interpret images and videos as well as they interpret structured data such as text and numbers.
An image is a set of signals sensed by the human eye and processed by the visual cortex in the brain creating a vivid experience of a scene that is instantly associated with concepts and objects previously perceived and recorded in one’s memory. To a computer, images are either a raster image or a vector image. Simply put, raster images are a sequence of pixels with discreet numerical values for color; vector images are a set of color-annotated polygons. To perform analytics on images or videos, the geometric encoding must be transformed into constructs depicting physical features, objects and movement represented by the image or video. These constructs can then be logically analyzed by a computer.
The process of transforming big data (including image data) into higher-level constructs that can be analyzed is organized in progressive steps that each adds value to the original information in a value chain (see Figure 2) – a concept developed by Harvard professor Michael Porter. Prescriptive analytics leverages the emergence of big data and computational and scientific advances in the fields of statistics, mathematics, operations research, business rules and machine learning.
Figure 2: Value chain of transformations.
Prescriptive analytics is essentially this chain of transformations whereby structured and unstructured big data is processed through intermediate representations to create a set of prescriptions (suggested future actions). These actions are essentially changes (over a future time frame) to variables that influence metrics of interest to an enterprise, government or another institution. These variables influence target metrics over a specified time frame. The structure of the relationship between a metric and the variables that influence it is a called a predictive model. A predictive model represents detected patterns, time series and relationships among sets of variables and metrics. Predictive models of key metrics can project future time series of metrics from forecasted influencing variables.
The first step in the prescriptive analytics process transforms the initial unstructured and structured data sources into analytically prepared data. Although there are parallels with standard data-warehousing/ETL, this step is different from that approach in that it contends with the complexities of pre-processing of unstructured data, as well as structured data including databases, narrative text files, images, videos and sound.
For more details on the image analytics technology process, click here.
Predator drones gather intelligence via video and image reconnaissance.
Defense and Security Driving Demand
The need to analyze data and proactively prescribe actions is pervasive in nearly every vibrant growth industry, government and institutional sector. This has created a vacuum, or demand, for prescriptive analytics systems. Defense and security, as well as healthcare, are particularly good examples of industries that are driving demand for such systems.
The defense industry has pushed the envelope for image processing, and it is reflected in the storage that is being procured by government. GovWin Consulting reports that “Defense agencies are the largest spenders on a per-agency basis at the federal level for electronic data storage.” The Army, Navy and Air Force, along with the Department of Defense, account for 58.4 percent of all federal spending for storage. GovWin indicates that the drivers for this spend are “big data and full motion video.”
The proliferation of captured data of interest to defense and security comes from four clear sources.
- Predator drones gathering intelligence via video and image reconnaissance at reduced risk as they seek out hostile scenarios.
- In-place surveillance cameras increasingly prevalent in public places, managed by federal, state and local governments.
- Stationary commercial and institutional surveillance mounted in public places of business, the workplace, hospitals and schools.
- Consumer-created image and video shared on YouTube, Facebook, Twitter, blogs and other online social media sharing/publishing sites.
While the demand drives proliferation, it also presents a conflict between safety and privacy. People value surveillance as a resource when a child is taken or a loved one goes missing. On the other hand, people see it as an invasion of privacy during everyday activities. Likewise, people value sharing their personal photos with family and friends, but they are concerned that their images and videos may be anonymously processed and analyzed to identify criminal activity. Where is the ethical line of “too much” drawn? And, do younger generations have the same privacy-loss perspective?
Major cities around the world, from London to Las Vegas, have cameras installed so densely that it’s nearly impossible to move about the city without being recorded. Keeping up with the installation statistics is almost impossible. The availability of easy-to-deploy, consumer-installed cameras is ubiquitous. This rate of adoption for security video capture makes an accurate assessment of how much video is being recorded difficult. We just know it is BIG.
Is all this surveillance coupled with the potential of video/image analytics helping? Research published in the Journal of Experimental Social Psychology suggests that increased surveillance only increases our propensity to be Good Samaritans, not reduce crime. Eric Jaffe calls this the “reverse the bystander effect” in his recent article. In the end, surveillance and image analytics does give provide data that can help officials pursue criminal activity and pursue justice, albeit ex post facto.
How does cost drive the demand for video and image analytics? People expect the nation’s defense and security effort to be cost-effective. This means that the country will move to a smaller but more educated fighting force and at the same time increase the use of remote sensing, observation and monitoring tools. Simply put, this means more image and video capture – or surveillance – everywhere.
Healthcare a Perfect Domain
|Parametric response mapping
The complexity of healthcare makes it a perfect domain to explore the potential for prescriptive analytics and imaging. Healthcare has been a pioneer in capturing rich imaging information and built databases to develop a variety of statistical medical norms. The next step is to use this image analytics to provide real-time insight to healthcare providers during diagnosis and treatment.
The advances in medical science come fast, and physicians have a difficult time keeping up with new procedures, treatments and pharmacology while they care for patients. Whether a routine office visit, serious disease or an emergency, prescriptive analytics integrated in medical workflow promises to improve the standard of care and speed of diagnosis, treatment and recovery.
It’s happening now. In Science Business, Alan Kotok wrote about University of Michigan researchers who adapted computed tomography image analytics to diagnose chronic obstructive pulmonary disease (COPD).
Advanced medical decision-support systems (MDDS) link massive knowledge bases to multiple clinical databases. These in turn are linked to a patient’s data. These complex systems have varying schemata, comparative image banks and discipline vocabularies – even local languages. Image analytics reduces varying subjective interpretation and human error, thereby accelerating the process of treatment and recovery.
With an image analytics system that can accurately process and prescribe action, it’s possible to envision real-time patient monitoring systems with rules-based analysis and caregiver notification.
The increasing role of algorithmic diagnosis and treatment creates the perfect opportunity to integrate images with prescriptive analytics. For example, medical professionals at the German company Medal and the Institute for Algorithmic Medicine in Houston, Texas, curate and credential a digital knowledge base of medical algorithms and computational procedures for medical treatment and administration. Integrating image analytics with such technology in a prescriptive analytics system holds potential to make faster and more informed decisions, streamline costs and broadly improve the quality and economics of healthcare.
The Road Ahead
Looking further ahead, several trends, opportunities and issues for video and image analytics will certainly emerge. For example:
- Other industries are already forging strategies for video and image analytics. Consumer and marketing research is one good example. Expect global firms such as GfK, Nielsen, Acxiom and Symphony IRI to reinvent survey-based research to add tone captured in the video of a panelist. Video and image analytics will generate a deeper understanding in both staged and impromptu marketing research. The oil and gas industry is considering what proactive action could be possible by analyzing video and image feeds during drilling and fracking processes (see related story).
- The demand for talent in this area will increase, creating the “job of the future.”
- Look for additional technology breakthroughs involving 3D image and video analytics, breakthroughs that will exponentially increase the potential for prescriptive analytics.
- Where applicable, ethics and the social effect of image and video analytics on people, groups and systems must be considered. Jay Stanley, a senior policy analyst with the ACLU, addresses the topic in “Video Analytics: A Brain Behind the Eye?” and explores the moral question of machines interpreting human activity – predictively and prescriptively.
- Finally, expect to see many purpose-built solutions that can leverage across industries, cultures, domains and other boundaries through a common image and video-processing platform for prescriptive analytics.
Fritz Venter (firstname.lastname@example.org) is the director of technology at AYATA, a prescriptive analytics software company headquartered in Austin, Texas, where he is responsible for product and intellectual property development, as well as delivery of solutions to customers. He has 20 years of industry experience and is finishing his Ph.D. in pattern matching.
Andrew Stein (email@example.com) is the chief advisor at the Pervasive Strategy Group located near Chicago, where he fuels creative vision for sustainable analytics-based strategies for continuous innovation. He can be found sharing disruptive innovative ideas on his blog, www.SteinVox.com.
By Fritz Venter and Andrew Stein
|An introductory definition of image analytics is a transformation from images and videos to analytically prepared data. For the purpose of this introduction to image analytics, we define an image as the rendering of a still (non-moving) scene and a video as the rendering of a scene containing a still or panning background segment and moving foreground segments. Note that by implication, a video is also a sequence of images (also called a sequence of frames).
More specifically, the objective of image analytics is to bring an unstructured rendition of reality in the form of images and videos into a machine analyzable representation of a set of variables. A variable is represented by a series of values related to an entity (such as sales, Peter’s emotions, customer sentiment, etc.). Each such value is time stamped, making it possible to treat a variable as a time series.
In computer science or engineering, the detection of objects, faces, movement and so on in images has many labels including image processing or computer vision. A deep discussion on this history and approach is beyond this article. This article will cover the basics; for ambitious readers, we suggest “ Computer Vision Central .”
Specific transformations are used for image analytics. Figure 1 frames the steps followed by an image analytics system in transforming images and videos to analytically prepare a dataset (a set of time series, one per variable).
At this level, image analytics continues to be a set of transformations on image-input that add value and create a rich set of time series as analytically prepared data output. The first transformation step segments images into structured elements and prepares them for feature extraction – i.e., the identification of low-level features in the image. The second transformation step is the detection of relationships between these features, variables and time. The third transformation step is the extraction of variables with time-stamped values.
Segmentation and Feature Extraction
Images and videos are segmented using algorithms and digital processing techniques known as image segmentation. Segments are spatially relevant regions of image or video scenes that have a common set of features. These can be color distributions, intensity levels, texture, moving and stationary parts of a video scene and other criteria.
There are numerous published image segmentation algorithms, each with a specific purpose and deep technical application. These techniques process a gray scale or color version of an image to identify edges, boundaries, regions, movement and many other important criteria. Popular image segmentation algorithms include:
Feature extraction is next in the process. To assist in the detection of higher-level characteristics, low-level features are extracted and stored with each instance. Vast research in this domain has culminated in many algorithms in the following categories:
These examples and an entire library of image segmentation and feature extraction algorithms have been credentialed at the Computer Science Department of the University of California Berkeley as a benchmark.
Relationships among variables, features and time
To detect relationships between variables, features and time, an artificial intelligence sub-discipline known as Machine Learning is combined with Applied Statistics to create the relationship “intelligence” that is the core of the image analytics process. The relationships among variables, features and time in image analytics is represented as a predictive model. Before a predictive model can be created, a set of instances are extracted from all the given images and/or all the given videos being analyzed.
An instance is:
that is atomic with respect to the granularity of the respective image analytics domain. This definition and settings defining the boundaries of instances form the input to the algorithm that extracts all instances from all given images and/or videos.
From a predictive modeling point of view, three sub-sets of all extracted instances are of interest:
A machine learning or statistical modeling algorithm trains a predictive model based on the set of annotated training instances. Modeling algorithms are based on known techniques including neural networks, scalable vector machines, function learning, Bayesian networks, regression and many more. Test instances are used to calculate the accuracy of a predictive model created by a modeling algorithm. The training process is often repeated with different sets of training and test instances and/or algorithm parameters until the accuracy of the predictive model is at an acceptable level. After the predictive model has been trained, it is used to classify predicted instances in a process described in more detail below.
The purpose of the annotated training instances is to establish an association among low-level image/video features extracted from instances, variable entities, variables values and time. A human or automatic/algorithmic supervisor can perform the task of annotation of training instances. The supervisor adds the name of a variable to the list of variables contained in the annotation of the instance if the entity representing the variable is matched positively on the instance. Such a positive match of an entity representing a variable on an instance occurs when a pattern of low-level features (also called a feature vector) that characterizes the respective entity (also called an entity matching pattern) are matched (using a feature vector comparison function) to a set of features extracted from the instance (refer to feature extraction above).
For example, if the entity associated with the variable “Emotion_of_Peter” is Peter, then a user can manually annotate all training instances containing Peter’s face with the entity “Peter.” As another example, an annotation algorithm can also match an entity-matching pattern containing a number of low-level features, such as the shape of a human face characterizing Peter’s face and a color histogram containing large bins of red skin color (because Peter’s complexion is red) on all training instances to automatically annotate the training instances that contain Peter’s face.
In a next annotation step, the values of variables listed in the annotation of each training instance are detected in the respective instance using an approach similar to that of entity matching described above. For example, the supervisor can give all training instances annotated with the variable “Emotion_of_Peter” that contain a happy facial expression the categorical variable value “Happy” or “Angry” in the case of an angry face. Such a value can also be automatically detected by an algorithm based on a pattern of low-level features associated with the value of a variable called a value matching pattern. For example, a value-matching pattern may describe a certain number of spectators wearing the color orange in an image of a football stadium based on a number of features such as:
Finally, training instances are time stamped for time series analysis purposes. We annotate training instances associated with a certain time value with the string or numerical representation of the respective time value. For example, assuming we are using annual quarters for the time granularity, then we can annotate training instances containing snow and winter scenes as “Q1”, instances containing spring blossoms and flowers as Q2, instances containing a lot of summer green and blue skies as “Q3,” and instances containing red, brown and orange fall foliage as “Q4”and so on. A time annotation algorithm uses a time matching value pattern associated with every discrete time value or time value range to annotate the training instances automatically.
The image analytics system now understands how to analyze a given set of input images and/or videos based on the predictive models created in the prior step. The final step of the image analytics process is to create the analytically prepared data. This final output is created by using these predictive models to predict a time series of variable values for every variable from the remaining set of instances called predicted instances.
Fritz Venter ( firstname.lastname@example.org ) is the director of technology at AYATA, a prescriptive analytics software company headquartered in Austin, Texas. Andrew Stein ( email@example.com ) is the chief advisor at the Pervasive Strategy Group located near Chicago.