Share with your friends


Analytics Magazine

How big data is changing the oil & gas industry

November/December 2012

The advent of the “digital oil field” helps produce cost-effective energy while addressing safety and environmental concerns.

Finding and producing hydrocarbons is technically challenging and economically risky.
Finding and producing hydrocarbons is technically challenging and economically risky.

Adam FarrisBy Adam Farris

Everyone needs it, few know how we get it, and many feel compelled to slow down efforts to finding and producing oil. One of the primary assets of successful, thriving societies is a low-cost energy source. What drives low cost? Supply greater than demand! What drives supply? Finding supplies in sufficient quantities so producing oil and gas is economically viable. Finding and producing hydrocarbons is technically challenging and economically risky. The process generates a large amount of data, and the industry needs new technologies and approaches to integrate and interpret this data to drive faster and more accurate decisions. Doing so will lead to safely finding new resources, increasing recovery rates and reducing environmental impacts.

The term “big data” has historically been regarded by the oil and gas industry as a term used by “softer” industries to track people’s behaviors, buying tendencies, sentiments, etc. However, the concept of “big data” – defined as increasing volume, variety and velocity of data – is quite familiar to the oil and gas industry.

The processes and decisions related to oil and natural gas exploration, development and production generate large amounts of data. The data volume grows daily. With new data acquisition, processing and storage solutions – and the development of new devices to track a wider array of reservoir, machinery and personnel performance – today’s total data is predicted to double in the next two years.

Many types of captured data are used to create models and images of the Earth’s structure and layers 5,000-35,000 feet below the surface and to describe activities around the wells themselves, such as machinery performance, oil flow rates and pressures. With approximately one million wells currently producing oil and/or gas in the United States alone, and many more gauges monitoring performance, this dataset is growing daily.

The oil industry recognizes that great power and imminent breakthroughs can be found in this data by using it in smarter, faster ways. However, resistance regarding workflows and analysis approaches remains in place, as it has for the last 30 years. How does the industry bridge the vocabulary and cultural gap between data scientists and technical petroleum professionals? Ideas, applications and solutions generated outside the oil and gas industry rarely find their way inside. Other industries seem to have bridged this gap, but in talking to experts in the broader technology industry, the oil industry is seen as a “no man’s land” for new-age entrepreneurs, while major technology providers spend billions trying to enter it (e.g., GE, IBM and Microsoft).

Breaking into the oil and gas industry is difficult for analysts, but the need and potential for reward are great. Nine of the top 10 organizations in Fortune’s Global 500 are oil and gas companies. More than 20,000 companies are associated with the oil business, and almost all of them need data analytics and integrated technology throughout the oil and gas lifecycle.

Throughout the 1990s, the oil and gas industry focused on data integration, i.e., How do we get all the data in one place and make it available to the geo-scientists and engineers working to find and produce hydrocarbons? Since the turn of the century, technology development has mainly focused on software that integrates across the major disciplines to speed up old workflows. The industry has had many amazing technical professionals, but the idea of a “data scientist” is new, and should be considered alongside the petrophysical, geophysical and engineering scientists. The next decade must focus on ways to use of all of the data the industry generates to automate simple decisions and guide harder ones, ultimately reducing the risk and resulting in finding and producing more oil and gas with less environmental impact.

Figure 1: Simple seismic acquisition diagram (left) and a processed, interpreted 3D seismic Earth model (right).
Figure 1: Simple seismic acquisition diagram (left) and a processed, interpreted 3D seismic Earth model (right).

Technically Complex, High Risk

Despite its astronomical revenues, the profit margin of the oil and gas majors is 8 percent to 9 percent. Finding and developing oil and gas while reducing the safety risk and environmental impact is difficult. The layers of hydrocarbon-bearing rock are deep below the Earth’s surface, with much of the world’s hydrocarbons locked in hard-to-reach places, such as in deep water or areas with difficult geopolitics.

Oil is not found in big, cavernous pools in the ground. It resides in layers of rock, stored in the tiny pores between the grains of rock. Much of the rock containing oil is tighter than the surface on which your computer currently sits. Further, oil is found in areas that have structurally “trapped” the oil and gas – there is no way out. Without a structural trap, oil and gas commonly “migrates” throughout the rock, resulting in lower pressures and uneconomic deposits. All of the geological components play an important role; in drilling wells, all components are technically challenging.

Following are three big oil industry problems that consume money and produce data:

1. Oil is hard to find. Reservoirs are generally 5,000 to 35,000 feet below the Earth’s surface. Low-resolution imaging and expensive well logs (after the wells are drilled) are the only options for finding and describing the reservoirs. Rock is complex for fluids to move through to the wellbore, and the fluids themselves are complex and have many different physical properties.

2. Oil is expensive to produce. The large amount science, machinery and manpower required to produce a barrel of oil must be done profitably, taking into account cost, quantity and market availability.

3. Drilling for oil presents potential environmental and human safety concerns that must be addressed.

Finding and producing oil involves many specialized scientific domains (i.e., geophysics, geology and engineering), each solving important parts of the equation. When combined, these components describe a localized system containing hydrocarbons. Each localized system (reservoir) has a unique recipe for getting the most out of the ground profitably and safely.

Finding It

To locate oil, geologists and petrophysicists use data that indicates the type of rock in nearby wells, as well as seismic data, produced by sending sound waves deep into the subsurface, which bounce back to receivers. The data collected from wells has higher resolution but is accurate for only a small area (10 feet) around the well, so data interpretation techniques are used to extrapolate between wells. This requires scientists to insert considerable “interpretation” into the analysis. Geophysics is the study of the Earth, but in the oil business, geophysicists focus primarily on seismic data. They use this data to create a subsurface picture, but the resolution is several hundred meters at best. Over time, many breakthroughs in finding new deposits of oil and gas have occurred through the combination of geology, petrophysics and geophysics, ultimately developing better models of the Earth’s subsurface, but this area still has the most uncertainty. Find a way to paint a clearer, more accurate image of the subsurface and you’ve found the Holy Grail of the oil and gas industry.

To date, 3D seismic data has been the industry’s most impactful scientific breakthrough. This data vastly improves the picture of the Earth’s subsurface and removes the need to drill a multi-million dollar hole, with very little data, to “explore” what is in the rock. Seismology, rightfully so, has received the most research attention (billions of dollars yearly), trying to better tune data acquisition and processing, in an effort to get a clearer image.

R&D spending in geophysics centers around four main categories – acquisition, processing, interpretation and hardware optimization – all rich in the three Vs of big data (volume, variety and velocity). One raw seismic dataset is usually in the hundreds of gigabytes, resulting in terabytes once the numerous and expensive processing and interpretations are finished. These processing algorithms calculate many billions of data points with each run, and hundreds of these runs occur globally every day, all for one goal – create a clear, accurate picture of the Earth’s subsurface and identify all of the major components of the hydrocarbon systems.

With the lower seismic data resolution, data from other existing nearby wells is used to enhance the overall Earth picture. Well log data is captured on every well and is interpreted to generate specific information about the rock that was cut and the fluids that exist. While the resolution of this data is one to two feet, it has limited accuracy as mentioned earlier. The problem: lots of data, different scales, different types, with a critical need to get the most clarity possible through integration.

Finally, while much time is spent on using captured data to make the subsurface clear, equivalent time is spent on acquisition machinery and techniques. Seismic acqusition crews, well-logging crews and other services deploy machinery on and in the Earth, taking readings that are processed – often manually – to produce a best possible picture used for planning, locating and drilling wells, and as the guide for economic planning and reporting.

Figure 2: Raw well-log (left) and processed well log images showing the rock type (right).
Figure 2: Raw well-log (left) and processed well log images showing the rock type (right).

Producing It

Producing oil and gas involves drilling and completing the wells, connecting them to pipelines and then keeping the flow of the hydrocarbons at an optimum rate, all integrally related to the subsurface environment. The path to optimizing production is dependent on the type of rock and structure of the reservoir.  These decisions depend heavily on models created in the exploration phase described earlier.

Today, every well that’s drilled uses extensive machinery, measurement devices and people – all of which produce video, image and structured data. This area is probably the fastest growing in terms of the volume, variety and velocity of data being captured. Improving drilling and completion operations can significantly reduce costs. For reference, the widely publicized shale oil wells typically cost between $7 million and $10 million each. Offshore wells can cost tens or hundreds of millions of dollars. The cost goes up as the seafloor and the reservoirs deepen – requiring more technology to do it safely and successfully.

The average finding and development (F&D) cost of operators is between $7 and $15 a barrel (the wide range dependent on geographies and geologies). If there is enough oil to make the economics work, they will proceed; if there are smaller pockets of oil, development costs must be lower in order for the economics to make sense.

Some simple math: If oil companies found a one million barrel oil reservoir (at a price of $100 per barrel), that seems like good money, right? $100 million! However, to find it, you have to shell out up to $30 million for the acquisition, processing and interpretation of the seismic data. The operators must then produce it, and the cost of land, drilling and getting oil to market is significant. Land can cost as much as $30,000 per acre for access and often requires 120 acres for one well. Typical deals involve thousands of acres, plus drilling costs of between $5 million and $10 million for U.S. onshore wells and up to $100 million for offshore drilling. If you are a major integrated oil and gas company, your profit on $100 million will be $1 million to $12 million (a bit higher for independent operators). Many will lose money overall. Analytical approaches that impact the success rate of finding or reducing the cost to develop and produce oil and gas can make energy more affordable, safer and environmentally conscious.

Figure 3: Simple well diagram (left) and an onshore drilling rig in south Texas (right).
Figure 3: Simple well diagram (left) and an onshore drilling rig in south Texas (right).


The integration and mining of data produced in the hydrocarbon finding and producing process offers amazing potential for answering some of the big questions facing the oil industry. The biggest question: Where is more oil? The next biggest question: How do we get substantially more out of the ground safely, with minimal environmental impact? The less sexy, but possibly more relevant question is: How do we use data that has such potential to unlock these answers?

The oil and gas industry can learn much from the data, yet it is generally used the same way it was historically used. The industry must look at broader areas and functioning of individual components to create different views and perspectives. For example, Drillinginfo, a leading data and intelligence provider of upstream data for oil and gas decisions, has begun to break the barriers of geography and discipline to consider many potential variables and, based on thousands of wells, create a statistically predictive model for a given area’s producibility.

Most analysis in oil and gas is done within technical disciplines and within a relatively small geographical study area. This would be easier if the Earth’s properties were not so variable. Depositional systems vary greatly based on rock types and the different layers caused by ancient rivers, mountains, plains and deserts. How do we learn between reservoirs when all of the drilling environments are seemingly different systems? Typically, the data is gathered and stored in different databases, file cabinets and various geoscientists’ and engineers’ desks. Grains of similar rock behave the same everywhere else on Earth. While they are never laid down the exact same way, the lessons learned in one area could be extrapolated or applied to another area. Today, this process is very manual and labor intensive.

Data science will help the oil and gas industry learn more about each subsystem and inject more accuracy and confidence in every decision, ultimately reducing risk. Big data analytics will be key. While the concept is still in its infancy as far as the oil and gas industry is concerned, here are some possible near-term big data analytical solutions:

  • Integration over a wide variety of large data volumes – incorporating all relevant information for finding additional hydrocarbons, and identifying the data and the best known technologies to produce it – for that particular system (the recipe).
  • Make daily operational data relevant to reduce operating costs and improve recovery rate.
  • Decision management – take into consideration all “knowns” and local conditions and quickly identify if or how to proceed.

In the oil and gas industry, all data is critical, but not all pieces are critical for every decision. So how do we break down every decision, identify every potential piece of contributing data and quickly sift through to a decision?

Other industries are embracing big data analytics, but the oil and gas industry is just now getting the concept. The oil and gas industry has dealt with big volume, variety and velocity, but must start thinking beyond self-made boundaries to truly capture the benefit awaiting.

In the past decade, oil and gas technology focused on the “finding” half of the equation and benefitted from imaging breakthroughs in the healthcare industry. The magnetic resonance imaging (MRI) technology that doctors use to see inside of humans without a knife has also proved useful to see into the rock from inside the wellbore – identifying useful rock and fluid properties. For the “producing,” side of the equation, the digital oil well, digital oil field and data formatting standards have led to operators putting gauges and data gathering devices on everything possible in the field. In the continuum of descriptive, predictive and prescriptive analytics, the oil and gas industry is just learning how to use this data to make decisions. The descriptive portion (per device) is being embraced. Capturing this data allows very large fields to be managed from a “central command center” rather than by people physically checking every well. In a field with thousands of wells, where one person once checked five to 10 wells per day, personnel costs and the cost of optimizing production have been vastly reduced while production has increased. One of the first implementations of the digital oilfield was Occidental Petroleum’s Elk Hills field in Bakersfield, Calif.

A decade ago, the term “digital oilfield” meant installing digital gauges and transmitting devices for production rates and pressures (instead of manual readings). Field personnel could target wells that were down or having problems. In the next stage, companies displayed this data in one room and then on the same big computer screen as an “earthmodel” built by the geoscientists and engineers. Today’s stage is a rather linear extension from this hardware-centric integration solution, with dashboards and software making digital monitoring and operations more effective, but more can be done.

Soon we will not just capture data and view it, which still requires experienced personnel to make a large number of decisions. We will have smarter solutions, with built-in intelligence, so computers can make simple decisions, while indicating a set of potential outcomes to the user in more difficult situations, helping with faster decisions based on best practices. Ultimately, costs for these operations will be cut and production will go up.

Soon we will have automated interpretation systems that learn while the user interprets seismic data, and we’ll begin completing more of the interpretations, as they improve in understanding the user’s selection methods. First versions of this concept exist today. The next stage of development must focus on merging this knowledge with other data types to identify what is productive and what is not. Further, the economics, necessary machinery, personnel and environment risks should be considered as the seismic work is done. This is possible! We have the data! We have the analytical expertise. They’ve just never been introduced to each other. The bottom line: The oil and gas industry is ripe for big data analytics, whether through new software, new middleware, data handling solutions or data manipulation tools. The primary focus areas should be finding it, producing it and operations.

Conclusion: It Must Happen

While we have plenty of data and challenges to undertake, how do we bridge the gap between the two? One way the oil industry tries is through venture capital funds. Both Shell and Chevron have their own such funds to create an outlet to explore new ideas. While not enough, the majors have begun to embrace analytics and others are finding ways to follow.

The oil and gas industry need more cross-fertilization. As oil and gas companies awake to the potential of analytics, many jobs will be created for data scientists, opening a portal for new applications and ideas to enter the industry.

Many technology providers exist in the industry. The successful ones from the past decade must embrace big data analytics to succeed in the future. This is challenging enough, but it will also require a mindset change. Few are poised to do so, but such companies will have the strategic data and intelligence to train new applications to be smarter and provide more complete solutions than stranded static data in empty point software products.

The oil and gas industry has an opportunity to capitalize on “big data” analytics solutions. Now the oil and gas industry must educate “big data” on the types of data the industry captures in order to utilize the existing data in faster, smarter ways that focus on helping find and produce more hydrocarbons, at lower costs in economically sound and environmentally friendly ways.

Adam Farris ( is senior vice president of business development for Drillinginfo. Based in Austin, Texas, Drillinginfo is the second largest oil and gas data and analysis marketplace in the country. Farris has 15 years of management and engineering experience in the oil and gas industry. He’s a member of Society of Petroleum Engineers and the Society of Exploration Geophysics.

business analytics news and articles

Related Posts

  • 53
    March/April 2014 By Atanu Basu The United States is re-emerging as an energy superpower. According to the International Energy Agency, by 2016 the U.S. will surpass Saudi Arabia and become the world’s largest oil producer. The domestic energy industry’s recent rise is the result of lower demand through energy efficiency…
    Tags: oil, data, gas, energy, big
  • 51
    November/December 2014 The marriage of two “natural resources” – hydrocarbons and data – will transform unconventional oil development. By Atanu Basu (right), Daniel Mohan and Marc Marshall Known-knowns, known-unknowns and unknown-unknowns. Donald Rumsfeld’s notable turn of phrase is an apt characterization of where we are with unconventional oil development today.…
    Tags: data, oil, energy, gas
  • 49
    July/August 2013 By Warren Wilson Throughout the century-and-a-half since the dawn of the commercial petroleum industry, oilmen have always hoped for the gusher – the big find that would spew enough oil to make them rich. There have always been far more “dry holes” than gushers, however, and the proportion…
    Tags: oil, data, industry, gas
  • 40
    Many organizations have noticed that the data they own and how they use it can make them different than others to innovate, to compete better and to stay in business. That’s why organizations try to collect and process as much data as possible, transform it into meaningful information with data-driven…
    Tags: data, big
  • 39
    International Data Corporation (IDC) recently released a worldwide Big Data technology and services forecast showing the market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015. This represents a compound annual growth rate (CAGR) of 40 percent or about seven times that of the overall…
    Tags: data, big


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to