Share with your friends


Analytics Magazine

Corporate profile: Data science at Monsanto

Data-driven analytics in agriculture delivers benefits for farmers, consumers and the planet.

By Shrikant Jarugumilli

Monsanto breeding lab: St. Louis, Mo., facility. Photos courtesy of Monsanto.

Monsanto breeding lab: St. Louis, Mo., facility.
Photos courtesy of Monsanto.

The Food and Agriculture Organization (FAO) of the United Nations estimates that the global population will reach 9.1 billion by 2050 and that world food production will need to increase by 70 percent to ensure adequate food security. The FAO also estimates that in order to meet today’s needs, agriculture utilizes 70 percent of the planet’s available fresh water. These two realities mean that in the future we will have to increase food production while decreasing resource usage – including land and water. At Monsanto we are leveraging plant biotechnology, plant breeding and chemistry in conjunction with the power of data science to develop better agricultural products that optimize field production for farmers, increase the sustainably of global agriculture, and run our internal business operations more efficiently.

Founded in 1901 in St. Louis, Mo., Monsanto operates in 60 countries across six continents with more than 350 facilities, over 20,000 full-time employees and net sales of $14.6 billion in 2017 providing agricultural products. Monsanto’s mission is to develop integrated solutions enabling farmers to grow crops while using energy, water and land more efficiently.

Monsanto has invested heavily in technologies that are enabling pervasive application of data science across the organization. These investments include high-throughput sequencing technologies, sensors, mobile and cloud computing, and technical platforms for collaborative data science initiatives. In many ways, Monsanto has transformed itself from a “seeds and traits” company to a digital agriculture company and is operating with velocity, teaming, focus, reproducibility and manageability as the key drivers to achieve this transformation. Monsanto’s products include agricultural seeds in high-performing germplasms, biotechnology-derived traits that enhance plant productivity, crop protection and digital agriculture tools. Backed by the power of data science, these products are offered as integrated solutions to address the full spectrum of challenges a farmer must overcome to deliver a successful crop.

From R&D to Harvest: Journey of a Strong Crop

Our seed innovations begin in R&D, then are rigorously tested to ensure compliance with regulatory and industry standards, are manufactured, and then distributed by our supply chain before being sold to farmers. The foundation of successful seed products are the right genetics bred into elite germplasm for crop qualities such as good yield, strong stalks and roots, and disease tolerance. Often, these products are genetically modified with multiple biotechnology-derived traits to give them above- and below-ground pest protection, to make them resistant to one or more herbicides, or more tolerant to drought conditions. Finally, seeds may be coated with treatments that will help the plants succeed underground by giving them mildew or below-ground pest resistance, or to help their roots better utilize nutrients in the soil.

During the growing season, the farmer will be able to control yield-robbing weeds and pests by applying appropriate crop protection products. Throughout the growing season, Monsanto’s on-farm digital agriculture tools, commercialized by our subsidiary The Climate Corporation, guide the farmer to maximize yield using insights gained by the analysis of multiple data sets that help inform the more than 40 decisions for crop management that are made between planting and harvest. These include decisions about product planting (placement and density) and water and fertilizer use (what parts of the field and in what amount).

All of these offerings are based on understanding the needs of customers, which vary from field to field, and are powered by the application of data science across the company.

Machine Learning Drives Plant Breeding Transformation

We have applied machine-learning algorithms to transform our plant breeding pipeline by leveraging 15 years of data from a corn R&D pipeline to move early product testing from the field to the lab, which is helping our researchers more precisely predict how thousands of seeds will perform in the field. This approach enables us to evaluate roughly five times more corn varieties without having to plant them in fields each testing season and accelerates the delivery of new products by a year. In addition to the development of a machine-learning model, this effort required creation of entire systems of analytics to optimize genotyping lab throughput, ensure data quality, impute full genome genetic information from partial observations and guide advancement decisions based on the genotypic data.

Genetic data drives many critical decisions that advance products through Monsanto’s breeding pipelines. A key example is genome-wide selection (GWS), which provides performance predictions based on genetics. Executing a GWS model requires observing (genotyping) a large number of genetic locations (markers) for each seed, which is cost prohibitive at the scale we require. Fortunately, knowledge about a seed’s ancestors and siblings can be used to reduce the number of direct observations needed. For a corn seed, we typically observe around a thousand genotypes directly. Using a method known as genotype imputation, we can commonly produce high-quality, quantified estimates for another 50,000 markers.

IoT Enables Global Field Testing Excellence

Spring planting: Monsanto’s Monmouth, Ill., facility. Photos courtesy of Monsanto.

Spring planting: Monsanto’s Monmouth, Ill., facility.
Photos courtesy of Monsanto.

Monsanto’s Internet of Things (IoT) platform collects data from sensors deployed in fields, on various equipment, and on machinery within processing facilities to provide a high volume and variety of data. Peak data collection for the IoT platform can hit hundreds of millions of data points per day providing invaluable information across thousands of field trials globally.

In addition to this high-velocity data, we have petabytes of genomic data for our seeds coupled with hundreds of millions of acres of associated environmental information to form a rich backbone for data modeling to determine which seed genetics will be most successful in each location. Data scientists apply predictive, prescriptive and cognitive analytical techniques to these data sets. These analytical techniques coupled with our elastic cloud computing capabilities are enabling the derivation of never-before-possible insights that are delivering tremendous value to our customers.

Smarter Supply Chain and Customer Experience

Monsanto leverages a variety of optimization and simulation techniques to increase the efficiency of operational activities and drive innovations and inventions throughout the company. These applications can be classified into two categories: planning for our facilities and planning for our field operations. Our facilities include laboratories, greenhouses, seed manufacturing and seed packaging units. For efficient utilization of these facilities, we have developed several optimization and simulation models for determining medium- and long-range plans. Typical optimization and simulation models support field operations (cover planting, in-season and harvest operations) that are tailored across the business functions of R&D and supply chain. These models need to be adapted for technological, regulatory and other constraints specific to regions.

It is often difficult to predict seed production yield of new products from their years of available research trial data because research trials and seed production are performed in different fields. The ability to classify environmental zones within fields and relate them across fields allows for accurate seed production yield estimates derived from research trials. The historical analysis of trial data has made it evident that yield performance depends on the interaction between the crop genotype and environmental factors such as topography, soil and climate conditions at the location where the crop is planted, that is, the zones within the field.

By applying advanced geospatial modeling and mathematical optimization in our supply chain, we can better understand, model and predict the impact that genetics, environmental conditions and agronomic practices have on crop yield. Understanding this relationship helps to mitigate product supply risk by reducing the variability between how much we need to produce and how much is produced by fields. Furthermore, such insights enable Monsanto to reduce its seed production footprint, supporting our ongoing commitment to operating as sustainably as possible.

This work has resulted in a significant increase in seed production yield just by planting a product in the best zones within our existing seed production network. The yield increase can be used either to reduce production acres or the level of uncertainty within existing production acres.

The supply chain optimization system drives an integrated, optimized solution through the Plan, Source, Make and Deliver functions. Past process improvement initiatives focused on improving the siloed processes that are required to deliver supply chain excellence. This effort led to very efficient groups within the overall supply chain system, but often sacrificed overall efficiencies by suboptimizing the overall supply chain.

The goal of the new system was to drive an integrated approach to customer satisfaction through supply chain excellence utilizing optimization, thereby allowing for the true integration of the processes, constraints and costs that challenge the business. This initiative required not only new systems to support the operations research implementation, but also a new business mindset. The understanding of the potential cost increases and process changes within one function, which appear suboptimal when viewed in a silo, needed to be understood as they relate to the overall efficiency of the end-to-end supply chain.

As key areas of the business underwent a procedural transformation and education effort to drive efficiencies, the optimization and systems groups were developing a global model for the supply chain process that could be easily adapted through data-driven functionality. The system provided a scalable and generic model for the businesses of various sizes and complexity to utilize data driven analytics.
We are also utilizing data science for customer relationship management (CRM). CRM applications range from understanding our customer needs, predicting behavior and purchasing patterns, optimal pricing strategies and sales force allocations.

Adapting a Data Science Mindset

Field data: Monsanto has developed a new scientific approach to analyze data layers in the billions. Photos courtesy of Monsanto.

Field data: Monsanto has developed a new scientific approach to analyze data layers in the billions. Photos courtesy of Monsanto.

Strong support from leadership and cross-functional collaboration among domain experts from business units, data scientists and IT teams are enabling these successful outcomes across the company. To accelerate transformation and to further strengthen collaboration, we have established company-wide data science best practices and are developing a training curriculum to attract, retain and develop top-notch data science talent.

With the advancement of digital tools, the responsibilities of our workforce are also evolving. We are preparing for an ag workforce that’s ready to leverage the full potential of data science in two ways: 1) driving young talent toward STEM careers in ag through education initiatives, and 2) expanding the skill sets of experienced ag professionals (e.g., specialists in plant breeding and biotechnology).

We see data science, AI and other digital tools as an enabling force for human innovation and productivity toward beneficial outcomes for people and the planet. Our commitment to people development, data stewardship and storage excellence, and collaboration with innovators across sectors, is driving the success of our digital transformation. Over the last three years Monsanto has matured as a data science company, and given the exciting discovery and development happening across technologies, shows no signs of slowing down. We anticipate that our continued growth as a company that operates with data science at its core won’t slow down either.

Shrikant Jarugumilli ( leads the Operations Research team within IT-Analytics, Products & Engineering at Monsanto. He is Monsanto’s representative to the INFORMS Roundtable. The author acknowledges input from Adrian Cartier, Seed Production & Customer Experience Data Science Lead; Barry Surber, Global Supply Chain Quantitative Modeling Technologist; Naveen Singla, Data Science Center of Excellence lead; and Nathan Vanderkraats, Genomics Data Science Lead.

Analytics data science news articles

Related Posts

  • 54
    I’ve spent the last few months working with the Human Rights Data Analysis Group. HRDAG has historically focused on unstable regions around the world, scientifically examining data captured by local institutions and grassroots activists to try to discern the truth about the volume and patterns of human rights violations [1].…
    Tags: data, science
  • 54
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 54
    Many organizations have noticed that the data they own and how they use it can make them different than others to innovate, to compete better and to stay in business. That’s why organizations try to collect and process as much data as possible, transform it into meaningful information with data-driven…
    Tags: data, science
  • 50
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data, science
  • 50
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to