Share with your friends


Analytics Magazine

Five-Minute Analyst: Rainfall and reference years

Harrison SchrammBy Harrison Schramm

This installment comes from a discussion I’ve been having with longtime friend and U.S. Naval Academy classmate Cara Albright. Her problem revolves around determining the “most representative” year of precipitation (rain) data from a large set. The original question – how to incorporate data from years that include “leaps” (i.e., Feb. 29) – started us down an interesting path. This is a fun story about collaboration and thinking about problems.

To make this concrete, consider a graph of two separate year’s raw rainfall data (Figure 1). From this plot, it is unclear what the best method for measuring the “distance” between these two years would be.

Figure 1: Graph of two separate year’s raw rainfall data.

Figure 1: Graph of two separate year’s raw rainfall data.

One current approach to this problem is to measure the similarity of the years “pointwise.” Now, those of us who have been alive for a few years (or have seen “The Pirates of Penzance”) know that not every year is the same; most years have 365 days, but a quarter of years have 366. The approaches to dealing with the problematic Feb. 29 are:

  1. Ignore it, thus throwing away ~.3 percent of the data.
  2. Lump it in with March 1.

Neither of these are particularly satisfactory to us. Instead of trying to measure the distance pointwise – which is highly sensitive to “breakpoints” – hourly and daily, we propose to measure the difference between cumulative precipitation, normalized to 365 days (and thus overcoming the leap year problem).

To measure the difference between years, we follow a simple process of re-normalizing the data to a 365-day “standard year.” We then sum the squared differences between the two years. For those who prefer math over words, we do this:

simple process of re-normalizing the data to a 365-day “standard year”

The year with the minimum distance, as determined by the minimum (summed) distance over all other years, is the “reference” year.


Our current data set consists of 100 years of rainfall data from Philadelphia, as shown in Figure 2. We determine the “representative year” starting in 1965 to the year chosen; in other words, the 1993 point is 1989-1993, 2000 is 1989-2000 and so on. Using this “moving right-hand reference approach,” we see the years chosen as depicted in Figure 3.

Figure 2: Cumulative rainfall over 25 years.

Figure 2: Cumulative rainfall over 25 years.

Figure 3: 1973 chosen as the most frequently representative year.

Figure 3: 1973 chosen as the most frequently representative year.

With 1973 chosen as the most frequently representative year, based on minimum distance and a normalized year length. One “might” argue, as we thought, that this approach tends to favor years that have the total rainfall that is closest to average. To overcome this minor difficulty, we may simply normalize the rainfall over the year as well, scaling the total for the year to 1. This “variance only” approach produces the graph shown in Figure 4.

Figure 4: Variance only approach.

Figure 4: Variance only approach.

Which tends to favor 1956 and, later, 1991 as representative years. A plot of these two candidates is shown in Figure 5:

Figure 5: Plot of two candidates.

Figure 5: Plot of two candidates.

In conclusion, we have applied a few more than five minutes worth of analysis this installment. What is more important than the results is that the basic ideas of calculus and statistics, which we don’t always use every day in practice, continue to echo in practice far beyond our basic schooling.

Technical note: This analysis made ample use of the R base function approxfun(), which interpolates between values of a given empirical data set. This made numerical integration quite straightforward.

Harrison Schramm (, CAP, PStat, is a principal operations research analyst at CANA Advisors, LLC, and a member of INFORMS.

Analytics data science news articles

Related Posts

  • 58
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 50
    It’s long been popular to talk about customer interaction data such as clickstream, social activity, inbound email and call center verbatims as “unstructured data.” Wikipedia says of the term that it “…refers to information that either does not have a pre-defined data model or is not organized in a pre-defined…
    Tags: data
  • 42
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data
  • 41
    Many organizations have noticed that the data they own and how they use it can make them different than others to innovate, to compete better and to stay in business. That’s why organizations try to collect and process as much data as possible, transform it into meaningful information with data-driven…
    Tags: data
  • 41
    The CUNY School of Professional Studies is offering a new online master of science degree in data analytics. The program prepares its graduates for high-demand and fast-growing careers as data analysts, data specialists, business intelligence analysts, information analysts and data engineers in such fields as business, operations, marketing, social media,…
    Tags: data

Analytics Blog

Electoral College put to the math test

With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.


Gaining distribution in small retail formats brings big payoffs

Small retail formats with limited assortments such as Save-A-Lot and Aldi and neighborhood stores like Target Express have been growing in popularity in the United States and around the world. For brands, the limited assortments mean greater competition for shelf-space, raising the question of whether it is worth expending marketing effort and slotting allowances to get on to their shelves. According to a forthcoming study in a leading INFORMS scholarly marketing journal, Marketing Science, the answer is “yes.” Read more →

Cognitive computing a disruptive force, but are CMOs ready?

While marketing and sales professionals increasingly find themselves drowning in data, a new IBM study finds that 64 percent of surveyed CMOs and sales leaders believe their industries will be ready to adopt cognitive technologies in the next three years. However, despite this stated readiness, the study finds that only 24 percent of those surveyed believe they have strategy in place to implement these technologies today. Read more →

How weather can impact consumer purchase response to mobile ads

Among the many factors that impact digital marketing and online advertising strategy, a new study in the INFORMS journal Marketing Science provides insight to a growing trend among firms and big brands: weather-based advertising. According to the study, certain weather conditions are more amenable for consumer responses to mobile marketing efforts, while the tone of the ad content can either help or hurt such response depending on the current local weather. Read more →



Essential Practice Skills for High-Impact Analytics Projects
Sept. 26-27, Executive Conference Center, Arlington, Va.

Foundations of Modern Predictive Analytics
Oct. 2-3, VT Executive Briefing Center, Arlington, Va.

2017 INFORMS Annual Meeting
October 22-25, 2017, Houston

2017 Winter Simulation Conference (WSC 2017)
Dec. 3-6, 2017, Las Vegas


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to