# Five-Minute Analyst: Voter fraud

## No, it’s not about the presidential election. It’s about a model car contest for kids.

### By Harrison Schramm

This article is a true story about detecting voting fraud in a charitable auction, using no tools save a pencil, paper and smart phone. The setup is as follows: A group of kids have entered model cars into a contest where they are voted on by the other contestants. Each of the models had an attribute to be voted on, such as color, creativity, dangerousness, etc. Each participant is given a strip of 10 tickets and told that they could vote for “one car per category.” When I arrived at the event, I was asked to tally the votes. The organizer, having no idea what he was about to unleash on the problem, assured me that “my judgment was absolute,” with a telling wink that said, “expect foolishness” -– but did not elaborate.

As I started tallying up the votes by hand (40 participants x 10 tickets each = 400 tickets total), I realized that some of the votes were off … that there were way more tickets for some of the cars than there should have been. But how could I adequately prove (to myself) that there was cheating going on?

T-37 Cockpit: The clock is on the upper left of the center instrument console.

### Pilots and Statisticians

Before all this happened, I was a Navy pilot. I took my first round of training flying the T-37 “Tweety Bird”’ at 37th Flying Training Squadron, Vance AFB, in Enid, Okla. I had a former fighter pilot instructor who was very fond of saying: “If you don’t know what to do, wind the clock, and by the time you’re finished, something useful will probably come to you.”

Years later, teaching and then practicing statistics, I had a similar mantra for my students: “If you are working a problem, and you don’t know what to do, you should compute the ‘marginal [1] distributions [2],’ and by the time you are finished, something useful may come to you.

Figure 1: Votes per category: something strange going on.

In this instance, I took my own advice, and computed the marginals by hand (see Figure 1; they are by computer, but I assure you it was the same process).

Now, because we know that there were 40 participants with 10 tickets each, we infer that `(C) = 40. Once the marginals {1} are computed, it’s an easy exercise to compute the variance by hand using the definition, Var[C] = E[C2] – E[C]2, which can easily be done using our smart phone. We arrive at a standard deviation, s = 9.5. A purely statistical approach would be to be suspect of any car that received more than +2s.

Figure 2: Plot of the distribution of votes for “strangest car.”

This approach is, of course, wrong.

Let’s recall the original question, which is to find where participants had pathologically voted for themselves. Because each participant only has 10 tickets, looking for 19 extra tickets would imply that more than one kid colluded, a very unlikely scenario. We identify “strangest” and “fastest looking” as potential areas for cheating, with “strangest” being the most interesting. Figure 2 shows a plot of the distribution of votes for “strangest car” (solid line indicates the average number of votes for in this category.)

Figure 3: An abnormally high number of votes for “strange” car No. 17.

Interestingly, while car No. 17 received attention for having the most votes in a single category, it did not have the most votes overall. In fact, car No. 17 did not receive more than an average number of votes.

By using the conditional distribution of votes for “strangest,” we see that car No. 17 has an abnormally high number of votes.

Do I think that car No. 17 received an abnormally high number of votes from one source? I’ll let you be the judge.

### An Interesting Observation

Votes for car No. 17: Note the ticket serial numbers.

You will notice that the number of votes increases by car number. The participants were handed tickets at the entry of the judging line, which is in front of car No. 1, and as they neared the end of the line, found themselves voting for the “later” cars. Interestingly, as the saying goes, a rising tide floats all boats, and being later in the judging did not affect the distribution of prizes.

A final thought: This type of fraud was easy to catch because it was poorly executed. Had the owner of car No. 17 had a more moderate strategy, such as only “stuffing” the box by five votes, he may have won and his fraud gone undetected. This type of padding can be detected by statistical methods, but not ones that are likely to be employed by hand on a Sunday afternoon.

Harrison Schramm (Harrison.schramm@gmail.com), CAP, PStat, is a principal operations research analyst at CANA Advisors, LLC, and a member of INFORMS.

Save

Save

## Related Posts

• 45
January/February Cybersecurity: new threats, new solutions The IOT and related, hidden security risks Can analytics save U.S. healthcare system? March/April Supply chain advances and solutions Software survey: vehicle routing Capitalizing on AI & machine learning May/June Social media, marketing & analytics Real-time customer personalization Next generation revenue management July/August Software…
Tags: fraud, analysis, save

### Web-First

#### Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a \$320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

#### Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

#### Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →

### UPCOMING ANALYTICS EVENTS

INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.

OTHER EVENTS