Share with your friends










Submit

Analytics Magazine

Five-Minute Analyst: The force is strong with correspondence analysis

Analytics data science news articles
Harrison Schramm and Matt Powers

By Harrison Schramm and Matt Powers

“I am one with the data and the data is with me”
– Chirrut Imwe

This article is going to do two things I’ve never done before: first is to include a co-author, and second is to write about the same topic using (almost) the same data. To recap, in “The Force Awakens,” Kylo Ren fears that he will succumb to the light because he is not as dark as his hero, Darth Vader. We considered this problem in July 2016 using “Darkside Envelopment Analysis.” We repeat the data used as Table 1 (spoiler alert) slightly updated to reflect events of “Rogue One.”

Our previous work “shot first” by using data envelopment analysis implemented in MS Excel’s standard Simplex LP solver to maximize the ratio of “goods” to “bads” for each force practitioner’s achievements. To complete our training, we must unlearn, and move from mathematical optimization to correspondence analysis (CA), in this case wielding R package “ca,” an elegant weapon for a more civilized age. In this, we will create a biplot of achievements and failures, with Vader as the reference (Figure 1).

Figure 1: Correspondence analysis biplot featuring blue achievement/failure points and red force practitioner points. The black lines are Euclidean distances between non-Vader practitioners and Vader (red, near center). Increased distance implies increased dissimilarity. Ren’s Vader-distance (2.08) is the greatest of the non-Vader candidates.

Figure 1: Correspondence analysis biplot featuring blue achievement/failure points and red force practitioner points. The black lines are Euclidean distances between non-Vader practitioners and Vader (red, near center). Increased distance implies increased dissimilarity. Ren’s Vader-distance (2.08) is the greatest of the non-Vader candidates.

By this metric, Luke is the most Vader-like. It also suggests that Ren’s journey to the dark side is not yet complete. CA indicator score analysis of data separated into achievements and failures suggests that Vader is not necessarily the dark standard to which Ren should strive to achieve. There is another.

“Make ten lines of code feel like a hundred!”
– Cassian Andor

Achievements Vader Ren Luke Palpatine
Planet-sized objects destroyed 1 4 1 0
Force Choking
Lightening Lifting
5 2 1 2
Aerial Victories 3 0 4 0
Planets Conquered 2
Hoth, Cloud City
0 1 10
(Chancellor)
Failures Vader Ren Luke Palpatine
Major Stations Lost 2 1 1 1
Temper-tantrums 1 2 1 0
Computer Drives Unrecovered 2 1 0 0

Table 1: Achievements and failures contingency table of Vader, Ren, Luke and Palpatine.

These indicator scores are calculated in three steps:

  1. Transform data into a contingency table.
  2. Use R’s ca package to create biplot row/column coordinates.
  3. Perpendicularly project column points onto row point lines and measure point-intercept distances to/from segment endpoints using a custom Rscript that performs the calculations onto the coordinates made available from the ca package.

This problem has the interesting – and surprisingly common characteristic – that the data fields are not inherently ordinal. While we might all agree that “destroying a planet (if you’re a Sith) or Death Star (for Jedi) is really good and that losing a Death Star is really bad,” but how do aerial victories compare to force choking and/or lightning lifting? Aerial victories are achievable by half-witted, scruffy-looking nerf herders, while force choking can punish a disturbing lack of faith.

We can create a more nuanced analysis by considering the CA indicator score analysis of achievements with multiple perpendicular projections. We will start by calculating Vader’s achievement CA indicator score set (see Figure 2).

Figure 2: Vader’s projections onto all six possible achievement lines. The ratio of point intercept distances to achievement line distances combines with weight differences to compute an overall CA indicator score for each practitioner.

Figure 2: Vader’s projections onto all six possible achievement lines. The ratio of point intercept distances to achievement line distances combines with weight differences to compute an overall CA indicator score for each practitioner.

The general formula for calculating a single score S via projection onto line (i,j) is:

equation

  • where R is the intercept distance d* over projection space while weights wi and wj are the assigned achievement weights. Applying this to our previous data, we get Table 2. Table 3 compares three final indicator score calculation methods.
Achievement Score Failure Score
Vader 12.44 5.61
Luke 9.82 2.38
Ren 6.70 5.87
Palpatine 5.60 1.57

Table 2: Force practitioner CA achievement and failure scores, sorted by achievement scores.

Achievement/Failure Ratio Normalized Difference CA Score Difference
Luke 4.13 0.84 7.44
Palpatine 3.57 0.41 4.03
Vader 2.22 0.20 6.83
Ren 1.14 -0.84 0.83

Table 3: Force practitioner indicator score comparisons, sorted by achievement/failure ratios.

This analysis agrees broadly with our previous work, but introduces a different way to consider these types of data sets.

Harrison Schramm (Harrison.schramm@gmail.com), CAP, PStat, is a principal operations research analyst at CANA Advisors, LLC, and a member of INFORMS. Matt Powers is an operations research analyst working in the Tidewater, Va., area. In addition to Star Wars, his research interests focus on international cooperation.

A technical note: Exploratory factor analysis of failure loads the same latent variable onto unrecovered computer drives and major stations lost, thereby confirming the relationship between increased station vulnerability and computer drive security while adding quantitative context as to why many Bothans died (and others) to retrieve the information on those drives.

A personal note: In the coming year, I don’t plan to have any regular co-authors, but would like to start bringing in some of the many padwans I’ve met along the way. It is my sincerest hope that eventually the students will become the masters.

Analytics data science news articles

Save

Save

Save

Save

Save

Save

Save

Related Posts

  • 32
    FEATURES Welcome to ‘worksocial’ world By Samir Gulati New approach, technology blends data, process and collaboration for better, faster decision-making. How to pick a business partner By David Zakkam and Deepinder Singh Dhingra Ten things to consider when evaluating analytics and decision sciences partners. Big data, analytics and elections By…
    Tags: data, analysis, save, analyst, table
  • 31
    Content/Interactive Marketing Opportunities Analytics-Magazine.org can help you build a successful content marketing program or interactive lead generation program. Enhance your position as an industry thought leader and expert in the analytics profession by promoting the following content formats on Analytics-Magazine.org. Product Videos Software Demonstrations White Papers Case Studiesa Research Reports…
    Tags: analysis, data
  • 30
    Many organizations have noticed that the data they own and how they use it can make them different than others to innovate, to compete better and to stay in business. That’s why organizations try to collect and process as much data as possible, transform it into meaningful information with data-driven…
    Tags: data

Analytics Blog

Electoral College put to the math test


With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.

Headlines

Stereotypes hold back girls’ interest in STEM subjects

New research from Accenture reveals that young people in the United Kingdom and Ireland are most likely to associate a career in science and technology with “doing research” (52 percent), “working in a laboratory” (47 percent) and “wearing a white coat” (33 percent). The study found that girls are more likely to make these stereotypical associations than boys. Read more →

Gartner: Connected ‘things’ will jump 31 percent in 2017

Gartner, Inc. forecasts that 8.4 billion connected things will be in use worldwide in 2017, up 31 percent from 2016, and will reach 20.4 billion by 2020. Total spending on endpoints and services will reach almost $2 trillion in 2017. Regionally, China, North America and Western Europe are driving the use of connected things, and the three regions together will represent 67 percent of the overall Internet of Things (IoT) installed base in 2017. Read more →

U.S. News: Analytics jobs rank among the best

When it comes to the best business jobs, analytics- and operations research-oriented disciplines dominate the list, according to U.S. News & World Report’s rankings of the “2017 Best Jobs.” In order, the top five “best business jobs” listings include: 1. statistician
, 2. mathematician
, 3. financial advisor, 
4. actuary, and 
5. operations research analyst. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

CONFERENCES

2017 INFORMS Business Analytics Conference
April 2-4, 2017, Las Vegas

2017 INFORMS Healthcare Conference
July 26-28, 2017, Rotterdam, the Netherlands

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.