Share with your friends


Analytics Magazine

Crowdsourcing – Using the crowd: curated vs. unknown

Ben Christensen crowdsourcingBy Ben Christensen

You’ve heard from colleagues, from industry news, maybe even from personal experience of the success of crowdsourcing. With nearly half the world’s population online, it makes sense to tap into this tremendous resource to collect large quantities of data by breaking the collection down into micro-tasks that the enormous crowd of Internet users will complete for you at pennies per task. But you’ve probably also heard of cases of crowdsourcing gone wrong, cases like the British government agency that put the task of naming its new research vessel out to the crowd and ended up with Boaty McBoatface [1]. So you have a human annotation task – a data refinement or evaluation task that requires human input – and you’re trying to decide whether to take your chances with crowdsourcing. Where do you start? The first thing you need to know is that you have options; it’s not Boaty McBoatface or nothing.

“When should I use crowdsourcing, and when should I use a curated crowd?” This is the question anyone interested in staffing human annotation tasks should be asking, but many don’t because they don’t even know there are two different options. So let’s start there – defining the options. Assuming you need human annotation, for example for search relevance evaluation, there are two ways you can gather the necessary humans to do that work: 1) crowdsourcing, where the task is made available to a large crowd without any management beyond a very limited set of task instructions and possibly a simple screening test; or 2) curated crowds, where a smaller group is selected to complete the task accurately according to quality guidelines.

When to Use a Traditional Crowdsourcing Model

The power of crowdsourcing is in its numbers. You can accomplish a lot quickly because many hands make light work. A hundred thousand people can do quite a bit more than a hundred can. The cost is less because crowdsourcing typically pays only a few pennies per task. Most members of the crowd aren’t trying to make a living – they’re just trying to make a few extra bucks in their spare time. There’s usually little overhead involved in crowdsourcing because the crowd looks after itself. You put the task out there, and if it’s interesting enough and pays enough, the crowd will get it done.

This model works well for simple tasks that require little explanation and even less expertise. For example, you can ask the crowd to choose which of two images contains a dog, to tell you whether a business listing is or isn’t a restaurant, or to transcribe words from images [2] and have a reasonable amount of success. Opinion-based tasks where you are looking for a wide variety of responses are also well-suited for crowdsourcing: which image do you like better, what’s the best Italian restaurant in your town, or how would you word this request for a voice-activated personal assistant.

Crowdsourcing Challenges

With the advantages of traditional crowdsourcing come a few limitations. First, quality control is minimal. Without this, you must rely on clear instructions, automated understanding checks and high overlap to get data you can trust. Overlap is important because there will always be noise in the crowd – bad data that you have to identify and sift out – so you’ll likely pay for at least five members of the crowd to review each result. Some members of the crowd will try to game the system, using bots to do their work for them, so you’ll need to account for this with screening tests on top of your high overlap.

The second limitation is your lack of control over task completion. The crowd can get a lot done quickly, but it will only get your task done quickly if it wants to. This means if your task is more difficult or less exciting than the shiny new task another team is offering, then you’re going to have to incentivize the crowd with higher payment to work on your task. The crowd has made no commitment to you and may not be motivated to make a deadline.

Finally, if you’re looking for data in smaller markets, you may be out of luck – crowdsourcing is huge in the United States and a few other countries, but the same doesn’t hold true globally.

The power of crowdsourcing is in its numbers. Many hands make light work. Photo Courtesy of | Kheng Ho Toh

The power of crowdsourcing is in its numbers. Many hands make light work.
Photo Courtesy of | Kheng Ho Toh

The Alternative: Using a Curated Crowd

Curated crowds, on the other hand, are all about quality. With this solution, offered by a small number of specialized crowd data solution providers, you have a group of people who are dedicated, if not specifically to your task, then to similar tasks. People in curated crowds become experts in search relevance evaluation, social media evaluation or whatever type of human annotation tasks they work on. This is not simply a case of counting on their accumulated experience to ensure quality, although that experience does play a large role. The key to quality is constant checks and balances. They are held to quality metrics, receive quality feedback, and are removed from your task if they don’t deliver the required quality.

This means that you can use very little overlap, paying for each judgment only one to three times instead of five or more times, because you can trust the data each person delivers. Curated crowd providers also monitor productivity and throughput, ensuring that the crowd meets their weekly, daily and hourly commitments so that you have the data you need when you need it. And if you’ve chosen a good vendor, then the manager will also be an invaluable resource, leveraging years of experience to partner with you in building out tasks and guidelines based on your needs.

With this higher level of quality and productivity management comes a cost. Curated crowds cost more than crowdsourcing because this work is typically a primary source of income. You also pay for the quality oversight that you don’t have in crowdsourcing. Keep in mind, though, that lower overlap mitigates these costs because you aren’t paying for each collected data point multiple times. Apart from the financial cost, curated crowds also require more of a commitment from you in exchange for the greater commitment you get. The curated crowd will be happiest and will keep their skills sharpest when you provide work for them consistently.

That said, if the natural ebb and flow of your need for human annotation necessitates more flexibility, there are alternatives, such as sharing a flexible curated crowd with other teams running similar tasks.

Use Cases: Search and Social

Major worldwide search engine providers have been using curated crowd solutions for years. Curated crowds are used for search relevance evaluation, local search result validation, query classification, spam identification and countless other tasks that require more attention than what traditional crowdsourcing provides. By using this model, these search engine providers gather high-quality data they can trust to accurately measure the success of their current algorithms, compare their search engine against competitors, and test out new iterations before launching.

Social media network providers have more recently come to appreciate the value of the curated crowd. While it’s common for these providers to poll their users regarding their experience with the site, the social feed, the ads and the search functionality, the data gathered from these traditional crowd-based methods is limited and uncontrolled. By contrast, engaging a curated crowd that is able to provide targeted feedback on specific aspects of the social feed, filtering their subjective experience through a set of objective criteria, has produced much more useful data. Social media providers who take advantage of this model are able to leverage the resulting data to improve their social feed algorithms, their ads and their search features in order to create a user experience that stands out above their competitors.

Choosing the Right Option

So when should you use crowdsourcing and when should you use a curated crowd? Crowdsourcing is great for simple tasks that can be adequately explained in two or three sentences. You’ll get a lot done quickly, but be prepared to raise the pay rate if you have a tight deadline and the crowd doesn’t find your task sexy enough. On the other hand, if you have a more complex task, particularly if it’s a longer-term or ongoing task that dedicated people can build expertise on over time, then a curated crowd is for you.
Either way, be sure you fully understand your options so that you can make the best choice for your business. And if you’re trying to name a boat, you might want to limit the vote to names you won’t be embarrassed to paint on the side of that brand new vessel.

Ben Christensen, director of content relevance operations at Appen, has been managing crowd-based search, eCommerce and social evaluation work since 2008. He has a master’s degree in library and information science from the University of Washington. Appen is a global language technology solutions provider with capability in more than 180 languages and 130 countries, serving companies, automakers and government agencies.


  2. This is exactly what the Gutenberg Project does:

Related Posts

  • 37
    FEATURES Putin vs. Western analysts Russia’s new approach to extending its influence necessitates new approaches to assessment. By Douglas Samuelson Making analytics work through practical project management Making analytics work: Why consistently delivering value requires effective project management. By Erick Wikum Crowdsourcing – Using the crowd: curated vs. unknown Using…
    Tags: data, work, crowdsourcing, crowd
  • 31
    Can publicly available data from large-scale social media networks be used to help predict catastrophic events within the country’s infrastructure, such as threats to national security, the energy system or even the economy? Conrad Tucker, associate professor of engineering design and industrial engineering, has received funding from the U.S. Air…
    Tags: data, social, will
  • 30
    Can publicly available data from large-scale social media networks be used to help predict catastrophic events within the country’s infrastructure, such as threats to national security, the energy system or even the economy? Conrad Tucker, associate professor of engineering design and industrial engineering, has received funding from the U.S. Air…
    Tags: data, social


Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →

Yale research on immigration, aging runners makes news

A recent study by Yale University professor and former INFORMS President Edward H. Kaplan (photo) and Yale colleague Jonathan Feinstein and Mohammad M. Fazel-Zarandi of MIT suggests that the number of undocumented immigrants in the United States is nearly twice as many as experts previously thought. Since its publication last month, the study, which estimates the number of such immigrants at 22.1 million instead of 11.3 million, has garnered worldwide attention from major media outlets including the Los Angeles Times, the Boston Globe, Fox News, Bloomberg News and the Daily Mail. Read more →

New salary survey paints optimistic picture for analytics professionals

Harnham, a global leader in data and analytics recruitment, recently released the 2018 editions of its salary guides for the United Kingdom, the United States and Europe. Having heard from thousands of data and analytics professionals across the globe, Harnham has gained an invaluable insight into key industry salaries and trends across a wide variety of analytics specialties and sectors. Read more →



Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden

INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Applied AI & Machine Learning | Comprehensive
Dec. 3, 2018 (live online)

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to