Share with your friends


Analytics Magazine

Crowdsourcing – Using the crowd: curated vs. unknown

Ben Christensen crowdsourcingBy Ben Christensen

You’ve heard from colleagues, from industry news, maybe even from personal experience of the success of crowdsourcing. With nearly half the world’s population online, it makes sense to tap into this tremendous resource to collect large quantities of data by breaking the collection down into micro-tasks that the enormous crowd of Internet users will complete for you at pennies per task. But you’ve probably also heard of cases of crowdsourcing gone wrong, cases like the British government agency that put the task of naming its new research vessel out to the crowd and ended up with Boaty McBoatface [1]. So you have a human annotation task – a data refinement or evaluation task that requires human input – and you’re trying to decide whether to take your chances with crowdsourcing. Where do you start? The first thing you need to know is that you have options; it’s not Boaty McBoatface or nothing.

“When should I use crowdsourcing, and when should I use a curated crowd?” This is the question anyone interested in staffing human annotation tasks should be asking, but many don’t because they don’t even know there are two different options. So let’s start there – defining the options. Assuming you need human annotation, for example for search relevance evaluation, there are two ways you can gather the necessary humans to do that work: 1) crowdsourcing, where the task is made available to a large crowd without any management beyond a very limited set of task instructions and possibly a simple screening test; or 2) curated crowds, where a smaller group is selected to complete the task accurately according to quality guidelines.

When to Use a Traditional Crowdsourcing Model

The power of crowdsourcing is in its numbers. You can accomplish a lot quickly because many hands make light work. A hundred thousand people can do quite a bit more than a hundred can. The cost is less because crowdsourcing typically pays only a few pennies per task. Most members of the crowd aren’t trying to make a living – they’re just trying to make a few extra bucks in their spare time. There’s usually little overhead involved in crowdsourcing because the crowd looks after itself. You put the task out there, and if it’s interesting enough and pays enough, the crowd will get it done.

This model works well for simple tasks that require little explanation and even less expertise. For example, you can ask the crowd to choose which of two images contains a dog, to tell you whether a business listing is or isn’t a restaurant, or to transcribe words from images [2] and have a reasonable amount of success. Opinion-based tasks where you are looking for a wide variety of responses are also well-suited for crowdsourcing: which image do you like better, what’s the best Italian restaurant in your town, or how would you word this request for a voice-activated personal assistant.

Crowdsourcing Challenges

With the advantages of traditional crowdsourcing come a few limitations. First, quality control is minimal. Without this, you must rely on clear instructions, automated understanding checks and high overlap to get data you can trust. Overlap is important because there will always be noise in the crowd – bad data that you have to identify and sift out – so you’ll likely pay for at least five members of the crowd to review each result. Some members of the crowd will try to game the system, using bots to do their work for them, so you’ll need to account for this with screening tests on top of your high overlap.

The second limitation is your lack of control over task completion. The crowd can get a lot done quickly, but it will only get your task done quickly if it wants to. This means if your task is more difficult or less exciting than the shiny new task another team is offering, then you’re going to have to incentivize the crowd with higher payment to work on your task. The crowd has made no commitment to you and may not be motivated to make a deadline.

Finally, if you’re looking for data in smaller markets, you may be out of luck – crowdsourcing is huge in the United States and a few other countries, but the same doesn’t hold true globally.

The power of crowdsourcing is in its numbers. Many hands make light work. Photo Courtesy of | Kheng Ho Toh

The power of crowdsourcing is in its numbers. Many hands make light work.
Photo Courtesy of | Kheng Ho Toh

The Alternative: Using a Curated Crowd

Curated crowds, on the other hand, are all about quality. With this solution, offered by a small number of specialized crowd data solution providers, you have a group of people who are dedicated, if not specifically to your task, then to similar tasks. People in curated crowds become experts in search relevance evaluation, social media evaluation or whatever type of human annotation tasks they work on. This is not simply a case of counting on their accumulated experience to ensure quality, although that experience does play a large role. The key to quality is constant checks and balances. They are held to quality metrics, receive quality feedback, and are removed from your task if they don’t deliver the required quality.

This means that you can use very little overlap, paying for each judgment only one to three times instead of five or more times, because you can trust the data each person delivers. Curated crowd providers also monitor productivity and throughput, ensuring that the crowd meets their weekly, daily and hourly commitments so that you have the data you need when you need it. And if you’ve chosen a good vendor, then the manager will also be an invaluable resource, leveraging years of experience to partner with you in building out tasks and guidelines based on your needs.

With this higher level of quality and productivity management comes a cost. Curated crowds cost more than crowdsourcing because this work is typically a primary source of income. You also pay for the quality oversight that you don’t have in crowdsourcing. Keep in mind, though, that lower overlap mitigates these costs because you aren’t paying for each collected data point multiple times. Apart from the financial cost, curated crowds also require more of a commitment from you in exchange for the greater commitment you get. The curated crowd will be happiest and will keep their skills sharpest when you provide work for them consistently.

That said, if the natural ebb and flow of your need for human annotation necessitates more flexibility, there are alternatives, such as sharing a flexible curated crowd with other teams running similar tasks.

Use Cases: Search and Social

Major worldwide search engine providers have been using curated crowd solutions for years. Curated crowds are used for search relevance evaluation, local search result validation, query classification, spam identification and countless other tasks that require more attention than what traditional crowdsourcing provides. By using this model, these search engine providers gather high-quality data they can trust to accurately measure the success of their current algorithms, compare their search engine against competitors, and test out new iterations before launching.

Social media network providers have more recently come to appreciate the value of the curated crowd. While it’s common for these providers to poll their users regarding their experience with the site, the social feed, the ads and the search functionality, the data gathered from these traditional crowd-based methods is limited and uncontrolled. By contrast, engaging a curated crowd that is able to provide targeted feedback on specific aspects of the social feed, filtering their subjective experience through a set of objective criteria, has produced much more useful data. Social media providers who take advantage of this model are able to leverage the resulting data to improve their social feed algorithms, their ads and their search features in order to create a user experience that stands out above their competitors.

Choosing the Right Option

So when should you use crowdsourcing and when should you use a curated crowd? Crowdsourcing is great for simple tasks that can be adequately explained in two or three sentences. You’ll get a lot done quickly, but be prepared to raise the pay rate if you have a tight deadline and the crowd doesn’t find your task sexy enough. On the other hand, if you have a more complex task, particularly if it’s a longer-term or ongoing task that dedicated people can build expertise on over time, then a curated crowd is for you.
Either way, be sure you fully understand your options so that you can make the best choice for your business. And if you’re trying to name a boat, you might want to limit the vote to names you won’t be embarrassed to paint on the side of that brand new vessel.

Ben Christensen, director of content relevance operations at Appen, has been managing crowd-based search, eCommerce and social evaluation work since 2008. He has a master’s degree in library and information science from the University of Washington. Appen is a global language technology solutions provider with capability in more than 180 languages and 130 countries, serving companies, automakers and government agencies.


  2. This is exactly what the Gutenberg Project does:

Related Posts

  • 37
    FEATURES Putin vs. Western analysts Russia’s new approach to extending its influence necessitates new approaches to assessment. By Douglas Samuelson Making analytics work through practical project management Making analytics work: Why consistently delivering value requires effective project management. By Erick Wikum Crowdsourcing – Using the crowd: curated vs. unknown Using…
    Tags: data, work, crowdsourcing, crowd
  • 31
    Can publicly available data from large-scale social media networks be used to help predict catastrophic events within the country’s infrastructure, such as threats to national security, the energy system or even the economy? Conrad Tucker, associate professor of engineering design and industrial engineering, has received funding from the U.S. Air…
    Tags: data, social, will
  • 30
    Can publicly available data from large-scale social media networks be used to help predict catastrophic events within the country’s infrastructure, such as threats to national security, the energy system or even the economy? Conrad Tucker, associate professor of engineering design and industrial engineering, has received funding from the U.S. Air…
    Tags: data, social


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to