Take the time to read this entire article; it's really engaging and makes a lot of sense.
In a small apartment in a small town in northeastern Mississippi, Sarah Marshall sits at her computer, clicking bubbles for an online survey, as her 1-year-old son plays nearby. She hasn't done this exact survey before, but the questions are familiar, and she works fast. That’s because Marshall is what you might call a professional survey-taker. In the past five years, she has completed roughly 20,000 academic surveys. This is her 21st so far this week. And it’s only Tuesday.
In the past five years, Sarah Marshall has completed roughly 20,000 academic surveys.
Marshall is a worker for Amazon’s Mechanical Turk, an online job forum where “requesters” post jobs, and an army of crowdsourced workers complete them, earning fantastically small fees for each task. The work has been called micro labor, and the jobs, known as Human Intelligence Tasks, or HITs, range wildly. Some are tedious: transcribing interviews or cropping photos. Some are funny: prank calling someone’s buddy (that’s worth $1) or writing the title to a pornographic movie based on a collection of dirty screen grabs (6 cents). And others are downright bizarre. One task, for example, asked workers to strap live fish to their chests and upload the photos. That paid $5 — a lot by Mechanical Turk standards.
Mostly, Marshall is a sort of cyber guinea pig, providing a steady stream of data to academic research. This places her squarely inside a growing culture of super-savvy, highly experienced study participants.
Sarah Marshall makes about 50 percent of her income from Mechanical Turk, mostly doing research studies. She works at home while watching her son. Photo by Mike Fritz.
As she works, she hears a rustling noise. “Grayson, are you in my garbage can?”
In the kitchen, the trash can’s on its side. Her son has liberated an empty box of cinnamon rolls and dumped the remaining contents on the floor. She goes to him, scoops him up and carries him back to the living room, where he circles the carpet, chattering happily as she resumes typing.
“I’m never going to be absolutely undistracted, ever,” Marshall says, and smiles.
Her employers don’t know that Marshall works while negotiating her toddler’s milk bottles and giving him hugs. They don’t know that she has seen studies similar to theirs maybe hundreds, possibly thousands, of times.
Since its founding in 2005, Mechanical Turk has become an increasingly popular way for university researchers to recruit subjects for online experiments. It’s cheap, easy to use, and the responses, powered by the forum’s 500,000 or so workers, flood in fast.
These factors are such a draw for researchers that, in certain academic fields, crowdsourced workers are outpacing psychology students — the traditional go-to study subjects. And the studies are a huge draw for many workers, who tend to participate again and again and again.
These aren’t obscure studies that Turkers are feeding. They span dozens of fields of research, including social, cognitive and clinical psychology, economics, political science and medicine. They teach us about human behavior. They deal in subjects like energy conservation, adolescent alcohol use, managing money and developing effective teaching methods.
Video by Mike Fritz
“Most of what’s happening in these studies involves trying to understand human behavior,” said Yale University’s David Rand. “Understanding bias and prejudice, and how you make financial decisions, and how you make decisions generally that involve taking risks, that kind of thing. And there are often very clear policy implications.”
As the use of online crowdsourcing in research continues to grow, some are asking the question: How reliable are the data that these modern-day research subjects generate?
The early adopter
In 2010, the researcher Joseph Henrich and his team published a paper showing that an American undergraduate was about 4,000 times more likely than an average American to be the subject of a research study.
But that output pales in comparison to Mechanical Turk workers. The typical “Turker” completes more studies in a week than the typical undergraduate completes in a lifetime. That’s according to research by Rand, who surveyed both groups. Among those he surveyed, he found that the median traditional lab subject had completed 15 total academic studies — an average of one per week. The median Turker, on the other hand, had completed 300 total academic studies — an average of 20 per week.
“Which is just crazy,” Rand said. “And for a lot of experiments, that’s a big problem.”
David Rand, director of Yale University’s Human Cooperation Laboratory, presents at the PopTech Conference in Camden, Maine in 2012. Photo by Thatcher Cook
Rand, a young, energetic behavioral economist, who accessorizes his suit jacket with gray converse shoes and orange-striped socks, works on the second floor of a beautiful cathedral-like building on Yale’s main campus. Behind his desk are shelves of robot toys, a nod to his pre-professor days, when he fronted an electro-punk band called Robot Goes Here. “I actually had a record deal,” he said.
Rand was an early proselytizer for Mechanical Turk. In fact, he authored the first study in his field that encouraged scientists to tap Turkers for surveys. At the time, he gave talks to fellow researchers, telling them recruiting via Mechanical Turk could be done more quickly, cheaply and easily and could be “just as valid as other types of experiments.”
That was in 2010. Since then, his early enthusiasm has been tempered with caution. He’s been following the forum for nearly a decade and has come to believe that it has some serious limitations. First, there’s the question of dropout rates. Turkers are more likely to drop out mid-study, and that can skew the results. Then there’s the question of environmental control. Read: There is none. In the lab, it’s easy to monitor survey takers; not so online. Who’s to say they’re not watching reality television while working, or drinking a few beers on the job? To guard against this, researchers test a worker’s focus by planting “attention checks” in their surveys. “Have you ever eaten a sandwich on Mars,” a question might read. Or “Have you ever had a fatal heart attack?” But the attention check questions are often recycled, and experienced workers spot them immediately. (“Whenever I see the word vacuum, I know it’s an attention check,” Marshall has said.)
But it’s the absence of gut intuition from experienced workers that concerns Rand the most.
A person’s gut response to a question is an important measurement in many social psychology studies. It’s common to compare the automatic, intuitive part of the decision-making brain with the part that’s rational and deliberate. But a psychologist testing for this among professional survey takers may very well be on a fool’s errand.
Katie Hays, 28, makes about $200 a week Turking in Biloxi, Mississippi, and has observed a change in her own survey performance. “It’s hard to reproduce a gut response when you’ve answered a survey that’s basically the same 200 times,” she said. “You kind of just lose that freshness.”
The humans behind the machine have essentially become machines.
Recently, Rand recruited a group of college students and Turkers — 5,831 total subjects — to perform a series of experiments testing whether cooperative behavior that’s successful in daily life will spill over into what’s known as a Public Goods game. In the game, participants were given a small amount of money and asked to make a choice: how much cash do they keep and how much gets contributed to the common pool, which benefits other players. He then made the players more or less inclined to rely on their gut responses by forcing them to answer quickly or to stop and consider their choices. Among college students, the pattern was clear. When forced to answer quickly, they were more likely to contribute to the common good; the more time they had to deliberate, the more they hoarded for themselves. But experienced Turkers behaved differently. Among them, the impulse to share had largely disappeared, likely because they knew the game, and had learned that sharing was not a good strategy. The study was published in the journal Nature Communications in April 2014.
There are two critical points here as they relate to Mechanical Turk. The first is that frequent Mechanical Turk workers are fluent in these experiments on arrival. They know how to play the game. But also, perhaps more importantly, their natural human impulses from daily life, as they apply to the game, no longer exist.
“If you’re running social psychology studies on Turk, watch out, because [the subjects] have gotten experienced, and that can change effects,” Rand said. “So if you run my experiment on Turk right now, you won’t get any effect. Which sucks for me.”
To be clear, extreme experience isn’t always a problem, Rand said. There are some psychological tests that are so robust that no amount of experience will override the effect, said Jesse Chandler, a researcher at the University of Michigan’s Institute for Social Research. The Stroop effect, for example, which involves identifying colors when the color of a word doesn’t match the color spelled out by the text. When the word “red,” for example, is colored green, it takes longer to override the automatic reading of the text and choose green.
“It’s a really powerful effect,” Chandler said. “No amount of practice, no amount of awareness will completely override that happening.”
And some puzzles with tricky instructions can benefit from a professional survey taker — someone who’s fluent in the game, said Rand, who notes that he still uses Mechanical Turk often and considers it a great tool, provided that you understand its limitations.