We need to talk about mechanical Turk: what 22,989 hypothesis tests tell us about publication bias and P-hacking in online experiments

Abel Brodeur; Nikolai Cook; Anthony Heyes

doi:10.2139/ssrn.4188289

We need to talk about mechanical Turk: what 22,989 hypothesis tests tell us about publication bias and P-hacking in online experiments

Abel Brodeur, Nikolai Cook, Anthony Heyes

Economics

Research output: Working paper/Preprint › Preprint

Abstract

Amazon Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erode substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.

Original language	English
Publisher	SSRN
Number of pages	57
DOIs	https://doi.org/10.2139/ssrn.4188289
Publication status	Published - 12 Aug 2022

Keywords

online crowd-sourcing platforms
Amazon Mechanical Turk
p-hacking
publication bias
statistical power
research credibility

Access to Document

10.2139/ssrn.4188289

Cite this

@techreport{b04ec70b323248a492dccc0186649047,

title = "We need to talk about mechanical Turk: what 22,989 hypothesis tests tell us about publication bias and P-hacking in online experiments",

abstract = "Amazon Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erode substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.",

keywords = "online crowd-sourcing platforms, Amazon Mechanical Turk, p-hacking, publication bias, statistical power, research credibility",

author = "Abel Brodeur and Nikolai Cook and Anthony Heyes",

year = "2022",

month = aug,

day = "12",

doi = "10.2139/ssrn.4188289",

language = "English",

publisher = "SSRN",

type = "WorkingPaper",

institution = "SSRN",

}

TY - UNPB

T1 - We need to talk about mechanical Turk

T2 - what 22,989 hypothesis tests tell us about publication bias and P-hacking in online experiments

AU - Brodeur, Abel

AU - Cook, Nikolai

AU - Heyes, Anthony

PY - 2022/8/12

Y1 - 2022/8/12

N2 - Amazon Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erode substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.

AB - Amazon Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erode substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.

KW - online crowd-sourcing platforms

KW - Amazon Mechanical Turk

KW - p-hacking

KW - publication bias

KW - statistical power

KW - research credibility

U2 - 10.2139/ssrn.4188289

DO - 10.2139/ssrn.4188289

M3 - Preprint

BT - We need to talk about mechanical Turk

PB - SSRN

ER -

We need to talk about mechanical Turk: what 22,989 hypothesis tests tell us about publication bias and P-hacking in online experiments

Abstract

Keywords

Access to Document

Fingerprint

Cite this