Abstract
Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterising count data with no or unknown upper bound. Given the prevalence of count data in linguistics, Poisson regression has wide utility no matter what subfield of linguistics is considered. However, in contrast to logistic regression, Poisson regression is surprisingly little known. Here, we make a case for why linguists need to consider Poisson regression, and give recommendations for when Poisson regression is more appropriate compared to logistic regression. This tutorial introduces readers to foundational concepts needed to understand the basics of Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.
Original language | English |
---|---|
Article number | e12439 |
Number of pages | 23 |
Journal | Language and Linguistics Compass |
Volume | 15 |
Issue number | 11 |
Early online date | 16 Nov 2021 |
DOIs | |
Publication status | Published - Nov 2021 |