Abstract
Studying second language (L2) development longitudinally at the level of the individual learner is generally preferred to inferring individuals’ development based on aggregated data. An important question in this context is how many data points there should be per learner to reliably identify his/her longitudinal development. This study examines whether changes in accuracy and complexity measures can be identified in the longitudinal development of the writings of individual learners and investigates the number of writings necessary to identify such changes. It specifically examines whether the necessary sample size varies as a function of (i) specific indices of measurement; (ii) whether the analysis incorporates the specific values of the nominator and denominator when the index of measurement is a ratio or proportion; and (iii) whether the statistical model incorporates potential nonlinearity in development.
Data were drawn from a large-scale learner corpus consisting of learners’ writings submitted to an online school. Due to the nature of the data, it is possible to track the development of individual learners, and teachers’ feedback can be exploited to identify errors in the corpus. The subcorpus analyzed in the study included 40 writings each by 54 learners, totaling 2160 writings. The following complexity and accuracy measures were targeted: mean sentence length, subordinate clause per T-unit, mean clause length, the measure of textual lexical diversity (MTLD) and the target-like use (TLU) scores of articles, plural -s and past tense -ed.
Statistical models revealed that the necessary sample size varies widely across different measures and depends on the amount of information incorporated into the model. This study points out the differences in the frequencies of the components of the indices (e.g. all vs selected words) as a potential reason for differences in the necessary sample size, and further highlights the importance of weighting individual data points appropriately in statistically modeling the indices.
Data were drawn from a large-scale learner corpus consisting of learners’ writings submitted to an online school. Due to the nature of the data, it is possible to track the development of individual learners, and teachers’ feedback can be exploited to identify errors in the corpus. The subcorpus analyzed in the study included 40 writings each by 54 learners, totaling 2160 writings. The following complexity and accuracy measures were targeted: mean sentence length, subordinate clause per T-unit, mean clause length, the measure of textual lexical diversity (MTLD) and the target-like use (TLU) scores of articles, plural -s and past tense -ed.
Statistical models revealed that the necessary sample size varies widely across different measures and depends on the amount of information incorporated into the model. This study points out the differences in the frequencies of the components of the indices (e.g. all vs selected words) as a potential reason for differences in the necessary sample size, and further highlights the importance of weighting individual data points appropriately in statistically modeling the indices.
Original language | English |
---|---|
Title of host publication | Usage-based dynamics in second language development |
Editors | Wander Lowie, Marije Michel, Audrey Rousse-Malpat, Merel Keijzer, Rasmus Steinkrauss |
Publisher | Multilingual Matters |
Chapter | 2 |
Pages | 29-49 |
Number of pages | 21 |
ISBN (Print) | 9781788925235 |
Publication status | Published - 14 Jul 2020 |