On the sample size required to identify the longitudinal L2 development of complexity and accuracy indices

Akira Murakami

On the sample size required to identify the longitudinal L2 development of complexity and accuracy indices

English Language and Linguistics

Research output: Chapter in Book/Report/Conference proceeding › Chapter (peer-reviewed) › peer-review

Abstract

Studying second language (L2) development longitudinally at the level of the individual learner is generally preferred to inferring individuals’ development based on aggregated data. An important question in this context is how many data points there should be per learner to reliably identify his/her longitudinal development. This study examines whether changes in accuracy and complexity measures can be identified in the longitudinal development of the writings of individual learners and investigates the number of writings necessary to identify such changes. It specifically examines whether the necessary sample size varies as a function of (i) specific indices of measurement; (ii) whether the analysis incorporates the specific values of the nominator and denominator when the index of measurement is a ratio or proportion; and (iii) whether the statistical model incorporates potential nonlinearity in development.

Data were drawn from a large-scale learner corpus consisting of learners’ writings submitted to an online school. Due to the nature of the data, it is possible to track the development of individual learners, and teachers’ feedback can be exploited to identify errors in the corpus. The subcorpus analyzed in the study included 40 writings each by 54 learners, totaling 2160 writings. The following complexity and accuracy measures were targeted: mean sentence length, subordinate clause per T-unit, mean clause length, the measure of textual lexical diversity (MTLD) and the target-like use (TLU) scores of articles, plural -s and past tense -ed.

Statistical models revealed that the necessary sample size varies widely across different measures and depends on the amount of information incorporated into the model. This study points out the differences in the frequencies of the components of the indices (e.g. all vs selected words) as a potential reason for differences in the necessary sample size, and further highlights the importance of weighting individual data points appropriately in statistically modeling the indices.

Original language	English
Title of host publication	Usage-based dynamics in second language development
Editors	Wander Lowie, Marije Michel, Audrey Rousse-Malpat, Merel Keijzer, Rasmus Steinkrauss
Publisher	Multilingual Matters
Chapter	2
Pages	29-49
Number of pages	21
ISBN (Print)	9781788925235
Publication status	Published - 14 Jul 2020

Cite this

@inbook{91378dcb94324d60b24615a23c43f71e,

title = "On the sample size required to identify the longitudinal L2 development of complexity and accuracy indices",

abstract = "Studying second language (L2) development longitudinally at the level of the individual learner is generally preferred to inferring individuals{\textquoteright} development based on aggregated data. An important question in this context is how many data points there should be per learner to reliably identify his/her longitudinal development. This study examines whether changes in accuracy and complexity measures can be identified in the longitudinal development of the writings of individual learners and investigates the number of writings necessary to identify such changes. It specifically examines whether the necessary sample size varies as a function of (i) specific indices of measurement; (ii) whether the analysis incorporates the specific values of the nominator and denominator when the index of measurement is a ratio or proportion; and (iii) whether the statistical model incorporates potential nonlinearity in development.Data were drawn from a large-scale learner corpus consisting of learners{\textquoteright} writings submitted to an online school. Due to the nature of the data, it is possible to track the development of individual learners, and teachers{\textquoteright} feedback can be exploited to identify errors in the corpus. The subcorpus analyzed in the study included 40 writings each by 54 learners, totaling 2160 writings. The following complexity and accuracy measures were targeted: mean sentence length, subordinate clause per T-unit, mean clause length, the measure of textual lexical diversity (MTLD) and the target-like use (TLU) scores of articles, plural -s and past tense -ed.Statistical models revealed that the necessary sample size varies widely across different measures and depends on the amount of information incorporated into the model. This study points out the differences in the frequencies of the components of the indices (e.g. all vs selected words) as a potential reason for differences in the necessary sample size, and further highlights the importance of weighting individual data points appropriately in statistically modeling the indices.",

author = "Akira Murakami",

year = "2020",

month = jul,

day = "14",

language = "English",

isbn = "9781788925235",

pages = "29--49",

editor = "Wander Lowie and Marije Michel and Audrey Rousse-Malpat and Merel Keijzer and Rasmus Steinkrauss",

booktitle = "Usage-based dynamics in second language development",

publisher = "Multilingual Matters",

}

On the sample size required to identify the longitudinal L2 development of complexity and accuracy indices. / Murakami, Akira.
Usage-based dynamics in second language development. ed. / Wander Lowie; Marije Michel; Audrey Rousse-Malpat; Merel Keijzer; Rasmus Steinkrauss. Multilingual Matters, 2020. p. 29-49.

Research output: Chapter in Book/Report/Conference proceeding › Chapter (peer-reviewed) › peer-review

TY - CHAP

T1 - On the sample size required to identify the longitudinal L2 development of complexity and accuracy indices

AU - Murakami, Akira

PY - 2020/7/14

Y1 - 2020/7/14

N2 - Studying second language (L2) development longitudinally at the level of the individual learner is generally preferred to inferring individuals’ development based on aggregated data. An important question in this context is how many data points there should be per learner to reliably identify his/her longitudinal development. This study examines whether changes in accuracy and complexity measures can be identified in the longitudinal development of the writings of individual learners and investigates the number of writings necessary to identify such changes. It specifically examines whether the necessary sample size varies as a function of (i) specific indices of measurement; (ii) whether the analysis incorporates the specific values of the nominator and denominator when the index of measurement is a ratio or proportion; and (iii) whether the statistical model incorporates potential nonlinearity in development.Data were drawn from a large-scale learner corpus consisting of learners’ writings submitted to an online school. Due to the nature of the data, it is possible to track the development of individual learners, and teachers’ feedback can be exploited to identify errors in the corpus. The subcorpus analyzed in the study included 40 writings each by 54 learners, totaling 2160 writings. The following complexity and accuracy measures were targeted: mean sentence length, subordinate clause per T-unit, mean clause length, the measure of textual lexical diversity (MTLD) and the target-like use (TLU) scores of articles, plural -s and past tense -ed.Statistical models revealed that the necessary sample size varies widely across different measures and depends on the amount of information incorporated into the model. This study points out the differences in the frequencies of the components of the indices (e.g. all vs selected words) as a potential reason for differences in the necessary sample size, and further highlights the importance of weighting individual data points appropriately in statistically modeling the indices.

AB - Studying second language (L2) development longitudinally at the level of the individual learner is generally preferred to inferring individuals’ development based on aggregated data. An important question in this context is how many data points there should be per learner to reliably identify his/her longitudinal development. This study examines whether changes in accuracy and complexity measures can be identified in the longitudinal development of the writings of individual learners and investigates the number of writings necessary to identify such changes. It specifically examines whether the necessary sample size varies as a function of (i) specific indices of measurement; (ii) whether the analysis incorporates the specific values of the nominator and denominator when the index of measurement is a ratio or proportion; and (iii) whether the statistical model incorporates potential nonlinearity in development.Data were drawn from a large-scale learner corpus consisting of learners’ writings submitted to an online school. Due to the nature of the data, it is possible to track the development of individual learners, and teachers’ feedback can be exploited to identify errors in the corpus. The subcorpus analyzed in the study included 40 writings each by 54 learners, totaling 2160 writings. The following complexity and accuracy measures were targeted: mean sentence length, subordinate clause per T-unit, mean clause length, the measure of textual lexical diversity (MTLD) and the target-like use (TLU) scores of articles, plural -s and past tense -ed.Statistical models revealed that the necessary sample size varies widely across different measures and depends on the amount of information incorporated into the model. This study points out the differences in the frequencies of the components of the indices (e.g. all vs selected words) as a potential reason for differences in the necessary sample size, and further highlights the importance of weighting individual data points appropriately in statistically modeling the indices.

UR - https://www.multilingual-matters.com/page/detail/UsageBased-Dynamics-in-Second-Language-Development/?k=9781788925235

M3 - Chapter (peer-reviewed)

SN - 9781788925235

SP - 29

EP - 49

BT - Usage-based dynamics in second language development

A2 - Lowie, Wander

A2 - Michel, Marije

A2 - Rousse-Malpat, Audrey

A2 - Keijzer, Merel

A2 - Steinkrauss, Rasmus

PB - Multilingual Matters

ER -

On the sample size required to identify the longitudinal L2 development of complexity and accuracy indices

Abstract

Fingerprint

Cite this