A corpus-based developmental investigation of linguistic complexity in children's writing

Yaling Hsiao*, Nicola J. Dawson, Nilanjana Banerji, Kate Nation

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    16 Downloads (Pure)

    Abstract

    Writing proficiency is associated with linguistic complexity. We used measures of linguistic complexity to investigate the development of children's narrative writing using a large corpus of short stories (N>100,000) written by children aged 5–13 in the UK. Linguistic complexity was assessed using both lexical (N = 30) and syntactic (N = 14) measures. Most measures were associated with age, with writing by older children showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50 % of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across the age range, there was wider variation in syntactic complexity than in lexical diversity, suggesting that syntactic development is subject to more individual differences than the ability to use a diverse set of lexical items. Our findings quantify the nature and content of children's writing through mid-childhood, and we discuss the utility of analysing children's writing using a computational, data-driven approach.
    Original languageEnglish
    Article number100084
    Number of pages14
    JournalApplied Corpus Linguistics
    Volume4
    Issue number1
    Early online date7 Jan 2024
    DOIs
    Publication statusPublished - Apr 2024

    Keywords

    • Lexical development
    • Grammatical development
    • Writing development
    • Corpus analysis
    • Linguistic complexity
    • Principal component analysis

    Fingerprint

    Dive into the research topics of 'A corpus-based developmental investigation of linguistic complexity in children's writing'. Together they form a unique fingerprint.

    Cite this