RESEARCHING LEXICAL THRESHOLDS AND LEXICAL PROFILES ACROSS THE COMMON EUROPEAN FRAMEWORK OF REFERENCE FOR LANGUAGES (CEFR) LEVELS ASSESSED IN THE APTIS TEST: N. OWEN, P. SHRESTHA + S. BAX

0
536

This project uses automated analysis software (www.textinspector.com) to research the lexical and metadiscourse thresholds, and lexical and metadiscourse profiles, of test-takers’ writing in the British Council’s Aptis Writing test, benchmarked to the Common European Framework of Reference for Languages (CEFR). Large quantities of Aptis writing responses (n=6,407), representing 65 countries, together with their score data, were analysed in terms of their use of lexis and metadiscourse. Measures and datasets used in the analysis include standard readability measures, the British National Corpus, the Corpus of Contemporary American English, English Vocabulary profile, the Academic Word List, and a bespoke corpus of metadiscourse markers. The purpose of the research is to enhance the validation argument for the Aptis test through large-scale profiling of candidates’ writing performance. The findings reveal that the Aptis writing test provides evidence that lexical complexity changes systematically as the CEFR level of learners increases. Of the 110 Text Inspector metrics used in the study, 26 metrics were significant across all CEFR boundaries, including measures of text length (sentence, token and type count), and metrics of lexical sophistication (syllable count and number of words with more than two syllables). Fourteen of the 26 metrics represent vocabulary use. One metric of text complexity (voc-d) was also significant across all thresholds. The study also explores the utility of these metrics for use in an automated scoring engine. Twenty metrics were used to build an ordinal logistic regression which was trained on a stratified subset of the data. This model was then used to predict the CEFR band of a testing subset which held nationality data constant. The data revealed that lexical use metrics from the Cambridge Learner Corpus (CLC) were the most successful at identifying CEFR level, and the model was most successful in identifying A1 and C-level responses. However, the model failed to accurately differentiate A2, B1 and B2 responses, suggesting that other, organisational variables play a significant role in human judgements, which are not accounted for in this study. The paper concludes with recommendations for rater training on the basis of the findings. RESEARCHING LEXICAL THRESHOLDS AND LEXICAL PROFILES ACROSS THE COMMON EUROPEAN FRAMEWORK OF REFERENCE FOR LANGUAGES (CEFR) LEVELS ASSESSED IN THE APTIS TEST: N. OWEN, P. SHRESTHA + S. BAX ASSESSMENT RESEARCH AWARDS AND GRANTS | PAGE 3 Authors Nathaniel Owen is an Associate member and former Research Associate at the Open University. He holds a PhD in language testing from the University of Leicester and has published articles in peer-reviewed journals and book chapters in edited volumes on subjects including language testing, research methods and widening participation in higher education. He has experience of teaching English as a foreign language in Spain, the UK and Australia, with expertise in teaching EAP and exam preparation courses. He has previously worked for the examination board, Cambridge Assessment and has managed international-funded research projects with Educational Testing Service, in addition to the British Council. He joined Oxford University Press in February 2020 as a Senior Research and Validation Manager. Prithvi N. Shrestha, an award-winning author (British Council ELTons finalist 2019), is Senior Lecturer in English Language at the Open University, UK. He has led or co-led a number of funded research projects. He has published over 40 research outputs, including one research monograph (Dynamic Assessment of Students’ Academic Writing, Springer, 2020) and an edited volume, covering academic writing assessment in distance education, language assessment, English language education in developing countries, English medium instruction and mobile learning. His research is informed by Systemic Functional Linguistics and sociocultural theory. Professor Stephen Bax was a professor of Modern Languages and Linguistics, and Director of Research Excellence in the School of Languages and Applied Linguistics, The Open University (UK). His research included work on the application of new technologies in language education, and he was awarded the 2014 TESOL Distinguished Researcher Award for his research using eye tracking technology to investigate reading. He was responsible for developing Text Inspector, an online tool for analysing lexis in text, which was shortlisted for the international British Council ELTons awards for Digital Innovation 2017. Stephen was the PI of this project until he sadly passed away in 2017. We express our condolences to his friends and family. We all miss him terribly. ACKNOWLEDGEMENTS We wish to thank the anonymous referees, Richard Spiby and Carolyn Westbrook for their many useful suggestions, and Dana Therova for her assistance in data management in the preparation of this manuscript. RESEARCHING LEXICAL THRESHOLDS AND LEXICAL PROFILES ACROSS THE COMMON EUROPEAN FRAMEWORK OF REFERENCE FOR LANGUAGES (CEFR) LEVELS ASSESSED IN THE APTIS TEST: N. OWEN, P. SHRESTHA + S. BAX ASSESSMENT RESEARCH AWARDS AND GRANTS | PAGE 4 CONTENTS 1. BACKGROUND 6 1.1 Rationale 6 1.2 Aims 6 2. REVIEW OF LITERATURE 7 2.1 Lexical and metadiscourse thresholds and profiles 7 2.2 Investigating text using automated analytical tools 7 2.3 Automated analysis of learner written data 8 2.4 Researching metadiscourse use in learner written data 10 2.5 Aligning the Aptis test to the CEFR 11 3. RESEARCH QUESTIONS 13 4. RESEARCH DESIGN AND METHODOLOGY 14 4.1 Materials 14 4.2 Data cleaning 14 4.3 Manual analysis 15 4.4 Data analysis 15 4.5 Research question 1 16 4.6 Research question 2 16 5. FINDINGS AND DISCUSSION 18 5.1 Research question 1 18 5.2 Basic statistics 20 5.3 Lexical diversity 23 5.4 Lexical profiles 24 5.5 Academic lexis 26 5.6 Metadiscourse profiles 27 5.7 Summary of findings 31 5.8 Research question 2 31 5.8.1 Constructing the model 31 5.8.2 Evaluating the model 32 6. DISCUSSION 33 7. CONCLUSIONS AND RECOMMENDATION 34 7.1 Limitations to the present study 36 REFERENCES 37 APPENDIX 1: Writing task 2 specifications and scale descriptors 41 APPENDIX 2: Writing task 3 specifications and scale descriptors 43 APPENDIX 3: Writing task 4 specifications and scale descriptors 45 APPENDIX 4: Sample characteristics 47 APPENDIX 5: Metrics analysed with Text Inspector 49 APPENDIX 6: Metadiscourse markers analysed using Text Inspector 51 APPENDIX 7: Descriptive statistics for significant findings 53 APPENDIX 8: Kruskal-Wallis test results for all metrics across CEFR thresholds 56 APPENDIX 9: Percentages of metadiscourse markers used across CEFR bands 58 RESEARCHING LEXICAL THRESHOLDS AND LEXICAL PROFILES ACROSS THE COMMON EUROPEAN FRAMEWORK OF REFERENCE FOR LANGUAGES (CEFR) LEVELS ASSESSED IN THE APTIS TEST: N. OWEN, P. SHRESTHA + S. BAX ASSESSMENT RESEARCH AWARDS AND GRANTS | PAGE 5 List of tables Table 1: Hyland’s categories of metadiscourse markers (Hyland, 2004, pp. 109–111) 10 Table 2: Score distribution of sample 14 Table 3: Data cleaning procedure 15 Table 4: Stratified sample for replication analysis 17 Table 5: Metrics contributing to lexical profiles in Aptis writing responses 19 Table 6: Summary of EVP tokens used by test-takers at different CEFR levels of the Aptis writing test 24 Table 7: Summary of EVP types used by candidates at different CEFR levels of the Aptis writing test 25 Table 8: Summary of statistically significant BNC tokens and types used by candidates at different CEFR levels of the Aptis writing test 25 Table 9: Summary of statistically significant COCA tokens and types used by candidates at different CEFR levels of the Aptis writing test 26 Table 10: Summary of statistically significant AWL tokens and types used by candidates at different CEFR levels of the Aptis writing test 27 Table 11: Significant differences in metadiscourse tokens used across CEFR thresholds 28 Table 12: Significant differences in metadiscourse types used across CEFR thresholds 28 Table 13: Model parameter estimates and intercepts 32 Table 14: Comparison of ratings between human raters and the regression model 33 List of figures Figure 1: Number of sentences by CEFR level 20 Figure 2: Token count by CEFR level 21 Figure 3: Type count by CEFR level 21 Figure 4: Syllable count by CEFR level 22 Figure 5: Average number of words >2 syllables by CEFR level 22 Figure 6: Voc-d score by CEFR level 23 Figure 7: Percentage of metadiscourse by CEFR level 29 RESEARCHING LEXICAL THRESHOLDS AND LEXICAL PROFILES ACROSS THE COMMON EUROPEAN FRAMEWORK OF REFERENCE FOR LANGUAGES (CEFR) LEVELS ASSESSED IN THE APTIS TEST: N. OWEN, P. SHRESTHA + S. BAX ASSESSMENT RESEARCH AWARDS AND GRANTS | PAGE 6 1. BACKGROUND This study falls under the Aptis 2016 Call for research proposals, in the category of Test Development and Validation, specifically: Studies investigating the usefulness of applying automated analysis techniques to investigate lexical thresholds and lexical profiles across the Common European Framework of Reference for Languages (CEFR) levels assessed in Aptis. In line with this category, this report details a large-scale investigation of the value of automated analyses, using the advanced TextInspector.com tool together with a concordancing tool, to research the lexical and metadiscourse thresholds, and lexical and metadiscourse profiles, of test-takers’ writing in the British Council’s Aptis writing test, benchmarked to the Common European Framework of Reference for Languages (CEFR).