MERLIN : An Online Trilingual Learner Corpus Empirically Grounding the European Reference Levels in Authentic Learner Data


Since its publication in 2001, the Common European Framework of Reference for Languages (CEFR) has gained a leading role as an instrument of reference for language teaching and certification. Nonetheless, there is a growing concern about CEFR levels being insufficiently illustrated in terms of authentic learner data. Such concern grows even stronger when considering languages other than English (cf., e.g., Hulstijn 2007, North 2000). In this paper, we present the MERLIN project that addresses this need by illustrating and validating the CEFR levels for Czech, German, and Italian. To achieve its goal, we are developing a didactically motivated online platform to enable CEFR users to explore authentic written learner productions that have been related in a methodologically sophisticated and rigorous way to the CEFR levels. By making a significant number of learner productions freely accessible and easily searchable in a form that is richly annotated with linguistic characteristics and learner error types, the platform will assist teachers, learners, test developers, textbook authors, teacher trainers, and educational policy makers in developing a more comprehensive conceptualization of CEFR levels based on authentic learner data. In the first, methodology-oriented part of this paper, we explain how the learner textual data were collected, re-rated, transcribed, double-checked and prepared for additional manual and automatic processing. We then illustrate the indicators we built to analyze L2 productions. Indicators were derived through (a) linguistic analyses of the performance samples, (b) the operationalization of the CEFR scale descriptors, (c) the study of relevant literature on SLA and language testing, (d) textbook analyses and (e) a questionnaire study. This study allowed us to devise a harmonized annotation schema taking into account both common and language-specific features (e.g., gender/article in German, reflexive possessive pronouns in Czech, pronoun particles in Italian). In the second, application-oriented part, we explain how, by offering a large corpus of freely accessible empirical material, the project helps provide a fine-grained characterization of the CEFR levels and how it serves language teaching and learning. MERLIN thereby aims at responding to the suggestions of the Council of Europe itself, which solicits the development of supplementary tools for illustrating the CEFR levels ( Furthermore, we explain how the platform enables the targeted users to retrieve authentic information about the relationship of the CEFR levels to a wide spectrum of well-defined, user-need-oriented L2 challenges. MERLIN users, such as teacher or learners, can thus compare their students’ or their own performances and get a clearer picture of their strengths and weaknesses. In the third, research-oriented part, we situate MERLIN with regards to two current topics in Second Language Acquisition: validation of CEFR scales and natural language processing for learner language. [This publication reports on work from the MERLIN project, funded by the European Commission (518989-LLP-1-2011-DE-KA2-KA2MP). It only reflects the views of the authors and the Commission cannot be held responsible for any use which may be made of the information contained therein.