NEST – a corpus in the brooding box

0
565

This paper describes the design and compilation of data for the Norwegian-English Student Translation corpus (NEST). Still in the beginning stages, the prospective corpus will contain translations from Norwegian into English produced by students of English at Norwegian colleges and universities. A brief discussion of learner translation corpora is followed by an outline of the principles and procedures applied in the collection of texts, the contributing students, and the source texts for translation. Some samples of data from the collection of student translations are given as an illustration and indication of possible future research applications. 1. The conception of the project Traditionally, translation into the foreign language has been a component of advanced language study programs in many countries (see e.g. Rydning 1994). In Norway, several universities and colleges offer translation into English as part of their portfolio for advanced students. The idea of creating an electronic corpus of student translations was conceived during the teaching of a translation course at the University of Oslo, the aim of which was [1] to improve the students’ proficiency in English, to raise their awareness of important differences between Norwegian and English usage, to present and discuss some central translation problems, and to improve the students’ competence as translators (ENG1102 Translation and Practical Exercises, An Introduction). This paper describes the initial stages of the design and compilation of the Norwegian-English Student Translation corpus (NEST), a project still in progress. Section 2 briefly discusses the place of learner translation corpora in the broader context of learner corpora. In section 3, some considerations relevant to the data collection are outlined, and section 4 contains a description of the present situation with regard to contributing students and collected texts. At the time of writing, the interface to search the corpus is not yet in place. For this reason, any comprehensive corpus linguistic analysis of the data gathered so far, as well as any meaningful comparison with data from other corpora, is barely feasible at this stage. However, as an illustration, section 5 presents some samples of the raw data in the present corpus material 2. Why build a NEST? Easy access to electronic resources during the past two decades has greatly facilitated research into learner language. Most learner corpora are “collections of texts produced by foreign or second language learners” (Granger 2004: 124), compiled with the purpose of analyzing language learners’ interlanguage, often within the framework of contrastive interlanguage analysis or computer-aided error analysis (Granger 2002: 12). Learner translation corpora are “multiple translation corpora […] containing translations done by trainees rather than professional translators” (Castagnoli 2008: 36). As a subcategory of learner corpora, learner translation corpora can serve various functions: they can provide a useful pedagogical resource for teachers and students involved in a translation course, enabling the tracking of student progress and identification of individual or collective problems both of a linguistic and a translation-related nature (see e.g. Bowker & Bennison 2003); they can provide supplementary research data to already existing learner corpora, or to translation corpora of texts written by professional translators; and they can be seen as a window into the process of translation, allowing researchers to uncover specific features of various translation types, such as the translation of language for specific NEST – a corpus in the brooding box. Graedler, A-L. http://www.helsinki.fi/varieng/series/volumes/13/graedler/ 2 av 11 purposes (e.g. the ongoing compilation based on the Norwegian national translator’s exam [Translatoreksamen], TK-NHH Translatorkorpus; for a survey of learner translation corpora, see Castagnoli 2008: 37–42). The primary aim of the corpus project NEST, described in this paper, is to provide supplementary data for researchers interested in learner language at an advanced level, enabling a comparison of the relatively free output of the type found in corpora such as the Norwegian component of the International Corpus of Learner English (argumentative essays) with learner output produced under the more constrained conditions of a translation task (cf. Kobayashi & Rinnert 1992). As a multiple translation corpus, NEST will also provide data for research on variation and choice in learner translation (cf. research on the Multiple Translations project, in Johansson 2007: 197– 198). 3. Feathering the NEST: Corpus compilation and design After having been put on ice for a few years, the NEST project idea was resumed in 2008, and the project was reported to the Norwegian Social Science Data Services. Requests for cooperation were sent to all Norwegian institutions of higher education that offer programmes in English, and several teachers signaled interest in the project. [2] From the fall semester of 2008, students taking part in translation courses at the University of Oslo, Sogn og Fjordane University College and the University of Tromso began contributing texts to NEST. 3.1 Collection criteria As one of the aims of NEST is to provide a supplement to existing learner corpora, the collection criteria need to be comparable if not fully consistent with those of other corpora in some significant respects. Most learner corpora enable the correlation of linguistic data with potentially relevant extralinguistic factors, and NEST contains some information about the contributing students’ background, in addition to the translated texts. All the student translators submit a simple questionnaire with information about the following variables: [3]  Sex  Year of birth  Nationality  Period of residence in Norway (non-Norwegian citizens)  General language background: o First language (L1) o Home language, if different from L1 o Main language of instruction during primary and secondary school o Competence in other foreign languages besides English (level of proficiency: Excellent/Good/Fair/Poor)  General study background: o No. of years of university level studies o Type of educational institution (university or college) o Which type of education/degree they are aiming at (BA, MA, Teacher Education, other)  Background in English: o No. of years of English in secondary school o Final grade in English at secondary school (written + oral) o No. of semesters of university level English studies o No. of completed ECTS credits in English NEST – a corpus in the brooding box. Graedler, A-L. http://www.helsinki.fi/varieng/series/volumes/13/graedler/ 3 av 11 o Preferred standard variety of written English (British English or American English)  Time spent in an English-speaking environment (length of stay, where and when)  Background in translation: o No. of semesters of university level translation studies o No. of completed ECTS credits in translation o Work experience as a translator (nature of the work, languages involved, for how long) In addition, the possibility of tracking the progress of individuals or groups of learners is appealing from the point of view of translation teachers. To cater to this idea, which requires access to longitudinal data, the original intention was that all students contributing to NEST would translate a set text at the beginning and again at the end of a teaching term, along with any other texts produced in the way of ordinary translation course work. The set text was a magazine article written for a general readership, and was intended to capture some common linguistic challenges faced by Norwegian students of English. Unfortunately, convincing students and teachers to devote time to the administration and translation of texts beyond the requirements of their courses proved a difficult task. Lacking the resources to offer any kind of compensation, the idea of a common translation for all contributing students was therefore unfortunately abandoned. Differences in production constraints, such as the time allocated to the task, whether or not the students had access to dictionaries and other translation aids, and the effect of teaching which targets specific translation problems, can all be said to be relevant variables that ideally should be either kept constant, or at least be openly available for consideration in an analysis of the data. However, since the corpus material is being gathered from several different student groups and educational institutions, consistent information regarding these variables has proved difficult to get hold of. Apart from the questionnaire and the set text, no qualitative requirements are thus imposed on the inclusion of texts in the corpus. Rather, priority has been given to collecting a substantial number of texts, from which subsets may be extracted if desirable. Some other unforeseen problems presented themselves at the outset of the data collection. Firstly, translation into the foreign language has recently been removed from many of the higher level English programs at Norwegian colleges and universities, thus providing relatively few sources from which texts could be harvested. Secondly, even having students (or their teachers) submit a copy to the project of the texts they were producing as part of their course requirements turned out to be a problem, and after six months with meagre results, the project was almost buried. However, after the initial setbacks, thanks to the cooperation of a few interested teachers, the prospective corpus contains data from more than 100 students, amounting to around 120,000 words. [4] The collection of texts will continue until December 2011.