Are the Current Listening Comprehension Tests Reliable and Valid

0
398

This study was conducted to examine the construct validity of the three listening comprehension tests; namely, the listening part of TOEFL, Dictation and Dictation-Translation. For this purpose, a comprehensive analysis of listening comprehension was undertaken as the trait under investigation and at the same time the three listening comprehension test types were produced and administered according to their theoretical backgrounds. The TOEFL is based on a general proficiency theory but the dictation is produced according to pragmatic theory in testing. To see whether these tests are valid measures of listening comprehension, the three tests were administered to seventy-four lranian university students who were majoring in English to gather the necessary data. Various statistical operations such as correlation, reliability estimates and t-test were performed on the data. The computations revealed significant differences among the tests. To account for these differences the self-report data which the students were supposed to fill in were examined. The survey indicated that test methods had a sizable effect and factors like memory, concentration, speed, vocabulary, chance, reading trait, and topic were reported to have significant role in the process of test taking to the extent that they totally influenced the subjects performances. Hence, the construct validity of these tests is questioned as they are not specifically dealing with the skills in listening comprehension. The washback validity of these tests are also questioned as the source(s) of the errors the students make can not be detected and as a result no particular instructions can enter into teaching program. The present research suggests that in order to make valid tests, we must base them on the skills involved in the process of listening comprehension. Hence, we would be in a better position to detect students errors and improve the teaching strategies accordingly. Keywords: Validity, Construct validity, washback validity, Listening Comprehension, Dictation Introduction Listening comprehension has received considerable attention in the fields of applied linguistics, psycholinguistics and second language pedagogy during the last two decades (Ur, 1984; Underwood, 1989; Rost, 1990; Alderson, J., & Wall P., 1993 ; Flowerdew, 1994).Results of the large body of research have shown that listening is not a passive process in which the listener simply receives a spoken message, but rather a complex cognitive process, in which the listener constructs the meaning using both linguistic and non-linguistic knowledge. Richards (1983) proposes that the following are the microskills involved in understanding what someone says to us. The listener has to: retain chunks of language in short term memory. discriminate among the distinctive sounds in the language. recognize stress and rythm patterns, tone patterns in tonational contours. recognize reduced forms of words. distinguish word boundaries. recognize typical word order patterns. recognize vocabulary. detect key words, such as those identifying topics and ideas. guess meaning from context. recognize grammatical word classes. recognize basic syntactic patterns. recognize cohesive devices. detect sentence constituents, such as subject, verb, object, prepositions and the like. In testing listening comprehension, the main concern would be to apply a test that would truly represent the individual’s abilities in listening comprehension. If we want to evelop and use language tests appropriately, for the purposes for which they are intended, we must base them on clear definitions of both the abilities we wish to measure and the means by which we observe and measure these abilities (Bachman 1990, p.81). Our lack of knowledge of how listening comprehension works suggests that there is an urgent need for research into the listening process, and the best ways of testing it The three listening comprehension tests that are used in this research include listening part of the TOEFL, dictation, and dictation-translation. Oller (1979) believed that the dictation which is constructed, administered and based on the theory of pragmatic testing meets three stringent construct validity criteria: (a) it satisfies the requirements of a theory; (b) it typically shows strong positive correlations with tasks that meet the same theoretical requirements; (c) the errors generated by dictation procedure correspond to the kind of efforts learners make in real life language uses (Oller, 1979). Winter (2000) maintains that ‘validity’ is not a single, fixed or universal concept, but rather a contingent construct, inescapably grounded in the processes and intentions of particular research methodologies and projects (Winter, 2000). Method Participants The subjects who participated in the experiments were 74 kanian university students at Khorasgan Islamic Azad University. This research was conducted during their eighth term of study at Khorasgan University. They were randomly sampled from a total of 138 to take part in the experiments. Material The listening part of the TOEFL used in this study consisted of 50 items in three sections. In the first section of the test, a short sentence was heard only once. The subjects were asked to read the four choices in their test book and decided which was the closest in meaning to the sentence they had read. In the second section, short conversations between two speakers were heard. At the end of each conversation, a third person asked a question about what was said only once. Then the subjects read the four possible answers in the test book and decided which one was the best answer to the question that had been heard. In the third section of the test short talks and conversations were heard, after each of them some questions were asked. The subjects received one point for each correct answer; hence, the total score for this test was 50. 2. The dictation used in this study was taken from VOA news It was based on Oller’s argumentation on pragmatic tests. It was validated through correlation with the listening part of TOEFL. The subjects were instructed how to take the dictation. Unusual names or expressions from the dictation passage were discarded. The dictation was read three times, first at normal reading pace, second, with pauses at determined points; and third, at normal reading pace again. Students did not write anything at the first reading. They listened and figured out what the whole thing was about. For the second reading, students wrote down what they had heard during each pause; words and phrases were not repeated. The third reading without pauses and at normal speed provided an opportunity for quick proofreading. The material chosen for dictation was unihed. The pauses came between seven or eight words; the structure of a sentence served as a guide. This dictation had 250 words. It was scored based on the model offered by Oller (Oller, 1979, p.282). 3. The dictation-translation used in this research was also taken from VOA (Voice of America) news. It consisted of eight sentences with 250 words. The subjects were supposed to write the meanings of the sentences they heard in their mother tongue. They received 6 to 7 points for each correct translation. The total mark they received was 50. Apparatus Language laboratory and related cassettes. Procedure The listening part of the TOEFL was the first test that the subjects received. The subjects were informed of the type of questions that they were supposed to answer on their answer sheets. The dictation was the second test that the students received a week after the frrst one. Subjects were informed that they would hear the dictation three times. As for the first time, they listened attentively and wrote nothing. They were told that the dictation would be read at a conversational rate. The second time, the dictation was read with pauses. During silent periods the subjects were allowed to write. Finally, the dictation was read again at a conversational rate for proofreading. The process for the administration of dictation-translation was the same as dictation. The only difference was that the subjects were supposed to translate the material into their mother tongue. It should be noted that the subjects were asked to give a detailed description of the process of test taking after each test and provided an account ofthe strategies used and the advantages and disadvantages ofeach test type as far as they were concerned. Results reliability coefficients for the three tests:Â