Models for reality : new approaches to the understanding of educational processes

0
383

Much of the dissatisfaction with existing quantitative explanations in education and the social sciences, arises from an over-simplification of real life. This talk will argue that in recent years new quantitative methodologies have been developed that provide powerful tools for studying social structures and processes. These ’multilevel’ models, which attempt to describe the complexity of the real world, have begun to yield important research insights and to provide a rational basis for the critique of certain contemporary educational policies. The talk will describe the essential features and potentialities of these procedures in a non-technical fashion. The truth about beauty General relativity: a theory too beautiful not to be true Albert Einstein While Einstein seems to have been right, so far, about the truth of general relativity, his justification for it seems less secure. The applicability of general relativity is to be found in empirical evidence, not in mathematical aesthetics. Our emotional feelings about our models of reality may colour our beliefs and the way we act upon those beliefs, and that has often been a useful guide, at least in the natural sciences. Yet such beliefs need to be judged by a careful study of the real world. In this talk I do not wish to argue against elegance, but I do wish to question the notion that elegance, or at least simplicity, equates to truth. In the social sciences, and especially in Education, descriptions which have possessed elegant simplicity have often also been wrong. There is a range of activity that goes from the early exponents of intelligence measurements, through the vast industry associated with most of modern psychometric modelling, through to the simplistic world of educational league tables. I will argue that in order to describe the complex reality that constitutes educational systems we require modelling tools that involve a comparable level of complexity. I also wish to argue that, while we need continually to elaborate our models, we will almost certainly remain a long way from perfect descriptions; the journey is important, even though we may never arrive at our destination. It follows that we should also strive to provide some way of knowing how far we may be from a complete description; in other words we require a measure of our ignorance as well as a description of our knowledge. I am, of course, talking about quantitative models. I will have little to say about non-quantitative explanations and models, yet it does seem to me that one of the reasons for the unfortunate gulf between the exponents of quantitative and non-quantitative educational understandings is that exponents of the latter tend to view the former as simplistic and reductionist. As will become clear, I have a certain sympathy with that view because I believe it has some justification in reality. One of my main purposes, however, is to demonstrate that quantitative models do not need to oversimplify reality in the way that they often do, and I want to suggest that they can begin to provide usefully detailed descriptions of the world, and thus perhaps prepare the ground for a reconciliation of research methodologies. To begin, I shall illustrate my general theme by looking at the way particular models of mental testing have come to dominate certain areas of educational assessment. I will then illustrate some consequences of this by looking at a recent international comparative study of adult literacy. Following that I will describe the work which has occupied most of my own research time for the last 15 years and attempt to show how this work is leading many people to think about education, and indeed many other areas of social and biological science, in new and powerfully constructive ways that begin to capture some of the complexity to which I have referred. Item Response models Item response models are too good not to be true A leading psychometrician Unlike general relativity the evidence for this assertion is decidedly lacking. Psychometrics has enjoyed a highly privileged status within education as a mathematically based discipline which seeks to provide a formal structure for making statements about mental abilities and student achievements. To be sure, it has achieved a certain level of technical sophistication, but on close examination this sophistication resides in the dexterity required to do the computing necessary to obtain decent numerical results from the models used to describe the data. The models themselves, as in the above quotation, have remained at a surprisingly simple level of description; so much so that they stand little chance of adequately representing the complex reality of the real world. Unlike many parts of the physical world, the social world does not lend itself, in my view, to description in terms of simple formulae. Nevertheless, what Stephen Gould once referred to as ’Physics envy’ (Gould, 1981) does seem to motivate many psychometricians, and hence the above quotation. To illustrate what I mean I shall explore a recent important survey of adult ’literacy’ the International Adult Literacy Survey (IALS) (Murray et al., 1998) supported by OECD and carried out in 9 countries and involving an interview with about 3,000 adults in each. Like most major international comparative studies this one was dominated intellectually by psychometric practice from the United States. In the case of IALS its design was actually based upon three major U.S. literacy surveys, which influenced the aims and content. From the outset it was decided that there were three functional literacy proficiency ’domains’: Prose literacy, Document literacy and Quantitative literacy (numeracy) (Murray et al, 1998). For each participant in the study, a proficiency ’score’ for each of these domains was estimated from responses to a set of tests or tasks. These scores then formed the basis of international comparisons. A considerable controversy arose towards the end of the study with one country (France) withdrawing completely after it emerged that it had the lowest scores on all three domains, followed by the EU setting up a project to re-evaluate the results. As with all international comparisons of competence or achievement a fundamental issue is whether it makes much sense to use a single common set of tasks in a variety of very different cultural, educational and social settings. IALS itself addresses this issue in a very limited study comparing the ’difficulties’ of some tasks for their French and English translated versions. One conclusion is that the necessities of translation make tasks more or less difficult in different contexts. Similar findings about the incommensurability of translated materials have been obtained by others (see Goldstein, 1995 for a summary), and the use of common measuring instruments therefore raises the issue of who is advantaged and who is disadvantaged in the process. I shall not go into this issue here, except to remark that it is perhaps the most important one yet to be addressed in the field of international comparisons. My concern, rather, is with the way in which the crude psychometric steamroller squeezes such considerations to the periphery of technical appendices which, I suspect, few will ever read. For each proficiency in IALS there is an ’item response model’, a modern day variant of factor analysis, which makes the really simple assumption that the responses to the constituent tasks are all determined by just one underlying ’factor’ or ability or proficiency call it what you will. Interpreting this literally some 10% of the tasks were excluded on the grounds that they didn’t fit such a model. A proficiency score for each individual was then calculated using a weighted average of the responses (correct/incorrect) to each remaining task. This produces just three numbers for each individual describing their literacy proficiency, with the assumption that these are directly comparable across countries. The psychometric model has formed a procrustean framework that excludes or downweights some components that don’t fit its simplistic assumptions. Nor is this an isolated example; this kind of psychometric reductionism is hugely popular among those devising computerised testing procedures where the use of such models greatly simplifies the whole exercise and creates an appearance of precision and objectivity. Another interesting example of the attraction of such simplistic approaches occurred in the late 1970s when the government’s Assessment of Performance Unit (APU) was concerned to determine whether average standards of achievement were changing over time. For those who are interested the episode has been fully documented by Caroline Gipps and myself in a study of the APU carried out in the early 1980s (Gipps and Goldstein, 1983). The NFER at that time was promoting the so called ’Rasch model’, the simplest of the item response models, as the technique that could provide an answer to this question. After much debate, and also a great deal of technical obfuscation, the APU, albeit reluctantly, accepted that there was no acceptable way of measuring absolute trends over time. To summarise, it turns out that there is no objective way to separate ’real’ changes in student performance from changes in test difficulty. This earlier debate bears a striking similarity to current government proposals, which advocate the achievement of specific targets to be reached at Key Stage 2 over the next few years; one can only hope that those politicians willing to stake their careers on such notions are familiar with recent history. At this point I suppose the obvious question to ask is ’why have such simple-minded models persisted when more complexity can be introduced’? A complete answer to that would, I think, make an interesting historical study, but let me make some interim suggestions which will also lead me on to the main concern of this talk.