Research - Wynne Harlen
Wynne Harlen, OBE, PhD,
ASE President for the Year 2009.
She has been a science educator and researcher and member of the ASE throughout her working life. She was Sydney Jones Professor of Science Education at Liverpool University 1985-1990 and then Director of the Scottish Council for Research in Education until 1999. She now has an honorary position as Visiting Professor at the University of Bristol although mainly working from her home in Scotland as a consultant to various UK and international projects. Her publications include 25 research reports, over 150 journal articles, contributions to 37 books and 27 books of which she is author or co-author.
2. A Classical Start
3. Research or evaluation?
4. Research or monitoring?
5. Research or development?
6. Research as learning?
7. Research as a profession
8. Not all research is good research
9. Is reviewing research research?
10. Finally ...
Reflections on a personal journey in research
Wynne Harlen OBE.
Visiting Professor at the University of Bristol and Educational consultant
Most of my 51 years of activity in science education have been spent in some kind of research. This is an account of some of those activities - by no means all, for I have made no mention of international work which has involved conducting studies in a dozen or so countries outside the UK. It picks out a few key projects and studies which have been significant stages of my journey so far and along the way poses some questions about the concept of research and how activities such as evaluation, monitoring, reviewing and scholarship are related to it. No answers are offered to these questions but the real examples may cause others to reflect on the nature of the activities we bundle into the term ‘research'.
My first piece of research was perhaps the only one where I had complete freedom to decide the topic. Thereafter it became more and more the case that in order to obtain funding for research the topic had to be one within an area determined by the funding body or, in the case of evaluation of innovations, quite narrowly defined to answer specific questions. That first research, however, was not funded by anyone else; it was conducted for a master's degree in education. At the time I was a science lecturer in a teacher education college (which, like most, has now become part of a university). It was the 1960s, when Piaget's descriptions of children's ideas about the world around were causing great excitement. Piaget's work was with children who had no formal education in science but I wanted to know whether and if so, how, experience of some science activities would change children's ideas. In those pre-pre-national curriculum - and almost pre-primary science days - it was not difficult to find schools where the children were untouched by science. It is worth recounting briefly the story of my first steps in research because it provides specific examples of several features of small scale research and raises a number of questions about methodology.
In line with the dominating preference in the 1960s for a ‘scientific' approach to educational research, an experimental design was chosen. Classes involved in the study were divided randomly into two halves, one to be the ‘experimental' group and one to be the ‘control', with measures of science concepts being applied both before and after a period of science activities. In this typical controlled design the main decisions about methods concerned the measures to be used as evidence of effect and the science experiences in the pre- and post- treatment assessment. A review of the literature (actually a summary - see later) on what had been done before in researching the effect of science activities on primary school children found very little ‘hard' evidence and the obvious way to fill that gap was to use tests so that results could be quantified. I devised tests specifically for this work, the ideas for the items originating mainly from Piaget's work. Each item was a short practical problem, chosen so that success depended on a certain level of concept development. Two complete forms of the test were created so that one could be used as the pre-test and the other as the post-test. Following correct procedures for this type of methodology, the test items were carefully trialled to obtain information to ensure sufficient reliability and validity.
Administration of the 46 practical items to individual children was helped by using trainee teachers, who learned a great deal about children's ideas during this face-to-face experience. From the results (which, incidentally at this pre-computer time, were analysed using edge-punched cards, a knitting needle, and a hand-turned calculating machine) two parallel tests were selected. Both the control and experimental halves of the class in one school were tested before a series of science activities which I provided for the experimental half of the class on five consecutive Monday afternoons. Meanwhile the control half was divided into small groups, each working with a teacher trainee. Then both groups were again tested. Complete results were obtained for 34 pupils (17 in each group).
The first step in the analysis of an experiment of this kind is the test of the ‘null hypothesis'. This is a statement that there is no significant difference between the experimental and control groups. In line with the Popperian model of scientific ideas (Popper 1968) hypotheses are never accepted and so the reverse of what is expected is formulated as a null hypothesis. In this double negative world of null hypothesis testing, only if the null hypothesis is proved to be incorrect can there be any claim of a real difference between the groups, and an alternative hypothesis can be suggested to explain the difference and then tested. When I carried out the statistical test of the null hypothesis of the results for my study, the laborious calculations showed that the null hypothesis could not be rejected, that is, there was no statistically significant difference between the groups either at the start or end.
A second experiment was set up, with older primary children and a longer period of science activities (12 hours). This time, as the children were more able to write, I modified the test for group administration (with each event being demonstrated) and used the same form for pre- and post-test. Again the evidence did not allow the null hypothesis to be rejected, leading to the conclusions that there was no significant difference between the groups associated with the difference in treatment. There were some interesting gender differences in the results, but otherwise I was left having to explain a ‘no difference' result.
Such a result is not uncommon in small scale research and raises a number of questions about this design and methodology in small scale research. For example, the outcome of the statistical tests of difference depends on the size of the sample as well as the size of any difference between two groups being compared. So the same difference between groups that is not large enough to be statistically significant with only 17 in each group might have been sufficient to be called significant with 170 in each group. When numbers are in the thousands, even small differences become statistically significant. But does this make them more educationally significant; that is, do the very small differences that can be detected in very large samples have any real meaning?
Several factors could have accounted for the ‘no significance' result in my study. One obvious one is the sensitivity of the tests: might there have been changes that were not detected? Another is the duration of the science work: is it reasonable to expect a change after a few weeks, especially when the activities were provided by a visitor to the classroom on one afternoon a week? Associated with this was the nature of the activities. Although the intention had been to base this on some topic of interest and familiarity to the children the necessary discussion with the class teacher had not taken place. Then there was the difference between the experiences of experimental and trial classes during the ‘treatment' weeks. This had to be different, but not so different that one could say the result was a foregone conclusion because children could not learn what they had not been taught.
All these points have general application to experimental designs. Trying to avoid them by controlling more of the variables leads to tighter and tighter specification of treatment and conditions which in some cases, means that the treatment is so different from normal classroom conditions that the relevance of the results becomes questionable. Not all educational research has to be directly relevant to the classroom although it can be argued that it should be capable of leading to improvement in students' learning, even if this is through curriculum or policy changes that have a delayed impact. However, the problems of interpreting results from experimental designs has also led in a different direction, away from tight control and measurement of outcomes to collecting a range of different kinds of information from and about those involved in either changed or existing situations. Testing null hypotheses changed to answering research questions.
Following my master's research I became for many years involved in the evaluation of curriculum development. The 1960s and early 1970s were the heyday of large scale curriculum development, particularly in science, but also in mathematics, reading and the humanities. I began by evaluating the Nuffield Junior Science Project in 1966 and in 1967 became the evaluator in the Science 5/13 team for six years. Curriculum evaluation was a new activity. How it was defined and how it related to research evolved through practice, mainly in the UK and the USA. In the UK the evaluation was conceived as a form of research in which the problem under study was determined for the evaluator (MacDonald 1976), whereas in research the problem could be chosen by the researcher - although as noted earlier this choice is more restricted in some funded research. The freedom of the evaluator was further constrained by the purpose of the evaluation - formative or summative. In the US the meaning of evaluation has been associated with the assessment of students. This was a consequence of conceiving evaluation as ‘the process of determining to what extent the educational objectives are actually being realised by the programme of curriculum and instruction' (Tyler 1949: 105). Tyler goes on to say that since the objectives are the intended changes in students, then evaluation is ‘the process for determining the degree to which these changes in behaviour are actually taking place'. However, this narrow interpretation was challenged in relation to formative evaluation where it was recognised that ‘much of the work of formative evaluation must focus on the study of how the programme functions when it is tried out' (Lindvall and Cox 1970:4) as well as on whether the programme achieves its desired goals.
One of my first tasks as evaluator of Science 5/13 was to help in clarifying the project's objectives. It may seem surprising that what the project aimed for was not already specified, but this was the first large scale attempt to introduce science into the primary school and it was necessary to see what was possible rather than setting what may be unrealistic goals. More generally, the value of formulating goals and objectives was a matter of debate at that time (e.g. Atkin 1968, Eisner 1969, Popham 1969). The Science 5/13 team worked closely with teachers, frequently visiting classes, sometimes trying out and developing ideas with children, often meeting groups of teachers to discuss the help they would like with science and the possible form this might take. Ideas about the objectives for children's learning emerged from this and, of course, from study of the literature and existing practice (resulting, for instance, from the preceding Nuffield project).
The evaluation throughout had a formative role; it was concerned to provide information to help in production of materials for teachers which took the form of units of work on various topics such as ‘Toys', ‘Working with Wood', ‘Change', ‘Ourselves'. The units were produced and trialled in batches. I began on the evaluation of the first set of four units by using an experimental methodology but, learning from previous experience, added the systematic collection of other data as well as data about change in children's knowledge and understanding of the science in the units. So, in addition to pre- and post-testing of trial and control groups, data were collected by questionnaire from teachers, by classroom observation and by discussion with children. Putting all this together in a cluster analysis - by that time using a computer program (albeit one that had to run overnight in an enormous machine that had to be fed by data punched on cards) - revealed patterns of responses or conditions that enabled the identification of sets of circumstances which were associated with the more successful and less successful use of the units.
As to the results of the tests which, as previously, had been expensive and time-consuming to produce, trial and administer, I wrote in a report on the evaluation:
It has to be admitted that the value of the test results in revising the units was not very great. The testing had been useful in indicating overall achievement of stated objectives and had thus served a useful purpose. As a guide to making revisions to the units, however, the test results could only point to certain sections of the units where there had been little progress towards achieving the objects, but they could not suggest why there had been no progress or indicate ways in which these sections might be changed.
The really useful information for revising units came from the analysis of teachers' report forms and questionnaires, from the classroom observations and from talking with pupils. Being able to identify groups of responses or items of information occurring together brought an order to these data which might otherwise have remained difficult to interpret. The cluster analysis, rather like a factor analysis, identified the main dimensions in the data and the items of information that defined these dimensions. It was then possible to see which circumstances and aspects of the units were associated with satisfaction or dissatisfaction.
Due to the length of the project and production of units spread over time I was in the fortunate position of being able to learn from the first trials about the evaluation instruments themselves. There were four sets of trials and from one to the next the questionnaires and observation schedules were revised and streamlined. By the time of the second set of trails it was clear that, although the information from tests was not without value, it did not repay the effort in producing the tests compared with the information from other sources. Thus in the third and fourth trials more attention was given to obtaining information about how the materials were being used in the classroom. Among the items most highly weighted in the key groups identified in earlier trials were those relating to contacts between teachers and pupils and information coming directly from talking to the pupils about their work.
So there was a significant shift in the focus of the evaluation and in my own conception of the process of curriculum evaluation. Throughout the various trials the results underlined the value of gathering and combining information about different aspects of the use and effect of the units. Information from any one source was inadequate on its own; each became useful when supported by evidence from other sources. The change in methodology and the reasons for it provided by this 6-year long evaluation became the focus of my PhD dissertation. I faced opposition from my supervisor, who was in thrall to the experimental model, but encouraged by the work of fellow evaluators who were creating new evaluation methodology, such as the ‘Illuminative approach' of Parlett and Hamilton (1976).
Although this is a lesson from my past it is one that is still very much relevant to the present. Forty years on I am now involved in formative evaluation of new science curricula in several countries, where IBSE (Inquiry Based Science Education) is being fervently embraced. The advice I give is not to fall for what seems the logical step of seeing whether the students' learning has changed but to find out more about whether, and if so how, they are experiencing the activities and thinking that is likely to lead to the intended learning. Only then, at the stage of full implementation is it worth assessing outcomes. Another way of putting this is that, as scientists, we should ensure that the independent variable is in place before measuring the dependent variable.
Skipping over the next four years, which I spent using the experience from Science 5/13 evaluation to devise professional development materials of teachers (Harlen et al 1978), I joined a team led by Paul Black at what is now King's College London (then Chelsea College Centre for Science and Maths Education) to work on the Assessment of Performance Unit (APU). The work involved deciding what to assess (this still being the pre-national curriculum era) and creating a bank of test items, including practical tasks, to assess the identified concepts and processes. Samples of items from the bank were administered annually to random samples of pupils, together with questionnaires for pupils and their teachers. The project was identified as one of ‘monitoring', raising another question about the definition and boundaries of research.
If we take the view that: ‘Educational researchers aim to extend knowledge and understanding in areas of educational activity and from all perspectives including learners, educators, policymakers and the public' (BERA 2004: 3) then APU-style monitoring would be included within this activity. It certainly provided knowledge through surveys of samples of pupils at ages 11 (in mathematics, English language and science), at age 13 (science and foreign languages) and at age 15 (in mathematics, English language, science and design and technology). Surveys in mathematics were conducted annually from 1978 to 1982 and then again in 1987. In English they took place from 1979 to 1982 and in 1988 and in science from 1980 to 1984. In the surveys small but representative samples of the age group took different tests, which were then combined to provide the overall picture. The range of items included was extensive and covered the curriculum far more validly than a single test taken by all pupils (as in using national tests to monitor trends from the mid-1990s).
The life-time of the APU science surveys was too short to do more than establish a good base-line; it was terminated when the decision was taken to use national tests to monitor pupils' achievement over time. Overall levels of performance were stable over the five years of the surveys in science, but there was more interest in differences in performance for sub-categories and for groups of items within them. Six main categories were assessed, including three involving children using equipment and manipulating real objects. Performance in the category ‘Using graphical and symbolic representation' was higher than in other categories and within this category performance fell from using tables, through using bar charts to using graphs and grids (DES, Welsh Office and DENI 1988). Gender differences were particularly illuminating and informative for classroom practice. Difference for individual items related to content and form (girls being more willing to write extended answers) but persisted at the main category level only for ‘Applying science concepts' (boys higher) and ‘Planning investigations' (girls higher). Boys were also rated at a higher level for willingness to undertake investigations. The science team made sure that useful results were made available to teachers by publishing short reports for teachers as well as the longer technical reports for the DES (as it was then).
In relation to extending understanding, the APU was the spring board for further work on children's own ideas in science (already begun by researchers in New Zealand, the USA and the UK), for patterns in pupils' answers revealed fascinating evidence of their thinking and reasoning. APU questions requiring application of concepts provided evidence of pupils' misconceptions and directly fed into the Children's Learning in Science (CLIS) project led by Rosalind Driver. Work on primary pupils' ideas followed when Paul Black and I were successful in obtaining funding from the Nuffield Foundation for the Science Processes and Concepts Exploration (SPACE) project.
Rather than use data from test items or special tasks, the SPACE team set out to find out the ideas the children were using in the course of normal work. To come close enough to children for this to happen required collaboration with teachers, who thus became members of the extended research team. Groups of teachers worked with members of the team on particular topics, but before embarking on work with children there was a two-day workshop for teachers on each topic. This gave teachers a chance to clarify for themselves their own ideas about the topic, for in some cases the teachers shared the ‘misconceptions' of children. Teachers were able to explore and investigate the topics using the equipment and materials they would use with the children. The teachers' experience of helping the research by collecting children's ideas created considerable excitement. Paying attention to children's own ideas opened new relationships with learners and access to children's thinking that many had no idea existed or was important. After the initial elicitation of ideas there followed a period of time for activities designed to help children to develop their ideas. For the purposes of the research a stratified random sample of children from the classes involved was interviewed by researchers before and after classroom intervention, whilst the teachers helped in collecting ideas less formally from all the children.
In a separate project at the same time, the attempt to improve practice in primary science focused on ‘the use of process skills by children in developing their scientific ideas about the world' (Schilling et al 1990: ix). This was the Science Teachers' Action Research (STAR) project (co-directed by Maurice Galton and me) which, as in the case of SPACE, worked closely with teachers as co-researchers and was intended not just to investigate practice but to improve it. It was clear from earlier curriculum change projects that the impact of projects developed at a distance from teachers was limited by the ‘willingness of teachers to commit themselves to changes which they themselves did not help to create.'(Schilling et al 1990, vii). Indeed the creation of change in any aspect of education is bedevilled by this problem of creating, on a large scale, ownership by the teachers who have to make the changes (eg Gardner et al 2008). So in STAR we involved teachers in first finding out what was going on when their pupils were engaged in science activities, using instruments devised by the research teams based at the Universities of Liverpool and Leicester. These instruments indicated what to look for as indicators of pupils using process skills (or inquiry skills as we would now call them), what questions to ask to elicit use of these skills and the kinds of opportunities what would encourage their use. The second stage was then for teachers to decide on and implement changes after which the assessment materials were used again to investigate impact of the changes.
Involvement in these projects was, for some teachers, an ideal way of becoming familiar with research. They could observe and participate in research procedures without the responsibility of carrying through a complex programme. It also provided the incentive for teachers to undertake their own research, sometimes for a higher degree. Whilst individual research can ‘extend knowledge and understanding' of teachers' own practice, if it is to add to the wider knowledge and understanding of education it needs to take place within a learning community. The analogy is with learning at any age and stage, where learners develop their ideas not only through individual exploration but through communication of their ideas to others. Interaction with others can lead to reformulation of ideas in response to the meaning that others give to experiences. The thinking of each individual is stimulated by peer evaluation. In the context of a research team, interaction and peer evaluation are built in, but this combined thinking still needs to be shared with the wider community through presentation at conferences, discussion in seminars, writing papers and less formal communications such as blogging. For lone researchers these activities are vital and should take place not only at the end of a research project but throughout. Stenhouse's definition of research, as ‘systematic inquiry made public' (1975: 142), should perhaps be interpreted as requiring communication during the process as well as of the product.
The SPACE and STAR projects were only two of the studies in primary science during the time when I was professor of science education at the University of Liverpool (1985-90). This was a time of great change in education following the Education Reform Act of 1988 which heralded the creation of the national curriculum and national testing. Science, as part of the curriculum core, was given close attention particularly in relation to assessment and to providing professional development for primary teachers since primary schools were, for the first time, all required to teach science. The need to make changes quickly had two consequences; first a focus on developing materials to support classroom activities and professional development of teachers; second to reduce the time-scale for any research that was associated with the changes. For almost the next decade, long-term research projects, such as the APU, SPACE and STAR, became rare (and not just in science).
My motivation for moving from Liverpool to be director of the Scottish Council for Research in Education (SCRE) was in part the hope of giving more time to research. What it meant in practice was giving more time to research-related activities, such as writing proposals for funding, rather than the basic work of ‘adding to knowledge and understanding'. A research organisation such as SCRE has to earn its keep - or most of it - through conducting funded research. The tendency, during the 1990s for research contracts to become shorter and shorter put pressure on senior staff to find new contracts to ensure continued employment for researchers. This meant responding whenever possible to calls for tender. Even with a reasonably large research staff matching to interests was often not possible and some researchers had to become ‘jacks of all trades'. The problem of short-termism is not only one for independent research institutions but also for universities, although the ability to move staff between teaching and research helps to ease personnel management. Rushing from one contract to another also means that potentially valuable opportunities to add value to a project by deeper reflection on how its findings are enhanced by those from other projects, remains as unfulfilled aspirations. The underlying problems caused by short-term contracts have to some extent been alleviated by funders such as the ESRC supporting extended programmes of research around a theme, but at the same time this reduces funding for research not in the selected areas.
It may not be a coincidence that at the same time as educational research was suffering from proliferation of short uncoordinated projects, the quality of much educational research was attracting severe criticism from Hargreaves (1996, 1998) and the Commission on the Social Sciences (2003). Although not explicit, the criticisms no doubt referred to studies of specific innovations employing limited procedures for gathering evidence (often questionnaires only), using opportunistic samples, with low return rate, which add little to the understanding of the particular events or to wider issues or to research methodology. Hargreaves particularly took issue with the non-cumulative nature of the corpus of educational research:
A few small-scale investigations of an issue which are never followed up inevitably produce inconclusive and contestable findings of little practical value to anybody. Replications, which are more necessary in the social than the natural science because of the importance of contextual and cultural variations, are astonishingly rare.
(Hargreaves, 1998, p 126)
It is easy for such research to be dismissed as unsound as a basis for decision-making either by policy-makers or practitioners. But not all educational research is of this kind and nor can all educational research meet the theoretical requirements needed to make findings and conclusions dependable. Thus it is important for potential users to know what weight can be given to research evidence.
Some view of quality in research is necessary, some criteria that can be used by researchers for self-evaluation and by users of research findings in judging how dependable the results and conclusions are. Educational research is not alone, of course, in attracting criticism of poor quality. One only has to read Ben Goldacre's column in the Guardian (‘Bad Science') to realise this, although often his target is the way research results are reported by the media. As president of BERA, 1993/4, I failed in my attempt to engage fellow researchers in a debate about quality. There was resistance to the whole idea that research was anything other than good, based on a defensive suspicion that the outcome would be used to downgrade some methodologies. I am pleased to see that, although there are strongly held and different views about the criteria for assessing education research (Hammersley 2008, Furlong and Oancea 2005), the debate is now taking place.
In the matter of using educational research for informing policy and practice, the issue is not so much the quality as the relevance and use of an appropriate methodology, for a study may provide good evidence for addressing one question while having much less weight in relation to a slightly different one. There is an important role for research reviews and syntheses in bringing together the best evidence relating to a particular issue from apparently unrelated studies. Even if individual studies are regarded as too small in scale or have other deficiencies, if several are providing evidence in agreement with each other, the combined message is more dependable than the findings of any one.
After leaving SCRE and full-time employment I undertook four extensive reviews of research on various aspects of assessment using the procedures for the systematic reviewing of educational research developed by the EPPI-Centre. Similar systematic reviews of science education have been conducted by a group based at the University of York, where seven reviews have been completed on topics including the impact of ICT, STS and small-group discussion (http://eppi.ioe.ac.uk/cms/Default.aspx?tabid=444&language=en-US). A systematic research synthesis differs from a summary of research by attempting to bring together the findings from several studies rather than treating them each as separate. It also differs from a narrative or conventional review in respect of conducting and recording searches exhaustively, and then having two researchers to independently screen studies for relevance of focus and methodology, to extract data and to judge the weight of the evidence each provides for addressing the review question. There remain many areas of difficulty in combining results from studies using very different methodologies (both qualitative and quantitative). The heterogeneity of research in education, and in social science research in general, presents problems in relation to synthesis. Very rarely do studies address exactly the same question; replications, common in medical and scientific research, are rare, as pointed out by Hargreaves, quoted earlier.
Looking to the future and reflecting on the past it is hardly possible to say that we need ‘more of this' and ‘less of that' kind of research thinking either of topic or methodology. We will continue to need research of all kinds - relating to policy, classroom (real or virtual) practice, the process of learning at macroscopic and microscopic (brain cell) levels, and assessment. Some of this will be best tackled by large teams, some by individuals; more, I hope, will be long-term and international. What seems more certain, however, is that we need high quality research and for education to become a research literate profession. As with other ‘literacies' this means having a general understanding of the nature of research activity, knowing its strengths and limitations and being able to distinguish good from not-so-good research. This would enable all members of the profession to use research selectively and effectively, as indicated in the concept of ‘scholarship'. Reviewing has a role in creating awareness of the varied nature of educational research, in recognising its limitations and thus in developing ‘research literacy'. As such it can improve research and the peer-review process. It deserves a key position in the training of educational researchers.
Atkin, J. M. (1968) Behavioural objectives in curriculum design: a cautionary note, The Science Teacher, 35 (5)
BERA (2004) Revised Ethical Guidelines for Educational Research. Southwell: British Educational Research Association.
Commission on the Social Sciences (2003) Great Expectations: The Social Sciences in Britain. London: Commission on the Social Sciences.
Eisner, E.W. (1969) Instructional and expressive educational objectives: their formulation and use in curriculum, in AERA Monograph 3, Instructional Objectives. Chicago: Rand McNally
Furlong, J. and Oancea, A. (2005) Assessing Quality in Applied and Practice-Based Educational Research: a Framework for Discussion. Oxford Department of Educational Studies www.bera.ac.uk/pdfs/Qualitycriteria.pdf
Gardner, J., Harlen, W., Hayward, L. and Stobart, G. (2008) Changing Assessment Practice. www.assessment-reform-group.org
Hammersley, M. (2008) Troubling criteria: a critical commentary on Furlong and Oancea's framework for assessing educational research, British Educational Research Journal, 34 (6) 747-762
Hargreaves, D. (1996) Teaching as a research-based profession: possibilities and prospects. Teacher Training Agency Annual Lecture. London: TTA
Hargreaves, D (1998) A new partnership of stakeholders and a national strategy for research in education. In (Eds) J. Rudduck and D. McIntyre Challenges for Educational Research. London: Paul Chapman Publishing, 114 - 138
Harlen, W. (1975) Science 5-13: a Formative Evaluation. London: Macmillan Education.
Harlen, W. Darwin, A., and Murphy, M. (1977) Match and Mismatch: Raising Questions. Edinburgh: Oliver and Boyd
Lindvall, C.M. and Cox, R.C. (1970) AERA Monograph 5. The IPI Evaluation Program. Chicago: Rand McNally
Macdonald, B. (1976) Evaluation and the control of education, in (ed) D.A. Tawney, Curriculum Evaluation Today: Trends and Implications. London: Macmillan Education
Parlett, M. and Hamilton, D. (1976) Evaluation as illumination, in (ed) D.A. Tawney, Curriculum Evaluation Today: Trends and Implications. London: Macmillan Education
Popham, W. J. ((1969) Objectives and instruction, in AERA Monograph 3, Instructional Objectives. Chicago: Rand McNally
Popper, K.R. (1968) The Logic of Scientific Discovery. London: Hutchinson
Schilling, M., Hargreaves., Harlen, W. with Russell, T. (1990) Assessing Science in the Primary Classroom. Written Tasks. London: Paul Chapman Publishing
Stenhouse, L. (1975) An Introduction to Curriculum Research and Development. London: Heinemann
Tyler, R.W. (1949) Basic Principles of Curriculum and Instruction. Chicago: University of Chicago Press
(2003) (with Macro, Reed and Schilling) Making Progress in Primary Science 2nd Edition. Handbook for Inservice and Preservice Course Leaders. London: Routledge Falmer
(2003) (With Macro, Reed and Schilling) Making Progress in Primary Science Study Book. London: Routledge Falmer
(2003) Enhancing Inquiry through Formative Assessment. San Francisco: Exploratorium
(2004) (with Qualter) The Teaching of Science in Primary Schools. 4th Edition. London: David Fulton. 5th edition to be published in 2009
(2006) Teaching, Learning and Assessing Science 5-12. 4th Edition London: Sage
(2006) (Editor) ASE Guide to Primary Science Education. Hatfield: The Association for Science Education
(2006) (with Allende) Report of the Working Group on International Collaboration in the Evaluation of Inquiry-Based Science Education (IBSE) Programs. Santiago, Chile: Fundación para Estudios Biomédicos Avanzados do la Facultad de Medicina.
(2007) Assessment of learning. London: Sage
(2008) (Editor) Student Assessment and Testing. 4 volumes. London: Sage
Recent journal papers
Harlen, W. (2003) Ability grouping in schools; does it matter? The Psychology of Education Review 27 (1) 10-11
Harlen, W. (2003) The inequitable impacts of high stakes testing. Education Review 17 (1) autumn, 43 - 50
Harlen, W. and Deakin Crick, R. E. (2003) Testing and motivation for learning Assessment in Education. Vol 10 (2) 169 - 207
Harlen, W. (2003) Question of understanding: AfL defined. Curriculum Briefing 2 (1) 9 - 11
Harlen, W. and Doubler, S. (2004) Can teachers learn through enquiry on-line? Studying professional development in science delivered on -line and on-campus. International Journal of Science Education 26 (10) 1247-1267
Harlen, W. and Deakin Crick, R. E (2004) Opportunities challenges for using systematic reviews of research for evidence-based policy in education Evaluation and Research in Education, 18, No1&2 54-71
Harlen, W. (2005) Trusting teachers' judgment: research evidence of the reliability and validity of teachers' assessment used for summative purposes. Research Papers in Education vol 20 (3) 245-270
Harlen, W. (2005) Teachers' summative practices and assessment for learning - tensions and synergies The Curriculum Journal, 16 (2) 207-223
Harlen, W. (2007) Criteria for evaluating systems for student assessment Studies in Educational Evaluation 33(1) 15-28
Harlen, W. (2008) Constructivism, inquiry and the formative use of assessment in primary school science education, Didaktikens Forum, 5 (1) 99-115
Harlen, W. (2009) Improving assessment of learning and for learning, Education 3-13
Published: 05 Feb 2009