The Association for Science Education

P3.2 AFL Assessment Ideas



This article explores the principles, beliefs and practices associated with assessment in educational settings. Assessment helps teachers monitor pupil learning and this can help the teacher see how effective their teaching has been and , more importantly, helps them work out what the next steps in learning need to be for individuals and groups of children. Assessment is not an “add-on”but an integral part of teaching and learning that unites and drives the teaching-learning process. Assessment lies at the heart of good teaching and learning. This is recognised in the 2008 Primary Framework: “Day-to-day assessment is a natural, integral and essential part of effective learning and teaching. Teachers and children continually reflect on how learning is progress in, see where improvements can be made and identify the next steps to take.”

Standards that are particularly relevant to this section are Q14 and Q24-28.

Keywords: Formative assessment, Summative assessment, Reliability, Validity, Bias


1.0 Introduction
2.0 Purpose of Assessment
3.0 History
4.0 Formative Assessment
5.0 Formative v Summative
6.0 Technical Issues - Reliability, Validity and Bias
7.0 Making Use of Assessment Data
8.0 References

1.0 Introduction

Assessment is a social process - as teachers we come to the classroom with a view about what quality means in terms of science education and our role is to both convey these ideas of quaiity to our pupils, while at the same time, discerning how well they are performing in line with our expectations.

2.0 Purposes of Assessment

The question of why do we assess is vitally important. Our assessment systems provides information for :

  • Support of Learning (formative)
  • Certification, Progress & Transfer
  • Accountability

For the classroom teacher, it requires reaching a balance between the formative and summative purposes so that you can both support and report on learning.

3.0 History

There has not been a great change in what is assessed in school science for the last 50 years but there have been four radical technical changes in the examination systems.

Assessing subjects separately
In 1951 the General Certificate of Education changed the system by requiring each school subject to be assessed separately. Prior to this, the School Certificate was an amalgamation of tests in different subjects, just as the International Baccalaureat is in schools today.

Introduction of coursework 
This resulted from the Certificate of Secondary Education (CSE) qualifications of the 1960s and 1970s and continued into the General Certificate of Secondary Education )GCSE) qualifications that were introduced in 1986.

Modular Syllabuses
These occur in both GCSE and A level qualifications and raise the question of the relationship of performance in a modular examination to that achieved in a terminal examination as well as concerns over fragmentation of the curriculum.

Vocational Qualifications
The introduction of work-based and work-related qualifications challenged the system to consider issues of comparability and equity.

All of the above have to be considered within the context of the changes in the educational system over this time. When GCE was introduced at the start of the 1950s, the external examination system provided for less than one-fifth of pupils in that year group. Today, the examination system provides for all 16 year olds in the UK. A second major influence has been the rise in high stake testing, particularly in England and Wales since the introduction of the National Curriculum in 1998. This has led to children being tested more often and the effect of this has been to demotivate and stress many learners. It has also encouraged "teaching to the test", which has narrowed the range of teaching strategies that teachers use. While there have been some respite by reduction in national testing, many of the effects of this remain within the education system.

For more information:

4.0: Formative Assessment (Assessment for Learning)

A review of the research literature by Paul Black and Dylan Wiliam in 1998 established that there is strong evidence that standards of learner's performance can be raised by improvements in the quality of formative assessment. They also established that, at that time, the present practices in formative assessment were generally weak. These findings were reported briefly in "Inside the Black Box" together with recommendations for action.

Since January 1999, the research group at King's College London have been working with schools to explore how the ideas that are suggested in the research literature can be transformed into useful practical knowledge through the efforts of teachers. Their role has been to provided ideas and support for teachers, who wanted to improve heir formative assessment practices. The results from 24 science and mathematics teachers, and later with 12 teachers of English, on their first project have been positive and very encouraging. In July 2002 they published a second booklet for teachers "Working Inside the Black Box" and since then various subject specific booklet and a booklet for primary teachers. The findings of the King's work form an important part of two DfES initiatives to improve learning - The Assessment for Learning Strategy for Key Stage 3 and the Primary Strategy. 

The Assessment Reform Group's 1999 publication "Beyond the Black Box" summarise assessment for learning as:

  • embedded in the teaching and learning experience
  • sharing learning goals with the learners
  • helping pupils to know and to recognise the standards they are aiming for 
  • pupils being engaged in self-assessment
  • providing feedback which leads the learner to recognise the next steps and how to take them
  • underpinned by the belief that every learner can improve
  • involves both the teacher and learner reviewing and reflecting on the assessment information

Assessment for Learning provides feedback to children about their learning. This takes place within the learning experience as children discuss ideas, hearing what others think and being challenged about their own thinking, as well as in the feedback they get about their work from their teacher. Through talk children can begin to construct new ways of thinking about a topic or a skill and so learning takes place. In his booklet Towards Dialogic Talking, Robin Alexander (2004) argues that:

"Children, we now know, need to talk, and to experience a rich diet of spoken language, in order to think and to learn. Reading, writing and number may be the acknowledged curriculum ‘basics'' but talk is arguably the true foundation of learning." (p5)

Alexander also categorised the different types of talk that he witnessed in an international comparison of primary classrooms as:

  • Rote - drilling of facts, ideas & routines 
  • Recitation - questions designed to elicit recall or work out answers from clues in the question
  • Instruction/exposition - giving information & explaining facts, principles & procedures
  • Discussion - exchange of ideas with a view to sharing information and solving problems
  • Dialogue - seeking common understanding through questioning and discussion which guide and prompt, reduce choices, minimise risk & error and expedite ‘handover' of concepts & principles (p. 33)

While teachers may need to use all of these categories of talk in their classrooms, it is clear that a formative approach requires the talk to be centred around the two latter categories of discussion and dialogue. What is essential in this process is that the children believe that their thinking is valued by the teacher, so that they feel able to discuss their understanding and misunderstandings openly.

There are also times when teachers wish to feedback on particular pieces of work that the children produce. This will provide ideas and guidance for improvement. Grades and marks detract from the guidance principles behind formative feedback. A comment provides the pupil with the information about how they can make improvements to their work. In many cases, the feedback comment relates back to the description of quality that has been discussed by the children before they attempt a task. In this way, children work towards success or quality by considering the criteria as their work progresses. Used over an extended period this helps children develop the skills of self-assessment, which is the main drive behind a formative approach to assessment.

For more information: 

5.0: Formative v Summative Assessment

Important Characteristics of Formative assessmentImportant Characteristics of Summative assessment
Mainly about improvement Mainly about accountability
Looks forward  Looks backwards
Favours descriptive feedback Favours tests and scores
Informs on quality Samples knowledge
Can lead to improvements in learning If overused can have a negative impact

Harrison & Howard, 2009                                                                            

It is important to realise the dilemmas that face teachers and schools as they try to strike a balance that foregrounds assessment for learning while maintaining assessment structures for accountability and reporting purposes. The question for teachers is just how do you create the classrooms in which the above factors come to fruition, where classroom evidence informs the next steps in learning, where feedback guides students towards areas they need to improve or develop and that enable learners to take an active role in their learning through self-assessment practices. While many teachers believe that they should respond formatively to the evidence that they collect in their classrooms, they find difficulty in achieving this because they feel confronted by large classes, a bulging and overburdening curriculum and an assessment environment that emphasizes monitoring and accountability. Their concern is that they cannot manage to find time to respond to the assessment information that they are collecting.

In those classrooms where teachers do manage to focus on a formative approach, children have made large learning gains. ( Black & Wiliam, 1998: Wiliam et al, 2003)

6.0: Technical Issues - Validity, Reliability and Bias

All assessments are fallible. They are simply a sample of what can be captured of understanding or skill at that moment in time by that examiner or teacher with that particular assessment tool or strategy. Examination boards and teachers need to show that they have been as rigorous as possible in making these judgements for there to be confidence in the results. There are two main criteria for quality to ensure this confidence - reliability and validity.

For an assessment to be reliable then it's results must reflect the capability of that person. The reliability is a measure of how close the assessment on that occasion is to the ‘true score' ( the average of several similar tests on that same aspect of learning). The reliability is then the relationship between how close you get to the ‘true score' on a particular testing occasion. In other words, it gives us an idea of the error within the assessment. So, if the assessment has 85% reliability then someone whose true capability is 50 per cent will score between 45 and 55 per cent. The reliability measures for many of our external examinations in the UK is around 70% reliability.
The main threats to reliability are :

  • question sampling 
  • variation in performance on different occasions 
  • marker variation

This is a measure of the extent to which the assessment captures what it is supposed to measure. It takes into account the breadth of the assessment (have all topics been covered), the types of question (have all the skills been measured) and takes into account any of the obstacles that may have prevented the learner revealing what they were capable of, such as complex language use preventing access or ambiguity shielding the assessment intention.

This aspect of assessment can be thought of as a validity issue. It poses the questions - does the assessment favour one groups' performance over another? In the 1980s, many UK examinations discarded multiple choice questions since it was felt that boys performed better on these types of question than girls with similar capabilities. Markers can bring in bias too since poor handwriting, spelling or literacy may cause markers to give lower marks to a pupil who has similar capability in that aspect of learning to a pupil who produces neat work where spellings and grammar are accurate.

7.0 Making Use of Assessment Data

The widespread use of assessment data now plays a vital role in school polices and practices in England. Analysis of assessment data is used extensively to foster school improvement conversations between head teachers and teachers as well as other interested parties such as school inspectors and local school improvement partners. Since its launch in 2006, RAISEonline (, a web-based reporting and analysis tool for schools, has played a key part in this drive to make better use of education data to support school improvement.

8.0: References

Alexander, R (2004) Towards Dialogic Teaching: rethinking classroom talk. Cambridge:Dialogos
Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2002). Working inside the black box: Assessment for learning in the classroom. London: nferNelson.
Black, P. and Harrison, C. (2004) Science inside the black box: Assessment for learning in the science classroom.  London: nferNelson. ISBN 0 7087 1444 7.
Black , P., Harrison , C., Osborne , J. & Duschl , R. (2004) Assessment  of Science Learning 14-19  London: Royal Society.
Black, C., Harrison, C., Lee, C., Marshall. B. & Wiliam, D. (2004) Assessment for Learning : Putting it into practice.  Milton Keynes : OUP
Black, P.J. & Wiliam, D (1998). Assessment and Classroom Learning, Assessment in Education, 5(1), 7-74
Braund, M. & Driver, N. ( 2005) Pupils’ perception of practical science in primary and secondary school : implications for progression and continuity of learning. Educational Research 47, 177 – 91
Brooks, R. & Tough, S. (2006) Assessment and Testing: Making space for teaching and learning. IPPR: London
Bunyan, P.  (1998) Comparing pupil performance in key stages 2 and 3 science SATs, School Science Review 74 (266) 39 - 42
Crooks, T. (1988) The Impact of Classroom Evaluation on Students. Review of Educational Research 58 (4):438 – 481
DeBoo, M.  & Randall, A. (2001) Celebrating a Century of Primary Science. Hatfield: ASE
Entwistle, N. (1992) The Impact of Teaching on Learning Outcomes in Higher Education, Sheffield, CVCP, Staff Development Unit
Nott, M. & Wellington, J. (1999) The state we ‘re in: issues in key stages 3 and 4 science. School Science Review 81 (294) 13 -18
Millar, R. & Osborne, J. (1998) Beyond 2000: Science education for the future  London: King’s College
Murphy, P. & Beggs C. (2003) Childrens’ attitudes towards school science. School Science Review 84(308) 109 -116
Pell. T. & Jarvis J. (2001) Developing attitude to science scales for use with children of ages from five to eleven years . International Journal of Science Education, 23 (8) 847 - 862
Pollard, A., Triggs, P., Broadfoot, P., McNess, E. and Osborn, M. (2000) What Pupils Say: Changing Policy and Practice in Primary Education Journal of Educational Change  4 (1), 87-93  Netherlands: Springer
Schagen, S. & Kerr, D. (1999) Bridging the gap? National curriculum from primary to secondary school Slough: NFER
Topping, K.J. ( 2006) Peer assessment Theory Into Practice, Volume 48, Issue 1 January 2009 , pp 20 - 27

Downloads in this Unit: