Ofsted’s national director for schools Mike Cladingbowl has written a document pithily titled Why do Ofsted observe individual lessons and how do they evaluate teaching in schools? It does exactly what it says on the tin. It is a pragmatic, common sense statement of intent.
The highlight, for me, is this – “inspectors should not give an overall grade for the lessons” they observe as part of a school inspection because it would be “nonsensical to suggest that an inspector could give a definitive validation of a teacher’s professional capacity” based on such short-term and superficial evidence ... in much the same way as “we would not expect a surgeon to be judged on a single 25 minute observation of their work”. Sceptical? Read on...
Instead of grading lessons, Mr Cladingbowl argues, inspectors should “provide feedback to individuals on what they have observed” but ensure that feedback “does not seem to constitute a view about whether the teacher is a ‘good’ teacher or otherwise, or if they ‘taught a good lesson’ or otherwise”.
Inspectors should not grade lesson observations, Mr Cladingbowl says, because an observation is just one piece of the jigsaw. Evidence gleaned from an observation contributes towards a wider evaluation of “the quality of education provided by the school” and is not used to evaluate “individually or collectively, the performance of teachers”.
If Ofsted has stopped grading individual teachers and lessons, then surely school leaders should follow?
Are lesson observations a relic?
As a result of high-stakes, graded lesson observations, teachers tend to do one of two things:
1. They over-plan, over-teach and proffer a showcase lesson which bears no relation to their everyday practice. Sometimes, this “showcase” lesson is better than their “normal” lessons because an incredible amount of time and effort has been invested in its conception. Sometimes, however, this showcase is not as good as their normal lessons because, having dedicated so much time to the planning and preparation of it, they are less willing to deviate from their intricate plans and respond to what’s happening right in front of them.
2. They become stressed by the experience of being watched and so underperform. They are nervous and stilted, pressured and pained. They tune out of the classroom dynamics – that sixth sense which tells them when students need the pace to slow or quicken and when the work is too hard or too easy. They try to spin too many plates all at once and, far from providing an entertaining circus act, it starts to resemble a Greek wedding.
In short, high-stakes, graded lesson observations do not allow observers to observe the teacher as they would normally teach. Even if the teacher is brave enough to teach a normal lesson and does not succumb to the natural stress of observation, the very presence of an observer in the room – particularly an inspector or senior leader with a clipboard – inevitably alters the classroom dynamic. It’s what’s called the Hawthorne Effect.
And that’s not all. High-stakes, graded lesson observations are also ineffective because, like other methods of evaluation, they are limited in what they can tell us about the complex process of learning. Let me explain...
Learning is not always visible and so we mistake poor proxies for learning instead. We see students engaged in discussions or listening attentively to the teacher and we assume this means they’re learning. But do we really know?
Learning is a complicated procedure which takes place over time and is the result of a series of cognitive processes. And what is learning anyway? Surely it is – or at least in part – the ability to retain and recall information at a later date? How can we possibly observe this in 20 minutes to an hour? By observing a lesson, we can see the information as it goes in but we’d need to see it as it comes out, too, in order to be sure it has been learnt.
Notwithstanding an observer’s tendency to mistake poor proxies for learning, observations are unreliable because there is always going to be an element of human error, albeit subconsciously. Put crudely, if a teacher isn’t teaching in the way you would, you are less likely to look favourably upon their teaching.
Observers make a strong emotional response to particular behaviours and styles which are hard to over-rule. This is partly why observers rarely concur with each other’s judgements: every observer is looking for something different and is making a different emotional response to what they see.
Professor Paul Black said that “good quality assessment is inevitably the child of a union between reliability and validity”. This is no mean feat. As Professor Dylan Wiliam added: “One cannot validate an assessment; one can only validate a particular interpretation of assessment data ... the onus is always on the users of the assessment information to establish that the inferences they make are warranted.”
Concerns about the validity of classroom observation are not new, of course. Writing in 1981, Scriven said that “using classroom visits to evaluate teaching is not just incorrect, it’s a disgrace (because) the visit alters the teaching ... the number of visits is too small to be an accurate sample ... visitors are not devoid of personal prejudices ... (and) nothing observed in the classroom can be used as a basis for any conclusion about the merit of the teaching”.
The Measures of Effective Teaching (MET) Project, set up by the Gates Foundation to find out how evaluation methods can be used to identify the skills that make teachers effective, quotes statistics which show that, when one observer judges a lesson to be outstanding, there is – at least – a 51 per cent chance that a second observer will disagree and – at most – a 78 per cent chance that a second observer will disagree. If the lesson is judged to be inadequate then there’s a 90 per cent chance a second observer will disagree.
Prof Wiliam, meanwhile, says that in order for a lesson observation judgement to achieve a reliability of 0.9 a teacher would have to be observed by at least five independent observers teaching at least six different classes.
In a University and Colleges’ Union (UCU) research project that lasted over a year and included an online survey, focus groups, and semi-structured interviews involving thousands of teachers across the country, Dr Matt O’Leary of the University of Wolverhampton found that only 33.7 per cent of the circa 4,000 respondents “agreed” or “strongly agreed” that graded lesson observations were essential for improving the quality of teaching, only 29.5 per cent felt they were essential for CPD, and a mere 20 per cent felt they helped to raise standards. By contrast, a majority 67.4 per cent said they agreed or strongly agreed that graded observations should no longer be used as part of an organisation’s assessment.
Are lesson observations a requisite?
The case against graded observations is, I believe, a strong one. But I’m not suggesting we should stop observing lessons altogether. In fact, I think walking into lessons to see what’s happening is important. By observing the classroom environment, for example, we can make judgements about the rapport the teacher has established with students, we can make judgements about how well the teacher manages behaviour and uses resources, and we can make judgements about the ways in which students are grouped. Lesson observations also allow us to see the ways in which transitions are handled and learning is organised.
Dr O’Leary, in his book Classroom Observation (2014), concludes that “despite some of the shortcomings associated with qualitative and quantitative observation instruments, one cannot deny that each has a role to play in researching classroom behaviour and/or teacher-learner interactions ... yet by itself observation may only provide a partial view of teaching and learning, thus highlighting the importance of gathering information from other sources ... to form a well-rounded judgement”.
So observations have their place but they should be ungraded and their remit made clearer from the outset. To return to Dr O’Leary’s UCU project, 81.2 per cent of respondents agreed or strongly agreed that ungraded lesson observations were more effective at improving teaching than graded ones, and 81.3 per cent said ungraded observations helped with CPD. A majority 76.6 per cent agreed or strongly agreed that ungraded observations should replace graded ones.
Participants in Dr O’Leary’s research project spoke highly of formative models of observation, particularly of peer observations which had the capacity for feedback and feed-forward. They said peer observations also gave a greater capacity for observees to negotiate the focus of an observation.
I think most of us now accept that grading students’ work can lead to what Professor Carol Dweck calls the “comparison effect” because it doesn’t necessarily show the progress students have made. Grading work can lead high-attainers to grow complacent and low-attainers to grow despondent.
Moreover, grading work encourages students to focus on the mark they have just achieved and not on what they need to do in order to make further progress.
We accept that formative assessment works best for our students because it provides them with clear direction and focus; it concentrates learners’ minds on what they need to do next in order to improve.
Surely, what’s good for the goose is also good for the gander? Surely, we teachers too must start assessing the quality of our teaching in a formative not summative way. And that means moving away from high-stakes, graded observations and instead engaging in developmental observations aimed at helping colleagues to improve and refine their teaching.
After all, in order to win you must first embrace failure. Or, as Michael Jordan famously said: “I’ve failed over and over and over again in my life. And that is why I succeed.”
If we are to improve the quality of teaching in our schools then our evaluative system should be:
Observations of teaching should identify what a teacher needs to do in order to improve rather than simply report their level of current practice against arbitrary criteria. In order to be formative, observations should be conducted without fear or favour. They should be led by the observed teacher rather than the observer and be focused on a particular aspect of their teaching at any one time.
In his UCU report, Dr O’Leary argues in favour of “more peer-based models of observation” which offer “the potential to redress some of the power imbalance associated with top-down, deficit models of observation” and which “encourage a greater sharing of practice and professional dialogue that can be mutually beneficial for observer and observee”.
Observations of teaching like those described above should form only a part of the overall judgement of a practitioner’s effectiveness and, indeed, of a school’s effectiveness. This should be triangulated with other sources of data such as work scrutiny (an evaluation of students’ work, assessment records, and planning information), and attendance, progress and attainment data.
In fact, as many sources of information as exist should be used to help form a fair, accurate picture of a teacher’s effectiveness. That way, each school will know it has reliable data on which to act; each practitioner will know they are being judged fairly and will not be penalised for taking a risk in a lesson observation which didn’t pay off, or be branded inadequate simply for having a bad day. And this data should be used in conjunction with our professional judgement.
Dr O’Leary talks about a multi-dimensional model of assessing teaching competence and performance. “The current reliance on graded observations,” he argues, is “an inequitable and reductive practice (and there is an overwhelming demand to) make the process of teacher assessment more inclusive by extending it beyond the lens of lesson observation and drawing on other sources of evidence such as student feedback, peer-review, student achievement data, etc.”
Lesson observations should continue to play an important role in the way our schools judge the quality of teaching and, moreover, in the way we support teachers to improve their practice. But observations should be formative and their remit made clearer. And observations should be just one of a multitude of data sources used to evaluate teaching and learning. They should help teachers identify areas of practice they need to improve in a way that eradicates fear and promotes risk-taking.