Teacher marking must be verified through quality assurance procedures, but what does research and experience tell us about how to conduct internal moderation effectively? Erin Miller considers different approaches
Fair moderation? Research suggests that teachers tend to be harsher on students they do not know whereas will defend the marks they have given to their own students - Adobe Stock

The objective of internal moderation processes is “to develop a shared understanding of standards of achievement and the qualities that will denote evidence of these standards” (Adie et al, 2013).

Sounds simple enough, but as all teachers know, moderation can quickly become a complicated and tricky job.

Internal moderation is required across all key stages as it is incumbent upon all teachers to ensure that they are accurately applying assessment criteria to their students’ assessed work, whether this is qualification-related or not. Without this accuracy, it is impossible to monitor progress.

Moderation is also a powerful tool for developing teachers’ formative assessment skills and is instrumental in ensuring we understand the progress of our students.

So, how we can conduct moderation in a way which increases these positive outcomes? Below you will find some advice and guidance related to the process of internal moderation. The advice is likely most applicable to moderation of assessments in essay-based subjects (statistical moderation is not explored).

 

Use benchmarked samples

Benchmarking should take place before marking begins. The process of collating benchmarked samples is invaluable in helping teachers to mark accurately as it ensures they have examples of different grades.

Not all teachers need to be involved in benchmarking. If it is possible to obtain them, externally benchmarked samples are more beneficial as no teacher in the department has a personal stake in the work – therefore using external samples saves time and avoids any risk of bias.

Ideally, teachers ought to have access to benchmarked samples before they begin marking, so that they have a point of comparison for their own marking. Benchmarked samples ought to be regularly updated to negate the risk of influencing teaching in the direction of just one or two samples of work.

 

Be clear: Consensus moderation or verification moderation?

Consensus moderation is where a group of teachers independently mark a sample of work and then “convene as a group, individually present their decisions and rationales, and deliberate them until consensus is reached” (Sadler, 2013).

This approach is useful when encountering new assessment criterion as it allows for a fuller discussion and shared understanding of the new criterion.

This can produce constructive discussion where you are able to understand the thought processes of all teachers in a faculty. However, this approach is likely to be more time-consuming, so ought to be used where it will be most valuable. Also, while it is in essence a more democratic process, it is important to remember that someone needs to have the final word.

Verification moderation is where a piece of work, which has already been marked, is passed to another colleague and that colleague verifies the mark. If the colleague disagrees, they can state this and the teacher can adjust the marks accordingly, or refer to the head of department. This tends to be less time-consuming, but potentially at the cost of productive discussion and fostering a shared understanding.

Verification moderation is useful for experienced colleagues supporting less-experienced colleagues.

Naturally consensus and verification moderation methods are not binaries and each can be adapted to use elements of the other – but do try to be clear on the method you wish to use, so that you set expectations for the level of discussion you want to achieve and how consensus will be reached.

 

Would ranking be a better method for moderation?

Moderation often becomes a comparative exercise (Crisp, 2017). So, rather than using the traditional methods of consensus or verification moderation, departments could use comparative judgement as a tool to moderate.

If ranking alone is being used to moderate, then this is more appropriate for in-house assessments where there is less pressure to meet the precise standards outlined in a mark scheme for assessments that will go on to be externally moderated.

Comparative judgement has the same goal as moderation: accurate assessment of student work. It is worth exploring the concept to see how comparing student responses to determine which is better can create a reliable ranking of all responses (you can find out more by searching online).

 

Remote or in-person moderation?

Moderation does not always need to involve all teachers being physically present.

Instead of gathering everyone together, why not ask each teacher to scan and send a sample of an assessment and then collate these onto a document or form, and then teachers can read the samples in their own time and submit their marks and feedback. Following this, you can send out the results from the remote moderation.

Teachers have fed back to me that they have appreciated where I have followed up with a short table listing the comments, the range of marks, and the average mark given to each sample.

This can have the benefit of teachers being in control of their time. Additionally, this method means that teachers will not be unduly influenced by the discussion that takes place at the time of moderation. This method also yields clearer records for teachers to refer back to.

 

Make use of AI tools in the moderation process

Many educational institutions are now using Co-pilot, ChatGPT or other AI technology to mark or grade student work.

While there are numerous practical and ethical considerations surrounding the use of AI in marking – not least data-protection issues surrounding the use of student information – a more appropriate use may be to use it as a tool in moderation.

As teachers, we have a professional responsibility to have a detailed knowledge of our students’ progress, so work must be reviewed by us in the first instance, but using AI as a second pair of “artificial eyes” to corroborate or critique our own marking may be a useful process.

It has been highlighted that grading using AI has the benefit of consistency (Kumar, 2023). As technology advances, the consistency and accuracy may well surpass that which can be applied by human grading.

However, AI is not presently sufficiently equipped to detect nuance in student work, so it should be used with the necessary caution, even when just using it in the moderation process.

Interestingly, Chai et al (2024) found that students prefer the idea of AI grading their assessments, so they might be interested to hear that it is being used in the moderation process (and how).

Again, we must emphasise that all student data should be anonymised before it is put into online AI tools in order to avoid falling foul of GDPR rules.

 

Share the moderation process

As noted by Crisp (2017), more experienced moderators do not always feel that moderation meetings are useful for them. However, their presence is beneficial for newer moderators, so always ensure that the moderation is shared with at least one experienced member of staff.

Where possible, invite teaching assistants to observe the moderation process. Not only will this be valuable professional development for them, but it may also give a fresh perspective on the meaning of terms in mark schemes.

Experienced teachers may find that, over time, they have developed very fixed ideas about what certain terms mean in practice, e.g. “insightful”.

Naturally students ought not to be involved in the final process of moderation – however, allowing students an insight into how moderation takes place can be excellent practice for them to understand how they are being assessed.

 

Final thoughts

Ultimately, moderation will always be a challenging area for departments. The stakes are often high, assessment criteria is so often vague, and absolute consistency is near impossible. And of course, human bias is perhaps the greatest flaw of moderation processes.

Indeed, in feeding back about moderation, teachers have noted that they “tended to be slightly harsher on students that they did not know because they had not seen the work progressing, and that they tended to defend the marks they had given to their own students” (Crisp, 2018).

Therefore, attaining a perfect system for internal moderation is impossible. However, by considering and implementing some of the ideas in this article, you may be able to facilitate moderation meetings which foster consistency in assessment by cultivating a shared understanding of criteria and helping teachers to identify the learning needs of a cohort.

At the very least, you may be able to avoid those lengthy moderation meetings which risk becoming tense affairs as they fail to reach a genuine consensus.

 

Further information & references

  • Adie, Lloyd & Beutel: Identifying discourses of moderation in higher education, Assessment & Evaluation in Higher Education (38,8), 2013.
  • Chai et al: Grading by AI makes me feel fairer? How different evaluators affect college students’ perception of fairness, Frontiers in Psychology (15), 2024.
  • Crisp: The judgement processes involved in the moderation of teacher-assessed projects, Oxford Review of Education (43,1), 2017.
  • Crisp: Insights into teacher moderation of marks on high-stakes non-examined assessments, Research Matters, UCLES, 2018: www.cambridgeassessment.org.uk/Images/476534-insights-into-teacher-moderation-of-marks-on-high-stakes-non-examined-assessments-.pdf 
  • Kumar: Faculty members' use of artificial intelligence to grade student papers: A case of implications, International Journal for Educational Integrity (19,9), 2023: https://doi.org/10.1007/s40979-023-00130-7 
  • Sadler: Assuring academic achievement standards: From moderation to calibration, Assessment in Education: Principles, policy & practice (20,1), 2013.