Blogs

Removing flaws in how exams are marked, graded and appealed

The reliability of GCSE and A level marking and grading must be a focus for the on-going Curriculum and Assessment Review if we are to remove flaws and unfairness from the current system, says Dennis Sherwood
Accurate grades? Ofqual told the Education Select Committee that 96% of GCSE grades and 98% of A level grades are “accurate plus or minus one grade”, implying that 4% of GCSE grades and 2% of A level grades are two or more grades wrong - Adobe Stock

The Curriculum and Assessment Review (DfE, 2024) is important. Its findings will determine educational policy for years to come.

We have to wait until the autumn to find out what those findings will be, but we have had the interim report (DfE, 2025).

As discussed in SecEd’s insightful summary, the emphasis of the report is on the curriculum, so my purpose here is to focus on some key aspects of assessment that have so far received less attention.

 

Setting the scene

I note that those who are hoping that GCSEs are to be scrapped will be disappointed, for the report does not mince its words: “We are clear that traditional examined assessment should remain the primary means of assessment across GCSEs.”

Furthermore, the authors declare that “externally set and marked exams are an important way to ensure fairness as part of our national qualification system”, fairness that underpins “the trust that stakeholders have in these qualifications”.

Although the report identifies some areas where the rougher edges of exams might be softened – such as the possibility of reducing the “overall volume of assessment” – there can be no doubt that formal exams are here to stay. And where there are formal exams, there are marks, grades, appeals and all the other attributes of “traditional examined assessment”.

The report makes few observations about “traditional examined assessment”, but perhaps looking into that more deeply is still on the to-do list, as suggested by the report’s conclusion that in the coming months “we will … conduct further analysis of assessment at key stages 1 to 4 and consider any necessary improvements”.

The extent to which assessments require “any necessary improvements” depends on their current fitness-for-purpose, and so, with that in mind, and thinking specifically of assessments relating to GCSE, AS and A level exams in England, let me ask you a question…

 

What do fair and trustworthy assessments look like?

My answer to this question identifies three features (I will add a fourth later).

  • Marking must be of high-quality.
  • Grades must be reliable.
  • The appeals process must be easily accessible, and act quickly to correct any errors.

As I am about to show, in my view all these are wanting, casting doubts as to their current fitness-for-purpose and making them prime candidates for “necessary improvements”.

Let me also note that none of these received mention in Ofqual’s recently published grading toolkit for schools or the accompanying video (Ofqual, 2025).

 

How good is the quality of marking?

All exam-based assessments depend on the quality of the underlying marking, and the exam boards have extensive quality control procedures to check that marking, as it happens, is “within tolerance” – meaning that the mark given by whomever is doing the marking is within a defined number of marks (the “tolerance”) of the “definitive” mark that a subject senior examiner gives to the same answer.

This recognises that there can be slight, but acceptable, differences in academic opinion, within limits.

When marking takes place, a mark that is different from the senior examiner’s “definitive” mark, but within “tolerance”, is accepted as legitimate, but if the mark is “out-of-tolerance”, the exam board intervenes accordingly.

In principle, the marks that contribute to the determination of the candidate’s grade should all be “within tolerance”. But in practice, a few “rogue marks” might get through. To protect against that, Ofqual’s rules for “reviews of marking” and “reviews of moderation” require that an exam board, following a grade challenge, must search for “marking errors”, these being marks that have slipped through the quality control net and so are “out-of-tolerance”.

To my mind, a fit-for-purpose assessment system would have very few marking errors. The number discovered and corrected following a challenge is therefore an important metric of the quality of the original marking and the effectiveness of an exam board’s quality control.

However, as I discussed in detail in my last SecEd article in February – Exam marking consistency: Ofqual must publish the data – the number of marking errors discovered is alarmingly high.

Do read that article for more detail, but for the summer 2024 exams in England, Ofqual’s data (Ofqual, 2024) shows that, on average, about 6 GCSE grade challenges in every 10 resulted in the discovery of a marking error, and about 8 in every 10 for AS and A level.

Summer 2024 is not exceptional. Furthermore, since a marking error can be discovered only if a grade is challenged, this raises profound questions about how many marking errors might have gone undetected in the nearly 95% of GCSE, AS and A level grades that were not challenged in 2024.

 

How reliable are grades?

In another SecEd article last year – GCSEs and A levels: Reliable to one grade either way? – I examined the reliability of grades, with particular reference to Ofqual’s acknowledgement in 2020 before the Education Select Committee that grades “are reliable to one grade either way” (UK Parliament, 2020).

This statement is validated and enhanced on page 20 of exam board OCR’s September 2024 report – Striking the Balance – where we read that: “Ofqual is often quoted as saying that GCSEs are only accurate within one grade anyway. The truth is slightly more nuanced…”

OCR’s report cites academic research quantifying the “more nuanced truth” that grades can be wrong by at least two grades. That research dates from 2010, but there is a more recent source: in 2020, Ofqual informed the same meeting of the Education Select Committee that 96% of GCSE grades and 98% of A level grades, are “accurate plus or minus one grade”, implying that 4% of GCSE grades and 2% of A level grades are two or more grades wrong (UK Parliament, 2020).

In the context of more than five million GCSE grades awarded each year, 4% represents some 200,000 grades.

 

How fair is the appeals process?

The impact of poor quality marking and unreliable grades would be less severe if the appeals process were easily accessible and were to right any wrongs quickly.

To many students, the fee is a barrier to access. As regards righting the wrongs, the current appeals process has a fatal flaw, in place since 2016 when the appeals process was changed to allow a script to be re-marked only if a “marking error” is discovered.

Despite being portrayed by Ofqual as “fair” (Ofqual, 2016), the “marking error” test is anything but.

The unfairness originates in the fact, as already noted, that any mark within “tolerance” of the subject senior examiner’s “definitive” mark, is legitimate.

As a consequence, the total mark for a script might be, say, 72/160, whereas the corresponding “definitive” total mark might be, say, 74/160. If grade 3 corresponds to “all marks from 70 to 79 inclusive”, then the candidate’s certificate shows a grade 3 regardless. But if grade 3 is specified as from 64 to 73 and grade 4 from 74 to 83, then the candidate has lost out on a grade 4.

To be clear, if this candidate requests a review of marking, no marking errors will be discovered, for there are none – all the marks are within “tolerance” (I expand on this issue in a 2023 SecEd article Can GCSE and A level exam grades be trusted?)

The originally awarded “non-definitive” grade 3 is therefore confirmed, even though the candidate would have been awarded the “definitive” grade 4 had the script been marked by a senior examiner, or any other examiner who happened to think like a senior examiner.

I believe this to be deeply unfair, and certainly “requiring improvement”.

 

The fourth feature of a fair system

This leads to my fourth feature defining what a fair and trustworthy assessment should look like: The assessment should do no harm.

But to my mind the current system does, especially when you consider the high-stakes of our assessment system and the impact on our students’ wellbeing – as the interim report (DfE, 2025) does.

That’s all very concerning, yet there is a further instance of the infliction of irreversible harm.

Taking my example of the script marked 72/160, resulting in a grade 3. If this happens to be for GCSE English, because of that grade 3 – which should really have been a grade 4 – the candidate is forced to re-sit, may be denied progression opportunities, and is branded a failure.

If this were a rare event, we might accept this. However, according to data from the Joint Council for Qualifications (JCQ, 2024), for the summer 2024 exams in England, a total of 782,022 certificates were awarded for GCSE English, of which 22.4% were grade 3 (about 175,000 students).

In November 2018, Ofqual published a report containing a chart showing “the probability of being awarded the ‘definitive’ grade”. We are not given exact numbers, but you can determine that this probability for English language is about 61%. That’s an aggregate over GCSE, AS and A level, but it’s the best number available and it implies that, for 61% of those 175,000 candidates awarded certificates showing grade 3, that grade 3 is “definitive”. But for the remaining 39% – that’s about 68,000 students – grade 3 is “non-definitive”. This implies that their “definitive” grade – the grade they truly merit – is something other than grade 3.

There is no information as to what that true “definitive” grade might have been for these students. Perhaps it’s lower, perhaps it’s higher.

I think it is fair to assume that a good number (probably about one-half) of those 68,000 should have had grade 4 – with the sting-in-the-tail that these errors cannot be corrected under the current rules for reviews of marking.

That is just one example of the harm done by the unreliability of grades and the unfairness of the appeals process in the current system.

 

Suggestions for ‘necessary improvements’

If I might take this opportunity to table some ideas as to “necessary improvements” I think would deliver significant benefit. First, two immediate changes to the appeals process:

  1. To refund the fee if a mark (not a grade) is changed, so giving the exam boards an incentive to improve the quality of marking.
  2. To scrap the requirement that a script can be re-marked only following the discovery of a “marking error”, and to re-instate that a “review of marking” triggers a re-mark by a subject senior examiner. This removes a great injustice, ensuring that a challenge always results in the identification of the “definitive” grade, albeit only to those who request a “review of marking”.

Second, as regards the reliability of grades, the simplest solution is for every exam certificate to include a disclaimer: “Ofqual warning: Unless the grade shown is the result of a re-mark by a subject senior examiner, the grades on this certificate are reliable, at best, only to one grade either way.”

It feels a harsh proposal, but this is the truth of our system – a truth that all users of a certificate should know.

There are also many ways to deliver assessments that approach 100% reliability as closely as we might like, where the measure of “reliability” is the probability that the originally awarded assessment will be confirmed, and not changed, following a fair re-mark either by a subject senior examiner or by any other qualified examiner.

Some of these are summarised in my blog for the Higher Education Policy Institute (Sherwood, 2019) or in the latter chapters of my book Missing the Mark (2022).

I hope I have demonstrated that there is an urgent need for implementing “necessary improvements” to the current GCSE, AS and A level assessment system. I await the final report of the Curriculum and Assessment Review with great interest…

 

Further information and resources