What really went wrong with exams 2020?

I write this in the time between the release of A Level and GCSE results and media attention is at fever pitch. What went wrong? Who’s to blame? How can we put it right? There are all kinds of questions to be answered and much better mathematicians than I can analyse the flaws in the system used, but to my mind the simple answer is ‘nothing went wrong – the system operated exactly as it was designed to operate. This is how it ALWAYS works.’ And how it always works is what’s wrong. Nothing ‘went’ wrong, it just IS wrong. All the time.

Let me explain. The reliability of the algorithm Ofqual devised (having rejected help from you know, real experts) was roughly 60% – that means that around 60% of students would probably get the right grade and 40% wouldn’t. We’d surely not accept that? If the chances of your plane crashing was 40% would you get on it? Of course not. But the weird thing here is we ALWAYS accept it. Ofqual decided that this was acceptable because in a normal year you would expect around 40% of young people to “underperform”. The algorithm’s reliability percentage was roughly in line with the reliability of putting young people into an exam.

Young people underperform for a number of reasons. The room is too hot; they have a panic attack; their Dad/Grandad/Dog died (forget mitigating circumstances – as far as exams go there’s a time limit on grief). Maybe they forgot to look on the back page of the exam paper and missed that 15 point question (looking at you, son). Maybe their girlfriend dumped them that week (looking at you, husband). Maybe it was a bad day for hayfever. All these factors conspire to ensure that around 40% of young people are disadvantaged every year not because they weren’t capable of success or because they didn’t know the content, but because they had a bad day. What’s our response? “Them’s the breaks. Tough luck.”

Even once they’ve left the exam hall there are circumstances working against them. Ofqual’s own analysis of the 2017 and 2018 exam papers showed a 50% unreliablity factor in the marking of English and History papers. But no-one changed the marks unless a child had stumped up the cash to pay for it. The fact is that the system relies on things going wrong in order to maintain the appearance of rigour. We can’t have too many passing after all – what would that say about our standards? I’m not sure, but I do know what it says about our morality.

It has taken an absurd situation to make these glaring inequalities obvious. In effect what Ofqual unwittingly wrote into its algorithm at an individual, human level was a random allocation of a dead dog/breakup/hot day/panic attack and this is clearly crazy. It explains why teacher grades were not ‘optimistic’ or ‘generous’ but more likely to be a real reflection of what a student could achieve and what they knew. If anything, what those teacher grades showed us was how badly we’ve been underestimating our young. Will there have been a couple of centres who pushed their luck? Probably. 40% of them? No way – not when they knew that their results would be compared to the last three years’ performance. That was the moderating control on the system – fair or not.

As a society we’ve accepted Maths as a truth when in fact it is as fallible to error as anything else if misused. We use numbers as cataracts to throw attention away from harsh realities. “Just 4% of entries were reduced by 2 grades or more” trumpet ministers and their messengers as a sign of success. Just 4%. Doesn’t sound like much does it? But that’s 28,720 people, or at least exams – some poor souls will have had more than 1 of their exams downgraded by 2 grades. Let that sink in. 28,720. Their teachers must be really rubbish at guessing grades, right? Well, no. There are other important and mathematically incompetent glitches that almost beggar belief – Alex Weatherall’s twitter thread showing how students ended up with Us instead of Cs is a clear example of the rampant injustice in the algorithm. You can link to it here –

These anomalies should have triggered a red flag for Ofqual but they didn’t. They should be triggering an immediate and automatic adjustment without appeal now. But they’re not. And here we come to the second problem we always have. A belief that the ‘system’ is infallible. The mathematician Hannah Fry in her brilliant book “Hello World: How Algorithms will Define our Future and Why We Should Learn To Live With It” writes “using algorithms as a mirror to reflect the real world isn’t always helpful, especially when the mirror is reflecting a present reality that only exists because of centuries of bias.” She argues that the two things we need to look out for when designing an algorithm that impacts directly on human life are accountability (for example to bias) and morality (both the morality of the system but also factoring in human concepts of morality – for example around fairness). This algorithm failed spectacularly on both counts. But the exam system has been failing on both counts for years and that failure stems from the centuries of bias we have in our system. From concepts of ‘deserving and undeserving poor’ to flawed concepts around meritocracy – a flaw that couldn’t have been more beautifully or ironically exemplified by the Harrow educated, hereditary peer, Lord Bethell in one of the most ill judged tweets of all time:

These inherent biases have led to the unfortunate, but no doubt unintended inequalities between private and state educated pupils in the adjustments to results: –

It’s not that Ofqual went out to deliberately benefit the private sector. It’s just that they didn’t think through how their decision not to moderate small cohorts would impact on those outcomes. They didn’t consider how centuries of bias and assumption have created a system that would impact on their mathematics. In the same way that the last Labour government didn’t think through the impact of league tables on house prices. Or in the way that successive governments haven’t thought through the impact of Ofsted/Performance Related Pay on behaviours. Not thinking through is endemic in our system – it’s not a new thing – we’re just seeing it in a new light.

If we had had modular exams, coursework, AS results in place, of course it would have been far easier to predict what the ‘real’ outcomes might be (by that I mean the outcomes that would best keep the perception of fairness intact because as they stood they were also prey to the same biases and game play). But we got rid of those. Why? Because another endemic problem in our society is the belief that people are out to cheat their way to the top. Teachers are out to cheat. Pupils and parents – middle class ones – are out to cheat (working class parents on the other hand are just feckless and irresponsible and their children need a firmer hand than others). With all these cheats and feckless people around, the system is designed to catch them out. It’s actually the opposite of a meritocracy. We were so obsessed with cheating that we made coursework so bureaucratic and joyless that even teachers were glad to see it go. We saw the idea of giving children second chances as ‘cheating’. “It’s unfair to those who didn’t have a bad day!” we cried. Lewis Carroll couldn’t make it up.

And now we’re suggesting that we can’t give these young people the grades they deserve – the grades they’d get on a normal, good, non heartbroken/anxiety-ridden/grief-stricken day. Because it’s not fair to the students who came before them. Let’s apply that logic to other situations shall we?

“We can’t end slavery because it’s not fair to all the slaves who didn’t get to see freedom.”

“We can’t make seat belts mandatory because it’s not fair to all those who went through the windscreen before them.”

Extreme examples I know. But not ending an injustice because it’s not fair to people who have previously suffered it is the most stupid reason I can think of for inaction. We need to give those young people their grades AND we need to use this lesson to prompt us to reform the system so that it doesn’t happen either in covert or overt form to other children. That’s one heck of a hill to climb but the view will be worth it.

What do we want? Young people who go out into the world with a sense of justice – a feeling that they had an opportunity to show what they could do (both academically, socially, practically and morally) and that those achievements are celebrated? Or a system that looks the same year on year that is deliberately set up to make sure that ‘enough’ children fail to make it seem robust? I know what I’d choose. What we’ve seen this week is an education system that has prioritised the system over the child. It’s been an ugly display, but frankly, I’m glad it’s out in the open and we can finally see it for what it is.

26 thoughts on “What really went wrong with exams 2020?

  1. This year’s fiasco has revealed the flaws in our exam system. It’s time to move to graduation at 18 via multiple routes including GCSEs and A levels with moderated coursework which is part of normal classroom life as well as vocational exams, extended essays, extra-curricular activities, community engagement, work experience, sports, theatre etc. Such a graduation system would work FOR the students who, after all, are the ones who are supposed to benefit.

  2. Spot on! One of the most worrying things is the lack of acknowledgement from Government that there is anything wrong.

  3. “What we’ve seen this week is an education system that has prioritised the system over the child. It’s been an ugly display, but frankly, I’m glad it’s out in the open and we can finally see it for what it is.”

    Beautifully said.

  4. An excellent analysis of the impact of the 2020 Algorithm. In your table you point out that this had the effect of increasing the difference in achievement A grades and above between Independent and other schools by between 4.4% and 1.1%. Clearly very unfair but look at the actual difference in achievement in A grades and above between Independent Schools [48.6%] and Secondary Comprehensives [21.8%.] Action this day would be nice !

  5. This article is generally a fair assessment, except for the analysis on outcomes by centre type. The problem here is that you’re looking at the absolute difference between 2019 and 2020 results, rather than the relative difference.

    Between 2019 and 2020 we have seen a general increase in the number of students being awarded grade As (as your table of data shows). If this increase in the number of students achieving As was unbiased with respect to centre type, we would likely expect the proportion of students achieving As to increase by roughly the same relative amount.

    So if we look at the change in 2019 to 2020 on a relative basis rather than absolute basis we get the following results:
    Comprehensive: 10.1% (ie 21.8 is 10.1% higher than 19.8)
    Selective: 3.3%
    Independent: 10.7%
    Sixth form: 1.5%
    Academy: 7.2%
    Other: 14.8%

    This does show quite a different picture. “Other” now shows the biggest change, (although other is a very small category). If we compare independent to the biggest 2 categories, comprehensive and academy, we see that comprehensive now has a very similar increase as independent, and whilst independent is still quite a bit ahead of academy, the difference is certainly less stark than your analysis done on an absolute basis.

    When you say “They didn’t consider how centuries of bias and assumption have created a system that would impact on their mathematics”, I think that’s unfair. They will almost certainly done analysis exactly the same or very similar to what I have just done here (ie on a relative basis, not absolute), and therefore likely come to the conclusion there are not huge stark differences between centers.

    1. Thanks. It wasn’t my analysis – it was Ofqual’s own analysis so if they couldn’t even get that right then I have little faith in your belief that they will have considered the consequences of their formula – consequences that were foreseen by the RSS and other respected experts.

      1. OK, fair enough. In which case if this is their own analysis I agree it further adds to the charges of incompetence to be laid at Ofqual’s feet.

  6. As clear as mud! but that’s me not you as you’ve written an excellent blog with evidence and clarity (for those clever enough) I’m afraid that I just can’t see the reason for algorithms and fiddling with results so that they look neater on a graph or in data systems. When I retired results in my subject, art, where still mostly based on students coursework and there were moves to remove the exam portion so that it was all teacher marked and then board moderated in situ. I always found this to be fair and the best way. Changing a grade /mark to fit an invisible pattern is abhorrent in my opinion.
    Thank you for the work you have done to help me and others understand.

    1. That is the issue. Grade inflation will mean results this year will be worthless if they do what Scotland do and increase by 14%. That is basically giving all pupils 1.5 grades better results this year. Whereas in a normal year the results are marked/moderated and who the person is and even the school are hidden from the marker this year they have had to take the schools past record into account. That in itself can bring in some bias. However, everyone is willing to condemn the system but no-one has come up with a better system or suggestion other than just give every pupil inflated results. Now on to teacher assessed grades. The teachers pay/promotion is based, incorrectly in my opinion, on getting results so does anyone genuinely believe that any teacher is going to under-estimate their pupils work. the answer is no and hence rampant grade inflation. It is fine having a go at Ofqual but I am yet to see anyone come up with a system that does not involve giving out highly inflated qualifications. Scotland results, for example, are up 14% so makes this years result s about 1.5 grades higher across the board which makes them totally suspect. Now we have Labour on the Tory backs in England but in Wales we have Labour supporting the system and Plaid Cymru attacking them. None of them are coming up with a solution other than to give out qualifications like confetti.

      1. No it doesn’t. They’re not worthless at all – they are the grades that they ought to have got and that reflect their capability. We have to get past this mindset. And for future employers of graduates? They don’t even look at them. The kids who were going on to apprenticeships or work? What possible harm could it do except to leave our society with fewer NEETs – especially at a time when we’re facing a huge recession. This concept of worth/grade inflation etc is one of the key reasons why this flawed system has been allowed to continue. We have to think afresh.

      2. If grades were awarded according to whether pupils met a set criteria, the problem of grade inflation would be solved. Yes, the proportion of grades awarded would fluctuate from year-to-year (down as well as up) but if the grade criteria remain stable then cries of grade inflation would be hollow. A driving test is a simple version – you meet the criteria for safe driving or you don’t. Now imagine a transport secretary deciding that a proportion of candidates would ‘fail’ irrespective of whether they met the criteria or not. The flaws in the system would immediately be apparent.

        Re teacher-assessment: there would need to be an independent moderating system in place to weed out over-generous or over-negative marking by individual teachers and ensure consistency between schools. Coursework could then form part of any exam and be a useful fall-back in the case of pupils not being able to take exams.

        There needs to be a decoupling of exam results and school accountability. Linking the two encourages dubious practices such as teachers overmarking work, too-much teacher help. ‘gaming’ and off-rolling. Exams should primarily be for the benefit of young people not politicians or schools themselves.

        Finally, we should move towards graduate at 18 via multiple routes (see my comment above).

  7. You are completely on the money! The algorithm used, the reliance of deeply flawed and biased historical data and the ignoring of teacher predictions is at the root of the problem. What is even more remarkable though, is the decision by Cambridge International to ignore AS results in awarding grades to students in the international sector. Despite having the luxury of externally verified data, generated through their own assessment systems, they overlooked AS grades (and the raw marks awarded at this level) and based their grading process on a similar procedure to that used in England. This is despite the fact that AS results always make up a considerable proportion of the final A Level grade. Young people around the world have been effectively dehumanised by deeply flawed mathematical systems that any social scientist would reject as unreliable. In in the case of the international sector individual students do not have the right to appeal. Appeals can be made by schools, but only against the grades awarded to a whole cohort of students within a subject area. This effectively means that schools would have to risk appealing grades that have been correctly awarded as well as the those that they wish to challenge. In any other year, parents and students would have the right to appeal individual grades, as is their basic right, but not in 2020. Not in the year when students have suffered due to Covid19 and school closures. The affects on the mental health and wellbeing of these young people has been completely ignored by examination boards and policy makers.

  8. Excellent analysis. Thankyou. This issue has actually been known since early July when IB grades came out using same flawed algorithm. There are over 4500 IB students in Uk and 170,000 worldwide where a high percentage saw massive unfathomable and illogical grade drops – also loosing Uni places and scholarships. This group are still waiting outcomes of appeals and public acknowledgment.

    1. It was always going to be even more complex for IB given that some schools were in lockdown, some weren’t, some had short lockdowns, some very lengthy ones. But there’s no excuse for a long appeals process – that should have been sorted months ago.

  9. Excellent analysis. Showing that the Emperor has no clothes and has been naked for quite a while is long overdue!

  10. This is a great article, thanks for sharing your insights. From an epistemological perspective, this might just go down very well with my IB students!

Leave a Reply