August is traditionally the silly season for newspapers, and there are a couple of educational stories they regularly treat us to. Firstly, there’s the grand announcement of the results, which gives columnists the perfect win-win situation. If the number of successful students goes down, then clearly educational standards are falling. And when the numbers go up – well, it’s obvious the exams are easier and hence standards must be falling.
It’s not just the writers who get busy; the photographers have an even better time. It’s truly remarkable just how many successful students are amazingly attractive young women. Judging by the photographs, only about 1% of successful candidates are boys or less attractive females.
There’s another regular August story, and this one’s more complicated and more serious as well. Each year the boundary points at which levels are awarded are liable to change slightly. So a mark of 75 may receive a different level this year to what was awarded last year. This can cause immense pain to teachers and pupils, and is liable to be interpreted by columnists as political manipulation, and by teachers as “arbitrary” or “capricious”. Now at the Key Stage 2 / 11-year-old level this is an area I know something about, and I presume the same principles apply elsewhere.
The basic factor is that it’s relatively easy to set examinations that test the syllabus or programme of study in a valid way. However, it’s quite impossible to set two examinations on the same syllabus on which pupils will perform identically, and here’s why. We’ll simplify our syllabus so that it consists of one element only, that children must master their multiplication facts; so that on a certain date in May every child will be tested and to make it fair they’ll all be tested on the same statement. Let’s say we ask them 6×9. Their papers are sent off for marking, the results are analysed, and in August the results become available and perhaps 85% of children are found to have been successful.
But what do you do next May? You could ask next year’s pupils 6×9 again, and perhaps 93% of children are successful. But in the meantime you know parents have been practising their children on 6×9, publishers have been bringing out 6×9 games, worksheets, and practice cards; schools have been putting on 6×9 practice sessions, and a thousand Youtube channels show 6×9 rhymes and raps.
So next May you decide to ask 7×8 instead, and this time 87% pass. But we don’t have much idea why this is. Is it because all the extra practice has meant children know their tables better this year? Teachers will claim the higher pass rate is down to their skill and commitment, while the government will claim a triumph for their enlightened policies. But perhaps the whole cohort is of slightly different ability. And it’s certainly true that 7×8 won’t present exactly the same level of difficulty as 6×9 to every child – some may find it easier to remember, and others harder.
And of course in the real exam there’s not just one, but dozens of questions sampling dozens of syllabus skills, so while we can be pretty sure we’re setting an exam that is fair and valid we really can’t say that a mark of 60% indicates the same level of performance as it did last year.
So just how do we ensure that a grade from last year is comparable to the same grade this year? This is vitally important and it’s a hugely sensitive issue. It’s also fiendishly difficult, and the boards use every method they can think of; many aren’t particularly watertight in themselves, but they do offer pointers. There may be an Anchor test, taken by a random selection of the age-group; the Anchor test stays the same year after year, so that gives an indication how each cohort compares to the last. Another process is that some of last year’s candidates sat this year’s paper immediately before they sat their own test, so we can reason their performance on the two tests will be similar. Of course the statisticians will be at work as well. There are likely to be other processes involved I don’t know about, but one I have experienced is where a panel of the most expert authorities sit down and examine a selection of papers at this year’s borderlines and compare them with borderline papers from previous years.
All of these are partial indicators only, and they all have disadvantages, but when they’ve all been taken into account it may be necessary to take the decision that one or more of last year’s boundaries may need to be adjusted by a point or two. I’m as sceptical about politicians as you are, but I’ve been given every assurance that this judgment is made on educational grounds and nothing else whatsoever. You can at least be certain that no examination board ever adjusts boundaries without a huge amount of thought and effort, and you can be 100% sure this is never done in an “arbitrary” or “capricious” manner.