Top aggregated marks for Tiffin Boys

Eleven Plus (11+) in Surrey (Sutton, Kingston and Wandsworth)

Moderators: Section Moderators, Forum Moderators

11 Plus Platform - Online Practice Makes Perfect - Try Now
SunlampVexesEel
Posts: 1245
Joined: Fri Jul 06, 2007 9:31 pm

Post by SunlampVexesEel »

Have a look at...

http://www.nfer.ac.uk/research-areas/as ... sation.cfm
An important consequence of this is that, in whatever month pupils were born, roughly the same proportion will achieve the specified pass mark. This is because pupils are, in effect, only being compared with other pupils of the same age as themselves.
In terms of standardisation; the assumption is the distribution is normal see the humpy diagram. The standard deviation reflects the humpiness of the diagram. If the mean standardised score is 100 then approx 34% of the population will have a score in the range of 100 to 115.

http://apm.sagepub.com/cgi/reprint/14/4/387.pdf gives detail of a fitting approach to age standardisation; it is from an NFER person.

If you look at the digram Figure 4 you can see that in the sample data that the raw scores of the younger group are worse; NFERs conclusion is that this is because they are younger rather than less smart. If no adjustment was made it would not be possible for a very young candidate to score highly.

In fact if you look at the 50%-tile you can see the young ones were scoring 10 raw points less than the old ones.... but again... the NFER assumption is that this is because they are younger.

Therefore to remove the bias the younger ones are given an adjusted mark. It is not an advantage; it is the removal of a disadvantage.

Regards
SVE
Animis opibusque parati
oesmugso
Posts: 39
Joined: Tue Mar 04, 2008 4:37 pm

RE: Standardisation

Post by oesmugso »

Thanks SVE for addressing the issue I had raised. The problem was in how I interpreted the meaning of what you were stating. I understood it as no adjustment is made. I happen to agree with the idea in general as it's somehow intuitive. Whether anyone thinks it IS an advantage to be in the younger group or the older group, would depend on their evaluation of the amount of marks affected this way and the merits of the arguments for & against. Anyway, we are then agreed NOW.

I only just discovered the edit option; so it is possible to amend the posted contents. Therefore I'll now spew out in the following my meanderings on the process & why, which I referred to earlier. I had already found the link you supplied and used that for much of what I will say. It is very long and you need to be interested in the subject to want to read it. So go ahead everyone & anyone - shoot me down or add any salient points you can think of. I can then edit the post to reflect whatever useful info is given and correct errors as we go along. SVE, I certainly hope you're game! Presentation I'm sure will take a battering.

Without warning, I will edit it every time I spot an error of spelling or grammar or to make it more readable.
oesmugso
Posts: 39
Joined: Tue Mar 04, 2008 4:37 pm

RE: Standardisation Process - What is & why

Post by oesmugso »

Sutton Grammar - "... allowance is included in their mark to reflect their younger age. ... The allowance is NOT fixed in advance but is worked out by the NFER by comparing the raw marks of ALL the children who have taken the tests, according to the month of their birthday."

Tiffin : "The raw scores have been standardised to take account of his age .."

Some may find the following references explain everything whilst others may not:
http://www.nfer.ac.uk/research-areas/as ... sation.cfm
http://www.nfer.ac.uk/nfer/index.cfm?25 ... 891278C934


STANDARDISATION
=============
Why do they do this? In Tiffin's case we know they set 2 papers VR & NVR. The theory is (and it's intuitive that this is right) that it's not correct to simply combine scores across a multiple number of papers of which may have varying degrees of difficulty and different numbers of questions and leave matters there. For example, 80% may be the top raw mark in NVR whilst 98% may the top raw mark in VR, a raw score of 79% in NVR ought to reflect much better ability than the same score in VR. After standardisation, raw marks of 80% in NVR and 80% in VR will most likely end up with a much better standardised mark than one that had raw marks of 70% in NVR and 90% in VR even though the average marks are the same.

It is commonly accepted that a very large number of traits in the world are distributed "Normally". This sounds vague, so it is probably better use the alternative name of Gauss who first introduced the idea but the "Normal" distribution describes exactly what the pattern amongst the population is EXPECTED to be. The distribution of the number of the boys on given scores is thought be "Normally" distributed - so we then force them to be by adopting this idea. Normal distribution is shown by the bell shaped graph that most us have seen sometime in our lives.[My take is that when the raw scores are compiled we'll perhaps not have a Normal distribution and certainly not one matching that being used. The assumption is almost as if to say, "Of course the scores are distributed normally, the reasons they are not may be down to an imperfection in the exams or the marking & we're going to adjust all the scores so that we do have the shape we were expecting!". Worth noting that for 2 Normal distributions to be the same, the means (= averages) and standard deviations must match.

STANDARD DEVIATION
===============
Try these ideas -
It's a measure of how close to the average score you can expect the mark achieved to be for a randomly selected entrant to the exam.
OR
The lower the sd, it means it means more entrants' scores are clustered around the average score as opposed to a higher sd that means the scores are more spread/dispersed around the spectrum of possible scores.
OR
Another way of looking at it is to say that a low sd means the average mark is good representation of what a student might have actually achieved.
OR
The bell shape is thinner in the middle (low sd) or fatter (bigger sd).

A quick example:
(a) 6 boys score 40%, 1 scores 60%, 3 boy scores 100%, average is 600/10 = 60 --- [sd = 85]

Whereas
(b) Scores of 55%, 60%, 65%, 62%, 58%, 61%, 57%, 60%, 57%, 65% also yield an average of 60% --- [sd = 10]

The simple average really does not tell the story.

THE PROCESS
=========
One of the bodies who do this eg NFER, in exercises involving educational attainment, usually use the Normal distribution that is given by an SD of 15 and an average of 100 (they think we find it easier to work with 100!). These settings for normal distribution also dictate that the minimum possible raw mark of 0 is mapped to 70 & the maximum score (whatever it is) is mapped to 140. So with all this in place, we have a predefined maximum, minimum, average and a spread. All of this is assumed every year as far as we know, until someone notices a different trend I guess.

Then using the raw marks, the average and standard deviation are calculated. The first is simply the sum of all scores divided by the number of entrants sitting. The second is the square root of (the sum of squares of each mark's difference from the average). If we plot the real raw marks against the number of entrants we ought to get something that looks like a normal distribution. We might not! Regardless, we then map points on our real graph to points on the theoretical god/NFER given graph. The act is as if we were to pick up our graph and pull/push it to be superimposed. The two end points are easy enough to imagine. The middle point - the average score - again is easy. The rest of the points are translated into standardised scores by using the formula which NFER have provided (lucky for me) and lucky for all of us, I'm not going to attempt to derive it here:

y = 100 + (15(x - A)/s) where y is the standardised score, s is the real standard deviation of the actual results, A = average raw score and x is the actual raw mark. (Mainly I've rewritten what was supplied to make it match the X&Y axes, added extra brackets for clarity)

This is actually an approximation to what really happens. NFER: "..the age-standardised scores are calculated in a much more statistically complex way, although the effect is similar to computing sets of scores using the above equation for pupils of the same age (to the nearest month)."

Repercussions:
(i) If you scored the average mark your standardised score is 100.

(ii) If the real standard deviation is also 15, the standardised score y = (100 + Raw Mark - Average Raw Mark) ie 100 plus how much did he beat the average by. If this were the case, then the standardisation process only introduces a straight shift of (100 - average raw mark) for every entrant.

(iii) For those who like looking at the ridiculous, if every score was the average, we need to take 0/0 as being 0.

(v) Rewriting the expression as (15x/s) - (15A/s) + 100, we can see that the second item here is fixed for everyone in the standardisation process as is the 100. The only variable is 15 times his own raw mark divided by the real standard deviation.So all of this means if we set aside the first item, the process is nothing more than adding or subtract a number to the raw mark. What's interesting is the first item - ie (15 x the mark / standard deviation).

(vi) If the boy only puts his name down OR gets everything wrong, he'll still get 70.

(vii) As there are 12 groups - I think the grouping is to the date of the exam rather than strict calendar months - the results will depict at least 12 boys with scores of 140 for each of the subjects.

Let's examine the effect of having different std. deviations:

Std Dev of 10 (the real test produced marks more tightly clustered than the normal distribution expectation) : for every mark more than someone else you get 1.5 std. marks

Std Dev of 12.5 (not as tightly as the first one but still bunched up). For every mark more than someone else you get 15/12.5 = 1 & 1/6 (1.1666) std. marks.

Std Dev of 15 - we've already seen that the adjustment is the same for all entrants - a fixed number.

Std Dev of 17.5 - the marks are a bit better spread now than Normal, so every one raw mark advantage you had is reduced to 6/7 of a std. mark.

Std Dev of 20 - marks even more widely spread, so that every one mark raw advanctage is only worth 3/4 of a std. mark.

So now we can see that the effect of standardisation is to widen the difference (in absolute numbers not percentage wise) when the results are closely clustered and narrow the difference when the results are spread more evenly. The process emphasises what position you achieved in each paper rather than the score achieved.
WP
Posts: 1331
Joined: Thu Jan 03, 2008 9:26 am
Location: Watford, Herts

Post by WP »

SunlampVexesEel wrote:Have a look at...

http://www.nfer.ac.uk/research-areas/as ... sation.cfm
An important consequence of this is that, in whatever month pupils were born, roughly the same proportion will achieve the specified pass mark. This is because pupils are, in effect, only being compared with other pupils of the same age as themselves.
In terms of standardisation; the assumption is the distribution is normal see the humpy diagram. The standard deviation reflects the humpiness of the diagram. If the mean standardised score is 100 then approx 34% of the population will have a score in the range of 100 to 115.

http://apm.sagepub.com/cgi/reprint/14/4/387.pdf gives detail of a fitting approach to age standardisation; it is from an NFER person.
Not arguing with the principle of age standardization, but quibbling about details: we still don't know how they do it, and there may be some odd effects. The NFER page says they use a complex procedure that approximates the bucketing procedure you described. The Schagen article describes four variants; it says the last one is rarely used, but we don't know which of the other three is used. And each of them produces different artifacts.

Figures 3, 4 and 5 show the three methods fitted to the same collection of test scores. The first one shows a clear ceiling effect: because the top candidates obtain almost perfect scores regardless of age, the older ones are not ahead of the younger ones by the same margin as in the mid-range scores. Because this model uses the same age adjustment across the ability range, these older children cannot reach the highest standardized scores. There's a similar effect at the bottom; almost everyone gets some questions right. (Of course it could be argued that this is unimportant: the test is used to select the top 25%, so the results at the extremes don't matter.)

The other methods show reduced ceiling effects, but they still assume that there's a smooth relationship between age and score, and in this data there isn't. For some reason, children aged 10 years and 9 months have almost invariably done substantially better than children one month older, so they will get higher standardized scores than if they were treated as an independent bucket. Children are competing not just against those of the same age, but those across the age range in a complex way.
SunlampVexesEel
Posts: 1245
Joined: Fri Jul 06, 2007 9:31 pm

Post by SunlampVexesEel »

WP wrote:Children are competing not just against those of the same age, but those across the age range in a complex way.
I have to agree with you to some extent there; In that the adjustment for each bucket isn't actually independent but is fit using one of the methods described but the fit is very close to that if the interpolation was not used.

What is very clear from the diagram is that, as you pointed out, the top performers tend to perform well regardless of age (and therefore receive less adjustment) but those nearer the middle perform very differently by age. But in the case of Tiffins only the top are being selected; the age effect is in reality even more limited.

Regards
SVE
Animis opibusque parati
WP
Posts: 1331
Joined: Thu Jan 03, 2008 9:26 am
Location: Watford, Herts

Post by WP »

SunlampVexesEel wrote:What is very clear from the diagram is that, as you pointed out, the top performers tend to perform well regardless of age (and therefore receive less adjustment) but those nearer the middle perform very differently by age. But in the case of Tiffins only the top are being selected; the age effect is in reality even more limited.
That means that for a super-selective like Tiffin it is particularly important which method is used, because for two of the three methods (the parallel ones) the age adjustment used for the high-scorers is determined by the age effect on the whole population. They could also mitigate the ceiling effect by using much harder tests, so that the more able are more spread. (I don't know whether they do.)
oesmugso
Posts: 39
Joined: Tue Mar 04, 2008 4:37 pm

Post by oesmugso »

It looks like WP & SVE are at the scary end of all this stuff. If I may pose a question please:

My son and 3 boys from his school have admitted to guessing 10 or more questions out of 80 in the NVR paper which each year (in recent times) boys are reporting as very hard. Two passed with over 240 and 2 are reasonably high in the waiting list. If you make the tests harder still, my thinking is the impact of good guessers v bad guessers comes into play even more. How wrong am I in saying, "I can't believe you do all this sophisticated maths on smoothing things out and allow the exam to become a lottery"? Perhaps more questions in the same time with wrong answers penalised? Hmm, that doesn't sound too fair either!!
Post Reply
11 Plus Platform - Online Practice Makes Perfect - Try Now