Standardisation

sycamore · Post by **sycamore** » Thu Mar 12, 2009 10:21 am

mattsurf wrote:I had assumed that NFER consider the whole population who take the 11+; if in one region only 700 take the test then the sample size per birth month is less than 60 - this is far too small a sample to standardise with any degree of accuracy. If we consider every child who takes the NFER test (maybe 25,000 kids - this is a total guess), then the sample is 2000 samples per birth month so the sample is statistically valid >97%

But that would only work if the same test was being used for more than one area? Does anyone know if this is the case?

In Wilts where fewer than 1000 children sit the test a 3% sampling error is a bit of a worry when the pass mark is so close to the average (317/420).

Maybe that's why they have a HT review?

mike1880 · Post by **mike1880** » Thu Mar 12, 2009 10:52 am

Different exams are standardised against different groups. Some are standardised against the candidates themselves, others are standardised against control groups.

Mike

Looking for help · Post by **Looking for help** » Thu Mar 12, 2009 11:20 am

mike1880 wrote:Different exams are standardised against different groups. Some are standardised against the candidates themselves, others are standardised against control groups.

Mike

Hi Mike,
I am very interested - can you explain further?
I though standardisation was simply to take out any advantage an older child would have over a younger child?
So if you have a very bright cohort or one where the majority had intensive tuition then the results would not necessarily be indicitave only of GS ability? Am I correct in this

Thanks
LFH

Charlotte67 · Post by **Charlotte67** » Thu Mar 12, 2009 11:38 am

sycamore wrote: In Wilts where fewer than 1000 children sit the test a 3% sampling error is a bit of a worry when the pass mark is so close to the average (317/420).

Ah, but the point is that the children are only measured against others born in the same month, ie about 60 children (not 700). According to mattsurf's figures, the risk of sampling errors has got to be much higher than 10%! As mattsurf says,

if in one region only 700 take the test then the sample size per birth month is less than 60 - this is far too small a sample to standardise with any degree of accuracy.

.
I felt sufficiently nervous about this (last year) to 'contact NfER to ask the question - they insisted that no external data is used. Hang on a mo and I'll hunt down their reply...

KenR · Post by **KenR** » Thu Mar 12, 2009 11:49 am

You might be interested in post I put on the B/Ham forum some while ago about Standardisation. This explains why the maximum standardisation score can be higher than 141 in some cases:-

ok I've done some further research and found some interesting information - it seems that although the maximum scores for normal 11+ and CAT tests are 140/141, occasionally the maximum scores can be higher for different types of tests. It looks as though this comes down to the distributional factors for more difficult tests such as the Univ of Durham CEM Tets.

If you take a look at the following DCSF Research Paper:-

http://www.dcsf.gov.uk/research/data/up ... /RR665.pdf

And in particular take a look at Table 32 - on page 55-56, you we see that this give the statistics for various types of CAT and MidYIS tests, including the min & max scores. You will see that whereas the CAT max scores are all 141 some of the MidYIS CAT scores go up to 163, 165 or 168. There are also some charts in the back of the paper that show the distributions.

This is explained in the paper as follows:-

Quote:

Closer analysis of KS2 matched to CAT/MidYIS scores

This analysis was based on KS2 results matched to either CAT or MidYIS data for 84374 pupils assessed at KS2 in 2001/02 and 2002/03 and assessed by CAT3 or MidYIS in Year 7 the following September. This data was also matched to PLASC data from 2002 to 2004.

CAT3 data was drawn from 5 LEAs and MidYIS data from 2 LEAs. In total, the sample contained matched KS2 to CAT records for 44668 pupils and matched KS2 to MidYIS records for 39710 pupils.

Both CAT and MidYIS tests varied in central tendency and dispersion. Standardised age scores (SAS) achieved in MidYIS tests tended to be over 100; whilst standardised age scores on CAT tests were closer to 100. This was likely to be due to variations in the ability of pupils tested: KS2 results achieved by the MidYIS cohort tended to be higher. Whilst scores achieved in the three component tests of the CAT battery ranged from 69 to 141, standardised age scores achieved in component
MidYIS tests varied in range, with less dispersion noted in vocabulary scores. The table below summarises these findings: -

So in summary, the some of the KE Foundation Papers are so difficult that you get much lower mean value and very wide distributions of scores, so in this case the maximum Age Standardised Scores can be higher. I have to say that I didn't know that - I stand corrected!

Regarding the Standardisation process, this can be done in 2 ways:-

(1) Either against the candidate cohort in the particular exam - this is typically the normal mechanism for most bespoke 11+ exam
(2) But it can also be done against a Statistical National Sample by the exam complier., This is sometimes done where the 11+ exam is a package product such as with the Moray House Tests that used to be used in Warwickshire until a couple of year ago.

The characteristics of the latter standardisation process is that the individual candidate test scores tend to be higher as only pupils who think they have a realistic chance of passing test are entered. Often many pupils can get scores close to 140 or 141 in this scenario.

I should just add one caveate - Buckingham 11+ is different, in that the average score post Standardisation is about 111 rather than 100.

Hope this helps

Charlotte67 · Post by **Charlotte67** » Thu Mar 12, 2009 11:53 am

Found it!

As you can see it doesn't answer the exact questions being asked here, but it does give some insight into the workings of NfER. Now that I know more, my question appears rather naive

Janet from NfER wrote:1) Regarding age standardisation: Because each exam is individually standardised does this mean that you have a roughly equal number of children from each month "passing"? If so this does not seem fair on children born in, say, April who would normally be successful, but in one particular year have very strong competition. Is no historical data used at all? The standardisation is based only on the group of children that take the paper. As the age standardisation is based on when the children were born to the nearest completed month all candidates have the same chance of obtaining a place if they score above the pass mark and there is a place available for them. For example if a school has 120 places they will select up to 120 children who are at the top of the scale being used. Historical data is not meaningful in this context since new tests are specially constructed each year.

I really don't think she answered my question - but then I probably didn't explain myself well enough (no change there then

)

Reader · Post by **Reader** » Thu Mar 12, 2009 12:29 pm

I think she is saying that the marks are generally standardised over the whole group e.g. the younger ones will have extra marks. I don't think that any child actually has marks taken away. I also don't think that children are just compared with those in their 'month group' - it is an overall scale simply to redress the balance of less learning time/maturity at the younger end.

mattsurf · Post by **mattsurf** » Thu Mar 12, 2009 12:32 pm

Charlotte67 wrote:Found it!

As you can see it doesn't answer the exact questions being asked here, but it does give some insight into the workings of NfER. Now that I know more, my question appears rather naive
Janet from NfER wrote:1) Regarding age standardisation: Because each exam is individually standardised does this mean that you have a roughly equal number of children from each month "passing"? If so this does not seem fair on children born in, say, April who would normally be successful, but in one particular year have very strong competition. Is no historical data used at all? The standardisation is based only on the group of children that take the paper. As the age standardisation is based on when the children were born to the nearest completed month all candidates have the same chance of obtaining a place if they score above the pass mark and there is a place available for them. For example if a school has 120 places they will select up to 120 children who are at the top of the scale being used. Historical data is not meaningful in this context since new tests are specially constructed each year.
I really don't think she answered my question - but then I probably didn't explain myself well enough (no change there then )

I don't think that they answered your question, and reading their answer I am a little worried: if there are only 60 kids in each month being standardised, the Std Error is greater than 15% which seems much too great. This is on top of the std error splitting the kids down into age groups (ie the std error on a sample of 700 is 5%) assuming that birthdays are normally distributed - which they aren't, I think that more kids are born in spring and less in autumn. What this means is that in a sample of 700 kids I suspect that the number per month is in a range between 55 and 63 and 55 samples gives an error of 18%

mattsurf · Post by **mattsurf** » Thu Mar 12, 2009 12:42 pm

Reader wrote:I think she is saying that the marks are generally standardised over the whole group e.g. the younger ones will have extra marks. I don't think that any child actually has marks taken away. I also don't think that children are just compared with those in their 'month group' - it is an overall scale simply to redress the balance of less learning time/maturity at the younger end.

That's not what the NFER site says - it is explicit that marks are standardised just for children born in that month. No child is given an extra score or has any score taken away

I assumed that many children take the same test - probably many thousands therefore there are several thousand in each age group and therefor the standardisation process is actually a really good way of scoring the test.

The only problem which could arrise is if there are insufficient children to provide a good sample size - however, I cannot believe that this is the case, standardisation seems to be a really well thought out process and would not overlook an issue like that

Charlotte67 · Post by **Charlotte67** » Thu Mar 12, 2009 12:42 pm

Reader wrote:I think she is saying that the marks are generally standardised over the whole group e.g. the younger ones will have extra marks. I don't think that any child actually has marks taken away. I also don't think that children are just compared with those in their 'month group' - it is an overall scale simply to redress the balance of less learning time/maturity at the younger end.

I was assured that the children are ONLY compared with those of the same age to the nearest month. No children are given 'extra marks', this is a statistical process. In theory, younger children could need to score higher in some years...