Professional Documents
Culture Documents
EL USO DE DATOS
DE AUDIENCIA
Contenido
Introduccin
Glosario bsico
1. Nmero mnimo de casos para analizar un target
1.1. Cul es el nmero mnimo de casos conforme a lineamientos internacionales?
1.2. Qu se recomienda hacer cuando hay un nmero pequeo de casos de un
target en la muestra?
2. Consultas de targets con pocos casos en muestra
2.1. Cmo utilizar las variables de anlisis en consultas de target con pocos
casos en muestra?
2.2. Qu tipo de agregados pueden disminuir la variabilidad del dato?
3. Alcance y Frecuencia durante el periodo de rotacin del panel
3.1. Qu variaciones se pueden esperar en los resultados de Alcance y Frecuencia durante una campaa publicitaria?
3.2. Qu se recomienda para disminuir la variacin en la post-evaluacin y la
planeacin?
4. Variable Error Estndar Relativo en la MSS
4.1 Uso e interpretacin de la variable Error Estndar Relativo
Introduccin
Glosario bsico
En el mdulo de creacin de targets en los software Media Quiz y Media Smart Station es posible observar por colores (rojo,
amarillo y verde) la identificacin del nmero de casos con el que se trabaja un target en un anlisis especfico: rojo = 30 casos
o menos (no utilizar), amarillo = 31 a 49 casos (utilizar con reservas) y verde ms de 50 casos (anlisis con mayor estabilidad).
Para mayor ilustracin ver Anexo 2.
Al crear un target de consumo de televisin es poco recomendable utilizar demasiadas variables demogrficas, ya que entre ms demogrficos se seleccionen, menor es el nmero de
casos, lo cual genera variabilidad en los datos.
La integracin de individuos que no son parte del target, pero que tienen caractersticas demogrficas similares, puede contribuir a dar estabilidad de los resultados sin impactar demasiado al pblico
objetivo. De lo contrario, se tendrn que considerar intervalos de confianza ms amplios.
2.1. Cmo utilizar las variables de anlisis en consultas de target con pocos casos en muestra?
Establecer la prioridad entre las diferentes variables de anlisis ayuda a minimizar o contener el error, puesto que el error incrementa a medida que se hacen anlisis con la mnima expresin de cada variable, as:
Target: (Mnima expresin permisible por el software: 30 casos)
Vehculo: (Mnima expresin permisible: 1 canal)
Horario: (Mnima expresin permisible: 1 minuto)
Periodo: (Mnima expresin permisible: da)
Una vez elegida la variable principal del anlisis, por ejemplo si fuera un programa, se recomienda robustecer el resto:
Target: si se eligen hombres 19+ ABC+ en Guadalajara y este target tiene menos de 200 casos
en muestra, es posible robustecerlo agregando mujeres 19+ ABC+ en Guadalajara, dado que
el programa en cuestin es de un gnero para todo pblico. En el supuesto que el gnero fuera
particular, se podra robustecer con segmentos de edad y/o nivel socioeconmico inmediatos,
hasta llegar al punto de integrar otros dominios.
Vehculo: si el programa es transmitido en ms de un canal, se incluiran estos canales.
Periodo: permite consultar la audiencia del programa en un da especfico y analizarlo tantas
veces como haya sido transmitido.
2.2. Qu tipo de agregados pueden disminuir la variabilidad del dato?
Existen prcticas en distintos mercados que tambin utilizan estimaciones con muestras que minimizan o contienen el error estndar aplicando promedios. Por ejemplo, Nielsen realiz un anlisis considerando ratings puntuales por minuto respecto a ratings con otro tipo de agregados. Dicho anlisis lleg a la siguiente conclusin:
Este anlisis logra reducir el error estndar al realizar distintos promedios respecto el rating especfico de un minuto.
La rotacin genera una baja de Alcance respecto al que una misma campaa suele lograr con
base en un panel estable.
Los GRP no resultan impactados en forma significativa, por ende la Frecuencia tiende a subir
de manera sostenida a raz de la baja de Alcance.
Es importante que el usuario considere los siguientes promedios de baja de Alcance y alza
de Frecuencia como una magnitud de los movimientos observados y no como tasa
de ajuste puntual, ya que en los distintos targets analizados se registraron diferencias con
mayor o menor importancia.
- La baja de Alcance suele ser mayor a medida que las campaas se sostienen durante ms tiempo,
como ejemplo los siguientes datos:
-1.5% de Alcance promedio en campaas mensuales
-6.4% de Alcance promedio en campaas trimestrales
-14.7% de Alcance promedio en campaas semestrales
- El incremento de Frecuencia suele ser mayor a medida que las campaas se sostienen por ms tiempo,
como ejemplo los siguientes datos:
+1.6% de Frecuencia promedio en campaas mensuales
+6.8% de Frecuencia promedio en campaas trimestrales
+17.3% de Frecuencia promedio en campaas semestrales
Ver Anexo 5.
Total personas con guest, personas 19+ con guests, personas 19+ sin DE con guest, personas 4-12 con guests, amas de casa,
mujeres ABC+C 19-44 con guest, mujeres CD+ 19-44 con guest, hombres ABC+C 19-44 con guest, hombres CD+ 19-44 con
guest, personas ABC+C 19-44 con guest, personas CD+ 19-44 con guest, amas de casa ABC+C, amas de casa CD+, personas
4-12 ABC+C con guest, personas 4-12 CD+ con guest, personas 13-18 ABC+C con guest, personas 13-18 CD+ con guest, total
hogares.
10
Anexo 1
Lineamientos internacionales
Fuente
IBOPE AGB
NIelsen US
MRC
(US standard)
GGTAM
Muestra mnima
para crear o analizar
un target
Muestra mnima
ptima
30
50
30
50
50
75
Entindase como muestra mnima ptima el nmero de casos necesario para considerar robusta una
muestra.
11
Anexo 2
Creacin de targets en MSS
La creacin de targets en el software MSS est ligada a un archivo denominado semforos que determina
el nmero de casos necesario para crearlos. Las definiciones son las siguientes:
Verde: indica que el target cumple con el nmero ptimo de casos (>=50 casos)
Amarillo: indica que el target cumple con el mnimo nmero de casos (30 a 49 casos)
Rojo: indica que el target NO cumple con el nmero de casos para ser creado o evaluado (<30 casos)
12
En el anlisis, los targets en amarillo solamente muestran datos, mientras los marcados en rojo no:
13
Anexo 3
Estos lineamientos se fundamentan estadsticamente en el tamao de los errores estndar. En el siguiente
grfico se compara el error estndar de dos estimaciones de rating (1 punto y 5 puntos), calculados con
distintos tamaos de muestra que van de targets con 5 casos hasta targets con 100 casos.
Error estndar
Tamao de muestra
En sntesis, a mayor tamao de muestra se va reduciendo la distancia del tamao del error estndar (lnea
roja y azul). Sin embargo, llega un nivel en donde la reduccin va siendo marginal, aunque el tamao de
muestra se siga incrementando.
14
s%
Anexo 4
En el siguiente grfico se expone un ejercicio terico para verificar el mnimo nmero de casos en muestra
que podra ser establecido como ptimo, en torno a los distintos tamaos de ratings que regularmente
maneja la industria.
Tamao de la muestra
Error estndar
Puntos directos de rating %
AV
1.8
500
400
300
200
100
50
30
6.0%
30
5.0%
AVG
1.8
50
4.0%
3.0%
AV
1.8
100
B
200
2.0%
300
400
500
1.0%
0.0%
0.1%
0.5%
1.0%
3.0%
5.0%
7.0%
9.0%
11.0%
Cada lnea representa un nmero de casos en muestra y presenta el comportamiento del error estndar de
acuerdo con los diferentes tamaos de rating que van desde 0.1% a 11%. Sobre el anlisis:
El recuadro azul enmarca la zona en donde se encuentra 80% de los distintos tamaos de rating de 17 targets (en individuos) que regularmente trabaja la industria, y la lnea roja vertical
establece el promedio de dicho 80% (1.8 puntos de rating).
La separacin entre cada una de las lneas es casi la acumulacin de distancia entre sus precedentes respecto sus consecuentes. Se ilustran dos distancias: entre la lnea azul fuerte (500
casos) y la morada (200 casos) hay una distancia de error determinada por la disminucin de
300 casos en muestra (A). Dicha distancia en error es ligeramente menor a la distancia establecida entre la lnea morada (200 casos) a la azul clara (100 casos) (B). Cuando se entiende
que la disminucin en muestra es slo de 100 casos, es decir, un tercio de disminucin de
muestra respecto la establecida en (A). Este fenmeno resulta en errores ms significativos a
medida que se reduce el tamao de muestra y se llega a casos como 50 y 30 casos.
15
Este ejercicio explica que respecto a los tamaos de rating con los que actualmente trabaja la industria
(recuadro azul), targets de entre 200 y 500 casos podran considerar un error estndar muy cercano, lo cual
contribuira a dar mayor certidumbre acerca de anlisis puntuales; es decir, consulta diaria y franjas en
minutos. Cuando se trabaja con menos de 200 casos, y en especial con menos de 100, los errores estndar
sern ms significativos dado que los tamaos de muestra no actan proporcionalmente.
Tamao de rating por minuto de enero 1 a septiembre 20 de 2012, en los targets personas total con guest, personas 19+ con guests,
personas 19+ sin DE con guest, personas 4-12 con guests, amas de casa, mujeres ABC+C 19-44 con guest, mujeres CD+ 19-44 con
guest, hombres ABC+C 19-44 con guest, hombres CD+ 19-44 con guest, personas ABC+C 19-44 con guest, personas CD+ 19-44 con
guest, amas de casa ABC+C, amas de casa CD+, personas 4-12 ABC+C con guest, personas 4-12 CD+ con guest, personas 13-18
ABC+C con guest, personas 13-18 CD+ con guest; canales Proyecto 40; Cadena Tres; C2; C5; C7; C9; C13; TVP; foro tv. Total 28 ciudades.
16
Anexo 5
Se realiz un ejercicio de simulacin de rotacin del panel que consider a todos los hogares intab del 1
de enero al 30 de junio de 2012 (6 meses de datos). En cada base diaria fue reemplazado el nmero de
identificacin de 20 hogares, por ejemplo, en la base del da 1 de enero se reemplaz el identificador de 20
hogares, el da 2 de enero se cambiaron los 20 del da 1 de enero ms otros 20 nuevos, y as sucesivamente
hasta cambiar el identificador de todos los hogares del panel hasta el ltimo da de junio.
Una vez ejecutados estos cambios de identificador de hogares, se calcul el Alcance y Frecuencia de varias
campaas publicitarias de distintos sectores cuya duracin fuera de un mes, otra de tres meses y una ltima
de seis meses. Asimismo, se calcularon los datos en 17 targets personas, los ms utilizados por la industria.
Los resultados obtenidos derivaron los siguientes resultados -en promedio- de todos los targets consultados, como se muestra en los siguientes grficos:
Decremento del Alcance
por la rotacin
Aumento de la Frecuencia
por la rotacin
Es importante mencionar que la razn del decremento en Alcance e incremento en Frecuencia se genera
por la forma en que se construyen las matrices de panel continuo dentro de su clculo. Para cada caso en
muestra se determina un expansor ponderado con base en la continuidad que cada uno tiene en el periodo
de anlisis seleccionado.
17
Anexo 6
Measuring Small Audiences:
The Challenge to Audience Measurement Systems
________________________________________
Tony Twyman
BARB: Broadcasters Audience Research Board and CRCA: Commercial Radio Companies Association, United Kingdom.
Steve Wilcox
RSMB Television Research Limited, United Kingdom.
Introduction
All developments in electronic and broadcast media leads towards more stations, more choice, more targeting and the fragmentation of audiences. The creation of yet more small audience channels does not,
however, eliminate the appeal of mass audience channels. These remain as mass audience advertising media, with a continuing demand for the spot by spot assessment of campaigns which has been a key feature
of people meter measurement systems.
The newer and smaller media are likely to sell their advertising in a radically different way from the mass
media, with packages of spots, even packages of stations, replacing the old single spot unit of advertising
measurement. They will, however, often be selling to advertisers using the mass media and wanting comparability between the meanings of assessments for their different campaigns.
The challenge to the current style of national measurements systems is how to accommodate the ever
widening range of audience sizes which it is expected to measure. This is a feature of a recent paper presented by Read and Johnson (1997) in which they discuss the development of the next British audience
measurement specification. The core of the problem is that the smaller the audience, the larger the relative
size of sampling error. This implies potential increases in sample sizes of a scale which exceeds the likely
expansion of advertising revenue and research funding available. Paralleling the diverging advertising demands on research systems, the broadcasting programme makers also will have different requirements,
according to the varying nature of their programming.
It is important to recognise, however, that there is not just one small audience problem but a number
with different potential research solutions and even in some cases no conventional solution.
The Measurement of Sampling Error
Key to any discussion of the measurement of small audiences is a realistic appreciation of the extent of
sampling errors involved. We will be referring to and summarising extracts from work in the United Kingdom and elsewhere.
First, however, we must make clear what we mean by sampling error. In the purest sense the term sampling
error is used to mean the deviation of behaviour of a randomly selected sub-sample with no response bias,
18
from the behaviour of the total population. Practical situations are far from this. Samples are not purely
random and there is response bias but we still need to understand the variability of the data.
In using the term sampling error we want to express the degree of variability which any measurement or
comparison between measurements is subject to when there is no real change in the behaviour which it
is intended to measure.
With panels there are a number of factors which contribute to the amount of statistical variability:
When panels are initially recruited, the sample will be biased through:
differential non-response biases inherent in the system, chance features of the particular sample selected which remain as a sample bias, changing over time as panel membership changes.
Comparisons between measurements at different times involves many of the same individuals
and according to the degree of correlation between their behaviour at those times, there is a
reduction in sampling error. This correlation diminishes over time, however, as people get older,
change their social life, their work life and their interests.
The need to balance known demographic imbalances involves weighting which can significantly decrease effective sample size and increase sampling error.
Types of Small Audience Situations
In this paper we seek to identify the range of small audience situations, the likely data requirements and
how they might be researched. We are not starting with a clean slate here. In most countries with broadcasting systems sufficiently developed to generate small station problems there will already be sophisticated people meter systems.
These have been designed initially to measure mass audiences but have been progressively expanded and
adapted to report on smaller audiences. Research solutions have to be considered:
within existing systems,
by expanding and adapting existing systems,
by creating entirely new research sources.
Viewing by Smaller Sub-Groups to Larger Stations
This is essentially a problem of sample size. Viewing by smaller sub-groups to larger stations is a situation
which regularly occurs within existing systems and even for mass audience channels. There appears to be
a law whereby, whatever the sample size, the number of sub-groups reported expands to include many for
which the sample size is inadequate.
Within the current BARB system the sample sizes and sampling errors shown in Table 1 for the largest
regional panel illustrate the point.
19
TABLE 1:
SAMPLE SIZES AND SAMPLING ERROR
S
All Individuals
Adults
Men
Women
Housewives
Housewives with Children
Women ABC1
Men 16-34
Women AB
Men 16-24
Children
ample Size
1204
996
459
537
530
154
296
164
129
62
208
8:30pm
TVR
9.6
11.1
11.0
11.2
12.7
9.8
8.9
5.9
4.8
7.8
2.5
Interval
+ 24%
+ 22%
+ 28%
+ 26%
+ 22%
+ 49%
+ 42%
+ 70%
+ 88%
+ 88%
+ 105%
These sampling errors assume the panel to be perfectly balanced. In reality the sampling errors are up to
20% larger for the actual panel which is weighted to correct for demographic profile imbalances. So these
sampling errors represent the best that could be achieved given the total number of homes available. It is
salutary to note that these sampling errors are for a peak time rating on the largest commercial TV station
in the United Kingdom.
Any attempt at optimising the choice of individual spots on the smaller sub-groups is clearly a waste of time.
One common response to statistics such as these is that practices should change and that trading should
be based on larger more reliable sub-groups and/or that optimisations and appraisal should be in terms
of schedules rather than individual spots. The statistical reliability of this approach is discussed in a later
section.
Another approach under test in the United Kingdom for regional sub-groups is that of modelling or factoring from the network panel.
The principle is that a regional panel measures the main audience categories directly. Sub-group audiences
are then factored by applying the relationship between the sub-group and the main category found at that
time on the larger network panel, to the directly measured regional main category audience. This factor is
derived after weighting the network data to match the demographic profile of the region. A summary of
this is provided in a later section.
This approach can be used to reduce sub-group variability equivalent to increasing effective sample size by
between 50% and 100%. This is an increase beyond the levels of affordability, but even that is not enough
where the market is trying to trade on sub-groups with samples of fifty.
This approach we believe could be used in the United Kingdom and help to make sub-group data more
reliable. It is, however, viable only in conditions such as in the United Kingdom where there is a broad framework of consistency in programming across the regions within the network and few marked deviations
of programming style at the regional level. We have not found evidence that regional variations in pro20
21
Two-thirds of the satellite channels reported by BARB never achieved any rating as high as 1% in this particular week.
If audiences are expressed in terms of numbers of viewers, however, they take on a reality which belies
their statistical bias. For example, a programme with a rating of 0.5% could easily lose all its audience from
one week to the next purely as a result of sampling error. Amongst housewives in satellite homes (there
are 6.6 million in the population) this represents a drop in the audience from over 30,000 to nothing at all.
How can you lose 30,000 viewers from one week to the next? Because they represent small ratings with
big sampling errors, the variability looks implausible.
What is the solution? This depends on the purpose for which audience research data are needed.
Programming: it is certainly possible to see which are the most successful programmes even from highly
unstable small ratings data for small stations. Judgement is considerably improved by managing several
weeks data together.
The precision of assessment for programme audiences is less than for large rating channels but the need
for more subtle distinctions may be less. The differences between programmes may even become more
deviant because they may be less affected by competition.
Buying and selling advertising: here any attempt to work on an individual spot time may be a waste of
time. Improved assessments may be made by:
using data averaged over time,
assessing whole schedules either within a channel but more realistically across a number of
channels.
For advertisers on niche channels representing a special market, e.g. a computer channel, advertisers may
well wish to get an idea of the best times to advertise. They are, however, more likely to buy a schedule
and compare the direct response with campaigns in other media. It is possible that the more specialised
the market, the less precise audience estimates are required.
Viewing to stations with restricted universes
With panels covering the whole television universe, some restricted universes may be represented with
only small sample sizes and there may be difficulties in representing their characteristics. Restricted universes in this sense occur in a number of ways.
Limited regional coverage
In the United Kingdom some cable franchises have small catchment areas. Within BARB, cable as a whole
is represented by a panel which is a specially weighted sub-set of the main panels. This reports separately
on cable stations which have a wide geographical coverage. Small regional cable franchises may, however,
wish to know the patterns of viewing to the station mix which they offer and their own local cable services. Their coverage by a network people meter sample is negligible. It would be possible but not economically viable to recruit a special people meter panel for the area. Instead, in the United Kingdom, the Cable
22
Research Group have commissioned, outside BARB, periodic two-week paper diary studies using diary
formats not unlike those used for much radio research. Some of this work is described in a later section.
This kind of situation is likely to increase for the future.
Most regional television structures end up with regions that vary in size. This often means that the smaller
stations would not have an adequate sample based upon proportionate regional sampling. The solution is
usually disproportionate geographical sampling or a federation of regional panels.
Whilst strict statistical logic would demand equal sized panels everywhere the money at risk argument
often leads to compromise whereby larger areas may be capped off at a certain limit and smaller areas
boosted up. The United Kingdom is an example of this illustrated by three of the thirteen regional areas
(see Table 2).
TABLE 2:
EXAMPLE OF REGIONAL AREAS WITHIN THE UNITED KINGDOM
London
North East
Border
20.2%
5.3% 6
1.2%
Percent of Meters
11.7%
.1%
2.2%
525
275
100
One of the problems is that there is sometimes a tendency to treat all the areas as having the same currency available. Thus the sample size for Border hardly warrants pursuit of spot by spot buying, certainly
not for sub-groups, but it sometimes happens. Possible remedies include selling by schedules or aggregated ratings and factoring discussed elsewhere.
An example of equal sized regional panels is Belgium with two equal panels of 750 households for each
of the Flemish and French speaking parts. Paradoxically, although the national sample is about a third of
the United Kingdoms, the actual panels used for trading are larger. A regional programming and trading
structure is one situation where, even with mass audience channels, the use of people meter panels leads
to statistical strain. It seems likely, however, that there is a general trend towards trading television advertising in larger units which may ease this.
Services based upon new technology
The advent of satellite transmissions was a past example of this. There it was possible to recruit a special
sample of satellite receivers who had a vastly increased range of programme choice compared with terrestrial reception. In the United Kingdom there was initially a separate panel but ultimately a specially
weighted sub-sample of the main people meter panels was used. This currently provides a sample of
around 1200 households.
We expect that digital television will be measured in the same way in the United Kingdom. This could initially involve special extra panels for satellite digital, terrestrial digital and even digital cable services since
23
there is very likely to be much initial overlap between this mode of reception. Such panels could be merged
with analogous panels when the universe is large enough.
These new developments present special small audience problems:
a. Universes. It is easy to define access to equipment but harder to measure it when it starts
from zero and may rise rapidly and erratically. Within the broadest definition of having the
reception equipment, however, there is the added complication of subscription packages incorporating different channels. These are subject to an additional variability from take-up and
churn within the variability of the equipment universe. Universes have been generally obtained from some independent survey source. An establishment survey, for example, for a slowly
changing terrestrial source can provide:
a reliable estimate of universe size,
a profile of demographic and often characteristics of the universe,
a source of households for panel recruitment.
With new services such as digital transmissions the new problems are that the universes are:
initially very small,
dispersed through the population,
changing very rapidly; for individual stations up and down,
highly complex in terms of combinations of channels received.
These characteristics mean that no representative sample is likely to be large enough, affordable on a continuous basis nor even able to be processed quickly enough. This means that some
alternative approach is necessary. In practice broadcasters will have exact databases showing
who is paying for and receiving what on a near-daily basis. It would be logical to use this. The
objection is sometimes raised that broadcasters might inflate the figures and/or be able to
detect the identity of the panel home. It will be necessary to counter this by some form of
independent auditing and access to the database. It will also be necessary to create new legal
safeguards and protection against interference with panel homes. The use of a broadcasters
database does not uniquely create this problem, it is there from the moment that the broadcaster gets into a direct one-to-one on-going relationship with the households in the audience.
Ultimately intelligent digital decoders will be able to record station viewing data in great detail
on large samples. This will need some individual viewing data from smaller samples modelled
onto it. This would solve both the universe and the small audiences problem for marketing
strategies based on type of household rather than type of individual.
b. Audience fragmentation. New technologies bring more choice and greater fragmentation. Digital television is likely to extend the range of channels from the thirty or more of
satellite into the hundreds. Different channels will show the same films at different times
to provide a near video-on-demand service. Necessarily, most audiences will be very small,
a further extension of the issues discussed earlier. This means that for all but a few channels
the assessment of individual spot ratings will be pointless. We would expect to see television
planning and assessment based upon aggregated data probably involving selling of schedules
comprising many small ratings spread across a range of channels.
24
Once again the receiving of schedule audience measurement becomes of crucial importance.
c. Panel structure. As services develop those which can afford people meter panels will probably do so but initially, with relatively small sample sizes. It will also be necessary to control
panel membership in terms of:
combinations of channels received,
novelty effects, i.e. length of ownership.
This will require complex weighting reducing effective sample sizes even more and exacerbating the problems of fragmentation discussed above. Only data aggregated across channels
and/or times will be robust.
When choice gets this complex and with the development of electronic programme guides where programmes can be chosen without channel awareness, the option of using alternative techniques such as
paper diaries and recall will no longer exist.
The industry will therefore have to get used to using audience measurement data for small audiences from
small people meter panels in a responsible way using aggregated data. Until that is the intelligent decoder
is able to give precise set range data on large samples.
Programming needs will vary according to the nature of the channel. Even with very small share channels
it is possible to see which are the most popular programmes, particularly if schedules are consistent and
weeks averaged together. BARB currently publishes Programme Top Tens for many small share channels,
which are robust, in the sense of similar programmes appearing week after week. Any subtlety in terms
of small differences between audiences would, however, be impossible. Programmers wishing to fine tune
programmes or schedules would probably gain more from qualitative research among viewers to their
programmes.
Ethnic or language minorities
Ethnic or minority language groups are likely, by definition, to have low representation on general representative samples. There are likely to be special sampling problems in that such groups are both clustered
but not exclusively confined within any geographical boundary. Universe measurement and sample selection probably requires large scale surveys and some allowance made for differential non response. Even
then the sample may not be adequate to provide reliable data for channels servicing these groups.
One solution is a separate people meter panel. This occurs in the United Kingdom for Welsh speakers,
measuring audiences to S4C. There is no separate panel for Gaelic speakers in Scotland. Response to programmes in Gaelic is studied through qualitative audience appreciation studies. The Gaelic channel in the
Republic of Ireland has Gaelic speakers as a possible audience sub-group on the main panel. In Germany
foreigners have been excluded from the main television panel universes but may now be represented by
a separate panel.
Whether an ethnic minority or language channel has a separate panel is largely a matter of economics.
Cable and the development of digital services will make niche channels possible for smaller ethnic groups.
25
The limited data available from mainstream panels may mean that alternative techniques have to be used.
Viewing to minority interest stations, intermittent interest channels
The multiplication of choice will give rise to channels which have a very restricted niche appeal but one
which is not identifiable by region, language or ownership of equipment. The channel would be based upon
interest in a topic such as natural history, or history. An intermittent interest channel would be a weather
or traffic channel. These stations are essentially general in potential appeal but likely to achieve a low reach and share. They suffer from the general small channel problems and the solutions lie in aggregation as
already discussed.
For minority stations where there is a marked minority appeal, there may be problems not only of sample
size but also of panel bias. The chance of over or under representation of a minority interest group could
stay with the panel for some time. Here alternative techniques with larger independent samples may help.
Sampling errors for small audience measurement
Small Audience Measurements
In the United Kingdom, the BARB TV people meter system currently reports audiences to five national
terrestrial channels, one local terrestrial channel (S4C in Wales), thirty-eight channels delivered by satellite
and cable (this number is constantly changing) and five cable exclusive channels. Typically, the national
terrestrial channels account for the following audience shares (see Table 3).
TABLE 3:
AUDIENCE SHARES
BBC1
BBC2
ITV
Channel 4
Channel 5
30%
11%
33%
10%
3%
Of the thirty-eight satellite/cable channels, only two account for more than 1% of all viewing. The full
distribution is as follows (see Table 4).
TABLE 4:
DISTRIBUTION OF SATELLITE/CABLE CHANNELS
Share
1-1.5%
0.5-1%
0-0.5%
Number of Channels
2
6
30
In total, the five cable exclusive channels account for less than 0.5% of all viewing, as does the local terrestrial channel S4C.
26
A large number of cable exclusive channels are not reported by the BARB system because the data is not
considered to be sufficiently robust. (These are catered for outside the BARB system.) The arrival of digital
TV later this year will generate yet another small audience measurement requirement.
The small channel shares are partly due to the large numbers of channels available and partly because only
34% of the population have access to cable or satellite and only 13% are cable exclusive.
In the United Kingdom, the national terrestrial channels are also commonly reported on a regional basis,
in terms of either the twelve BBC editorial regions or sixteen ITV areas. This is another key dimension
resulting in small audiences. For example, if 10% of the population live in a particular ITV area, then the
share of all TV viewing by the whole national population which is accounted for by viewing to the ITV
station broadcasting in that area is only 3.3% (i.e. 10% of the national ITV share of 33%). Effectively this
is another example of a restricted availability channel because only 10% of the population has access to
that particular regional ITV station.
The last dimension which results in audience fragmentation is the need to report on demographic categories, ranging from simple male/female splits to very tightly defined age groups. For example, a 10%
penetration sub-groups viewing to an ITV station in an area containing 10% of the population would only
account for 0.33% of the total national populations viewing.
Of course it is not normal practice to report such fragmented audiences as percentages of the total national population base. Therefore the percentages are not normally seen as such small numbers. However, this
way of presenting the audiences is a useful lead in to the consideration of sampling errors and the relative
reliability of the various audience measurements.
Sampling Error Study
BARB and RSMB have recently completed the first phase of a major study of the sampling errors associated with the various audience measurements produced by the TV people meter panel in the United
Kingdom. This is considered to be an essential contribution to the sample design component of the future
audience measurement specification. The theory has been developed to allow the calculation of sampling
errors for many different audience measurements and to compare the performance of perfectly balanced
proportionate and disproportionate designs and to assess the effect of weighting used to correct for the
usual panel imbalances that exist within an operational system.
The calculation of sampling error
Several papers have been written concerning the components of sampling error and the methodology for
their calculation (eg. Schillmoeller, 1992; Boon, 1994 and Twyman and Wilcox, 1996).
The calculation of sampling error takes account of the variability in the audience measurement between
individuals, the sample size, clustering within households and weighting. These factors and their effects can
be different for each measurement, for each channel, for each demographic category and each area base.
27
It was necessary to consider the whole range of different audience measurements because some will have
smaller sampling errors than others and therefore may be more useful in the small audience situation:
average ratings and channel share for all-time, time segments (day-parts), quarter hours and
individual minutes;
channel reach;
programme, commercial break and individual commercial spot ratings;
reach and frequency analysis;
daily, weekly and four week averages;
change over time, from month to month and from year to year.
The actual analyses were based on a limited number of channels using two ITV area panels. For the purposes of this paper, the results have been interpreted to provide approximate sampling errors for a number
of hypothetical situations.
All sampling errors have been converted to 95% confidence intervals. This means there is a 5% chance
that the audience measurement estimate is more than one confidence interval from the true value of the
audience measurement.
Sampling errors for proportionate panel designs
Because network based channel share encapsulates the extent of each small audience situation, a useful
start point is to consider the sampling errors on channel shares for the types of viewing situations described in earlier.
TABLE 5:
CHANNEL SHARE SAMPLING ERRORS - ALL ADULTS 16+, NETWORK
Channel
Share
Single Minute
Single Day
4 Week Average
BBC 1
30.0% +
10 %
+ 1.4%
+ 0.9%
BBC 2
11.0%
+ 19%
+ 2.3%
+ 1.2%
ITV
33.0% +
9%
+ 1.2%
+ 0.8%
Channel 4
10.0%
+ 20%
+ 2.1%
+ 1.2%
Channel 5*
3.0% +
30%
+ 5%
+ 3%
Satellite
1.0-1.5%
+ 50%
+ 8%
+ 5%
Satellite
0.5-1.0% +
65%
+ 10%
+ 6%
Satellite*
Under 0.5%
+ 90%
+ 15%
+ 8%
S4C*
0.3% +
90%
+ 15%
+ 8%
Cable only*
Under 0.25%
+ 100%
+ 20%
+ 10%
Sampling errors have not yet been calculated for Channel 5, S4C nor any of the cable channels so
calculated for any satellite channels with less than 0.1% share. The single minute sampling error
relates to peak-
Sampling errors have not yet been calculated for Channel 5, S4C nor any of the cable channels so interpolations have been made. It should also be noted that sampling errors have not been calculated for any
28
satellite channels with less than 0.1% share. The single minute sampling error relates to peak-time.
The BARB panel in the United Kingdom is currently nearly 4500 homes with 8600 adults. If this were to
be of a proportionate design and perfectly balanced (i.e. no weighting were required) then the shares of
viewing would have the following sampling errors. These are shown for a single minute, a single day and
then for a four week average in Table 5.
At the next level of fragmentation (see Table 6), consider sampling errors for an average 10% penetration
demographic sub-group or a 10% penetration geographical region. The network based channel shares are
all divided by ten. The sampling errors above increase as the sample size decreases - i.e. multiply by .
TABLE 6
CHANNEL SHARE SAMPLING ERRORS - 10% SUB-GROUP OR REGION
Channel
BBC1
BBC2
ITV
Channel 4
Channel 5*
Satellite
Satellite
Satellite*
Cable only*
Share
3.0%
1.1%
3.3%
1.0%
0.3%
0.10-0.15%
0.05-0.10%
Under 0.05%
Under .025%
Single Minute
+ 32%
+ 61%
+ 29%
+ 64%
+ 96%
+ 160%
+ 208%
+ 288%
+ 320%
Single Day
+ 4%
+ 7%
+ 4%
+ 7%
+ 16%
+ 25%
+ 32%
+ 47%
+ 63%
4 Week Average
+ 3%
+ 4%
+ 3%
+ 4%
+ 9%
+ 16%
+ 19%
+ 25%
+ 32%
For highly targeted channels this table must be interpreted carefully. This is because it would not be normal practice to analyze average demographic sub-groups. More often than not the key target sub-group
would account for a large proportion of the channels total audience. In this situation the percentage
sampling error will not increase as much as the decrease in sample size suggests because the sub-group
has higher viewing levels. In fact we can hypothesis that if we do have a situation where the whole of a
channels audience is attributed to one key demographic sub-group (e.g. 16-34 year olds and a young
music channel) then the percentage sampling error for the sub-group is the same as for all adults.
This can be demonstrated theoretically, using single minute ratings rather than channel share in order to
keep things simple:
Network sample = 8600 adults
Single minute TVR = 1
Sampling error =
95% confidence interval = 2.04 = 20% which is almost exactly the same and therefore almost totally
independent of the sample size.
Empirical evidence for alternative audience measurements and with a more sophisticated sampling error
calculation is not always so consistent. However the relationship seems to be proved in terms of orders of
magnitude - certainly the sub-group would not have a sampling error 3.2 (= ) times as large.
In order to complete the series of audience share sampling error tables, Table 7 is for an average 10%
penetration demographic sub-group within a 10% penetration region:
TABLE 7
CHANNEL SHARE SAMPLING ERRORS - 10% SUB-GROUP IN A 10% REGION
Channel
BBC1
BBC2
ITV
Channel 4
Share
0.3%
0.1%
0.3%
0.1%
Single Minute
+ 100%
+ 190%
+ 90%
+ 200%
Single Day
+ 14%
+ 23%
+ 12%
+ 21%
4 Week Average
+ 9%
+ 12%
+ 8%
+ 12%
The original network based channel shares are all divided by 100 and the confidence intervals are now ten
times as big as those shown in the national/all adults table.
Individual spot ratings vs. schedule averages
Having used channel share and average demographic sub-groups to demonstrate in principle how large
sampling errors can be in small audience situations, it is important now to consider real demographic subgroups and the key audience measurements used in the buying and selling of advertising. Real demographic sub-groups do not have average levels of variability nor average levels of clustering within households.
The key audience measurements relate to individual commercial spots and whole advertising schedules.
First consider the sampling errors for the ratings to a selection of individual minutes broadcast on ITV and
Channel 4, shown in table 8. The sample base is the London ITV area panel which comprised 530 homes,
delivering 459 men but only sixty-two men aged 16 - 24 years. The analysis period is November 1996. This
illustrates the small audience measurement situations arising from restricted areas, small demographic
groups and times of low viewing.
30
TABLE 8:
SAMPLING ERRORS FOR INDIVIDUAL MINUTE RATINGS
Channel
ITV
CH4
Time A
7:45am
1:45pm
8:30pm
7:45am
1:45pm
8:30pm
ll Men M
TVR
1.3
5.1
11.0
0.5
0.4
5.0
95% c.i.
+ 39%
+ 21%
+ 14%
+ 81%
+ 77%
+ 23%
en 16-24
TVR
2.5
3.8
7.8
0.0
0.0
0.0
95% c.i.
+ 95%
+ 68%
+ 44%
-
All the sampling errors are large, even for the peak-time ITV All Men rating. The sampling error for men
aged 16-24 years is huge. The zero Channel 4 ratings actually emphasise the small sample problem - even
a Men 16-24 rating of 5% (as achieved within All Men) would be the result of only three individual panel
members viewing.
By averaging over time, even within a continuous panel, there will be significant reductions in sampling
error. This fundamental theory originally expounded in a report prepared by Arbitron (1974) has been demonstrated in several published papers (e.g. Wilcox and Reeve, 1992). For example, average ratings over
four consecutive Mondays have the sampling errors shown in Table 9.
TABLE 9:
SAMPLING ERRORS FOR AVERAGE RATINGS - FOUR MONDAYS
Channel
ITV
CH4
Time A
7:45am
1:45pm
8:30pm
7:45am
1:45pm
8:30pm
ll Men
TVR
1.2
3.5
13.2
0.4
1.0
3.1
95% c.i.
+ 24%
+ 15%
+ 5%
+ 39%
+ 13%
+ 10%
Men 16-24
TVR
1.6
2.1
7.1
0.1
0.4
1.3
95% c.i.
+ 57%
+ 45%
+ 19%
+ 88%
+ 43%
+ 33%
31
TABLE 10:
SAMPLING ERRORS FOR TOTAL SCHEDULE RATINGS
Schedule C
I
II
III
IV
hannel A
ITV
Channel 4
Satellite
Total
ITV
Channel 4
Satellite
Total
ITV
Channel 4
Satellite
Total
ITV
Channel 4
Satellite
Total
ITV
Channel 4
Satellite
Total
ll Men
Total TVRs
33
19
4
55
148
71
14
233
140
36
10
186
254
171
36
461
174
64
17
256
95% c.i.
+ 15%
+ 18%
+ 26%
+ 12%
+ 6%
+ 9%
+ 18%
+ 5%
+ 7%
+ 9%
+ 19%
+ 6%
+ 6%
+ 6%
+ 17%
+ 5%
+ 7%
+ 9%
+ 20%
+ 6%
Men 16-24
Total TVRs
39
15
2
56
87
54
16
157
116
28
28
173
134
148
55
337
88
32
21
141
95% c.i.
+ 25%
+ 51%
+ 67%
+ 29%
+ 19%
+ 24%
+ 52%
+ 16%
+ 18%
+ 24%
+ 33%
+ 15%
+ 20%
+ 21%
+ 28%
+ 15%
+ 26%
+ 29%
+ 58%
+ 21%
Although there is some variability in the relationship with the single minute rating sampling errors - to
be expected with real data and small samples - on average the percentage sampling errors are halved. We
believe that many broadcasters are already using such averages for planning purposes.
This principle can be extended to whole schedules where in general we will find even greater reductions in
sampling errors. Table 10 shows results for five schedules broadcast in November 1996, again based upon
the London panel.
For All Men and for any schedule with a reasonable number of ratings, the sampling errors have reduced to
a more manageable level. However, there does seem to be a plateau beyond which additional ratings will
not result in further sampling error reductions. For All Men the minimum 95% confidence interval seems
to be 5% whilst for Men 16-24 it is about 15%. This is approximately in line with their relative sample sizes
although Men 16-24 are also more variable as a group.
However, the basic principle is clearly demonstrated: schedule total ratings have much smaller sampling
errors than do individual commercial spot ratings.
Sampling errors for different schedule structures
A key question we have asked is how much the sampling error for schedule total ratings is dependent upon
the composition of the schedule, i.e. is the schedule total ratings percentage sampling error high if the individual spots in the schedule have low ratings and therefore high percentage sampling errors? For example, is
the sampling error for a schedule of ten spots with an average rating of 20% the same as for a schedule of
twenty spots with an average rating of 10%? The principle can be demonstrated with some simplistic theory.
32
For a schedule of s spots each with a rating of p (and assuming statistical independence between spots)
then:
Sampling error =
Single Spot
% s.e.
2.2%
3.2%
4.7%
10.7%
15.2%
Number
of Spots
10
20
40
200
400
Total
TVRS
200
200
200
200
200
Total TVRs
95% c.i.
+ 5.9%
+ 6.3%
+ 6.4%
+ 6.6%
+ 6.6%
So in theory the total ratings sampling errors are independent of the size of the ratings which make up
the schedule. Certainly the variations in the percentage sampling errors are nothing like the variations in
the single spot percentage sampling errors. In practice the equality of the schedule total ratings sampling
errors will depend upon correlations in viewing between spots.
We can get a feel for whether or not this works in practice by comparing the ITV and Channel 4 components of each schedule. On average, single ITV ratings are about three times as high as single Channel 4
ratings. In Table 12 the schedule components have been ranked according to total ratings delivered.
33
TABLE 12:
SAMPLING ERRORS FOR ITV AND CHANNEL 4 SCHEDULES
Schedule C
IV
V
IV
II
III
II
V
III
I
I
hannel
All Men
Total TVRs
254
174
171
148
140
71
64
36
33
19
ITV
ITV
CH4
ITV
ITV
CH4
CH4
CH4
ITV
CH4
95% c.i.
+ 6%
+ 7%
+ 6%
+ 6%
+ 7%
+ 9%
+ 9%
+ 9%
+15%
+18%
Although the evidence is not exactly in line with the hypothesis that schedules with equal total ratings
have equal sampling errors, it is certainly not the case that a schedule of low rating/high percentage
sampling error spots will have correspondingly high sampling error for the total ratings. But what about a
restricted availability channel?
To generate the same impacts, a restricted availability channel with a 20% penetration would need to generate
ratings five times as large within its own universe, i.e. 1,000 ratings in total. The equivalent schedule structures
and theoretical sampling errors, now based upon a sample of only ninety-two men, are shown in Table 13.
TABLE 13:
SAMPLING ERRORS FOR SCHEDULES ON RESTRICTED AVAILABILITY CHANNELS
Single Spot
TVR
100
50
25
5
2.5
Single Spot
% s.e.
2.4%
4.2%
10.5%
15.1%
Number
of Spots
10
20
40
200
400
Total
TVRS
1000
1000
1000
1000
1000
Total TVRs
% s.e.
+ 4.7%
+ 5.7%
+ 6.4%
+ 6.5%
Even with the ridiculously high 50 rating spots, the order of magnitude of the sampling errors is preserved.
This time the empirical evidence shown in Table 14 is very thin, with only two satellite schedules coming
close to the terrestrial channels total ratings levels.
TABLE 14:
SAMPLING ERRORS FOR SATELLITE SCHEDULES
Schedule C
IV
V
hannel A
Satellite
Satellite
ll Men
Total TVRs
36
17
95% c.i.
+ 17%
+ 20%
34
However, these seem to fit in with the hypothesis of equal sampling errors for equal schedule total ratings.
The last hypothesis considered is: Do schedules with equal impacts have the same sampling error for total
ratings irrespective of the sample size of the demographic sub-group analysed? This is analogous to the
restricted availability channel situation.
Again we can get a feel for whether or not this works in practice by re-examining the schedule sampling
errors. This time the Men 16-24 total ratings are multiplied by 62/459 (the ratio of the sample sizes) to
form percentages of the All Men universe (equivalent to a comparison of schedule total impacts) before
ranking according to the total ratings delivered, as shown in Table 15.
TABLE 15:
SAMPLING ERRORS FOR MAIN CATEGORIES VS. SUB-GROUPS
Schedule C
I
IV
III
IV
I
III
II
IV
I
V
IV
V
III
II
II
V
III
I
II
IV
I
I
hannel C
Total
Total
Channel 4
Satellite
ITV
Total
Total
Channel 4
Channel 4
Total
ITV
Satellite
ITV
Satellite
ITV
ITV
Satellite
Total
Channel 4
Satellite
ITV
Satellite
ategory
All Men
Men 16-24
All Men
All Men
All Men
Men 16-24
Men 16-24
Men 16-24
All Men
Men 16-24
Men 16-24
All Men
Men 16-24
All Men
Men 16-24
Men 16-24
All Men
Men 16-24
Men 16-24
Men 16-24
Men 16-24
All Men
Total TVRs
55
46
36
36
33
23
21
20
19
19
18
17
16
14
12
12
10
8
7
7
5
4
95% c.i.
+ 9%
+ 15%
+ 9%
+ 17%
+ 15%
+ 15%
+ 16%
+ 21%
+ 18%
+ 21%
+ 20%
+ 20%
+ 18%
+ 18%
+ 19%
+ 26%
+ 19%
+ 29%
+ 24%
+ 28%
+ 25%
+ 26%
There are some exceptions but in general the hypothesis holds. More importantly, it is certainly not the
case that Men 16-24 sampling errors are 2.7 times as big:
35
channels. In these situations it is important to understand the sampling errors involved so that the best
use of existing panel data is made.
The sampling errors associated with audience measurements of individual minutes or commercial spots
are often huge. In the context of advertising schedules, any attempt at optimising the choice of individual
spots is often unjustified.
However, it is well known that the total ratings for whole schedules have much lower sampling errors
than the individual spots within a schedule. In fact it is broadly true that schedules with equal impacts
have equal sampling errors irrespective of the size of the individual spot ratings or the sample size of the
sub-groups analysed. Of course there are limiting situations in which this equality breaks down, but these
would correspond to unusually heavy advertising on an individual channel.
Undoubtedly this finding will be useful in many situations. However, it cannot be allowed to generate
complacency. In practice, a schedule on a low rating, restricted availability channel would never generate
total impacts for a sub-group which were equal to those for a main category on a high rating, national
channel.
Within the existing panels, there are measurements which provide a significantly more robust basis on which
to trade advertising air time. However, in many cases there is still no substitute for increased sample size.
Alternative measurement methods for low penetration channels
Choices
A recent paper (Franz 1997) points out that small rating stations may get neglected in media planning
because of their low representation on people meter panels. He suggests using independent samples
collecting data, say monthly, and capable of being aggregated to large numbers of respondents in a year.
One advantage is that a large sample size made up of independent samples reduces bias which may be
significant for small stations on a permanent panel. Out-of-home viewing may also be included in the
measurement. He advocates normalising the viewing levels to panel levels so that data can be used comparably (presumably separating out-of-home viewing). The use of data aggregated over time would mean
that the data would be for strategic planning and the panel data would provide some tactical information.
The techniques listed by Franz for a strategic television monitor are:
personal interviews: paper or CAPI,
computer aided telephone interviews,
self-completion diaries.
We have a case history to report from the United Kingdom using self-completion diaries, not in the continuous way suggested by Franz but as periodic snapshots.
36
37
where:
s = sub-group rating, small area;
m = main category rating, small area;
S = sub-group rating, network;
M = main category rating, network.
By re-arranging the above formula, we can derive the basic factoring model for estimating sub-group ratings in small sample areas:
The model is improved by weighting the network panel to the small area panel profile before calculating
the network ratings S and M.
This approach inevitably leads to results which are more stable than actual small sample based data, because all the components on the right hand side of the formula are based upon larger samples. However, if
the principle is not valid, then results will be biased. The purpose of the evaluation is to examine the tradeoff between improved stability and potential bias.
Theoretical Reductions in Variability (Sampling Error)
The sampling error of a factored rating will be a function of the sampling errors of all three components of
the factoring formula and their correlations. The mathematics of the theory are quite tortuous, but if we
make some fairly well justified assumptions then the following relatively simple formula can be derived for
the relationship between the sampling errors for factored and actual ratings:
where:
ns = small area sub-group sample size
nm = small area main category sample size
38
Main Category
29%
55%
36%
24%
14%
Sampling Error
27%
8%
28%
28%
48%
Equivalent
Sample Increase
x1.9
x1.2
x1.9
x1.9
x3.7
lated for every quarter hour, for every channel, for every day, for every week and input into the analysis
of variance procedure. After allowing for as many known systematic variations as possible (e.g. the daily
quarter ratings pattern, the differences between channels) and their interactions, the analysis of variance procedure calculates a residual variance. This is taken to be the average variability for any particular
quarter hour measurement and is used to compute the associated coefficient of variation for a typical
quarter hour rating. This is analogous to the percentage sampling error for a quarter hour rating. Then we
can calculate the percentage reduction in this coefficient of variation for factored ratings against published
ratings. Example results for London and shown in Table 17.
TABLE 17:
PRACTICAL REDUCTIONS IN VARIABILITY - LONDON
Main Category
29%
55%
36%
24%
14%
Sampling Error
17%
6%
14%
25%
35%
Equivalent
Sample Increase
x1.5
x1.1
x1.4
x1.8
x2.4
So in terms of change over time, the reductions in variability are still worthwhile if not so large. The full
benefits of factoring will only be realised in comparisons between regions when the initial recruitment
sampling error component is also relevant.
Potential Bias
In practice it is very difficult to determine whether or not factored results are biased. The prediction error
will be a mixture of model bias and random error which are difficult to untangle due to the large sampling
errors associated with the actual sub-group measurements. All we can do is compare factored and actual
results at various levels of detail and to search for exceptional differences. If exceptional differences are
always at times when the regional programming is different to the network, then there may be a problem.
Otherwise we have to judge the relative credibility of factored and actual results. Remember that in many
cases factoring is designed to replace unbelievable and erroneous results with more credible audience
measurements - by definition these would be different.
At the highest level of aggregation, Table 18 compares factored and actual four week all-time average ratings for Total TV and ITV in London in March 1996.
All these differences between factored and actual ratings are within sampling error. The largest differences
are for Women AB. However, at this time the actual data showed that Women AB viewing in London was
20% lower than in the whole network. Although viewing levels in London are expected to be lower, the
factored data seems to provide a more credible result.
40
TABLE 18:
FOUR WEEK AVERAGE RATINGS - FACTORED VS. ACTUAL - LONDON
Total TV
+1%
+1%
-3%
+7%
-2%
ITV
+1%
+3%
-4%
+8%
-2%
At greater levels of detail, the differences factored and actual ratings are obviously greater but still within
sampling error. Another way to evaluate the factoring model is by examining exceptional differences at
the quarter hour level.
For example, amongst Men 16-24 in London, the biggest difference between factored and actual quarter
hour ratings was on BBC1 at 7:45pm on Thursday 21st March. At this time the actual rating was 13%
and the factored rating was 26%.The first thing to note is that the same programme was being shown in
London and across the whole network - this is not a bias caused by inconsistent programming. To put this
exceptional difference into context, Table 19 shows the actual and factored ratings in adjacent weeks:
TABLE 19:
EXCEPTIONAL DIFFERENCE - LONDON - MEN 16-24, THURSDAY 7:45-8:00PM
Week 1
Week 2
Week 3
Week 4
21
21
13
21
25
30
26
27
In this case, the factored rating provides a more credible result in relation to the adjacent weeks, a finding
repeated for all the exceptions examined so far. However, it should be noted that our examination of exceptions has been based upon factoring from a reduced network panel which may minimise programme
schedule effects.
Summary
The factoring approach to small area sub-group audience measurement is still under test in the United
Kingdom. The advantages are significant in terms of reduced variability because factoring is equivalent to
adding between 50% and 100% to the current panel sample sizes but at virtually no additional cost.
Across a wide range of sub-groups and difficult areas, we have so far found no evidence of bias in the
factoring model. Analysis of exceptions always points to more credible factored results and factoring is no
worse during times of inconsistent programming between a region and the network. The potential disadvantages are that unforeseen changes in regional programming policy could disrupt the factoring principle
and that unfactored sub-group data would always be available to support any criticism of factored results.
The issue of potential bias is still under investigation and if the results are positive, then factoring could
provide a real solution to the small area sub-group audience measurement problem.
41
similar for the same number of rating points no matter whether arising from a single spot or a schedule.Measurement of small audiences can therefore become as reliable as for large audiences when
the small audiences are combined together.
10. So for the fragmented audiences of the future, research systems have to change and also the
ways in which the research is used. Research users cannot go on looking at smaller audience forms
by turning up the magnification of a limited microscope and seeing ever more blurred picture
References
American Research Bureau Inc. New York, Arbitron replication: a study of the reliability of broadcast ratings, 1974.
Boon, A.K. den The reliability of television audience ratings., in ARF/ESOMAR Worldwide Electronic and
Broadcast Audience Research Symposium, 1994, Paris, France.
Franz, G. How to catch small fish approaches to the measurement of small reach stations, in: ASI 1997
European Television Symposium, 1997, Budapest, Hungary.
Read, S. and Johnson, J. Audience measurement in the 21st Century, in: ASI 1997 European Television
Symposium, 1997, Budapest, Hungary
Schillmoeller, E.A. Audience estimates and stability, in: ARF/ESOMAR Worldwide Broadcast Audience Research Symposium, 1992, Toronto, Canada.
Twyman, T. and Wilcox, S. The variability of audience measurement data and how to live with it, in: ARF/
ESOMAR Worldwide Electronic and Broadcast Audience Research Symposium, 1996, San Francisco, USA.
Wilcox S. and Reeve B. Statistical efficiencies in the new UK television audience measurement panels, in:
ARF/ESOMAR Worldwide Broadcast Audience Research Symposium, 1992, Toronto, Canada.
43