You are on page 1of 16

Archaeometry 50, 1 (2008) 142157

doi: 10.1111/j.1475-4754.2007.00359.x

ON STATISTICAL APPROACHES TO THE STUDY OF


CERAMIC ARTEFACTS USING GEOCHEMICAL AND
PETROGRAPHIC DATA*

Oxford,

0003-813X
Archaeometry
ARCH
ORIGINAL
XXX
ON
M.
*Received
University
J.STATISTICAL
BAXTER
UK
8 November
of
ARTICLES
ET
Oxford,
APPROACHES
AL.Ltd
2005;
2008 accepted
TO 20
THE
September
STUDY 2006
OF CERAMIC ARTEFACTS
Blackwell
Publishing

M. J. BAXTER and C. C. BEARDAH


School of Biomedical and Natural Sciences, Nottingham Trent University, Clifton, Nottingham NG11 8NS, UK

I. PAPAGEORGIOU
Department of Statistics, Athens University of Economics and Business, 76 Patission Str., 10434 Athens,
Greece

M. A. CAU
Instituci Catalana de Recerca i Estudis Avanats (ICREA)/ERAUB, Departament de Prehistria, Histria
Antiga i Arqueologia, c/ de Baldiri Reixac s/n 08028 Barcelona, Spain

P. M. DAY
Department of Archaeology, University of Sheffield, Northgate House, West Street, Sheffield, S1 4ET, UK

and V. KILIKOGLOU
Laboratory of Archaeometry, Institute of Materials Science, NCSR Demokritos, Aghia Paraskevi, 15310 Attiki,
Greece

The scientific analysis of ceramics often has the aim of identifying groups of similar
artefacts. Much published work focuses on analysis of data derived from geochemical or
mineralogical techniques. The former is more likely to be subjected to quantitative statistical
analysis. This paper examines some approaches to the statistical analysis of data arising
from both kinds of techniques, including mixed-mode methods where both types of data are
incorporated into analysis. The approaches are illustrated using data derived from 88 Late
Bronze Age transport jars from Kommos, Crete. Results suggest that the mixed-mode
approach can provide additional insight into the data.
KEYWORDS: CERAMICS, GEOCHEMICAL, LATE BRONZE AGE, MIXED-MODE,
MULTIVARIATE ANALYSIS, PETROGRAPHIC, THIN SECTIONS
University of Oxford, 2008

INTRODUCTION

The scientific analysis of archaeological ceramics is often undertaken with the aim of identifying
groups of similar artefacts. Much published work focuses on the analysis of data derived from
either geochemical or mineralogical techniques. Geochemical data lend themselves naturally
to analysis by quantitative statistical methods. Since the mid-1970s numerous papers have
been published on the use of multivariate analysis for the purpose of grouping such data (e.g.,
Bieber et al. 1976; Glascock 1992; Beier and Mommsen 1994; Baxter and Buck 2000).
*Received 8 November 2005; accepted 20 September 2006
University of Oxford, 2007

On statistical approaches to the study of ceramic artefacts

143

Mineralogical dataand the focus in this paper is on those produced by thin-section


petrographyare less frequently recorded in a manner that invites quantitative analysis, and
studies in which both geochemical and mineralogical data are used, and combined, in a quantitative
way are comparatively rare. The aim in the present paper is to investigate some possible
approaches to what we shall call mixed-mode analysis, in which both kinds of data are studied
jointly in a quantitative fashion. The motivation is that the joint study of both forms of data is
likely to be more informative than separate study.
The main idea is to undertake a series of analyses that give different weights to the two
types of data. At one extreme only the thin-section data are used, while at the other only chemical
data are used. If these tell the same story, there is no real need to combine them in an analysis.
If apparently different patterns are observed using the two different data types then it is possible
that using them in combination will show interpretable patterning in the data, not readily seen
using either type separately.
The methods developed will be illustrated on a sample of 88 transport jars found in excavations
at Kommos, Crete. In the concluding section we review what has been achieved and consider,
briefly, other approaches to analysis that might be adopted.
One of the referees for this paper questioned the validity of combining petrographic and
geochemical data in a single analysis and concluded that perhaps it was acceptable, but it is
also dangerous. One reason for this suggestion (we comment on other reasons later) is that it
may be preferable to investigate the petrographic and geochemical data separately, and test the
results against each other investigating, in particular, the reasons for any discrepancies. In fact,
as noted in the discussion, our approach allows for this if it is the preferred option, and the
issue is discussed in Beardah et al. (2003). The focus in the present paper is on what we have
termed mixed-mode analysis, but in the light of the referees comment we should emphasize
that we present it as an approach to analysis rather than the approach.
PRINCIPLES OF STATISTICAL ANALYSIS

Notation and general principles


Let Xc be an n p data matrix describing the chemical composition of n artefacts. If thin sections
are available for each artefact, using the methods described in Cau et al. (2004), these can be
coded in the form of an n q data matrix Xm. The q variables are binary, taking on the values
1 or 0, which reflect the presence or absence of qualities of the thin section deemed relevant
to the purpose of the analysis in mind.
This method of coding thin sections is a flexible approach that could be undertaken in more
than one way. In Cau et al. (2004), and here, a set of primary variables relating to the technology,
rock types and rock-forming minerals is initially defined. These are categorical variables with
variable k having Lk levels, from which Lk dummy variables corresponding to the levels can
be defined. This is done for each variable in turn, giving rise to an n q matrix, where q = kLk.
Each row of this matrix consists of 0s and 1s, the 1s corresponding to features observed in the
thin section. Full details are given in Cau et al. (2004). The system is not prescriptive, as
researchers are at liberty to define those variables considered to be most appropriate to their
problem.
The same referee who queried the merits of combining petrographic and geochemical data
in a single analysis also wondered, at this point, whether this liberty allowed researchers to
choose variables that supported their preconceived ideas about the petrographic typology. The
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

144

M. J. Baxter et al.

question is a legitimate and interesting one. The possibility exists, but is not necessarily an evil
one. If the quantification supports a typology determined by other means, it can be argued
that it is doing its job, while allowing a more formal comparison of typologies suggested by
different methods (petrographic and chemical). That, as the referee has also suggested, this
might result in a combination of the data types resulting in adjusting geochemical groupings
so that they may be more in line with predetermined typological groupings, resulting in a spurious
validity being assigned to the groupings, is also a valid point. We would simply reiterate
here that, while we are focusing on mixed-mode analysis, our philosophy does allow for the
comparison of separate petrographic and geochemical analyses. The quantification of the
petrographic data, however undertaken, can be viewed as one way of facilitating this.
The previous paragraph is an attempt to discuss an important issue in a reasonably general
way. As far as the present study is concerned, the original petrographic classification, and
subsequent coding of the data, were undertaken by different individuals. While the coding was
undertaken with knowledge of the classification, it was based on a system developed independently for a different data set (Cau et al. 2004), and judged to be suitable for the purpose
to hand. That is, the petrographic attributes used were not specifically chosen to, in the referees
words, best support the pre-determined typological classification, although as our results show
they do support the classification well. While the purpose of the present paper is primarily to
derive and illustrate methodology, researchers tempted by it do need to give careful thought to
the issues raised above and in the concluding paragraph of the introduction.
The data matrix Xc may be analysed by standard methods such as principal component
(PCA) or cluster analysis, usually after transformation and/or standardization of the variables.
If Zc denotes the n r matrix of scores on the first r principal components, the usual hope is
that two- or three-dimensional plots based on a subset of the columns of Zc will reveal interpretable
structure in the data.
The data matrix Xm may be treated in an essentially similar way, allowing for its binary
nature, either by using correspondence analysis (CA) which can be thought of as a weighted
form of PCA, or by the direct application of PCA, which is equivalent to classical metric multidimensional scaling (MDS). Other forms of (non-metric) MDS are also available. In practice
it can be useful to compare different methods, since they can emphasize different (interpretable)
structures in the data. Any single analysis results in a matrix of scores, Zm, which can be used
in the same way as Zc for identifying structure in the data.
Two possible approaches to mixed-mode analysis are described. In the first a matrix of
scores Z is obtained, where reflects the relative weight given to the two kinds of data.
For = 0, Zm is obtained, while as increases through whole-number values Z Zc. In the
second approach a matrix of scores Z is obtained, which behaves in a similar way to Z as
varies smoothly from 0 to 1.
Comparing different analyses
Principles will be discussed first before some important practicalities are noted. Most simply,
and informally, two- or three-dimensional plots based on (subsets of) the r components may
be compared visually.
More formal comparisons may be undertaken using some form of Procrustes statistic. Such
statistics measure how close two r-dimensional configurations of data are after rotating,
reflecting and rescaling the data to match the configurations as closely as possible. For two
sets of scores, Zi and Zj, one such statistic, developed by Sibson (1978), is defined as
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

On statistical approaches to the study of ceramic artefacts

145

= 1 [{tr(Z Ti ZjZTj Zi)1/2}2/tr(ZTi Zi)tr(ZTj Zj)],


where tr(.) is the trace and T the matrix transpose operator. Other Procrustes statistics could be
used; however, this one has the merit of symmetry, as Zi and Zj can be interchanged without
affecting the result. It takes values between 0 and 1, with 0 arising for identical configurations.
The main practical consideration is that both visual and formal comparisons can be badly
affected by outliers present in either the chemical or mineralogical data sets. The sensible
approach here is to identify obvious outliers from either analysis, remove any such outliers
from both analyses, repeat the statistical analysis, and proceed in an iterative fashion until outliers
are judged not to be a problem. The subset of outliers, if any, identified in this way should not,
of course, be ignored, but should be considered separately when substantive interpretation of
the data is attempted.
Mixed-mode analysisfirst approach
Our first approach to mixed-mode analysis rests on the idea of defining a dissimilarity coefficient
between cases using all the available data, and subjecting the resulting dissimilarity matrix to
some form of MDS. This requires that dissimilarity between cases be defined. A seminal
paper in this regard is Gower (1971). Let d(i, j) be the dissimilarity coefficient between cases
i and j and let m = (p + q). Kaufman and Rousseeuw (1990) generalize Gowers coefficient by
defining
m

d (i, j ) =

ij(k )dij(k )
k =1
m

ij(k )

[0, 1] ,

k =1

where dij(k) is the contribution of variable k to d(i, j) and ij(k) is the weighting of variable k and
depends on the variable type (for details concerning computation, see the appendix).
This is a fairly general definition, and for present purposes we specialize to the case where
variables are binary or continuous. For continuous data,
dij( k ) = xik x jk rk
and ij(k) = 1, where rk is the range of variable R, so that the contribution of the variable is
between 0 (identical) and 1 (most different). Here, xik is the value of variable k for case i.
Binary variables may be treated symmetrically or asymmetrically. In the former case, 00
and 11 matches are treated as equally indicative of similarity; in the latter case, 00 matches
are not regarded as indicative of similarity. In an asymmetric treatment, which is the one used
here, the fact that two thin sections do not, for example, include a particular rock type is not
regarded as indicative of similarity, whereas the fact that they do is regarded as evidence
(k)
of similarity. Thus, define d (k)
ij to be 0 if xik = xjk and 1 otherwise, and define ij = 1 unless
xik = xjk = 0, in which case it is equal to 0.
Notwithstanding suggestions to the contrary in the first edition of Shennans (1988) text,
Gowers coefficient does not seem to have been widely used in published archaeological
applications (Baxter 2003, 94). One possible reason for this is that such analyses tend to be
dominated by the binary data, Xm, at the expense of the continuous data Xc. For the situation
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

146

M. J. Baxter et al.

to which we have specialized, a possible way round this potential problem is to generalize the
definition of d(i, j) as follows. Define
q

B=

ij(k )dij(k )
k =1

and
p

C=

dij(k ),
k =1

which are the contributions to the numerator of d(i, j) of the binary and continuous variables.
Now generalize the definition of d(i, j) to
d (i, j ) =

B + C
[0, 1],
b + p

where b is the number of binary variables for which (k)


ij = 1, and is a weighting factor that
has the value 1 in the original definition.
For practical applications it is necessary to decide on a suitable value of and calculate the
d(i, j). Given software that allows calculation of Gowers coefficient, or generalizations of it,
the following simple method has proved to be effective. Use the notation
Xm + Xc = [Xm | Xc]
to refer to the partitioned data matrix of the original data. Analysis of this corresponds to using
= 1 in the analysis. To give more weight to the chemical data, the idea is to augment this
data matrix with copies of Xc so that, for example,
Xm + 2Xc = [Xm | Xc | Xc]
would correspond to a choice of = 2.
Rather than attempting to determine an optimal value of , we have found it useful to
examine a series of views for different values. For a sufficient (relatively small) number of
copies of Xc, the analysis is essentially that of the chemical data only; for = 0, analysis is of
the mineralogical data only. Computational aspects of this approach are discussed in the
appendix.
Another practical concern is that it is to be expected that, for data matrices of the size and
type being used, no single low-dimensional projection of the data will reveal all the structure
present. This suggests that outliers and groups revealed in initial analysis of the data should
be removed, and analysis repeated to reveal further structure in the data. This will be
referred to as iterative analysis, or peeling off of the more obvious outliers and structure in
the data.
Mixed-mode analysissecond approach
The mixed-mode analysis described above relies upon the ability to calculatefor example,
via Gowers coefficienta measure of the dissimilarity between cases when variables are of
differing type. The resulting dissimilarity matrix is then subjected to some form of MDS. An
alternative weighted mixed-mode analysis can be undertaken by separately calculating
dissimilarities between cases on the basis of (a) the continuous chemical data and (b) the
mineralogical binary data, before combining this information as described below.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

On statistical approaches to the study of ceramic artefacts

147

Let Dc be the dissimilarity matrix for the chemical compositional data and Dm that for the
coded mineralogical binary data, and assume that these are scaled so that entries lie between
0 and 1. We can now form a new dissimilarity matrix, D, using
D = Dc + (1 )Dm.
Here, is a mixing parameter that lies between 0 and 1. If = 0, then D = Dm and the dissimilarities are based only on the mineralogical data. On the other hand, if = 1, then D = Dc
and the dissimilarities are based only on the chemical compositional data. Weighted mixed-mode
analyses can be performed by choosing intermediate values of . Values of close to 0 assign
greater weight to the binary data, whilst values close to 1 favour the chemical compositional
data; a value of = 0.5 would give the two analyses equal weight. For display purposes, the
resulting dissimilarity matrices are subjected to some form of MDS. In principle, we prefer to
display results for a sequence of values as is changed from 0 to 1, rather than attempting to
identify an optimum value of , but see the final section for further discussion of this.
The approach just described is essentially a specialized version of the method discussed in
unpublished work of Neff et al. (1988). They, in turn, ascribe the methodology to Romesburg
(1984). So far as we are aware, it has not been illustrated in published archaeometric applications.
EXAMPLES

To illustrate, data from a set of 88 Late Bronze Age transport jars found in excavations at
Kommos, Crete, will be used. Using neutron activation analysis, all samples were analysed for
the elements Sm, Lu, Yb, Na, Ca, Ce, Th, Cr, Hf, Cs, Rb, Sc, Fe, Co, Eu, La, As, Sb, U and Tb.
For reasons of precision, the last four of these were not used in statistical analysis. Additionally,
four samples had incomplete chemical information for the elements used, and have been omitted
in all the analyses to follow.
Initially, on the basis of typological evidence, the jars were (separately and independent of
statistical analysis) classified as Cretan (34 samples) and imported material (54). The latter
group was classified as Canaanite (32) or Egyptian (22). Subsequently, on the basis of the thin
sections the Cretan material was divided into 10 fabric groups, and the imported material into
12. Of these 22 groups, nine consist of a single specimen, and a further six of two specimens
only. For the purposes of the quantitative analyses to be described here the thin-section data
were coded in binary form, as discussed in the first part of the previous section, and described
fully in Cau et al. (2004), using a coding system similar to that given there.
In what follows all the data are analysed, to illustrate our general approach. It proves possible
to separate out most of the Cretan samples from the imports. A more detailed analysis is then
undertaken of the imported material. It should be emphasized that while some discussion of
the archaeological import of our analyses is necessary, the prime emphasis is on illustrating
methodological matters. A full archaeological discussion will be published elsewhere.
Table 1 is intended as a succinct reminder of the methodologies used.
Example 1analysis of all the data
Figure 1, based on a correspondence analysis (CA) of the petrographic data only, shows four
clear petrographic outliers. Two of these correspond to singleton fabric groups identified in the
original interpretation of the thin sections, the remaining two (closely associated) outliers
corresponding to a similarly identified group of two cases. All are Canaanite samples. The
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

148

M. J. Baxter et al.

Table 1 In approach 1, a modified version of Gowers similarity coefficient is used in which the value of determines
the relative weight given to the geochemical data. For 0, the analysis is based on the petrographic data alone; 1
notionally gives equal weight to both forms of data but, for reasons noted in the text, values greater than 1 may be
sometimes be preferred. What constitutes Large may depend on the data setin our application values of about 5
or 6 ensured the dominance of the geochemical data. In approach 2, separate dissimilarity matrices are computed for
the two kinds of data, which are then combined with a weighting determined by , where 0 1

Approach 1 ()
Approach 2 ()

Petrography

Mixed (equal weight)

Chemistry

0
0

1
0.5

Large
1

Figure 1 A three-dimensional component plot based on correspondence analysis of the Kommos petrographic data,
using both Cretan and imported material. The main aim is to identify four petrographic outliers, omitted from
subsequent analyses. Key to samples: o, Cretan; , Canaanite; +, Egyptian.

Cretan material separates out fairly well from the imported material, apart from three samples
that seem similar to Egyptian samples. A PCA of the standardized chemical data (not illustrated)
identified four clear chemical outliers, one of which was also a petrographic outlier. All four
chemical outliers correspond to samples identified as loners in the original interpretation of the
thin sections. The seven outliers (three petrographic, three chemical, and one both a petrographic
and chemical outlier) were subsequently omitted from all analyses, to allow formal comparisons
between different plots.
Thus, Figures 2 and 3 repeat the CA and PCA analyses after this omission, and are labelled
according to whether the samples are Cretan, Caananite or Egyptian. In the petrographic analysis
of Figure 2, separation between the three groups is reasonably good, although the boundary
between the Caananite and Egyptian samples would be difficult to identify without prior
knowledge of the identifications. Two of the Cretan samples seem more akin to Egyptian samples,
while one Egyptian sample is firmly located within the Caananite group. We note, again in
response to a request for clarification from a referee, that the labelling Caananite and so on
was initially undertaken without reference to the petrography, but there is some overlap in
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

On statistical approaches to the study of ceramic artefacts

149

Figure 2 A two-dimensional component plot based on correspondence analysis of the Kommos petrographic data,
using both Cretan and imported material and omitting the four outliers identified in Figure 1. A further three outliers,
identified in analysis of the chemical data, have also been omitted. Key to samples: o, Cretan; , Canaanite;
+, Egyptian.

Figure 3 A two-dimensional component plot based on PCA of the Kommos chemical data, using both Cretan and
imported material and omitting seven chemical or petrographic outliers as in Figure 2. Key to samples: o, Cretan;
, Canaanite; +, Egyptian.

the petrographic classification of the Caananite and Egyptian samples, so that clear separation
is not to be expected in the statistical analysis of the petrographic data.
In the chemical analysis of Figure 3, all but two of the Cretan samples separate out clearly
from all but two of the imported samples. The separation between the Egyptian and Canaanite
material is less good than for the petrographic analysis. The broad patterns evident in the two
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

150

M. J. Baxter et al.

Figure 4 A two-dimensional component plot based on mixed-mode analysis ( = 1) of the Kommos chemical and
petrographic data, omitting chemical and petrographic outliers. Key to samples: o, Cretan; , Canaanite; +, Egyptian.

analyses are similar, but the detail is not, evidenced by visual comparison of Figures 2 and 3,
and the fact that = 0.60. This suggests that it is worth attempting a mixed-mode analysis.
Figure 4 shows the results from the first approach to mixed-mode analysis, using = 1. The
second approach to mixed-mode analysis that was outlined produced similar results for = 0.5
and is not illustrated. Experimentation with values of > 1 and 0.5 did not produce any
additional insights into the data. Arguably, the results are more satisfactory than for the chemical
and petrographic analyses alone. All the Cretan material now separates from all but two of the
imports (which remain located within the Cretan group on inspection of higher-order components).
The Egyptian material is possibly slightly better separated from the Canaanite material, one
sample apart, than in the petrographic analysis, although this is a fine judgement to make.
To summarize the analysis so far, treating the division into Cretan, Canaanite and Egyptian
material as given, we have shown that the mixed-mode approach to analysis is slightly more
successful at recovering these distinctions than the separate analysis of chemical or (quantitatively
formulated) petrographic data. This is after peeling off some obvious outliers, and removing
them from both the chemical and petrographic data sets to facilitate comparisons.
Example 2analysis of the imported material only
With the exception of between two and four cases, the Cretan material is convincingly separated
from the imported material, and for the purposes of further illustration we shall now concentrate
on the latter, with a view to seeing how well the Egyptian and Caananite samples can be
separated, and whether there are subgroups within them.
On the basis of the original thin-section analysis, the imported material was classified into
12 fabric groups. Of these, five groups consisted of only one sample and two groups of only
two samples. The remaining five fabric groups contained between three and 20 samples. Four
of the fabric groups were divided into subgroups, consisting in some cases of just two or three
samples. Three samples had incomplete chemical information.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

On statistical approaches to the study of ceramic artefacts

151

Figure 5 A two-dimensional component plot based on correspondence analysis of the Kommos petrographic data
for the imported material omitting outliers. Numbers identify Canaanite fabric groups and E indicates an Egyptian
origin.

Initial data analysis using the petrographic data confirmed that most of the very small
groups were indeed distinctive and, in the peeling-off procedure previously described, after
three iterations most of the singletons and doubleton groups and subgroups were removed
from the analysis.
A similar analysis based on the chemical data largely identified the same outliers as the
petrographic analysis. The main exception to this generalization is that a fabric group of two
samples was clearly chemically (though not petrographically) distinct. After removing all outliers
in advance of further analysis the four largest groups were left, along with one doubleton, and
a single survivor from a fabric group of three that had been classed in a subgroup of its own.
In Figures 58, Egyptian samples have been labelled with an E (these include the doubleton
noted above), while Caananite samples are labelled 15 according to their original classification
based on the thin-section analysis. Some of the finer distinctions made in the original classification
have been suppressed but, where appropriate, we will note these in our commentary. Caananite
group 1 consists of specimens that were grouped with the bulk of the Egyptian specimens in
the original thin section analysis. The singleton labelled 5 in the plotsthe single survivor
referred to abovehas been retained.
Two points should be emphasized here. The first is that the coding system used for quantitative
analysis is designed to reflect the properties of the thin sections used to define the fabric
groups. It is therefore to be hoped that this analysis will identify the more obvious features of the
fabric grouping, such as singletons. That it does so suggests that the approach to quantification
used is sufficiently sensitive to be combined with the chemical data in a mixed-mode
approach. We shall also see shortly that it is capable of suggesting subgroups not identified in
the original thin-section analysis.
The second point is that the emergence of some samples as clear petrographic outliers is
only obvious after having peeled off the Cretan data. This is illustrative of the fact that lowdimensional plots will initially be dominated by the more obvious structure in the data, subtler
features only being evident after obvious structure is stripped out.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

152

M. J. Baxter et al.

Figure 6 A two-dimensional component plot based on PCA of the Kommos chemical data for the imported material
omitting outliers. Numbers identify Canaanite fabric groups and E indicates an Egyptian origin.

Figure 7 A two-dimensional component plot based on mixed-mode analysis ( = 1) of the Kommos imported
material omitting outliers. Numbers identify Canaanite fabric groups and E indicates an Egyptian origin.

Figures 57 are similar to Figures 24. In Figure 5, based on the quantified petrographic
data, the Egyptian samples that separate out to the right of the plot form two subgroups (not
separated in the thin-section classification). Intermingled with the larger of these is a Canaanite
sample. All these samples were originally classified into the same subgroup. Egyptian and
Caananite samples (coded 1) that do not plot to the right all belong to different groups or
subgroups.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

On statistical approaches to the study of ceramic artefacts

153

Figure 8 A two-dimensional component plot based on mixed-mode analysis ( = 3) of the Kommos imported
material omitting outliers. Numbers identify Canaanite fabric groups and E indicates an Egyptian origin.

The Canaanite fabric group 3 of five samples (only three show, as there are coincident
plotting positions) and the singleton fabric 5 plot coherently at the top-centre. The other main
concentration to the left of the plot mixes samples from the other two larger Canaanite groups,
2 and 4.
The chemical analysis of Figure 6, as might be expected, does less well at separating out the
fabric groups defined on the basis of thin-section analysis. The Egyptian material, with the
exceptions noted for Figure 5, plots coherently at the bottom of the plot, along with both
the Canaanite samples associated with the Egyptian samples in the original petrographic
classification. Three of four group 4 samples plot fairly well together, as do eight of nine
group 2 fabrics, and the separation is better than in the petrographic analysis of Figure 5.
Group 3 does not plot coherently.
Figure 7 shows the results from our first mixed-mode approach with = 1. It does a better
job at distinguishing the Egyptian from the Canaanite material, while suggesting that the main
Egyptian group could be divided into two. Only one Egyptian sample, from a separate subgroup
in the original classification, plots well away from the main concentration. The picture is
otherwise fairly similar to Figure 5, with one group 3 sample plotting much further away from
the main group 3 concentration.
None of the plots examined so far convincingly isolate fabric groups 2 and 4 from each
other and the remaining groups, but we observed that three of the four samples from group 4
did separate out in Figure 6. This suggests that increasing the weight given to the chemical
data might effect better separation, and Figure 8 shows the results of using the first mixedmode approach with = 3. This, more so than the other plots examined, shows most of groups
2 and 4 plotting coherently and separately. There are individual samples within these fabric
groups, and others that are clearly petrographically and/or chemically very distinctive from
their fellows as originally identified.
Analyses were also undertaken using our second mixed-mode approach, and these produced
essentially similar results to those for our first approach. We comment on this in a little more
detail in our concluding discussion.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

154

M. J. Baxter et al.
DISCUSSION AND CONCLUSION

The main aim of this paper has been to propose, illustrate and evaluate what we have termed
a mixed-mode approach to the quantitative analysis of chemical and petrographic data
obtained in ceramic provenance studies. A novel feature of our approach, we think, is our
advocacy of the merits of examining several different views of the data, rather than trying to
identify an optimal view that captures all the interesting features in the data. These different
views correspond to the different weighting given to the chemical and petrographic data,
with at one extreme only the chemical data being used, and at the other extreme only the
petrographic.
A perfectly valid approach to mixed-mode analysis, though we have gone further, would be
simply to compare, and synthesize the conclusions from, these two extreme views. This
requires that the petrographic data be quantified, a subject discussed in detail in Cau et al.
(2004). Another feature of our approach is that we advocate an iterative approach to analysis
in which obvious outliers and groups are identified, and then removed from both sets of data
before effecting comparisons.
To evaluate the effectiveness of our methodology, we initially took as given the distinction
between the three main provenance groups. It proved relatively easy to separate the Cretan
material from the rest. Having removed this from the analysis, we then took as given the fabric
groups defined on the basis of qualitative analysis of the thin sections. From this perspective
our approach seems quite successful, and threw up some surprises.
With the caveat that in several of the fabric groups there were single samples that did not
cluster well with others from their group, it proved possible to separate the different groups
fairly well and, inter alia, separate the Egyptian and Canaanite material. This, and this reflects
one of the advantages of our approach, could not be done in a single view. For example, the
petrographic and mixed-mode ( = 1) analyses fail to separate out fabric groups 2 and 4,
whereas analyses that give more weight to the chemical data (e.g., the mixed-mode analysis
with = 3 in Fig. 8) do so. In Figure 8 one fabric 4 case, to the middle of the plot, separates
out quite clearly from the other three cases. It is interesting that in a study of comparative
Canaanite material, carried out after the original drafting of this article but independently of it,
this isolated fabric 4 case was reclassified as fabric 2, to which it plots more closely in the figure.
We have retained the original labelling to emphasize that our approach is capable of identifying
anomalies in the classification that may indeed require rectification.
The mixed-mode analysis ( = 1) successfully separates out most of the Egyptian material
from the rest. The petrographic analysis shows some Egyptian samples that separate from the
other Egyptian material, but these belong to different subgroups, information that, for clarity,
has been suppressed in the plots. The chemical analysis shows that the specimens involved
have rather different chemical compositions from the other Egyptian samples. The singleton 5
was most successfully isolated in the mixed-mode analysis.
A feature of the petrographic analysis, not identified in the original classification, was that
the main Egyptian group split clearly into two subgroups. This was also evident in the mixedmode analysis, but not the chemical analysis where it plotted as a coherent chemical group.
How effective would our analysis have been had we lacked the thin-section classification,
and interpretation in terms of provenance, that has informed our discussion so far? We could
certainly have separated the Cretan from the other material. This might have taken with it two
samples (1 Canaanite, 1 Egyptian) that are, however, clear petrographic outliers with respect
to the other Canaanite and Egyptian samples.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

On statistical approaches to the study of ceramic artefacts

155

If one now views Figures 57 and ignores the labelling, simply looking for pattern in terms
of cluster structure, to our mind the mixed mode analysis of Figure 7 seems most satisfactory.
The two groups to the right of the plot are clear as, we think, are the groups consisting of 3s
in the top centre, and of 2s and 4s to the left. Remaining interpretation is more subjective, but
we would be inclined to identify 5 and 3 in the top-left as isolated cases and either treat the
remaining six cases as a single group or split it into a group of three cases, and three more isolated
specimens. However this is done, the groups so identified can be labelled and such labelling
applied to other plots. This would immediately show, for example, that in the largest group to
the bottom-right of Figure 7, the two Egyptian samples (in fact from a different fabric group
from the others) separate out on the petrographic plot and are chemically different from each
other, and that the sample labelled 2 is also distinct. Those plots weighted towards the chemical
data also show that the 2s and 4s in the leftmost group separate out.
In other words, starting from the mixed-mode plot of Figure 7 without assuming labelling,
it is possible to identify all the different fabric groups reasonably successfully; and additionally
split the largest group into two-subgroups. Furthermore, it is possible to identify outliers relative
to their presumed group whose classification might be questioned and re-evaluated. As this is
primarily a methodological paper these issues will be pursued elsewhere. The present claim is
that our analyses suggest that the mixed-mode methodology, in which comparison of different
analyses plays an integral part, offers potential advantages compared to the quantitative analysis
of petrographic or chemical data only.
Two approaches to mixed-mode analysis have been suggested. The second of these is apparently
the more sensitive, since it allows finer control over the weighting of the different types of
data, but we have not illustrated it here. One reason is that results for = 1 and = 0.5 were
very similar. Another reason is that we found that for values of not very different from 0.5,
analyses very quickly became similar to an analysis of either the petrographic data alone, or
the chemical data alone, depending on which direction was varied in. This is illustrated in
Figure 9, which shows how Sibsons coefficient, , measuring the similarity with the petrographic

Figure 9 For the second mixed-mode approach, the graph shows how closely the results compare to the petrographic
(solid line) and chemical analyses (dashed line) alone as varies, and as measured by Sibsons coefficient, , for the
two-dimensional plots.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

156

M. J. Baxter et al.

and chemical analyses, varies as varies. Thus, for the analyses reported here, experimenting
with different values of added little to those analyses reported. Whether or not this will
generally be the case is unclear, and would need more experience to decide.
Early experiments in the cluster analysis of mixed-mode data notwithstanding (Rice and
Saffer 1982; Phillip and Ottaway 1983), there seem to have been few attempts to apply such
methods in the published archaeometric literature. Our emphasis is on the use of ordination
methods, such as PCA and MDS, rather than cluster analysis, and we also place importance
on an iterative approach to the analysis, and the comparison of different views of the data.
Computational resources have increased considerably since the early period of experimentation,
and our methodology exploits this. Computational aspects, with additional examples, are
described in Beardah et al. (2003) and, briefly, in the appendix to this paper. Ours is an exploratory
approach to data analysis. Recently, Moustaki and Papageorgiou (2005) have developed a
model-based approach to mixed-mode analysis that is applied to archaeometric data identical
in kind to ours. Their methodology is more complex than ours, both mathematically and
computationally, and, being model-based, dependent on distributional assumptions one might
not always wish to make. Such assumptions allow a more formal approach to determining the
number of groups in the data, and assessing goodness-of-fit. Both their examples involve data
sets where the structure is quite clear, using either petrographic or chemical data, and further
experience (and possibly development) is needed to assess how it would handle large data sets
with numerous small groups (including singletons) where the separate chemical and petrographic
analyses tell different stories.
ACKNOWLEDGEMENTS

This work forms part of the GEOPRO Research Network funded by the DGXII of the European
Commission, under the TMR Network Programme (Contract Number ERBFMRX-CT980165). We are grateful to Hector Neff for access to his unpublished work and Jaume Buxeda i
Garrigs for suggesting the second of the mixed-mode approaches. For permission to sample
and analyse the transport jars from Kommos, we are grateful to J. W. Shaw, J. B. Rutter and
the 23rd Ephorate of Prehistoric and Classical Antiquities, Herakleion, Crete. We are also
grateful to Archaeometrys referees for thorough and constructive comment on the original
version of this paper.
REFERENCES
Baxter, M. J., 2003, Statistics in archaeology, Arnold, London.
Baxter, M. J., and Buck, C. E., 2000, Data handling and statistical analysis, in Modern analytical methods in art and
archaeology (eds. E. Ciliberto and G. Spoto), 681746, Wiley, New York.
Beardah, C. C., Baxter, M. J., Papageorgiou, I., and Cau, M. A., 2003, Mixed-mode approaches to the grouping of
ceramic artefacts using S-Plus, in The digital heritage of archaeology: CAA2002 (eds. M. Doerr and A. Sarris),
2615, Hellenic Ministry of Culture, Greece.
Beier, T., and Mommsen, H., 1994, Modified Mahalanobis filters for grouping pottery by chemical composition,
Archaeometry, 36, 287306.
Bieber, A. M., Brooks, D. W., Harbottle, G., and Sayre E. V., 1976, Application of multivariate techniques to analytical
data on Aegean ceramics, Archaeometry, 18, 5974.
Cau, M. A., Day, P. M., Baxter, M. J., Papageorgiou, I., Iliopoulos, I., and Montana, G., 2004, Exploring automatic
grouping procedures in ceramic petrology, Journal of Archaeological Science, 31, 1325 38.
Glascock, M. D., 1992, Characterization of archaeological ceramics at MURR by neutron activation analysis and
multivariate statistics, in Chemical characterization of ceramic pastes in archaeology (ed. H. Neff), 1126,
Prehistory Press, Madison, WI.
University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

On statistical approaches to the study of ceramic artefacts

157

Gower, J. C., 1971, A general coefficient of similarity and some of its properties, Biometrics, 27, 85771.
Kaufman, L., and Rousseeuw, P. J., 1990, Finding groups in data, Wiley, New York.
Moustaki, I., and Papageorgiou, I., 2005, Latent class models for mixed variables with applications in archaeometry,
Computational Statistics and Data Analysis, 48, 65975.
Neff, H., Bishop, R. L., and Rands, R. L., 1988, Similarity/distance measures: solutions to the mixed level data problem,
unpublished manuscript.
Phillip, G., and Ottaway, B. S., 1983, Mixed data cluster analysis: an illustration using Cypriot hooked-tang weapons,
Archaeometry, 25, 11933.
Rice, P. M., and Saffer, M. E., 1982, Cluster analysis of mixed-level data: pottery provenience as an example, Journal
of Archaeological Science, 9, 395409.
Romesburg, H. C., 1984, Cluster analysis for researchers, Lifetime Learning Publications, Belmont, CA.
Shennan, S., 1988, Quantifying archaeology, Edinburgh University Press, Edinburgh.
Sibson, R., 1978, Studies in the robustness of multi-dimensional scaling: Procrustes statistics, Journal of the Royal
Statistical Society (B), 40, 2348.
Venables, W. N., and Ripley, B. D., 1999, Modern applied statistics with S-Plus, 3rd edn, Springer, New York.
Venables, W. N., and Ripley, B. D., 2002, Modern applied statistics with S, 4th edn, Springer, New York.

APPENDIX

Computational considerations
All the analyses reported in this paper were undertaken using S-PLUS 2000 for WINDOWS
(Venables and Ripley 1999), and our implementation is described in Beardah et al. (2003).
This system is now obsolete, having been superseded by later versions of S-PLUS, described
in the fourth edition of Venables and Ripleys (2002) book. Were we to start this research anew
we would use the R system, which has similar functionality to S-PLUS with the additional
advantage of being Open Source. Venables and Ripley (2002) is a good guide to R as well as
S-PLUS, though other texts at various levels are increasingly appearing.
All these systems, or packages of functions written for them, allow the application of methods
such as PCA and MDS (metric and non-metric) to be applied with relative ease. For the first
of the mixed-mode approaches described we used the daisy function to calculate the generalized
Gowers coefficient, as described in Kaufman and Rousseeuw (1990). Anyone wishing to
emulate this should be aware that, using chemical (i.e., continuous) data only, daisy defaults to
computing Euclidean distance as a measure of dissimilarity. If used to compute the dissimilarity
matrix Dc described in our second approach, this means that Dc needs to be rescaled, so that
entries lie between 0 and 1, before implementing the approach.
Interested readers are welcome to approach the second author for further details and access
to our code, but should be aware that this might need modification for those systems currently
available.

University of Oxford, 2007, Archaeometry 50, 1 (2008) 142157

You might also like