Methodological and Theoretical Issues in Multimodality

John A.
Bateman
2. Methodological and Theoretical Issues in
Multimodality
Abstract: The current state of the art in multimodality appears to be reaching a con-
sensus concerning several central methods and perspectives that need to be applied
in its study. This offers an appropriate starting point for the reassessment of some
foundational issues concerning the definition and combination of modalities. This
is an important step to take at this time because, despite a wealth of experience now
gained in this endeavour, core uncertainties remain. This chapter proposes some
clarifications of the notions of semiotic modes, media and genres intended to help
recast issues as more specific empirical challenges requiring detailed analysis, both
corpus-based and experimental. The essential idea is that re-constructing semiotic
modes as theoretically tightly interwoven bundles of material, form and dynamic
discourse semantics provides a suitable foundation both for fine-grained empirical
analysis and for multimodally-aware definitions of media and genres. The chapter
motivates this position in some detail, offering illustrations of the treatment of multi-
modal media and genres that it supports.
1 Introduction: The Need for Method

2 Defining Semiotic Modes
3 Semiotic Modes and ‘Text Dynamics’
4 Media and Genres
5 The Multimodal Description of Text and Image Combinations
6 Conclusions
7 References
1 Introduction: The Need for Method

As well documented by this handbook, multimodality, as both a range of phenomena
to be investigated and as a field of inquiry, is currently enjoying considerable growth
and increasing recognition. Journals traditionally concerned with issues of combina-
tions of, for example, visual and verbal material – such as Visual Communication or
Text & Image – are now being joined by new journals with multimodality explicitly
part of their respective charters – such as Multimodal Communication (De Gruyter)
and the Journal of Multimodal Communication Studies (Posnań and Warsaw Universi-
ties). An increasing number of valuable introductions to the ‘state of the art’ of multi-
modality are also now available (Machin 2014; Stöckl 2014, this volume; Klug/Stöckl

Brought to you by | Staatsbibliothek zu Berlin Preussischer Kulturbesitz
Authenticated
Download Date | 3/22/19 2:54 PM
Methodological and Theoretical Issues in Multimodality 37
2015; Żebrowska 2014). Together, these efforts show a striking degree of convergence,
often across quite diverse disciplinary starting points.
Some of the characteristic assumptions and orientations now emerging in this
‘consensus view’ of multimodality include the necessity of paying close attention to
the use of multiple modes by communicators in concrete contexts of production and
reception (Bucher 2011) and a growing awareness of the importance of well articulated
models of text, discourse and textual/discoursal semantics (Stöckl 2006; Bateman/
Wildfeuer 2014a; Klug this volume). In many respects, this can be characterized as
a long overdue re-emphasis of dynamics and use-in-context as central concerns for
both pragmatics and semantics. Many approaches thus describe themselves as prag-
matic or pragma-semantic in orientation, seeking accounts of how the interpreters of
multimodal artefacts and performances go about that interpretation, making refer-
ence to formal, cultural, social and contextual properties as required.
For multimodality as such, however, the primary research issue across all
approaches remains the core question of just how it can be that information in differ-
ent modes operates together – i.e., how do disparate message components with poten-
tially very different properties combine to produce ‘more’ than what can be achieved
in isolation (cf. Lemke 1998; Liu/O’Halloran 2009; Holly 2009). Striking here is that
despite a wealth of experience now gained both within and across disciplines, many
basic questions concerning modalities and their combinations are still only answered
in a programmatic, impressionistic fashion. Moreover, the interrelationships between
characterizations of modes and arguably broader constructs such as media, genre,
materiality, design and many more remain unclear, with proposals cross-cutting one
another and exhibiting an extreme fluidity with respect to how the principal terms
are employed.
The vast majority of multimodal analyses are still found in the form of discus-
sions of individual texts couched as ‘running commentaries’ in which combinations
of different expressive resources are noticed and discussed on a case-by-case basis.
Such discussions are in considerable danger of being ad hoc – a critique made for-
cibly for discourse analyses by Halliday (1994, xvi) and of multimodal discussions
more specifically by Forceville (2007) – primarily due to a lack of appropriate meth-
odological guidance. Moreover, for accounts that do attempt to apply or define more
general frameworks in their analyses of mode interactions – approaches drawing, for
example, on rhetoric (Koch/Schirren this volume), on cognitive models of metaphor
(Forceville this volume), on functional discourse semantics (Liu/O’Halloran 2009;
Royce this volume), on text linguistics (Stöckl this volume; Klug this volume), on
formal discourse semantics (Wildfeuer 2012; 2013b) as well as more traditional (but
still very relevant) applications of semiotics as such (e.g., Nöth this volume) – only
limited contact has been achieved to date with the empirical validation and gener-
ation of testable predictions essential for progress. Thus, while current approaches
have certainly allowed a host of revealing applications and descriptions of multi-
modal artefacts and performances to be pursued, deficient empirical foundations

Authenticated
38 John A. Bateman
continue to render them considerably less revealing than they will need to be in the
future for progress to be made.
Moving beyond more conjectural characterizations requires recasting issues as
empirical questions. However, in many areas of multimodality, we are quite far away
from being able to achieve this. This chapter will argue that two main reasons for the
gap between theory and empirical investigation can be found in, first, a lack of clarity
in the central theoretical constructs employed and, second, corresponding weak-
nesses in the methodologies for analysis that are available. The chapter therefore
attempts to contribute to applied semiotics and, more specifically, to methodology
in applied semiotics. We need to establish analytic guidelines that encourage even
individual text analyses to feed into more general bodies of results and to encourage
subsequent empirical probing. In this, we are entirely in agreement with Stöckl (this
volume) when he cites Björkvall that multimodality “[is] still very much an emerging
field and there is both room and need for methodological development” (Björkvall
2012, 18).
This is a rather more pressing issue than the host of current research into multi-
modality might lead one to believe. There are, in fact, considerably fewer guidelines
for directing practical investigation along productive lines than commonly assumed.
Whereas for linguistic materials we have detailed and specific models of the phenom-
ena involved and their interrelationships, outside of the linguistic system accounts
quickly become schematic and ‘gappy’. Analysis then in turn becomes opportunistic
and ‘running commentaries’ are a natural outcome. For more revealing and reliable
analyses of any object of investigation, more structured approaches to characterizing
multimodal artefacts and processes that complement and augment existing accounts
are necessary. And for this task, more rigorous definitions of central terms such as
semiotic mode are going to be unavoidable (cf. Klug/Stöckl 2015). Providing a tighter
analytic scheme for addressing multimodality of all kinds is then the most specific
and immediate goal of the present chapter. Weaker suggestions that semiotic modes
arise as a product of, or as support for, analysis or that they have flexible and fluid
boundaries are conducive neither to the development of sound methodologies nor to
reproducible analyses.
Achieving more analytic and definitional precision should not only promote more
revealing characterizations of the basic issue of how modes combine but also help sig-
nificantly with occurrences previously seen as problematic when framed more loosely
in terms of multimodality. Such occurrences include alleged cases of hybridity (i.e.,
combinations across genres, media, modes), of modes cross-cutting sensory channels
(e.g., are spoken language and written language a single verbal mode?), and of media
‘embedding’ (e.g., is a dance shown in a photograph appearing in a narrative film
displayed on a smartphone an instance of a semiotic mode of ‘dance’ or not? – and
recursively: of film, of photography etc.) – all classical examples drawn on when argu-
ments in favour of maintaining a certain looseness in definitions of multimodality are

Authenticated
made. Our approach will be quite the reverse: it is only with more precision that such
complex phenomena can be addressed productively.
2 Defining Semiotic Modes

We begin, therefore, by making explicit the basic horizon within which the kind of
semiotic artefacts and performances that we are interested in can appear. For the pur-
poses of the present chapter, this involves all cases where language combines with
visual, acoustic, and other materially-present signifying practices. Such practices are
often characterized in terms of semiotic modes (Kress 2014; Stöckl 2014), a concept
which, as Klug/Stöckl (2015) explain, nowadays typically synthesises at least aspects
of materiality and mediality, ‘codality’, sensory modalities, processing mechanisms,
and socio-cultural conventions. It will require considerable attention to detail to show
how these aspects might work together productively.
For this, we will proceed in terms of an ontological analysis of what constitutes
a semiotic mode. Ontological is used here in the sense of determining what has to be
available for there to be semiotic modalities of the kind that are of interest to us at all
as well as what internal organizations such modalities must exhibit. The resulting
framework will then be used to cast some light on a range of current open issues and
questions in multimodality research and, more specifically, to suggest how certain
analytic methodologies then follow. In addition, since this is necessarily a semiotic
endeavour, we will also note on the way points of connection or overlap with more
traditional semiotic accounts (e.g., Saussure, Peirce, Hjelmslev) in order to aid com-
parison and to make explicit the framework’s connections with necessary semiotic
foundations.
2.1 Ontological Foundations for Semiotic Modalities
In any foundational discussion of semiotic modes, it is important to emphasise the

status of modes as interpretative practices constructed and maintained by commu-
nities of users (Goodman 1969; Bateman 2011). The following observation from Kress
and colleagues brings out the consequences of this particularly well:
[…] the question of whether X is a mode or not is a question specific to a particular community.
As laypersons we may regard visual image to be a mode, while a professional photographer will
say that photography has rules and practices, elements and materiality quite different from that
of painting and that the two are distinct modes. (Kress et al. 2000, 43)
Although, as in the case of verbal language, the particular community involved can
turn out to be quite large, this question must always be posed – at least abstractly –

Authenticated
40 John A. Bateman
as an empirical issue. That is: we must first look to see if there appear to be some
expressive resources that are being employed systematically in some specific context
and only then proceed to attempts to identify and characterize those resources. This
leads naturally to questions concerning the communities within which the systematic
practices take place. That the community concerned with verbal language turns out to
have arisen evolutionarily to span the entire species rather than as a more socio-cul-
turally and temporally restricted subgroup makes no difference to this basic method-
ological stance. Indeed, a reoccurring theme in the discussion will be that research
still too often relies on ‘accepted distinctions’ that are themselves in need of more
stringent empirical probing.
A second necessary starting point to be anchored into the account at the outset is
that of the materiality that is employed by the community of users engaged in mean-
ing-making. The materials that can be put to use of this kind are extremely varied, but
require minimally that they are sufficiently ‘controllable’ as to admit of purposeful
articulations – otherwise it would not be possible for them to function as the mate-
rial carriers of ‘semiotically-charged’ distinctions. This control may be exercised not
only by the physical actions of the members of the community – e.g., by using their
vocal chords for producing sounds via manipulations of the shape of the mouth, by
performing particular bodily gestures or movements, or by forming lines in the sand
with fingers, etc. – but by any of the physical-technological processes available to
that community – e.g., by using burned sticks to draw on the wall of a cave, a particu-
lar kind of printing press to produce a newspaper, or a combination of whiteboard,
whiteboard marker, spoken language, gesture, screen and video projectors for an
audiovisual PowerPoint or Keynote presentation.
For want of a better, or more general or neutral term, the material employed for
meaning making will be called the canvas; the material that is actually available
for meaning making is then the virtual canvas (or virtual artefact: Bateman 2008,
16–17, 192) formed by combining physical materiality with corresponding techno-
logical means of articulation. The articulations made are then available for serving
as the ‘physical’ (i.e., perceptible) record of semiotic ‘decisions’ and so can be used
by any member of the relevant community as evidence of those decisions being ‘in
effect’. Comparisons can (and should) be drawn here with Hjelmslev’s ([1943] 1961,
54–55) discussion of expression-purport. Perhaps due to the duality that he wished
to uphold between his expression and content planes, Hjelmslev’s notion is more
tightly bound to semiotic distinctions than our intended use of the virtual canvas.
Even prior to being employed for semiotic reasons, a virtual canvas admits a range of
affordances – i.e., it can be ‘bent’ or ‘cut’ in some ways rather than others. This is seen
as an important contribution of materiality in its own right.
As noted particularly for non-linguistic semiotic modes by Stöckl (this volume),
this emphasis on material also reinstates several of Peirce’s less commonly used
semiotic categories as crucial for our understanding of how semiotic modes operate:
in particular, the qualisign (perceptual qualities) and the sinsign (instances within

Authenticated
which perceptible qualities are manifest) are appropriately placed at the centre of
processes of both producing and interpreting signifying practices. Moreover, and
also connecting more closely to corresponding semiotic foundations (cf. Nöth this
volume; Peirce 1931–1958, §§ 2.275–2.308), such materials do not come readily divided
up according to sensory channels. This is a rather different position to that adopted
in much of the multimodality literature, where distinctions are often drawn along
sensory channel boundaries. This naturally leads to information offerings relying
on vision being characterized as distinct to those relying on sound or touch. Fricke
(2013), for example, elevates this to a categorial distinction between what she terms
narrow and broad multimodality. Broad multimodality is when a number of semiotic
codes are active within a single sensory channel – as traditionally suggested for pic-
tures and texts – while narrow multimodality requires multiple sensory channels. In
this view, face-to-face spoken language is consequently seen as multimodality in the
narrow sense, while illustrated documents, for example, may only be multimodal in
a broad sense. Although not explicitly stated within this line of argument, it does not
take many more turns of the screw to arrive at the suspicion that face-to-face interac-
tion is perhaps to be seen as ‘proper’ multimodality, while other forms may only be
multimodal by extension.
In contrast to this, the framework proposed here seeks to maintain a more open
orientation to all possible forms of multimodality. Individual sensory channels or bio-
physical distinctions between sensory channels are not granted any definitional role
as far as our ontological characterization of semiotic modes is concerned. Ongoing
work on perception and its neuro-cognitive foundations also supports denying tradi-
tional sensory channels theoretical primacy. Strong interactions and interconnections
between sensory channels are observed at very early stages in processing (e.g., Clark
2011; Seeley 2012; Kluss et al. 2012) and so when attention turns to how we use any
information being encountered, assumptions of boundaries between senses become
both theoretically and practically problematic. An example of this is McGurk/Mac-
Donald’s (1976) well known result in spoken language perception that certain aspects
of the acoustic signal and visually accessible lip shapes combine to the extent that
different sounds may be heard, i.e., a visual shape co-determines the perceived acous-
tic event. The previously common restriction of the material relevant for spoken lan-
guage to the audio channel is therefore a considerable simplification – a point already
emphasised by Hjelmslev ([1943] 1961, 103). Broader discussions of synaesthesia and
complex embodied responses to the apparently audiovisual medium of film all move
in similar directions (Sobchack 2004). Thus, semiotic modes as brought into being
by communities of users need not respect sensory compartmentalization. Materiality
may involve any combination of sensory channels and so even individual semiotic
modes may be multisensorial. It is, again, an empirical issue to investigate just which
dimensions of materiality are being drawn on by any particular semiotic mode.

Authenticated
42 John A. Bateman
2.2 Shaping Material Articulations
Although links between sensory modalities and semiotic modalities are often drawn,
it is also always accepted as uncontroversial that there is something ‘more’ to a semi-
otic mode that is not exhausted by identifying the sensory channel. Simply impos-
ing some articulations on a material is not then of itself sufficient – the articulations
imposed must be recognisable as instances of reoccurring patterns known to the
community of users involved (i.e., Peircean legisigns). Collections of distinguishable
marks with particular meanings-in-context (e.g., traffic lights, patterns of sticks left
at decision points to indicate which path to follow, etc.) might then be said to make
up ‘sign repertoires’. In the model proposed here, however, rather than adopting the
metaphor of the ‘code book’ or sign catalogue that places prominence on individual
signs, we draw further on linguistic insights and consider sign-vehicles, i.e., particu-
lar physically accessible traces, to be characterizable only in terms of sets of minimal
distinctions – that is, distinctions made in the material must recognizably correlate
with differences between semiotic events that the community of users is concerned
with distinguishing. The description of this collection of distinctions is then ‘wrapped
around’ materiality as a semiotic ‘stratum’ in its own right; we return to the overall
architecture of this model below.
This ‘negative’ definition of the signs of signification goes back to Saussure
([1915] 1959) rather than Peirce and is developed further in the ‘algebraic semiotics’
of Hjelmslev ([1943] 1961). ‘Marks’ made in some material do not then correspond
directly to referents; it is only distinctions between marks that support the recogni-
tion of distinctions between semiotic categories. This means that we can characterize
any non-material semiotic contribution in terms of paradigmatic and syntagmatic
axes of organization – i.e., paradigmatic systems of choice together with a syntag-
matic organization for re-expressing, or ‘re-coding’ paradigmatic selections in struc-
tural configurations. These structural organizations typically provide both constitu-
ency and structural dependencies – structural complexity is thus intrinsically part
of the model and is always a possibility. Furthermore, following Halliday (e.g., 1978,
128–129), paradigmatic distinctions can be organized into hierarchies of more or less
specific, but nonetheless abstract, semiotic choices The purpose of the structural
configurations is then to leave traces in distinctions drawn in material form, while
the paradigmatic description provides an organizational structure for the ‘space’ of
semiotic decisions available within any semiotic resource. Folding this arrangement
back into the Peircean categorization above, this means that we can also consider the
paradigmatic organization as a characterization of the organization of legisigns (cf.
Bateman 2013, 261–263) – a modelling alternative that has received rather little atten-
tion in the Peircean tradition previously.
A further consequence of Kress and colleagues’ observation above, however, is
that not all semiotic modes as employed by recipients are equally finely articulated in
terms of their syntagmatic and paradigmatic organizations. This means that it is also

Authenticated
often helpful to apply a topology over such organizations characterized in terms of

the continuum drawn by Kress/van Leeuwen (2001, 113) between lexically-organized
semiotic resources and grammatically-organized semiotic resources. Lexically-organ-
ized semiotic resources consist of collections of signs with little additional organi-
zation – the distinguishable signs may be simply ‘listed’. In contrast to this, gram-
matically-organized semiotic resources place their distinguishable signs within a
productive system of meaning potential. It is this that provides the power to compose
simpler signs into complex signs employing structural mechanisms analogous to
those of grammar. Lexically-organized semiotic resources thus exhibit a shallow
paradigmatic organization, whereas grammatically-organized resources may exhibit
considerable depth in paradigmatic organizations supported by correspondingly
complex syntagmatic structures enabling entire complexes of semiotic choices not
only to be deployed but also to be reliably recognised.
2.3 Using Material Articulations
The last fundamental ingredient for our definition of semiotic modes is provided by
a further abstract stratum of discourse semantics. The task of discourse semantics
within any semiotic mode is to relate particular deployments of ‘semiotically-charged’
material to their contexts of use and the communicative purposes they can take up.
Thus: the discourse semantics of a semiotic mode provides the interpretative mecha-
nisms necessary for relating the particular forms distinguished in any semiotic mode
to their contexts of use and for demarcating the intended range of interpretations
of those forms. Such interpretations can vary with respect to just how tightly con-
strained they are intended to be, stretching from the very specific to rather more
abstract ‘guidelines’ for interpretation. Although individual semiotic modes can vary
with respect to just how much work they make their discourse semantics do, we nev-
ertheless consider an ordering of some directions for interpretation as definitional for
the kind of artefacts or performances with which we are concerned.
Many traditional models or approaches to multimodality have posited a more
direct relationship between signs (formed out of some material) and meanings for
those signs. And, for some very simple semiotic modes at the lower bound of what
we are defining here as semiotic modes at all, this may be adequate. In such cases
we have formally trivial interpretative requirements where the meanings of the distin-
guishable signs may be characterized independently of particular contexts of use. For
example, within a particular culture, a red traffic light will always mean stop; we do
not need to consult some text history of the sequences of reds and greens that have
occurred to make this assignment of meaning – although, of course, it may well be
possible for different communities of users to construct other meanings, such as, for
example, ‘red means speed up and try and get by in any case’ (cf. Kress/van Leeuwen

Authenticated
44 John A. Bateman
2001, 8–9). Such alternatives are also, however, generally independent of text context,
i.e., previous sequences of reds and greens.
The code-based view appears in many accounts as a general model for semiotic
systems as such, even though its separation of signs from use gives rise to a host of
problems and misconceptions – probably the most damaging of which has been the
separation of code and inference (for further discussion, see Tseng/Bateman 2012;
Bateman/Wildfeuer 2014a). Those who believe (correctly) in the importance of infer-
ence mechanisms in interpreting meaning may take (incorrectly) the notion of code to
exclude such processes. This has in turn led to doubts about the applicability of semi-
otic approaches to a broad variety of non-verbal media. In contrast, the definition
proposed here insists on a more indirect relationship between material traces and
attributions of meaning which always involves notions of inference and, moreover,
distinct kinds of inference depending on the levels of semiotic abstraction involved.
This naturally focuses attention more on cases where compositionality, in its tradi-
tional linguistic sense, constitutes a major principle of organization within both the
mid-level and discourse semantic semiotic strata. Cases in which materially mani-
fested distinctions construct a ‘lexicon’ with relatively fixed ‘meanings’ which do not
operate compositionally (i.e., a code without inference) are of less concern precisely
because they do not offer sufficient means for more complex meaning making.
We therefore consider the presence of a discourse semantics stratum to be the
hallmark of semiotic modes ‘proper’. Without a discourse semantics, a semiotic mode
can only be effective within very particular contexts of use with little possibility of
extension – we might speak in such cases of semiotic proto-modes analogously to Hal-
liday’s (1978, 121) description of the semiotically simpler earlier phases of language
in children as protolinguistic. In contrast to this, the additional stratification provided
by a discourse semantics allows semiotic configurations to generalize across different
contexts by providing guidance schemes for contextual interpretations.
In previous accounts, much of the work that we assign here to the discourse
semantics stratum has been assumed to be part of the task of general accounts of
pragmatics or communication-in-context. This has led to the contributions of poten-
tially diverse discourse semantics not receiving the close scrutiny they demand in
the operation of individual semiotic modes and their combinations. It also leaves
the contextually-driven production of ‘differing meanings’ for ‘fixed signs’ an open
problem with few proposals for explicit mechanisms. The situation is in fact entirely
analogous to that in the study of discourse and text, where it has also taken sub-
stantial work to reveal how particular mechanisms of discourse construction can be
beneficially recast in the form of dynamic semantics rather than being left to generic
pragmatics or problem solving (cf. Kamp 1981; Wildfeuer 2013a). The essential notion
of discourse semantics that we build on is that it is possible to isolate a particular
class of inferences which function specifically to mediate between compositionally
constructed semantic specifications and more abstract contextual or individual
knowledge (Asher/Lascarides 2003; Wildfeuer 2013a). This means that discourse in

Authenticated
the sense we intend with discourse semantics, operates on a local, individual text-ori-
ented level (Martin 1992). It is then complementary to the broader kinds of discourse
that, following Foucault, are taken to operate within (and thereby define) cultures at
large (Kress/van Leeuwen 2001; Klug this volume) – we emphasize, therefore, that we
see discourse semantics as additional to pragmatic inferences and discourse ‘in the
large’, not as a replacement for such processes.
The inclusion of a stratum of discourse semantics is fundamental for our frame-
work and has consequences at every level, both for description and method. Con-
sider, for example, the longstanding debate as to whether images can ‘stand alone’
as autonomous communicative artefacts without verbal support (cf. Barthes 1964,
10–11). Although the denial of this possibility is a cornerstone of many ‘logocentric’
approaches, Nöth (this volume) shows how the adoption of a Peircian perspective
reveals any blanket rejection to be ill founded. Following Peirce’s characterization
of signification, the question reduces to the types of signs that would be necessary to
allow pictures to function ‘on their own’. According to traditional wisdom, images – at
least pictorial images – are iconic: i.e., they signify by virtue of resemblance. But then,
in order to function communicatively, they need to be embedded in propositions or
arguments that fix their intended communicative role.
As Nöth explains, in Peircean terms in order to move the interpretations of pic-
tures towards particular statements, assertions, etc. (and hence for them to function
‘autonomously’ for the communicative purposes generally attributed to language), it
is necessary for them to be assigned contributions ‘within’ more complex sign con-
figurations. Thus, an image by itself may not ‘explicitly’ communicate whether it is
a reference to some particular in the world (i.e., its indexicality status is established)
or whether the image is intended as an assertion, a promise, an example, and so on
(i.e., its dicent status as a proposition that may contribute to an argument is unclear).
Nöth then argues that the extra information necessary for making such commitments
can be provided from many sources, including the knowledge of recipients. Explicit
verbal information is not then required.
In the framework given here, the question is similarly seen as ill founded: any
semiotic mode can be autonomous – indeed, that is part of the definition of mode since
the possibility of ‘autonomous’ use is intrinsic to the nature of a discourse semantics.
Moreover, it is precisely the discourse semantics of any semiotic mode that is gen-
erally responsible for allowing the ‘growth’ of information or signification potential
characterized in the Peircean account by an assignment to different sign ‘types’, or
ways of being signs. This means that each semiotic mode may, by definition, involve a
full range of semiotic distinctions as these have generally been conceived, including,
for example, iconic, indexical and symbolic signs (Nöth this volume). Questions of
resemblance, of indexicality and of their symbolic nature are then necessarily sec-
ondary since, again in general, they can in any case only be derived by application of
the corresponding discourse semantics. Thus, the ‘decision’ that something is to be
interpreted as an index, i.e., as a reference to an entity in the world, or as a diagram,

Authenticated
46 John A. Bateman
graph, etc. is primarily a discourse decision. Discourse semantics in general offer

explicit models of the mechanisms that are responsible for managing this process.
The invisibility of discourse semantics in semiotic accounts hitherto has given
rise to substantial, if sometimes productive, ambiguities in traditional semiotic anal-
yses. It is, however, the presence of a discourse semantics that can now provide the
mechanisms necessary for the context and knowledge-based resolution of such ambi-
guities. We see this development as being entirely compatible with Peirce’s notion of
semiosis. The explicit treatment of discourse semantics now fills in this notion with
more formalised principles that can more readily be operationalized for empirical
exploration.
This returns us finally to several essential issues of method. First, discourse
semantics are defined in such a way as to provide (more or less modular) theories
of the domains they cover. These theories can be richly structured and so provide
the necessary organization for defining combinations and correlations across modes
in terms of structured mappings of the kind used in metaphor theory (cf. e.g., Force-
ville this volume) or cognitive blending (cf. Fauconnier 1997); we address the formal
underpinnings of such mechanisms in some detail in Kutz et al. (2014) and will make
several references to such structural mappings in our examples below. Second, the
inclusion of discourse semantics opens up the possibility of allowing the artefacts
and performances analysed to contribute more to their analysis themselves – pre-
cisely because discourse semantics incorporate the crucial operations of textuality.
We consider textuality to occur when artefacts and performances provide more or less
explicit cues for guiding their own interpretation, both at the more general level of
text types and genres and at the very specific level of how the text unfolds from clause
to clause (cf. Kesselheim 2011; Bateman/Kepser/Kuhn 2013; Stöckl this volume). This
is then a more refined and focused methodological contribution to analysis than is
possible when talking simply of pictures (images) and words, etc. Indeed, the func-
tioning of textuality in this sense was the original motivation of including explicit
specification of discourse semantics in accounts of verbal language (cf. Kamp 1981).
We now extend this basic insight to semiotic artefacts and performances in general.
2.4 Semiotic Modes Defined: Ramifications and Consequences
Our basic account of semiotic modes has now combined a location for fine-grained
detail concerning the workings of a discoursal component without seeing the semi-
otic modes themselves as separated from their supporting materiality. For these com-
ponents to work together, we arrange them analogously to the view of the linguistic
system proposed, for example, within systemic-functional socio-semiotics – i.e., again
following principles proposed by Hjelmslev, each semiotic mode is itself seen as a
stratified system. First, a material substrate must be fixed as an essential component
for any semiotic mode; this material may itself stretch over diverse sensory channels.

Authenticated
Second, a mid-level, ‘mediating’ stratum provides more (i.e., grammar-like) or less

(lexicon-like) compositionally functioning structural possibilities capable of drawing
‘functionally’-motivated differentiations in form. Third and finally, ‘above’, or ‘sur-
rounding’ these levels of semiotic abstraction, we place our more abstract stratum
of (local) discourse semantics, which operates abductively on the descriptions of the
lower levels of abstraction; this means that, in contrast to the rather different appro-
priation of the word stratum in Kress/van Leeuwen (2001, 4), we retain the semiotic
sense of strata as tightly and formally interrelated descriptions at different hierarchi-
cally ordered levels of abstraction.
Fig. 1: Abstract definition of a semiotic mode. All semiotic modes combine three semiotic ‘strata’:
material substrate, technical features (abbreviated as ‘form’) and discourse semantics.
The stratified model as a whole is depicted graphically in figure 1: working ‘upwards’

in abstraction here we see, first, materiality; second, form organized along the para-
digmatic and syntagmatic axes organizing the ‘technical features’ of the mode; and
third, discourse semantics. We now take this model, derived from both functional
linguistics and formal approaches to discourse, and apply it across all semiotic modes
regardless of their specific materialities.
By virtue of the three semiotic strata, all semiotic modes necessarily involve quali-
ties of perception, lexicogrammatical organization, and discourse mechanisms. These
distinctions are not then simply terminological – different facets of multimodality so
defined may be distinguished more precisely according to the differing mechanisms
that apply. For example, the lower two semiotic strata are related semiotically by
realization, or ‘manifestation’ – that is: the patterns of a semiotic mode are realised
in material features. This is quite different to the formal relations that hold between
these levels and the stratum of discourse semantics, which operate in terms of defea-
sible rules of interpretation. These formal properties help us subsequently to make
particular modelling decisions rather than others when attempting to characterize
the semiotic behaviour of some body of data. Whenever we find signs of abductive

Authenticated
48 John A. Bateman
reasoning at work, we know we have to consider locating those phenomena at least at

the semiotic stratum of discourse semantics. This helps us distinguish different inter-
acting components of explanations more appropriately within a complete descrip-
tion.
The decomposition of semiotic modes also allows us to clarify the status of the
various types of artefacts or performances covered. As an example we can consider
the pre-theoretical notions of a semiotic mode of language and a semiotic mode of
pictures. These are now termed pre-theoretical because the work of exploring to what
extent single semiotic systems are operative or not still needs to be done, rather than
assumed. Many researchers point out that language and pictures appear to differ with
respect to how ‘close to perception’ they are (cf. Sachs-Hombach 2003, 73); others
have made comments building on Goodman’s (1969, 136, 153) notion of the density of
semiotic systems – dense schemes, such as pictures, take ‘every’ distinction present
in material to be significant, others, such as language, impose an abstracting frame
over variation so that many unique events can be grouped together for the purpose of
meaning making (cf. Koch 1971). Where we can now go further is: (a) making explicit
that the degrees of density relied upon are defined by the semiotic modes employed –
which is similar to Goodman’s invocation of abstract symbol systems, and (b) for
some semiotic modes (generally precisely those that are ‘closer to perception’) the
actual material units that will be selected are partially a result of discourse interpre-
tation, rather than being an ‘input’ to such interpretation (cf. Bateman/Wildfeuer
2014a, 190–192).
Particularly this latter property, made possible by the addition of a discourse
semantics, changes considerably how we conduct analyses of multimodal artefacts
and performances and characterize how the contributions of distinct semiotic modes
may combine. In Bateman/Wildfeuer (2014b, 376–378), for example, we present a
discourse semantic analysis of an illustration of the conventionality of signs used in
comics by McCloud (1994, 128). In several panels, McCloud shows how essentially
the same slightly curved lines can be used in one case to indicate smoke from a pipe
and, in another, to show the unpleasant smell of a pile of garbage. The fact that the
signs being used are very similar goes further than simply indicating conventional
decoding – what is necessary is that readers attempt to find discourse interpretations
(in the form of specifiable discourse structures) which succeed in maximizing the dis-
course’s overall coherence.
The presence of the curvy lines in the panel and their spatial location sets dis-
course hypotheses concerning what is most likely related to what – the acceptance
of a discourse relation then establishes abductively a plausible way of binding the
information into the growing discourse. This is completely typical of the operation of
discourse semantics and how it can function to pick out signs from what is potentially
on offer in the material. In the case of the curvy lines, if there were no convincing
discourse interpretation, it would be possible for them to be seen as adding depth to
the background or even not to be seen at all. Such interpretations may also be subject

Authenticated
to empirical investigation, for example by exploring the allocation of attention within

the image during perception by means of eye-tracking.
The bundling according to three semiotic strata is therefore intended to charac-
terize differences between semiotic modes more clearly, to allow exploration of their
properties, to provide improved recognition criteria and, last but in the context of
multimodality anything but least, to formalise their combinations and interactions.
This in turn helps us to move away from isolated statements of interrelationships and
towards more systemic characterizations of the workings of semiotic mode combina-
tions as wholes.
3 Semiotic Modes and ‘Text Dynamics’

The description of semiotic modes now introduced has repeatedly mentioned the
central role of discourse semantics for the account. The notion of dynamic semantics
as the foundation for this discourse semantics has also been emphasized. This is fully
in line with the emerging consensus view that semiotic modes be examined in their
concrete contexts of use. In this section, we show this further by describing how the
incorporation of mechanisms for dynamic discourse interpretation takes on more of
the work of characterizing combinations of semiotic modes and their meanings than
can be appropriately covered without such mechanisms.
Many semiotic-oriented approaches to multimodality proceed by calling for
investigations of the ‘semiotic resources’ that individual modalities offer. Kress/
van Leeuwen (2006 [1996]) have been particularly prominent in promoting such a
view, although it is common in most accounts that take their lead from socio-semi-
otics (Halliday 1978). When this is approached as providing static descriptions of
resources, however, those description can readily become skewed in the following
fashion. Semiotic resources are first organized as static classifications. Then, when
the analyst is confronted with a multimodal artefact or performance to analyse, it will
be noted that particular combinations of properties seem to be doing semiotic work.
These combinations are subsequently included in the description of the modes that
are assumed to be operative. However, since there is no model of dynamics, these
properties are actually back-imported to form part of the description of the modes
that appear to be using them. The consequence of this is that the contents of semi-
otic modes are progressively widened in order to cover the many combinations of
resources that occur in real instances of multimodal artefacts and performances. This
descriptive widening inevitably leads to cases of overlap and fuzzy boundaries, as
well as single sets of resources apparently serving roles in different modes, since they
arise out of the fact of semiotic resources co-occurring in use. The dynamicity of com-
bining distinct modalities is replaced by modalities where the work of combination
has already been ‘smuggled in’.

Authenticated
50 John A. Bateman
A further consequence of this has been criticized at length by Bucher (2011).

When considering combinations of semiotic modes in any artefact or performance, if
the meanings of any elements being combined themselves depend on the particular
multimodal context in which they appear, then characterizations that focus on the
individual modalities involved will simply fail to address the core research question of
how meaning arises in multimodal contexts. The step of relating elements has already
presumed that the elements related ‘have’ the meanings that they are accorded by
virtue of their mutual occurrence – however, since those elements may quite possibly
not have had those meanings outside of that context, any such account is circular.
In short, the combination of meanings is assumed to be a fact of grammar to be read
off the co-occurrence of elements rather than a result of the formation of discourse
hypotheses.
An in many ways similar conclusion can be read from Stöckl’s (2004) particu-
larly detailed and useful characterization of semiotic modes and their interrelations.
Stöckl classifies modes drawing on several contributing perspectives, including the
sensory channels involved, medial variants of semiotic modes (e.g., written, spoken),
peripheral modes (e.g., modes depending on the existence of other modes – such as
the dependence of typography on written language or of intonation on spoken lan-
guage), as well as the internal organization of modes in terms of the structural con-
figurations they rely upon (sub-modes) and the perceptual qualities that allow them
to be produced and recognised (features). Several of these correlate with the semiotic
strata we have introduced – for example, features correspond to the distinguishable
properties of materials, while sub-modes correspond to the particular patterns that
semiotic modes impose on that material, “the building blocks of a mode’s grammar”
(Stöckl 2004, 14–15), within our middle semiotic stratum; we return to the question
and position of medial variants below when we have introduced media and their place
in the model as a whole. Then,
While it is certainly true that modes have their individual characteristics (semiotically, semanti-
cally and cognitively) which pre-determine how they can be deployed in a textual structure, the
dynamics of meaning-making must be given due emphasis. (Stöckl 2004, 27)
[…] text is the locus where all modes, sub-modes and features are realised. So it is the dynamics
of text production and reception, the complex chain from discourse to design to production and
distribution (Kress/van Leeuwen 2001, 1–23) that determines how we deploy modal resources
and how they in turn are construed in reception. (Stöckl 2004, 15)
Drawing on the multiply-stratified view of semiotic modes presented here will now
help us develop a more explicit account of how this might operate. It achieves this by
splitting descriptions across, on the one hand, more static components of semiotic
modes – i.e., the resources which define them individually – and, on the other hand,
more dynamic components that operate to combine meanings.

Authenticated
A crucial foundation stone for this process is provided by the link we maintain
with materiality as an inalienable part of each and every semiotic mode. Whenever
accounts relax this link, problems for characterizing the dynamics of multimodal
meaning making arise precisely because a critical source of communication across
modes is removed. Given this, it is interesting how many accounts are still willing to
countenance according materiality only a secondary role. Many authors, including
Kress and van Leeuwen, suggest that it is natural, almost definitional, for semiotic
modes to “los[e] their tie to a specific form of material realization” (Kress/van Leeuwen
2001, 22) – a position more reminiscent of the views of Saussure and Hjelmslev, where
the acceptance of materiality was marginal at best (cf. Hjelmslev [1943] 1961, 105),
than would be expected from Kress and van Leeuwen’s claim of a renewed focus on
materiality. Some semiotic traditions then go further and work with an explicit dis-
tinction between multimodality and multicodality (Weidenmann 1995; Dölling 2001):
the former corresponds to the physical material and its perception through sensory
channels; the latter picks up the non-material, semiotic contribution. Discussions
of multimodality and hybridity then address questions of whether single codalities
might be used across different (sensory) modalities, of how distinct codes may be
combined or hybridise, and so on.
This superficially more precise formulation obscures crucial differences between
semiotic modes, however. When materiality is factored out of the equation, the chal-
lenge of combining or relating semiotic codes reduces to a purely formal operation
of aligning distinct semiotic resources. Considered abstractly, it is always possible to
construct formal correspondences across such codes (cf. Kutz et al. 2014) but, sepa-
rated from their materiality, there is little guidance for just which correspondences
are appropriate or necessary for any particular multimodal text at hand and which
not. In contrast to the ‘separationist’ view, therefore, we take the position that the
contribution of materiality must be accepted in its own right – after all, many aspects
of perception have evolved precisely in order to give meaning of a rather direct nature
to our perceptual experiences (cf., e.g., Matthen 2005); this should not then be over-
looked or downplayed when attempting comprehensive accounts of multimodality.
We therefore consider it more plausible that semiotic modes will always bring with
them the ‘textural’ resistances of their materialities. Combining semiotic modes must
then respond to the issue of matching and reconciling differences in material affor-
dances. This promises interesting new sources of insight for just which combinations
of modes may ‘work’ and which may not, and why (cf., e.g., Björkvall/Karlsson 2011).
Methodologically, the constant co-presence of materiality when attempting anal-
yses employing semiotic modes insists that we always anchor our descriptions first
and foremost in the material distinctions that can be motivated by the assumption of
semiotic modes. We can suggest something of the consequences of this by briefly con-
sidering the analysis of a case similar to the curvy lines interpretation from McCloud
that we passed over briefly above, combined with some other plausible cases of semi-
otic modes. Figure 2 shows a constructed composite ‘image’ inspired by some discus-

Authenticated
52 John A. Bateman
sions of multimodality and advertisements in the literature (cf. Forceville 1996) that
will support the discussion. The point of this simple example will be to show how
the analysis of an artefact should be driven by the artefact itself (i.e., be bottom-up)
as far as possible, and how in addition this also plays an important role in helping
select both the top-down interpretations using discourse semantics and the dynamic
construction of meaning during discourse interpretation.
Fig. 2: A constructed image combining several potential sources of semiotic interpretation
We will not overly problematize the issue of which semiotic modes apply in the
present case; some further examples discussed below will bring this methodological
step out more clearly. What must be emphasised already here, however, is that we
at no point consider a methodological question such as what semiotic mode is this
representation? as a sensible place to begin. Our analytic approach must be more cir-
cumspect, first looking for evidence of semiotic modes that may be being deployed in
the material under analysis. Assuming a semiotic mode to hold is then, as suggested
above, an abductive hypothesis in its own right. Then, under the assumption that
some semiotic modes apply, we proceed with the analysis under that assumption and
attempt to maximise the coherence of the object under study. It is generally possible –
and in practice very likely – that there will be multiple semiotic modes at work in any
artefact or performance we are investigating and it will be their shared materiality
that guides their combination.
In the present case, then, we can probably safely assume via the visual over-cod-
ings present that there is some use of written language (since the letter forms are
visually salient) as well as some use of pictorial representations (since there is distri-
bution of information spatially that is not motivated, or ‘claimed’, by the assumption
of written language). Both of these, in this case trivial, assumptions remain abductive
hypotheses since they could turn out to be wrong. They are ‘working explanations’ for
the material form and necessarily have as consequence an always preliminary char-
acter.

Authenticated
This notion of ‘so far unclaimed’ information in the artefact is very important
for driving analyses: whenever there is unclaimed information, this is an explicit
indication (also methodologically) that there is more interpretative work that needs
to be done. We can start with the word apparently given in the lower portion of the
‘image’. Given the assumption that this is a piece of written language, we can apply
the semiotic resources of typography. We can note in passing that the occurrences in
the typographical representation are not entirely motivated by, for example, the need
to spell out the word cheese – there are two many ‘e’s. One of the semantic interpreta-
tions of typography is that a correlation can be constructed with the sound system of
spoken language. Now the sound system has various properties that the typograph-
ical system does not have. Some of these revolve around properties of continuous
sounds such as vowels: for example, vowel length – length is a continuous physical
quality. One communicative goal might be then to represent this continuous physical
quality with the non-continuous resources of typography. Several solutions could be
derived: for example, stretching the visual representation so that spatial extent cor-
relates with (e.g., in Peircean terms, is a metaphor for) temporal extent. Such a distor-
tion, although certainly sometimes used, may come at the cost of legibility.
The present solution is to employ repetition of the written form of the vowel. There
is then the discourse hypothesis that the number of letter forms for the vowel will cor-
relate with the length of the pronounced vowel sound. This is a possible hypothesis
for anyone familiar with typography and the linguistic sound system and relies only
on fairly mundane commonalities in the materialities employed (a more is more-met-
aphor). Discussions of this and similar techniques in phonology and morphology
have long been framed in linguistic discussions in terms of iconicity (Jakobson 1965);
here we go back to the origins of this terminology to draw more explicitly on Peirce’s
three distinct types of iconic signs (cf., e.g., Hiraga 1994). Most spatial ‘deformations’
of typography can consequently be characterized in this fashion: the general dis-
course task is to hypothesize other domains with which the spatial properties of the
visual typographic representation can be placed in correspondence. Moreover, such
domains may equally well be under the control of further semiotic modes, which pro-
vides a strong material basis for mappings between the resources co-deployed. The
kind of relation posited here is then rather different to notions of hybridity or mixing
commonly considered. For example, both Mitchell (2005, 261) and Krämer (2006, 80),
albeit in rather different ways, talk of actual uses of visuality and language mixing
their contributions in various proportions or ratios. Visuality and language are seen
as theoretically distinct conceptual poles, but then mix in use (cf. Stöckl 2004, 27).
The position here is that it is unnecessary (and often confusing) to see semiotic modes
as mixing in this way. Even when deployed in the service of concrete artefacts or per-
formances, the semiotic modes maintain their distinctive contributions. The combi-
nation of information that occurs is then a product of the operation of the distinct
discourse semantics applying and does not require a mixing of the semiotic modes
themselves – whatever that might mean.

Authenticated
54 John A. Bateman
Nevertheless, if the design decision of the present case relating repeated ele-
ments to length were used in some community with sufficient regularity, then it
could also become part of the semiotic resource of typography directly – in this case,
the repeated ‘e’s would be read as lengthening of the corresponding vowel without
recourse to external information. In this case we have an extension of the mid-level
semiotic stratum since the pattern has become part of the ‘code’ rather than being
a discourse hypothesis. Empirical investigation might then reveal that for different
readers (for example, by age or reading habits) or in different genres, the degree of
association between this typographical property and its interpretation turns out to
be different, thus providing evidence for the semiotic placement of the interpretative
procedures being deployed.
Moving on with the current example, the typographic system does not, however,
lay any claim to the fact that in the artefact under discussion the baseline of the
written word is not a straight line. This is still ‘unclaimed’ information. Returning to
a pictorial interpretation of the artefact there are again several hypothesis that can be
made. In the general case, as we will see below, these hypotheses are influenced con-
siderably by genre concerns. For the present, we will simply note that one potential
interpretation is that a Gestalt form reminiscent of a face or a smiley is on hand. Now,
if this discourse interpretation is hypothesized, then certain correlations between,
and segmentations of, visual qualities abductively follow. In particular, the two round
elements (actually originally designed to depict plates of spaghetti) and the curved
form below them are mapped to two eyes and a smiling mouth respectively. Again,
there is no necessity in this assignment: if a reader does not see the connection then
the correlations are not constructed. Whether or not this Gestalt reading is followed is
again something that can be empirically investigated in concrete cases of reception.
Our concern must be to set out the space of discourse interpretations possible and
what follows when particular paths are taken up rather than others. If the connection
is made here, then the correlations will hold – it is at this point that the indeterminacy
collapses.
If we assume in the present case, therefore, that this hypothetical interpreta-
tion is being followed, then the typographically expressed curve of the word cheese
receives an explanation as the upwardly curving line of a mouth. As before, this is an
abductive discourse hypothesis attempting to find the best explanation (Peirce) of the
data on hand and, in other contexts of interpretation, quite different readings of the
slope might be made preferable. There is no need to predefine any particular meaning
for the slope – indeed, there is no need to predefine it as a relevant visual property of
the image at all as its relevance (or not) is established during discourse interpretation.
As usual in such cases, then, the meaning of the artefact largely resides in the
connections constructed abductively during discourse hypothesis formation. These
connections offer an explicit representation of the commonly claimed meaning mul-
tiplication at work in multimodal artefacts (cf. Lemke 1998), characterizing mode
combinations as dynamic processes of text construction and reception rather than

Authenticated
as static or pre-given inter-relationships across codes or semiotic resources. The

curve, for example, does not have any such meaning as mouth or smile outside of
its context of use in this concrete text; nevertheless the mechanisms by which these
meanings emerge during dynamic discourse interpretation are not erratic or tied to
specific cases. There is also no need to add these possibilities to the description of
some individual semiotic mode of typography; the mechanisms of discourse inter-
pretation are sufficient for establishing the iconic mappings (in the Peircian sense of
structural relationships) that are required during the process of maximizing discourse
coherence.
We will see several more cases of the top-down role of discourse semantics for
perception and segmentation below. This example again emphasizes, however, that
any organization of semiotic modes around sensory channels is going to be of limited
relevance during analysis – information may be combined as required by the dis-
course semantics of any mode, regardless of which sensory modalities that involves.
The issues during analysis are more concerned with the more fine-grained claims
that any mode makes of the material being deployed. Less fine-grained selections
of material – whether these are the four (language, image, music, noise) of Stöckl
(this volume) or the five (written language, spoken language, static images, dynamic
images, sound) of Schmitz (this volume) – will always then tend to leave gaps when
confronted with actual artefacts or performances to be analysed.
4 Media and Genres

Semiotic modes never occur on their own, outside of a context of use; similarly, they
never occur without being used for some communicative purpose. When studying
semiotic modes in use, therefore, we need to have a suitable framework that charac-
terizes how they are embedded into contexts of use and into the concrete artefacts or
performances within which they operate. Within the current framework, these tasks
are managed primarily in terms of media and genres, which we now introduce.
4.1 Media and Communicative Forms
When examining any artefact or performance, it was suggested above that any number
of semiotic modes may be operative and careful empirical analysis is necessary to dis-
tinguish them. This follows from the fact that materials (both actual and virtual) are
able to support a host of simultaneously co-varying dimensions. This does not mean,
however, that we must always start from scratch – it is certainly possible to determine
likely constraints both on the semiotic modes that may apply and on their precise
manner of application. Perhaps the most prominent source of such constraint is the

Authenticated
56 John A. Bateman
medium within which the artefact or performance is couched. Medium-specificity as

such is then also a necessary component of multimodal analysis – that is: just which
modes may be operative and how they are combining may exhibit medium-specific
properties and so knowing more about the medium can help guide subsequent empir-
ical investigation.
That certain semiotic modalities regularly combine and others not is itself a
socio-historically constructed circumstance, responding to the uses that are made of
some medium, the affordances of the materialities being combined in that medium
and the capabilities of the semiotic modes involved. Under this view, a medium is best
seen as a historically stabilised site for the deployment and distribution of some selec-
tion of semiotic modes for the achievement of varied communicative purposes. For
example: books are a medium, traditionally mobilizing the semiotic modes of written
text, typography, page layout and so on. When we encounter a book, we know that
certain semiotic modes will be likely, others less likely, and others will not be possible
at all (at least not directly – more on this below). This relationship between media and
semiotic modes is suggested graphically in figure 3 and, in the suggestive phrasing of
Winkler (2008, 213), we can consider any medium as a biotope for semiosis.
Fig. 3: Relation between semiotic modes and media
Media have a further range of interesting properties that are useful when we wish to
reflect on their development and application over time as well as helping directly with
the questions and problems of multimodality. The first property we discuss follows
straightforwardly from, on the one hand, the capability of semiotic modes to be multi-
sensorial and, on the other, the possibility that particular media may not provide full
sensorial access to the options a semiotic mode in principle spans. This situation is
quite common and is often linked with technological developments where the deficits
of a new medium are more than counterbalanced by new capabilities (consider, for

Authenticated
example, the respective introductions of the printing press, the telephone and the
web).
It is this potential mismatch between the material of the semiotic mode and that
provided by some medium that gives rise to the phenomenon of medial variants used
by Stöckl (2004) in his characterization of semiotic modes introduced above. It also
covers the possibility of broad re-use of particular techniques or mechanisms, such as
representational pictures that may be drawn, painted, sketched with a mouse, etc. –
all very different media but sharing sufficient overlap in the distinctions expressed to
allow transfer and alignment of semiotic mode use. In such situations we will refer to
media as being depictive, adopting this term as a generalization of its more common
usage with respect to pictorial representations. The nature of pictorial depiction is
still subject to considerable controversy and debate; we will not engage with this dis-
cussion here, however, even though much would be gained by applying the view of
semiotic modes we have presented. A useful bridge is provided by Newall (2003), who
observes:
Pictures regularly depict other pictures. Paintings or drawings of galleries, studios, and other
interiors, for instance, often depict pictures hanging on walls or propped on easels. (Newall
2003, 381)
Newall then proceeds to discuss what kinds of properties are possible for the depicted
pictures, concluding by and large that those properties are actually (some subset of)
the properties of the depicting artefact, rather than of the artefact being depicted.
Newall’s characterization of depiction then in many places comes suggestively close
to how we have introduced semiotic modes:
A system of depiction, for the purposes of this essay, is a practice that determines the features of
a picture’s surface that bear on its content. […] A particular system of depiction, as I have defined
it, is distinguished by the type of features it determines to be content-bearing. (Newall 2003, 384)
We now generalize beyond the pictorial case and rely upon the full range of material
features that a collection of semiotic modes may mark out for use.
Media can then be said to operate depictively when their (virtual) material –
i.e., the materials formed by the combination of the materials of their contributing
semiotic modes and their technical capabilities – offers sufficient foundation for the
application of semiotic modes from other media. Such cases do not require any exact
equivalence in material form: all that is necessary is the availability of some ‘sub-
slice’ of material distinctions sufficient for attaching or linking into the distinctions
of the depicted semiotic modes. This provides a ready place for the phenomenon men-
tioned at the outset of a dance being shown as a photograph within a film and so on as
well as such nowadays everyday phenomena as reading a digital version of a newspa-
per in a web browser. The virtual canvas created by current web browser technology
is very different to that created by print technology and we are therefore dealing with

Authenticated
58 John A. Bateman
quite different media. There is, however, still sufficient commonality to offer material
support that is in many ways overlapping (although there are still many differences,
cf. Bateman et al. 2004).
In a similar vein, Stöckl (2014, 276) suggests differentiating between the medium
(e.g., i-Pad) and communicative forms (e.g., Holly 2011, 155) – e.g., newspapers, radio,
e-mail, etc. – in order to describe this phenomenon. Medium as we are using it here is
then often closer to communicative form in intention than physical material. This is a
general problem to be faced with content-flexible technical devices: the i-Pad would
not then be a medium in our sense, but more a possible virtual canvas that might be
employed by a variety of media. Stöckl then also prefers to see film, comics, opera,
dance, etc. as medially restricted communicative forms rather than semiotic modes
in their own right. The communicative forms set the medial and situational configu-
rations for the production of multimodal texts, which, according to their text types,
bring different semiotic modes together.
Within our present framework, we consider these largely empirical issues. We
do not yet know whether ‘film’ or ‘comics’, for example, contribute their own semi-
otic modes – although there appear to be good arguments that at least some of the
regularities and specific workings of these media offer good candidates for semiotic
mode status. The identity criterion to be applied is always that of finding an appro-
priate discourse semantics that explains how particular slices of patterns through the
employed material are to be related and contextualized. In all cases, however, there
will also be other semiotic modes that apply over the ‘same’ material, drawing on
different slices in order to carry their own signifying practices.
Discussion concerning the appropriate use and definition of terms like media
continues (cf. Posner 1986; Dürscheid 2005; Schneider/Stöckl 2011) and so we will not
enter into this particular facet of the challenge further here. More important for our
current purposes is not the terminological labelling, but rather the ontological struc-
ture of the model as a whole and the relationships between its parts and processes –
i.e., how the identified components function together and which networks of depend-
encies hold. In this respect, at least the functioning of media or communicative forms
in relation to our definition of semiotic modes should now be clearer. As long as we
maintain the entire structural ensemble of modes-within-media as set out so far, we
will be in a better position to track the kinds of meanings being made, including those
occurring within media depictions. This allows us to avoid assuming hybrids or fuzzy
boundaries that are not present and which are not necessary for characterizing the
multimodal meaning at work.
Depiction thus provides a necessary barrier that prevents treatments of semiotic
modes and their combinations unravelling – that is: just because a photograph is
shown within a film, or a newspaper on an i-Pad, does not mean that we suddenly
stop having the expected and conventionalized properties of photographs or news-
papers and how these media employ semiotic modes to make meanings. There is no
‘combination’ of semiotic modes intrinsically involved in such cases. Although more

Authenticated
creative combinations can then be generated (e.g., the increasingly frequent ‘trick’ in
film of having the contents of a photograph move, or allowing zooming and hyper-
linking on an i-Pad), this alters the medium depicted and in so doing brings different
semiotic modes together in the service of communication.
Other cases of medium depiction are also useful to discuss. One relatively simple
class includes notations. For example, the use of braille as a printed form of rep-
resentation for written language involves a medium that is different to that of regular
print. However, the distinctions that are drawn in the material of that medium are suf-
ficient to cover the distinctions drawn in at least the written alphabetic form of verbal
language and so allow a straightforward transference to another material carrier with
usefully different affordances. Whether or not any further semiotic modes have grown
with respect to this medium would require empirical investigation – involving estab-
lished practice and communities of users. Certain correlates, for example, of typo-
graphic layout could be expected to serve a function, just as positions and divisions
(by tactile perception) within the ‘page’ space. There is in this case, then, no reason
not to consider application of many of the semiotic modes related to the use of printed
language to the medium of braille publications. Other examples of notation would be
the use of a light source for Morse code or the representation of music that occurs in
sheet music – again, the question of whether additional semiotic modes specific to
these medial forms have emerged is always an empirical question. For Morse code,
this appears unlikely – for sheet music, quite possibly.
The situation is very different when we compare static image and moving image,
classified as medial variants by Stöckl (2004). Here the semiotic potential of the
media supporting these two kinds of image differs so substantially that there are
almost certainly separate systems (at least in part) in operation. More on the poten-
tial course of semiotic development of spoken and written language is suggested in
Bateman (2011); while the emergence of distinct semiotic modes for film is taken up
in Bateman/Schmidt (2012, 130–144). For our present concerns, we will focus below
more on some of the other uses that can be made of a material supporting visual per-
ception, such as those observed in cases of so-called text-image relations.
4.2 Multimodal Genres
Whereas a collection of modes may regularly be mobilised within a medium, just

what is done with those modes requires a more general level of description still: that
of ‘text type’, or genre. It is possible for an in principle unrestricted range of genres
to be carried within any medium, although here, as elsewhere, there may well be,
and most often are, conventional restrictions or associations that arise during their
use over time. In other words, particular socio-cultural periods will employ different
media for different ranges of genres, not because this is necessary but as part of the

Authenticated
60 John A. Bateman
ways in which meaning distinctions are established and signalled in a culture in any
case.
Thus books are a medium, traditionally mobilizing the semiotic modes of written
text, typography, page layout and so on – however, as remarked above, it is possible
for an unrestricted range of genres to be carried within this medium. One such genre
might, for example, be a factual report; others are biographies, school textbooks and
so on. Another rather different medium is that of newspapers: newspapers are dis-
tinct from books in that they have different modes of distribution and consumption,
although the semiotic modes drawn upon largely overlap with those found in books.
Again, there are many genres that may be employed within the newspaper medium
and these, by and large, do not overlap with those occurring in books. It does not
therefore make sense to talk of newspapers or books (or web pages!) ‘being’ genres,
but it does make sense to ask what kinds of genres typically appear in these media.
As argued in Bateman (2014a), information that is inherent in the medium should not
be considered as offering an identifying feature for some presumed variety of genre
in that medium.
Appropriate considerations of genre need to be clear concerning the purpose for
which genre classifications are being pursued. One of the most general such pur-
poses is that set out by Lemke: “Co-generic texts are privileged intertexts for each
other’s interpretation” (Lemke 1999) – in other words, knowing something about the
genre of some text offers useful ways of considering the properties of other, (generi-
cally) related texts and of distinguishing those texts as a family from non(generically)
related texts. Genre-attribution thus brings with it a horizon of expectation (Todorov
1990): given the decision to consider a text in the terms of some genre, the reader/
hearer/viewer should then be in a position to make a variety of predictions concern-
ing how that text is organized, what it is for, and so on. This is then as clearly relevant
for multimodal artefacts and performances as it is for purely textual artefacts. Indeed,
without a genre allocation, it is often not possible to provide a sensible description of
a text, multimodal or not, at all.
Here we restrict our discussion of genre to locate it within our general frame-
work for multimodality. Genres thus define families of artefacts or performances as
being similar in some respects of organization and form. Moreover, in addition to this,
common to most definitions of genre is the presupposition that the families of texts
picked out as generically-related should form a socially significant class – that is, in
order to qualify as a genre there must not only be formal similarities, but also some
recognition in society at large that the genre ‘exists’ and does some specifically recog-
nisable social ‘work’ (cf. Miller 1984; Swales 1990 and many more). This makes genre
much more than a passive classificatory device: the existence of a genre in a culture
is considered a relatively stable communicative strategy both for achieving some
relevant social purposes and for allowing its practitioners to display that they are
attempting to achieve those purposes. This adds a psychological or strategic function
to genre use (Bhatia 1993, 13), while others see the entire repertoire of genres that are

Authenticated
available as an effective way of characterizing the discourses constituting a society or

community as a whole (cf. Martin/Rose 2008).
In certain respects, genres resemble the lexicon in that they are constituted by
restrictions in the options that a semiotic system makes available in general. They are,
however, very much more abstract and can call upon restrictions in almost all areas of
the semiotic systems they are built on. For this reason, we characterize genres graph-
ically in figure 4 as a ‘cloud’ of possibility that surrounds and permeates media (and
their contained semiotic modes). Thus genres might, on the one hand, be sufficiently
general that they may be employed across a range of different media – narrative might
be a candidate for such an abstract genre; on the other hand, they might be specific to
particular semiotic modes within particular media.
Fig. 4: Relation between semiotic modes, media and genres
Moreover, for genre theories that allow internal generic structures, or generic stages,
individual components of genres may themselves adopt differing combinations of
semiotic modes, giving rise to internally diverse multimodal genres (Lemke 2005;
van Leeuwen 2005, 80). Subsequently, specific genres may become more general over
time as their social function is found useful, or general genres might become more
specialised, as their social function becomes restricted in application. Again, it is
always an empirical issue just how particular genres develop and change over time
and the purpose of a framework such as that presented here is to provide the theoret-
ical space within which such changes can be described and tracked.

Authenticated
62 John A. Bateman
5 The Multimodal Description of Text and Image

Combinations
We have now introduced a rather general framework for exploring multimodal phe-
nomena. In this section, we illustrate the abstract framework in use. In particular, we
will address the question of text-image relations, since this is an area that, on the one
hand, is considered by almost all who work on the theory and foundational under-
pinnings of multimodality and, on the other, is a case where a rather straightforward
assumption concerning the nature of the ‘two’ modes being combined proves par-
ticularly prevalent. We will use this to argue that a more differentiating account of
the relationships within and between semiotic modes, as well as their appropriate
embedding in media and use by genres, is crucial for moving towards more adequate
characterizations of the phenomena and mechanisms involved.
One of the most detailed accounts of text-image relations is the classification
network proposed by Martinec/Salway (2005) on the basis of examples from news-
papers, textbooks, advertisements, diagrams, etc. Martinec and Salway go so far as
to claim:
The system may need modifying as our sample of image-text combinations increases; however,
even if the relations that we are writing about can be further subclassified and genre-, or regis-
ter-specific realizations added, we surmise that the outline of the basic system will probably stay
as it is. (Martinec/Salway 2005, 341)
Martinec and Salway’s classification is indeed very general as it draws primarily on

the distinctions found useful in systemic-functional grammar for describing semantic
relations between grammatical clauses; a detailed introduction to this, and several
other schemes for describing text-image relations is given in Bateman (2014b). A more
worrisome possibility not addressed by Martinec and Salway is, however, that the
outline of their system may indeed stay as it is – not because it already characterizes
the data but because it fails to engage with the data it is purporting to describe. In
this case, there may never be evidence that the classification needs to be changed
since it can, due to its generality, always be ‘made to fit’. This is a consideration that
needs to be raised for all general proposals for text-image relation classifications and
is not specific to Martinec and Salway. The methodological issue remains as always
one of making explicit just how categories and classifications are to be motivated and
evaluated.
Even the starting assumption that an investigation of text-image relations has
already clarified just which semiotic modes are being brought together – i.e., ‘lan-
guage’ and ‘image’ – is premature in several respects. For example, we have argued
that semiotic modes may well introduce structured entities (their syntagmatic organ-
ization) with mode-specific discourse semantic relations holding within those struc-
tures. Until this information has been uncovered, it is unlikely we will be in a good

Authenticated
position to state what relations may be holding between the elements placed in struc-
tural configurations. If we consider the contrasting situations of (i) the layout on the
page of elements in a comic or graphic novel, (ii) the layout on the page of elements
in a newspaper, or (iii) the layout on the page of a tourist guide giving us information
about a tourist attraction, it should be clear that very different kinds of discourse
semantic relations apply. In fact: the discourse relations holding for comics and
graphic novels overlap with relations found in other narrative communicative forms
(cf. Cohn 2013b; Bateman/Wildfeuer 2014a); those found on the newspaper employ
the spatial distribution of elements on the page for the expression of, among other
things, news salience (cf. Bateman/Delin/Henschel 2004); while the discourse inter-
pretations of elements related by spatial proximity in the tourist guide are often cap-
tured well by accounts of multimodally extended notions of rhetorical organization, in
particular Rhetorical Structure Theory (RST: Mann/Thompson 1988) as described in
Bateman (2008, 151–163). Within each of the ‘orchestrating’ modes at work here there
will typically be many occurrences of items that draw on written language and items
that are image-like. Whether these are best described with reference to a mode-inde-
pendent classification system is then, at best, an open issue at this time.
We suggest here, therefore, that we would be better off, at least methodologically,
first paying close attention to the explicit identification of both genre and the semiotic
modes that are at work in any objects of analysis. To show this, we will briefly con-
sider analyses of the four contrasting cases of apparent co-occurrences of ‘text’ and
‘images’ shown in figure 5. For the purposes of the present discussion, these examples
are all relatively simple but should nevertheless allow the desired points to be made.
The first two examples are drawn from a sequence of instructions (Ikea) for an item of
home furniture; the third one is from a scientific journal article (Nature 365, Svoboda
et al., 1993); and the fourth one is an adapted version of a comics panel (Fletcher
Hanks, Fantastic Comics # 15, 1941).

Authenticated
64 John A. Bateman
Fig. 5: Four examples of text and various kinds of image occurring together
5.1 Visual Procedural Instructions
To begin, a concern might be raised with respect to the first two examples that it is
unclear whether we are dealing with text-image relations at all, since many of the
non-textual components are clearly diagrammatic. This takes us directly to the
assumption that we already know what semiotic modes are relevant and which not.
Instead, we choose to move the entire discussion to the level of semiotic modes and
address this question very differently: we do not assume a priori that we already know
what is to be allocated to text or to image (cf., e.g., Bateman 2014b, 12–18 and Ell-
eström 2014, 2). The question of what semiotic mode(s) apply is prior. As suggested
above, this is also necessary in order even to segment the relevant units of analy-
sis – to talk of text and image wherever they might be found in the visual field is
too weak, as different semiotic modes might be doing quite different things with the
available ‘space’ of the canvas. As the artefacts examined become more complex and
other orchestrating modes are mobilised, this becomes an ever more central issue for
effective analysis.
For examples (a) and (b) we clearly need to consider the genre of procedural
instructions in order to situate them appropriately. Moreover, within procedural
instructions there are typically distinct genre stages with differing functions and
properties. Informally we can draw attention to the existence of at least three main

Authenticated
stages: general information, the component parts of the item to be constructed, and
the instructions themselves. For the cases at hand, this is also evidently a multimodal
genre in that the realizations of the stages draw freely on diagrammatic and textual
representations. For the text-image question, therefore, the issue is what relations do
we find between such diagrammatic and textual elements in this context of use.
The depiction in (a) is drawn from the genre stage of setting out the component
parts that are required (and so should be present) for the construction to succeed. This
probably already gives us sufficient information to decode the intended meaning of
the text-image combinations present, even without prior knowledge of how such mul-
timodal procedural instructions are constructed. The communicative function of this
genre stage can be glossed (also multimodally) as an assertion of the form “the fol-
lowing components are part of what you are building” plus a ‘table’ showing how the
components appear, their company part numbers, and how many there are of each.
The communicative functions are also fairly self-evident: the appearance is shown so
that the user can find them, their quantity can also help identify the intended parts
as well as clearly help determine whether any are missing, while the part number
might be used for ordering replacements. The instructions evidently assume that all
the parts are present because the official part number information is made nonsalient
by the selection of a (very) small font; we will not discuss this further here.
Considering the relation between the remaining textual element, “10x”, and the
‘image’, we can note that they are placed in proximity to one another and so it is likely
that they are to be interpreted with respect to each other. However, the relations that
come into question are not those of the comic page, newspaper or tourist guide men-
tioned above. Thus, however this particular artefact is working, the significance it is
ascribing to proximity differs to that exhibited in these other cases. In terms of Salway
and Martinec’s classification, the relationship would probably be classified as some
further subtype of elaboration and as the text being subordinate to the image (since it
is the image that is clearly ‘nuclear’). The particular meaning of the “10x” is then that
there are ten of the items so depicted. But this follows directly from the communica-
tive functions to be achieved in this genre stage; there is very little else that could be
intended. Knowing (or hypothesizing) the organization of the genre stage then pro-
vides the necessary information for explicating the text-image relation holding.
This is very similar in form to Bucher’s (2011, 129) critique of Salway and Mar-
tinec’s account – Bucher argues that it is the practical task of communication that
determines multimodal use and coherence and not the presence of general classifi-
cations of possible text-image relations. Thus, relations cannot be worked out on the
basis of the elements, but must be traced back to what communicators are trying to do
with the elements they mobilise (Bucher 2011, 131–132). This indeed appears to be the
case, although here we look to the account of genre to provide information about just
what actions might be relevant. That is: in order to understand the text-image rela-
tion, we need to understand what the genre stage is doing; when we have understood
this, we know the text-image relation. We can then at that stage give a classification in

Authenticated
66 John A. Bateman
terms of Salway and Martinec’s (or any other) categories, but those categories would
not have helped us determine the genre stage and so appear to operate more as post
hoc labels rather than as explanatory mechanisms. Where our framework and that
of Bucher’s differ, however, is in the presence of the discourse semantics stratum:
whereas Bucher places the task of interpretation in pragmatics and, more specifically,
general theories of action, we retain the guidance aspects of the organization of the
artefacts themselves (i.e., their textuality) as a part of discourse semantics.
This means, quite concretely for analytic method, that we take the proximity rela-
tionship in the artefact as a semiotic resource directing attention to the need to provide
an explanation for that proximity, and the possible explanations are offered by the
relations given by the discourse semantics of the mode. As we shall see when we turn
to the other examples, we consider it beneficial to avoid general problem-solving in
such cases because what can be done with the material distinctions is often already
tightly constrained by what the semiotic mode allows.
We have spent some time on this simplest of examples because already in the
second case, we are confronted with a very similar appearing text-image relation with
a very different semantics. In (b), the “4x” now indicates that the depicted action
has to be performed four times. This information is, however, again available from
the different communicative function of this genre stage. Here the instructions are
concerned with actions, what is being acted on, the tools to be used, the locations
and directions of the actions, etc. rather than static descriptions of parts. It is then
natural that information concerning objects, instruments, locations and frequency or
number be given.
The solutions at work here for these communicative tasks are highly convention-
alized – so much so that we may well talk of a semiotic mode of graphical instruc-
tions, with its own rather limited range of discourse relations and lexicogrammatical
building blocks (cf. Schumacher 2013). Moreover, this mode is again an orchestrating
mode since its raw materials are diagrammatic representations, typically drawn in
perspective, plus limited textual annotations. It also includes graphically expressed
information concerning direction of movements (as in the bold circular arrow shown
below the screw), paths of movement (shown by the thin lines passing from screws
through holes), as well as zoom-ins to provide important detail – a semantic relation
that is very frequent in instructions of this kind. Here it is also interesting that the
graphical realization of the zoom-in is almost identical to the form that functions as
speech balloons in comics. In the present semiotic mode and generic stage, however,
it clearly has nothing to do with speech balloons.
In short, therefore, the kind of relationships between text and image in the first
two cases are very specific and quite distinct from how captions of visuals are gener-
ally used – if the image in (a) was instead being used in an encyclopaedia on tools, for
example, the text would most likely present a linguistic label for the depicted object,
which is not what occurs here; and the second case is different again. Although in
both these cases it may be possible to select a relation from Salway and Martinec, this

Authenticated
classification appears to do rather little work as far as explaining the construction of

multimodal meaning is concerned.
5.2 Scientific Visualizations and Graphs
In depiction (c), we see another genre and yet another semiotic mode at work. The
‘image’ here is a very specific type of diagram with its own further conventions and
restrictions – that is, more precisely stated, the Gestalt form of the visual representa-
tion makes a conventionally motivated abductive hypothesis that we are dealing with
a graph a good line of inquiry to follow. The kinds of meanings made within such
representations are again very specific. The labels on the axes, for example, are not
identifying what the lines are, they identify what kinds of values and measurement
units apply to values that may appear in the graph body. The numbers aligned with
designated places on the axes are also not only identifying those places but setting up
a correspondence between a continuous space on the graph and a continuous range
of values – values that may be read for points in the graph body by their vertical or
horizontal alignment. With highly specialised tasks of this nature, it is again ques-
tionable to what extent general talk of text-image relations is going to be appropriate.
The graph is also clearly assertional, although ascertaining what precisely it
is asserting is, in this case, helped considerably by the text of the main article: in
fact, the paper wants to claim that the relation between displacement and the output
voltage from the identified device (the interferometer) is linear up to around 150nm.
We thus find a standard usage of rhetorical relations as covered by, for example, Rhe-
torical Structure Theory (Mann/Thompson 1988) – in this case an evidence relation
needs to be constructed by the reader between the elements of the text and the graph.
However, in order to function as evidence, the graph must be abductively assumed to
be asserting that the relationship between input and output is as shown – if this were
not an assertion, it could not function as providing evidence. As discussed above with
respect to the autonomy of images and Peirce’s characterizations of types of signs,
this argumentative force is only possible when images receive suitable additional
support – and that support comes here again from the discourse semantics (captured
in part by Rhetorical Structure Theory) and its use within a particular genre.
5.3 Visual Narrative: Comics Panels
Finally, in depiction (d), we appear at first glance to have moved back to more famil-
iar text-image territory with two instances of what are often considered as text-im-
age relationships: one with the caption and one with the speech balloon. However,
here again the fact that we are now apparently in the specific communicative form of
comics with their accompanying rather specific semiotic modes raises doubts about

Authenticated
68 John A. Bateman
the value of a generic classification. Indeed, the caption in the present case is very
different in function to those, for example, found in newspapers or even in the graph
description from the previous example. This can be verified by the simple procedure
of exploring different placements of the caption with respect to the material it is being
related to. In depiction (c), the caption could be placed anywhere in close proximity to
the graph – the semiotic mode at work would still assign the same labelling function
to graph and caption regardless; in the case of the comics panel, this is not the case.
In essence, the different function of the comics panel is one of placing time-indexed
descriptions on a temporal path that is as far as possible tied to the reading path.
The caption should not then be read ‘around’ the same time as the panel is exam-
ined: it needs to be read before that panel as preparation for its content. This can also
be corroborated by examining the content of panels that do appear in other spatial
locations with respect to the panel they are connected with: their content changes
according to the precise temporal placement. Thus captions can usefully be seen, at
least in certain respects, as further panels that happen to be expressed using words
rather than images.
This means that, even though the captions appear to be classifiable according
to Martinec and Salway’s scheme (in fact usually as enhancements of time), their
role for the comic and its narrative is even more determinate. These more specific
relations could be added to Martinec and Salway’s network – indeed, some versions
of text-image relation classifications already contain similar categories (cf. Bateman
2014b, 208–211). But it is not in general true that such a temporally specific relation
can be expressed visually by spatial proximity between a text fragment and an image:
it only holds for comics (and perhaps some other strongly temporal communicative
forms: more empirical study would be necessary). To add this possibility to a general
characterization of text-image relations, therefore, adds uncertainty where, in most
cases, there is none. The discourse semantics of the semiotic mode deployed may
already have provided sufficient mechanisms for interpretation.
The other instance of a text-image relation, the speech balloon, is also often
added into classifications of text-image relations as if this were a straightforward
step (cf. Martinec/Salway 2005, 352). It is certainly true that most readers, regardless
of whether or not they are well practiced in the comicbook medium, will recognise
speech balloons and their general intent of expressing that someone is saying some-
thing. Indeed, readers who are not familiar with comics may make an interpretation
that the speech balloon is a kind of graphical notation for a verbal expression of the
form the indicated character says X. This does not require the reader to go beyond
the semiotic mode of verbal language, probably augmented with some assumptions
concerning the second-order semiotic mode of typography. A similar reading in terms
of a notation might be assumed for thought bubbles. For such readers, there is no
particular semiotic mode for comics as such and the text-image relationship dissolves
to a shorthand form for connecting speakers and thinkers to the corresponding texts.

Authenticated
Speech balloons are, however, native to the media of comics and graphic novels,
although other media can of course depict them and, in so doing, quote their usage
in comics. Readers sophisticated in the reception of comics may then act differently
to the naive readers posited above. Cohn (2013a), for example, argues that comics
have developed a particular productive system for interfacing textual contributions
and the pictorially-displayed information in panels. To capture this, he classifies such
co-occurrences along two parallel dimensions: root awareness and adjacent aware-
ness. These paradigmatic options are expressed, or realised, in a visual syntagmatic
configuration described as Carrier–Tail–Root. The standard cases of speech balloons
and thought bubbles are then simply realizations drawn from this semiotic resource.
The productivity of the system stems from the fact that the syntagmatic configuration
allows a variety of semantic configurations to be composed, including cases where
the agent involved is absent or where the tail is omitted, as in onomatopoeic uses
of language such as Bang!, etc. or, indeed, captions. The former case would be one
where there is no root awareness (because there is no one producing the sound as a
communicative act in the first place) but adjacent awareness because all in the panel
can hear the indicated noise; the latter case would be neither root awareness nor adja-
cent awareness because the contents of the caption are (in general) non-diegetic and
not accessible to the participants in the storyworld.
This system is then a proposal for a semiotic mode of comics, one that is also
orchestrating in function and which brings quite medium-specific relations to bear –
relations that only partially overlap with the simple assumption that speech balloons
are shorthand for locutions. This emphasizes again that analyses need to address
artefacts and performances together with the diverse user communities that engage
with those materials, as well as with the multimodal genres being performed, since it
is unlikely that these interpretative possibilities are relevant for, for example, news-
papers, scientific papers, or procedural instructions.
6 Conclusions
This chapter has set out a differentiated framework for considering the phenomena
typically discussed under the rubric of multimodality. The framework binds together
materiality, ‘lexicogrammatically’ organized technical features of form, and discourse
semantics in order to characterize signifying practices in general. It was suggested
that this degree of differentiation is important in order to avoid conflating rather dif-
ferent processes and to open up analysis methodology so that the detailed styles of
working of particular artefacts, performances, media and genres can be probed more
effectively.
The present account therefore differs in several respects from previous views
of semiotic modes. Working with the framework means, for example, that there are

Authenticated
70 John A. Bateman
going to be many, ‘smaller’ semiotic modes operating in a semiotic artefact or per-

formance than typically revealed by broader, sensory channel views. Assigning pri-
ority to sensory channels when distinguishing modes is not then seen as methodo-
logically helpful for the central multimodal task of explaining how combinations of
modes function productively to produce meaning. Only when attention is turned to
the micro-level of individual semiotic modes do we find the necessary level of detail
for formalizing and empirically investigating how meanings are being combined and
constructed in multimodal use.
In emphasizing the role of discourse semantics, the approach can also be seen to
favour a more explicit characterization of the textual functioning of multimodal arte-
facts and performances. This contrasts with and complements what may be termed
pragmatic approaches, which tend more to see multimodality as problem-solving in
the context of individual communicative situations – thereby aligning the operation
of multimodality more to Saussure’s notion of parole. While much can be done with
such problem-solving approaches, they do not always lead to appropriate character-
izations of how strongly conventionalized mode combinations operate nor to more
refined accounts of how the artefacts and performances themselves can be structured
as texts to guide interpretation, both within and across modes.
7 References
Asher, Nicholas/Alex Lascarides (2003): Logics of conversation. Cambridge.
Barthes, Roland (1964): Elements of Semiology. London. Translated by Annette Lavers and Colin
Smith.
Bateman, John A. (2008): Multimodality and Genre. A Foundation for the Systematic Analysis of
Multimodal documents. Basingstoke.
Bateman, John A. (2011): The decomposability of semiotic modes. In: Kay L. O’Halloran/Bradley A.
Smith (eds.): Multimodal Studies. Multiple Approaches and Domains. London, 17–38.
Bateman, John A. (2013): Dynamische Diskurssemantik als allgemeines Modell der Semiose.
Überlegungen am Beispiel des Films. In: Zeitschrift für Semiotik 35 (3–4), 249–284.
Bateman, John A. (2014a): Genre in the age of multimodality. Some conceptual refinements for
practical analysis. In: Paola Evangelisti Allori/John A. Bateman/Vijay K. Bhatia (eds.): Evolution
in Genres. Emergence, Variation, Multimodality. Frankfurt a. M., 237–269.
Bateman, John A. (2014b): Text and Image. A Critical Introduction to the Visual/Verbal Divide.
London/New York.
Bateman, John A./Judy L. Delin/Renate Henschel (2004): Multimodality and empiricism. Preparing
for a corpus-based approach to the study of multimodal meaning-making. In: Eija Ventola/
Cassily Charles/Martin Kaltenbacher (eds.): Perspectives on Multimodality. Amsterdam, 65–87.
Bateman, John A./Matthis Kepser/Markus Kuhn (2013): Film, Text, Kultur – Beiträge zur Textualität
des Films. In: John Bateman/Matthis Kepser/Markus Kuhn (Hg.): Film, Text, Kultur. Beiträge zur
Textualität des Films, Marburg (Textualität des Films 1), 7–29.
Bateman, John A./Karl-Heinrich Schmidt (2012): Multimodal Film Analysis. How Films Mean. London.
Bateman, John A./Janina Wildfeuer (2014a): A multimodal discourse theory of visual narrative. In:
Journal of Pragmatics 74, 180–218.

Authenticated
Bateman, John A./Janina Wildfeuer (2014b): Defining units of analysis for the systematic analysis of
comics. A discourse-based approach. In: Studies in Comics 5 (2), 371–401.
Bhatia, Vijay K. (1993): Analysing Genre.Language Use in Professional Settings. Harlow, U.K.
Björkvall, Anders (2012): Multimodality. In: Jan-Ola Östmann/Jeff Verschueren (eds.): Handbook of
Pragmatics. Amsterdam, 1–20.
Björkvall, Anders/Anna-Malin Karlsson (2011): The materiality of discourses and the semiotics of
materials. A social perspective on the meaning potentials of written texts and furniture. In:
Semiotica 187 (1/4), 141–165.
Bucher, Hans-Jürgen (2011): Multimodales Verstehen oder Rezeption als Interaktion. Theoretische
und empirische Grundlagen einer systematischen Analyse der Multimodalität. In: Hans-Joachim
Diekmannshenke/Michael Klemm/Hartmut Stöckl (Hg.): Bildlinguistik. Theorien – Methoden –
Fallbeispiele. Berlin, 123–156.
Clark, Austen (2011): Cross modal links and selective attention. In: Fiona MacPherson (ed.): The
Senses. Classic and Contemporary Philosophical Perspectives. Oxford/New York, 375–395.
Cohn, Neil (2013a): Beyond speech balloons and thought bubbles. The integration of text and image.
In: Semiotica 197, 35–63.
Cohn, Neil (2013b): Visual narrative structure. In: Cognitive Science 37 (3), 413–452.
Dölling, Evelyn (2001): Multimediale Texte. Multimodalität und Multicodalität. In: Ernest W.B.
Hess-Lüttich (Hg.): Medien, Texte und Maschinene. Wiesbaden, 35–50.
Dürscheid, Christa (2005): Medien, Kommunikationsformen, kommunikative Gattungen. In:
Linguistik online 22 (1), www.linguistik-online.de/22_05/duerscheid.html.
Elleström, Lars (2014): Media Transformation. The Transfer of Media Characteristics Among Media.
Basingstoke.
Fauconnier, Gilles (1997): Mappings in Thought and Language. Cambridge.
Forceville, Charles J. (1996): Pictorial Metaphor in Advertising. London.
Forceville, Charles J. (2007): Book Review. Multimodal transcription and text analysis. A multimedia
toolkit and coursebook by Anthony Baldry and Paul J. Thibault. In: Journal of Pragmatics 39 (6),
1235–1238.
Fricke, Ellen (2013): Towards a unified grammar of gesture and speech. A multimodal approach.
In: Cornelia Müller et al. (eds.): Body – Language – Communication/Körper – Sprache –
Kommunikation. Berlin/New York (Handbücher zur Sprach- und Kommunikationswissenschaft/
Handbooks of Linguistics and Communication Science (HSK) 38/1), 733–754.
Goodman, Nelson (1969): Languages of Art. An Approach to a Theory of Symbols. London.
Halliday, Michael A. K. (1978): Language as Social Semiotic. London.
Halliday, Michael A. K. (1994): An Introduction to Functional Grammar. 2nd. ed. London.
Hanks, Fletcher (1941): The Stardust Sixth Column: The World Invaders. Fantastic Comics (1939
series) #15 (February 1941), 31–39. New York.
Hiraga, Masako K. (1994): Diagrams and metaphors. Iconic aspects in language. In: Journal of
Pragmatics 22, 5–21.
Hjelmslev, Louis ([1943] 1961): Prolegomena to a Theory of Language. Madison, Wisconsin.
Translated by F.J.Whitfield.
Holly, Werner (2009): Der Wort-Bild-Reißverschluss. Über die performative Dynamik der
audiovisuellen Transkriptivität. In: Helmuth Feilke/Angelika Linke (Hg.): Oberfläche und
Performanz. Tübingen, 93–110.
Holly, Werner (2011): Bildüberschreibungen. Wie Sprechtexte Nachrichtenfilme lesbar machen.
In: Hans-Joachim Diekmannshenke/Michael Klemm/Hartmut Stöckl (Hg.): Bildlinguistik.
Theorien – Methoden – Fallbeispiele. Berlin, 235–256.
Jakobson, Roman (1965): Quest for the essence of language. In: Diogenes 13, 21–37.

Authenticated
72 John A. Bateman
Kamp, Hans (1981): A theory of truth and semantic representation. In: Jeroen A.G.
Groenendijk/T.M.V. Janssen/Martin B.J. Stokhof (eds.): Formal Methods in the Study of
Language (Mathematical Centre Tracts Vol. 136). Amsterdam, 277–322.
Kesselheim, Wolfgang (2011): Sprachliche Oberflächen. Musterhinweise. In: Stephan
Habscheid (Hg.): Textsorten, Handlungsmuster, Oberflächen. Linguistische Typologien der
Kommunikation. Berlin/New York, 337–366.
Klug, Nina-Maria/Hartmut Stöckl (2015): Sprache im multimodalen Kontext. In: Ekkehard
Felder/Andreas Gardt (Hg.): Handbuch Sprache und Wissen. Berlin/Boston (Handbücher
Sprachwissen – HSW 1), 242–264.
Kluss, Thorsten et al. (2012): Investigating the in-between. Multisensory integration of auditory and
visual motion streams. In: Seeing and Perceiving 25 (1), 45–69.
Koch, Walter A. (1971): Varia Semiotica. Hildesheim.
Krämer, Sybille (2006): Die Schrift als Hybrid aus Sprache und Bild. Thesen über die
Schriftbildlichkeit unter Berücksichtung von Diagrammatik und Kartographie. In: Torsten
Hoffmann/Gabriele Rippl (Hg.): Bilder. Ein (neues) Leitmedium?, Göttingen, 79–92.
Kress, Gunther (2014): What is mode? In: Carey Jewitt (ed.): The Routledge Handbook of Multimodal
Analysis, 2nd. ed. London, 60–75.
Kress, Gunther et al. (2000): Multimodal Teaching and Learning. London.
Kress, Gunther/Theo van Leeuwen (2001): Multimodal Discourse. The Modes and Media of
Contemporary Communication. London.
Kress, Gunther/Theo van Leeuwen (2006 [1996]): Reading Images. The Grammar of Visual Design.
London/New York.
Kutz, Oliver et al. (2014): E pluribus unum. Formalisation, use-cases, and computational support
for conceptual blending. In: Tarek R. Besold/Marco Schorlemmer/Alan Smaill (eds.):
Computational Creativity Research. Towards Creative Machines (Atlantis Thinking Machines 7),
167–196.
Lemke, Jay L. (1998): Multiplying meaning. Visual and verbal semiotics in scientific text. In: J.R.
Martin/Robert Veel (eds.): Reading Science. Critical and Functional Perspectives on Discourses
of Science. London, 87–113.
Lemke, Jay L. (1999): Typology, Topology, Topography. Genre Semantics. MS University of Michigan.
http://www-personal.umich.edu/~jaylemke/papers/Genre-topology-revised.htm.
Lemke, Jay L. (2005): Multimedia genre and traversals. In: Folia Linguistica XXXIX (1–2), 45–56.
Leeuwen, Theo van (2005): Multimodality, genre and design. In: Sigrid Norris/Rodney Jones (eds.):
Discourse in Action – Introducing Mediated Discourse Analysis. London, 73–94.
Liu, Yu/Kay L. O’Halloran (2009): Intersemiotic texture. Analyzing cohesive devices between
language and images. In: Social Semiotics 19 (4), 367–388.
Machin, David (2014): Multimodality and theories of the visual. In: Carey Jewitt (ed.): The Routledge
Handbook of Multimodal Analysis. 2nd. ed. London, 217–226.
Mann, William C./Sandra A. Thompson (1988): Rhetorical structure theory. Toward a functional
theory of text organization. In: Text 8 (3), 243–281.
Martin, James R. (1992): English Text. Systems and Structure. Amsterdam.
Martin, James R./David Rose (2008): Genre Relations. Mapping Culture. London/New York.
Martinec, Radan/Anthony Salway (2005): A system for image-text relations in new (and old) media.
In: Visual Communication 4 (3), 337–371.
Matthen, Mohan (2005): Seeing, Doing, and Knowing. A Philosophical Theory of Sense Perception.
Oxford.
McCloud, Scott (1994): Understanding Comics. The Invisible Art. New York.
McGurk, Harry/John MacDonald (1976): Hearing lips and seeing voices. In: Nature 264 (5588),
746–748.

Authenticated
Miller, Carolyn R. (1984): Genre as social action. In: Quarterly Journal of Speech 70, 151–167.
Mitchell, W.J.T. (2005): There are no visual media. In: Journal of Visual Culture 4 (2), 257–266.
Newall, Michael (2003): A restriction for pictures and some consequences for a theory of depiction.
In: The Journal of Aesthetics and Art Criticism 61 (4), 381–394.
Peirce, Charles Sanders (1931–1958): Collected Papers of Charles Sanders Peirce. Cambridge, MA.
Posner, Roland (1986): Zur Systematik der Beschreibung verbaler und nonverbaler Kommunikation.
Semiotik als Propädeutik der Medienanalyse. In: Hans-Georg Bosshardt (Hg.): Perspektiven
auf Sprache. Interdisziplinäre Beiträge zum Gedenken an Hans Hörmann. Berlin/New York,
267–313.
Sachs-Hombach, Klaus (2003): Das Bild als kommunikatives Medium. Elemente einer allgemeinen
Bildwissenschaft. Köln Univ., Habil.-Schr.–Magdeburg.
Saussure, Ferdinand de ([1915] 1959): Course in General Linguistics. London. Edited by Charles Bally
and Albert Sechehaye. Translated by Wade Baskin.
Schneider, Jan Georg/Hartmut Stöckl (Hg.) (2011): Medientheorie und Multimodalität. Ein
TV-Werbespot – Sieben methodische Beschreibungsansätze. Köln.
Schumacher, Peter (2013): A pattern language for pictorial assembly instructions (PAIs). In:
Information Design Journal 20 (2), 111–135.
Seeley, William P. (2012): Hearing how smooth it looks. Selective attention and crossmodal
perception in the arts. In: Essays in Philosophy 13 (2), 498–517 http://commons.pacificu.edu/
cgi/viewcontent.cgi?article=1434&context=eip. Special issue: Aesthetics and the Senses
edited by Cynthia Freeland.
Sobchack, Vivian (2004): Carnal thoughts. Embodiment and moving image culture chap. In: What
my Fingers Knew. The Cinesthetic Subject, or Vision in the Flesh, 53–84. Berkeley/Los Angeles/
London.
Stöckl, Hartmut (2004): In between modes. Language and image in printed media. In: Eija Ventola/
Cassily Charles/Martin Kaltenbacher (eds.): Perspectives on Multimodality. Amsterdam,
9–30.
Stöckl, Hartmut (2006): Zeichen, Text und Sinn – Theorie und Praxis der multimodalen Textanalyse.
In: Eva Martha Eckkrammer/Gudrun Held (Hg.): Textsemiotik. Studien zu multimodalen Texten,
Frankfurt a. M., 11–36.
Stöckl, Hartmut (2014): Semiotic paradigms and multimodality. In: Carey Jewitt (ed.): The Routledge
Handbook of Multimodal Analysis. 2nd. ed. London, 274–286.
Svoboda, K./C.F. Schmidt/B.J. Schnapp/S.M. Block (1993): Direct observation of kinesin stepping by
optical trapping interferometry. In: Nature 365 (6448), 721–727.
Swales, John M. (1990): Genre Analysis. English in Academic and Research Settings. Cambridge.
Todorov, Tzvetan (1990): Genres in Discourse. Cambridge.
Tseng, Chiaoi/John A. Bateman (2012): Multimodal narrative construction in Christopher Nolan’s
Memento. A description of method. In: Journal of Visual Communication 11 (1), 91–119.
Weidenmann, Bernd (1995): Multicodierung und Multimodalität im Lernprozess. In: Ludwig J.
Issing/P. Klimsa (Hg.): Information und Lernen mit Multimedia. Weinheim, 65–84.
Wildfeuer, Janina (2012): Intersemiosis in film. Towards a new organisation of semiotic resources in
multimodal filmic text. In: Multimodal Communication 1 (3), 233–304.
Wildfeuer, Janina (2013a): Formale Zugänge zur Diskursanalyse. In: Zeitschrift für Semiotik 35 (3–4),
393–417.
Wildfeuer, Janina (2013b): Trompeten, Fanfaren und orangefarbene Tage. Zur Intersemiose in Die
fabelhafte Welt der Amélie. In: Lars C. Grabbe/Patrick Rupert-Kruse/Norbert M. Schmitz (Hg.):
Multimodale Bilder. Zur synkretistischen Struktur des Filmischen. Darmstadt, 81–101.

Authenticated
74 John A. Bateman
Winkler, Hartmut (2008): Zeichenmaschinen: oder warum die semiotische Dimension für eine
Definition der Medien unerlässlich ist. In: Stefan Münker/Alexander Roesler (Hg.): Was ist ein
Medium? Frankfurt a. M., 211–222.
Żebrowska, Ewa (2014): Multimodal messages. In: Journal of Multimodal Communication Studies 1,
8–15.

Authenticated

Methodological and Theoretical Issues in Multimodality

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Methodological and Theoretical Issues in Multimodality

Uploaded by

Copyright:

Available Formats

John A.

1 Introduction: The Need for Method

1 Introduction: The Need for Method

2 Defining Semiotic Modes

2.1 Ontological Foundations for Semiotic Modalities

In any foundational discussion of semiotic modes, it is important to emphasise the

2.2 Shaping Material Articulations

often helpful to apply a topology over such organizations characterized in terms of

2.3 Using Material Articulations

graph, etc. is primarily a discourse decision. Discourse semantics in general offer

2.4 Semiotic Modes Defined: Ramifications and Consequences

Second, a mid-level, ‘mediating’ stratum provides more (i.e., grammar-like) or less

The stratified model as a whole is depicted graphically in figure 1: working ‘upwards’

reasoning at work, we know we have to consider locating those phenomena at least at

to empirical investigation, for example by exploring the allocation of attention within

3 Semiotic Modes and ‘Text Dynamics’

A further consequence of this has been criticized at length by Bucher (2011).

Fig. 2: A constructed image combining several potential sources of semiotic interpretation

as static or pre-given inter-relationships across codes or semiotic resources. The

4 Media and Genres

4.1 Media and Communicative Forms

medium within which the artefact or performance is couched. Medium-specificity as

Fig. 3: Relation between semiotic modes and media

Whereas a collection of modes may regularly be mobilised within a medium, just

available as an effective way of characterizing the discourses constituting a society or

Fig. 4: Relation between semiotic modes, media and genres

5 The Multimodal Description of Text and Image

Martinec and Salway’s classification is indeed very general as it draws primarily on

5.1 Visual Procedural Instructions

classification appears to do rather little work as far as explaining the construction of

5.2 Scientific Visualizations and Graphs

5.3 Visual Narrative: Comics Panels

going to be many, ‘smaller’ semiotic modes operating in a semiotic artefact or per-

You might also like

1 Introduction: The Need for Method

2 Defining Semiotic Modes

2.1 Ontological Foundations for Semiotic Modalities

2.2 Shaping Material Articulations

2.3 Using Material Articulations

2.4 Semiotic Modes Defined: Ramifications and Consequences

3 Semiotic Modes and ‘Text Dynamics’

4 Media and Genres

4.1 Media and Communicative Forms

5 The Multimodal Description of Text and Image

5.1 Visual Procedural Instructions

5.2 Scientific Visualizations and Graphs

5.3 Visual Narrative: Comics Panels