You are on page 1of 30

PALM

Plateforme danalyse linguistique mdivale


Version 0.1

User Manual
February 2013

PALM
Plateforme danalyse linguistique mdivale
Version 0.1

User Manual
November 2013

Contents
1.

Why PALM?
1.1
1.2
1.3
1.4
1.5
1.6
1.7

2.

MEDITEXT: PALMs internal Library


2.1
2.2

2.3
2.4
3.

What does PALM do?


Why would you want to do this?
How does PALM lemmatise?
Why lemmatise?
What texts can be treated by PALM?
What is MEDITEXT?
How do you log in to PALM?

Browsing the Library


Finding out more about a text from the Library
2.2.1 Title
2.2.2 Lemmatised?
2.2.3 Period
2.2.4 Access level
2.2.5 Main language
2.2.6 Country of Origin
2.2.7 Author
2.2.8 Edition
2.2.9 Digitised by
2.2.10 Date
2.2.11 Notes
2.2.12 Add the text
Viewing a Library Text
Adding a Library text to your Workspace

Managing Your Workspace


2

3.1
3.2

3.3

4.

Adding Texts to Your Workspace


Uploading a New Text
3.2.1 Verse/prose
3.2.2 Field/champ and Text type/type de texte
3.2.3 Title
3.2.4 Main Language
3.2.5 Country of Origin
3.2.6 Period
3.2.7 Access level
3.2.8 Text type
3.2.9 Author
3.2.10 Edition
3.2.11 Language
3.2.12 Digitised by
3.2.13 Date
3.2.14 Notes
3.2.15 Text
Managing Texts in your Workspace
3.3.2 Text details
3.3.3 View the Text
3.3.4 Modify the Text
3.3.5 Delete the Text
3.3.6 Add to the Library

Lemmatising a Text
4.1
4.2
4.3
4.4
4.5
4.6

4.7

The Morphosyntactic Tagging Page


The Annotator
Correcting a text in the Annotator
Correcting an annotation
Correcting all the instances of a form
Definition of lemma within PALM
4.6.1 Latin Lemma
4.6.2 Middle French Lemma
4.6.3 Middle English Lemma
4.6.4 Note on word division
Definition of Part of Speech within PALM
4.7.1 Parts of speech in Latin
4.7.2 Parts of speech in Middle French
4.7.3 Parts of speech in Middle English
3

4.8
4.9
5.

Export
5.1
5.2

6.

Add a new user


User Account Management
Manage the Library

Digital linguistic resources provided by PALM


8.1
8.2
8.3
8.4
8.5
8.6
8.7

Changing User Details


Changing Your Password

Administering PALM and MEDITEXT


7.1
7.2
7.3

8.

Exporting a Corpus
Note on Export Formats
5.2.1 TXM

Managing Your Account


6.1
6.2

7.

4.7.4 Note on Named Entities and Proper Nouns


Navigating through the text in the Annotator
Annotating a Corpus

Electronic Lemma-Form Dictionaries


Taggers
Rules
Development and Performance of PALM : Latin
Development and Performance of PALM : Middle French
Development and Performance of PALM : Middle English
Further technical remarks on the operation of PALM

PALM-MEDITEXT: List of texts

PALM
Plateforme danalyse linguistique mdivale
Version 0.1

User Manual
1.

Why PALM?

1.1

What does PALM do?

PALM is an online platform which pre-treats medieval texts so that they can be analysed
using software designed for the statistical and semantic analysis of texts in modern
languages (often called textomtrie).

Although PALM includes a digitised library of medieval texts called MEDITEXT (see below 1.6
What is MEDITEXT?), this is provided to enable the user compile text corpora for statistical
and semantic analysis. It emphatically does not offer online editions of these texts, many of
which are in a rough digitised form. Users who wish to consult editions of scholarly quality
should consult the most recently available edition for purposes of citation.

Specifically, PALM provides facilities for the computer-aided annotation of text corpora by
lemma (that is, the standard form of a word as it appears in a dictionary) and by part of
speech.
PALM has been developed for use on texts in late medieval Latin, French and English of
northern French and English origin, but its architecture has been designed to permit the
annotation and the development of resources for texts in any medieval language.
Texts can be uploaded to PALM with little or no mark-up, or from minimally prepared XMLTEI or Word files. It can give output in a number of formats adapted for use with such
software packages as Hyperbase, Lexico 3, Tramer, Analyse and TXM.
1.2

Why would you want to do this?

PALMs intended users are historians, literary scholars or philosophers who would like to
make use of widely available computer tools for the statistical and semantic analysis of their

late medieval text corpora but have been prevented from doing so by the absence of
standard spelling and by the presence of non-standard vocabulary items in their texts.
For modern languages, many digital tools exist to assist the researcher in tasks ranging from
simple lexical tracking, for example through concordances, to the application of statistical
tools, from the identification of collocations and co-occurrences to sophisticated statistical
methods such as factorial analysis.
Without PALM, a researcher who wishes to apply these tools to late medieval texts must
first lemmatise his or her corpus manually: grouping together both variant spellings and all
inflected forms.
PALM greatly eases the task of lemmatisation, making it as automatic as possible, but also
providing facilities for the manual correction which is inevitably necessary for texts in these
three languages.
1.3

How does PALM lemmatise?

PALM lemmatises...
(1) by the application of the linguistic resources it contains : digital form-lemma dictionaries;
taggers trained on annotated corpora; rules programmed manually for each language.
(2) by providing a user-friendly environment in which the user can correct this annotation
and so create new linguistic resources.
Text corpora annotated in PALM can then be exported in a number of formats which can be
used by widely available text-analysis software designed for standardised modern languages,
such as TXM, Tramer, Analyse, Lexico 3 and Hyperbase.
For a technical discussion of how PALM lemmatises, see below, section 8 Digital linguistic
resources provided by PALM.
1.4

Why lemmatise?

Lemmatisation is useful even for texts in modern languages. It makes it possible to perform
statistical analyses and to follow usage of all the inflections of a verb, for example,
something which can be very important in inflected languages such as French.
Lemmatisation is even more important in treating medieval vernaculars, because of the
absence of standard spelling in these languages.
6

Lemmatisation makes it possible to group together all the variant spellings of a particular
lemma, and so perform statistical analyses and follow usage in a way which would be
impossible without it.
Even medieval Latin, where spelling variation is less marked, there are nonetheless a large
number of words, often imported from a vernacular language, which also vary in spelling,
particular in practical contexts close to contemporary legal or economic practice, for
example.
1.5

What texts can be treated by PALM?

Users can both import their own texts into PALM or make use of PALMs internal library of
texts: MEDITEXT.
1.6

What is MEDITEXT?

MEDITEXT is a corpus of texts first assembled under the direction of Jean-Philippe Genet and
Claude Gauvard between the 1970s and 2010. It was corrected and expanded as part of the
European Research Council project Signs and States between 2010 and 2014. It provides
the basis for PALMs internal Library.
MEDITEXT, and by consequence PALMs internal Library, contains essentially political texts,
by which we mean: texts which are associated with identified political events (speeches,
letters, treatises, poems, sermons, chronicles); texts which deal in general with good or bad
government; and a variety of texts addressed by the king to his subjects or by his subjects to
the king (proclamations, acts, cahiers de dolances, petitions, lettres de rmission).
For the moment, PALMs internal Library contains texts of English origin (in Middle English,
Middle French and Latin) and of (northern) French origin (in Middle French and Latin). We
would however like to include texts of different provenance in the future.
1.7

How do you log in to PALM?

PALM is accessed via the internet at the address <https://palm.tge-adonis.fr/PALM>.


In order to log in to PALM you will first need to apply for a username and password. You can
do this by sending an email to christopher.fletcher@univ-paris1.fr or by using the contact
form on the website.

2.

MEDITEXT: PALMs internal Library

When you first log on to PALM, you are presented with a short description of its internal
Library MEDITEXT. If you want to return to this description, click on Library and
Presentation. To access Mditext: Go to the menu Library and click on Consult the
Library.
Note, however, that PALM does not aim to provide digital editions of the texts it contains. Its
aim is to permit the user to create a corpus of texts, which can then be pre-treated, before
being exported for treatment by software designed for use with modern languages.
2.1

Browsing the Library

There are over 900 text files in PALMs Library. For a complete list of texts, see annex 2 to
this manual. You can browse by short title, by language, by country of origin or by period
(divided into half centuries). If you already know the file code of a particular text in PALM,
you can browse using code.
We also intend to provide a search engine to explore the Library, but this option is not yet
active.
2.2

Finding out more about a text from the Library

To find out more about a text, right click on it, and select Details. The Text Details screen
then appears, providing basic information about the text.
2.2.1 Short Title
The first field is a short title in a standard format to aid ease of identification. Note that this
is not the title of a document in a strict sense, but more a short name (including the author
name) to enable the quick location of a text in the Library. For a more precise scholarly
identification of the text, see the field Edition below.
The default language for a short title is French, except when it is widely known under a name
in a different language. If this title is an editorial convention only, for texts in Latin and
French, a French translation is suggested in brackets. For authors widely known in France,
names are given in French. Alternative names of the author in English or Latin are supplied
where appropriate in the field Author below.

Note that long texts will be split up amongst several shorter files. Where possible, this
follows the editorial or authorial subdivisions of the text, but sometimes the division is
necessarily arbitrary.
Typical short titles include:
Magna Carta
Gille de Rome, De Regimine Principum, pt. II, bk. 2
Ranulph Higden, Polychronicon, vol. viii, p. 50-100.
Against the Kings Taxes
Acceptation par Richard dYork du titre de Protecteur, 17 nov. 1455
More details about this standard form are given later in the manual under Uploading a New
Text: Short Title (section 3.2.2).
2.2.2 Lemmatised?
This field marks whether the text in the Library has already been lemmatised or not.
2.2.3 Period
The period field provide an approximate dating of a text, to make it easy for the user to find
texts of around the same date. Each text is assigned to a half-century period. Where the
period of composition is only known approximately, or when the composition took place
over a number of years, the most probable or most significant period is selected, or if this is
not known the earliest relevant period. So, if you are looking for texts over a certain number
of years, it would be wise to search the period just before and just after the one you are
looking for.
Note that no extensive verification has been undertaken for the dating of texts. Unless
otherwise noted, the date used in the edition has been accepted.
2.2.4 Access level
PALM contains texts of three levels of access : (3) User, which can be seen and used by
anybody ; (2) Expert, which can only be seen by advanced users of PALM ; and (1)
Administrator, which can only be seen by the system administrator.
For the point of view of a simple user, only level 3 texts will appear in the library. You can
however set the access level of your own texts to Expert or Administrator to restrict
access of other users of the system (see below, section 3.2, Uploading a New Text).
9

2.2.5 Main language


This is the language of the majority of the text, since medieval texts often contain phrases,
sentences or entire passages in different languages, and in extreme cases can be written in
several languages at once.
2.2.6 Country of Origin
A marker of origin as far as possible appropriate to the period of composition, in French.
France and Angleterre for the kingdoms of the later middle ages where most of the texts
in the Library were written.
2.2.7 Author
This field identifies alternative versions of an authors name, especially where he is known in
several languages, or second and third authors.
2.2.8 Edition
This field is designed to enable the user to identify and locate the edition or other source
used for the digitisation, including manuscripts. French citation standards are followed,
although the title and the authors name are cited in the same language as in the edition.
Normally only the place of publication is given, except where that could help to identify the
precise edition.
For examples of PALMs citation style, see below section 3.2.10 (Uploading a New Text :
Edition)
2.2.9 Digitised by
The name of the person or the people who digitised, corrected and uploaded the text onto
PALM.
2.2.10 Date
If the text is dated, the date will be marked. For the form used, see below section 3.2.13.
2.2.11 Notes

10

This field is provided for technical notes or comments when the other fields prove
insufficient (for example, on the provenance of the text, its dating, or the manuscript from
which the edition is drawn).
2.2.12 Add the text
After the details of the text there is a button Add the text which enables you to transfer it
to your workspace for further treatment, for example as part of a corpus for lemmatisation
and export (see further below, section 4).
2.3

Viewing a Library Text

You can access the text itself whilst browsing the library. Right click on the text title, and
select View the text. You can then browse through the text by short extracts. The purpose
of this facility is not to provide a digital edition of the text, but to enable you to examine it
before selecting it for pre-treatment and export.
2.4

Adding a Library Text to Your Workspace

Once you have examined the details of the text and/or viewed the text itself, you may then
wish to add it to your workspace for further treatment and (ultimately) for export. There are
a number of ways to add a Library text to your workspace. You can either, from the browsing
screen (Library Consult the Library) right click on the text and then select Add text to your
workspace. Or you can click the button Add text at the bottom of the Details or View the
text pages.
Once you click on Add text, it is transferred to your workspace for further treatment, in the
same way as if you had uploaded the text yourself.

3.

Managing Your Workspace

As well as preparing corpora of medieval texts for use with software designed for modern
languages, PALM provides a system of corpus management which enables you to assemble a
corpus ready for export.
To access this system, select the option Manage your Workspace from the menu
Workspace. If you have not yet added a text to your workspace, for example from PALMs
internal Library, it will be empty.
Once you have finished assembling a corpus in your Workspace, you can then lemmatise it
using the menu Morphosyntactic Tagging (section 4) before passing to Export (section 5).
11

3.1

Adding Texts to Your Workspace

You can start create a corpus either by adding texts to your workspace from PALMs library
(as described above, Adding a Library text to your Workspace, section 2.4), or by uploading
your own texts.
3.2

Uploading a New Text

To upload one of your own texts to your Workspace within PALM, select the option Add a
text from the menu Workspace. You will then be presented with a form which asks you to
specify certain details for your text for ease of retrieval. The following fields can be
completed (compulsory fields are marked with an asterisk *).
3.2.1 Verse/prose*
Is the text in verse (vers) or prose (prose) ? If the text is a mixture of both, select the form in
which the majority of the text is presented.
3.2.2 Field/champ and Text type/type de texte*
These two entries serve to identify the general nature of the text.
The field of the text refers to the socio-literary context of its production, as discussed in
Annexe 1.
The text type is a less rigorous system of classification than field , aimed at helping the
user find texts of a particular type (act or letter, sermon, political poem etc.). Text type has
no pretention to offer a universal system of genre, and the options offered are derived from
the nature of the kind of corpus of political texts for which PALM was designed.
Both of field and text type may seem rather subjective. One might argue whether Saint
Augustines De Ciuitate Dei contra Paganos is a religious or a political text, for example. It is
hoped that each user will make their best judgement, aiming to help future users group
similar texts together.
3.2.3 Title*
This should be a short title which enables the user to find a text quickly. The default
language is French, except when the text is widely known under a name in a different

12

language. If this name is an editorial convention only, and a French translation can be
suggested in brackets. For authors widely known in France, names are given in French.
Note that long texts will be split amongst several shorter files. Where possible, this follows
the editorial or authorial subdivisions of the text, but sometimes the division is necessarily
arbitrary.
Some examples :
Tractatus de regimine principum ad regem Henricum Sextum
On the Times [Sur les maux du temps]
Deux pomes sur la mort de Piers Gaveston
Adam Orleton, Apologia (1/2)
Augustin d'Hippone, De Ciuitate Dei contra Paganos, Liber XV
John Russell, Sermon "In corpore multa quidem sunt membra...", 1484
John Kemp, Discours douverture du Parlement, nov. 1450
3.2.4 Main language*
If your text contains more than one language, you can add extra languages further down the
form. Identify the main language of the text in this box.
3.2.5 Country of Origin*
Where possible, identify the country or region of origin at the moment of composition.
3.2.6 Period*
A general marker of the time the text was composed, in half centuries. If your text is not
dated precisely, choose either the most likely half-century, or the earliest. If your text was
composed over a number of years, choose the earliest period.
3.2.7 Access level*
All texts in PALM are assigned a level of access which will apply if the text is included in the
Library. Level 3 (User) denotes general access: all users can read it. Level 2 (Expert) texts
can only be read (and seen) by accredited experts. Only PALMs administrator (and
yourself, whilst it is in your corpus) will be able to read level 1 (Administrator) texts.
3.2.8 Text type

13

A further opportunity to specify the nature of your text, in an open field rather than a preset menu.
3.2.9 Author
The known or deduced author of the text. Click on + to add more than one author, or
where the author is known under different names in different languages (Latin, French,
English...).
For literary texts or for those where one could expect to have an author, but where none is
known, you can specify that such a text is Anonyme. This is not necessary for texts created
by institutions where questions of authorship are less helpful. In this case, you should just
leave the box blank.
We understand this is not the normal diplomatic practice, which tends to identify the author
as the person in whose name a document is issued, but for historical reasons we prefer to
avoid what for our texts is often a misleading identification (King John as the author of
Magna Carta, Henry III as the author of the declarations of his baronial opponents, etc.)
3.2.10 Edition*
Please fill in this box. It serves, as it were, as a footnote, enabling the user to identify and
locate the edition used. French citation standards are followed, although the title and the
authors name are cited in the same language as in the edition. Normally only the place of
publication needs to be supplied, except where the name of the publisher is necessary to
identify the precise edition.
If the title of a short text or poem in a larger edition is already given in the short title, there is
no need to repeat it here, although page references should be supplied.
Some examples (as for Details in PALMs Library):
Aegidius Romanus, De Regimine Principum, Rome, 1607.
Rotuli Parliamentorum, d. J. Strachey et al., Londres, 1767-77, vol. V, p. 16-17.
The political songs of England : from the reign of John to Edward II, d. et trad. Th. Wright,
Londres, 1839, p. 258-261.
Ptolomaeus lucensis [Bartholomeo Fiadoni], De Regimine Principum, dans Thomas Aquinas,
Opuscula philosophica, d. R.M. Spiazzi , Turin, 1954, p. 280-358.
Londres, British Library, Royal MS 8.B.xxiii, ff. 9-10v.
Lille, Archives du Nord, B 517/11679.
Corpus Thomisticum <http://www.corpusthomisticum.org>.
14

Click on + to identify multiple editions.


3.2.11 Language
An opportunity to identify second or third languages which appear in the text. Click on + to
add more languages.
3.2.12 Digitised by
Please enter your name here and the names of those involved in the digitisation of the text.
Use + to add additional names.
3.2.13 Date
An opportunity to identify a date more precisely. The following standards should be
followed:
1467
[1467]
[?1467]
[1215-1258]
[avant 1327]
[aprs 1292]
[c. 1340]

A text self-dated to 1467


A text which we can deduce was composed in 1467
A text which may have been composed in 1467
A text which was composed during the period 1215-1258.
A text composed before 1327
A text composed after 1292
Around (circa) 1340

3.2.14 Notes
An opportunity to add extra details: technical notes, for example, or notes on the particular
nature of a complex edition or manuscript.
3.2.15 Text
Cut and paste your text into this box.
Texts should be inserted with no annotation except pagination. Pagination can be inserted
using : either the style <p=1>, <p=2> etc.; or the style <p=1|25> where 25 refers to the page
number in the edition.
You must insert at least <p=1> at the start of the text for it to be uploaded correctly.

15

Matter inserted between square brackets [ ] will not be taken into account in statistical
analysis. Square brackets can thus be used to insert comments within the text
When you have completed the form and cut and pasted the text click on the button Upload
and the text will be uploaded to your Workspace. A message will appear when the text has
been correctly uploaded.
3.3

Uploading a Text Directly

Texts can be uploaded directly in a plain text TXT format. To select this option click on File
at the top of the menu Add a text.
3.4

Managing Texts in your Workspace

You can use your Workspace to check the details of your text by clicking on Workspace
Manage your Workspace. The Workspace screen allows you to perform a number of
actions on texts which you have selected.
3.4.2 Text details
To see the details of a text in your Workspace, right click and select Display details.
3.4.3 View the Text
A preview of the text can be displayed, as in PALMs Library. Right click on the text and select
View the text.
3.4.4 Modify the Text
If you detect errors in either the Details of the text or in the text itself, you can correct this
by right clicking on the text in your Workspace, then selecting Modify the text.
This sends you back to a form similar to that provided to upload a text to your corpus. With
this form you can change the details of a text, or even the text itself, before uploading it
once more to your Workspace.
3.4.5 Delete the Text
If you right click on a text in your Workspace and select Delete the text, it will be deleted
from your Workspace. Make sure you are sure you want to delete a text. Once it is deleted,
there is no way to retrieve it!
16

3.4.6 Add to the Library


If you are an accredited Expert or Administrator, you can transfer new texts from your
Workspace to PALMs Library by right clicking on the text in your Workspace and selecting
Add to the Library.
If you have the access level User, you can submit a text for consideration for inclusion in
PALMs Library by using this same button. After vetting, it may then be included in the
Library.

4.

Lemmatising a Text

PALMs primary function is to tag texts by lemma and part of speech so that they can then
be analysed by software designed for modern languages.
To perform this operation, you should first transfer the texts that interest you into your
Workspace, then go to the menu Linguistic Analysis Morphosyntactic Tagging.
In the future, it is our intention to provide tools to identify collocations and to tag named
entities. For the moment, however, the options Collocations and Named entities on the
Linguistic Analysis menu are not yet active.
4.1

The Morphosyntactic Tagging Page

The Morphosyntactic tagging page takes the same form as Manage your workspace. It lists
the texts in your workspace by code, title, language, country of origin and period.
To lemmatise a text, right click on its title and select Morphosyntactic tagging. There may
be a short pause after which you will see the message The analysis is in progress... Please
wait... This operation may take a few minutes... You will then be transferred to PALMs
Annotator
4.2

The Annotator

17

When you first select a text for morposyntactic tagging, PALM applies a number of digital
linguistic resources (form-lemma dictionaries; probablistic taggers trained on annotated
corpora; and manually written rules) in order to identify the lemma and part of speech of
each word (token) in the text. (For a detailed description of these resources, see section 8
Digital linguistic resources provided by PALM).
These tools have been developed for use with late medieval texts (roughly, from the mid
thirteenth to the early sixteenth century) in Latin and French of northern French origin, and
in Latin, French or English of English origin. The efficiency of these tools increases when the
new text being analysed is similar to the corpus with which they were developed.
The Annotator enables you to assess the efficiency of PALMs automatic lemmatisation and
to correct it where necessary.
Each word in the text is initially marked in one of three colours. For PALMs default theme
these are green, yellow and red.
Where a word is marked in green, PALM believes it has correctly identified its lemma and
part of speech. To make sure that this is correct, you can move the pointer over the word.
The lemma and part of speech assigned will appear in a small window.
When a word is marked in yellow, PALM has identified a number of possible lemma and
would like the user to make a manual choice.

18

When a word is marked in red, PALM has not succeeded in identifying a lemma for this
word.
4.3

Correcting a text in the Annotator

It is often at the moment of annotation that the user notices errors or inconsistencies in the
text. To correct a word incorrectly entered, right click on the word. You can then choose
Modify to correct a form. To add a word which has been omitted, right click on the word
next to it, and choose Add a new word. To delete a word, right click, and choose Delete.
4.4

Correcting an annotation

In order to correct an annotation, left click on the word. A pop-up menu will appear, which
can be used to correct the lemma and part of speech, or identify a word in a foreign
language.

<Image to correct when concordance and correct all added>


If you click on submit, the word selected will be corrected. Corrected forms are coloured
white in the annotator.
4.5

Correcting all the instances of a form

It is also possible to correct all the instances of this form within the text. First click on
Launch the concordances. This will present a concordance of all the instances of this form
within the text. If you are convinced that this form always corresponds to the same lemma
click on Correct all, and every instance of this form will be corrected in the same manner.
4.6

Correcting from the frequency list

The same operation can also be carried out from the frequency list on the left of the
annotator. Click on a form in this list. You can then launch concordances, and, if you decide
this is appropriate, annotate all the instances of this form in the same way.
19

4.7

Definition of lemma within PALM

The lemma is often defined as the canonical form of a word as it occurs in the dictionary.
Unfortunately, for late medieval Latin, Middle French or Middle English, there is no single
dictionary which can be used as an authority to define lemma.
The choice of lemma used within PALM, and which for the best results the user should also
follow, thus need to be explained.
4.7.1 Latin Lemma
The base list of lemma in PALM is derived from M. Goullet and M. Parisse, Lexique LatinFranais: Antiquit et Moyen ge, Paris: Picard, 2006.
Lemma are in standardised medieval spelling. Dipthongs are not present (e rather than ae
or oe). U and u are used in place of V or v. I or i rather than J and j. Note however
that these standard spelling apply only to lemma. The texts in PALM contain dipthongs, v
and j, and is adapted to deal with them.
Lemma have been sub-divided as little as possible on semantic grounds. On the other hand,
in a number of cases it has been necessary to subdivide lemma. In these cases the lemma
are numbered in the order they occur in Goullet and Parisse.
Where a lemma is not found in Goullet and Parisse, as can happen in Latin in practical
contexts in British sources (the English common law, for example), lemma have been taken
from the Dictionary of Medieval Latin from British Sources, ed. R.E. Latham et al., Oxford,
1975-. When lemma are not found in either of these authorities, it has occasionally been
necessary to propose a new lemma on the basis of the form attested.
4.7.2 Middle French Lemma
As far as possible the lemma usd by PALM correspond to those of the Dictionnaire du Moyen
Franais. This is available on line at <http://www.atilf.fr/dmf/>.
From the point of view of PALM, the DMF is an appropriate reference both because it covers
the period of our corpus, and because it distinguishes clearly between homonyms. It is
developed and maintained by the CNRS laboratory ATILF (UMR 7118) at the university de
Lorraine.

20

The DMF uses the modern French form of a lemma if it is still in use. If the lemma is no
longer used, the form selected by the DMF is that found in the Altfranzzisches Wrterbuch
of A. Tobler, E. Lommatzsch et al. On account of the large number of texts in our corpus of
English origin, lemma were also occasionally found which had no equivalent in the DMF. In
this case we have made use of the Anglo-Norman Dictionary, which can be consulted online
at <http://www.anglo-norman.net/>. For lemma found neither in the DMF (and so also
absent from the Tobler and Lommatzsch), nor in the AND, we have occasionally but rarely
had to supply lemma from Frdric Godefroys dictionary. In the very rare case that the
lemma has no equivalent in any of these works, we have supplied our own lemma on the
basis of the form attested.
4.7.3 Middle English Lemma
The lemma for Middle English are based on the resources created by the Linguistic Atlas of
Early Middle English at the University of Edinburgh (LAEME), augmented from texts in our
corpus and the citations given by the Middle English Dictionary (MED).
In general, we have followed LAEMEs practice of using lemma drawn from Modern English
wherever possible, either adopting theirs or attributing our own for words not attested in
their dictionary. These lemma have the advantage of being quickly identifiable and their use
greatly speeds up the process of annotation. Where such a lemma is not evident (for
instance if a word has fallen out of use or significantly shifted its meaning) we have used the
headwords given in the MED rather than following LAEMEs practice of attributing a lemma
in Old English, Scandinavian or French. While this dual system is not as etymologically
consistent as that of LAEME, it facilitates consultation of the MED and enables users to check
that they are using the appropriate lemma.
To distinguish between homographs which share the same part of speech, we have followed
LAEME in using a series of specifiers following the word {within braces}. This may be a brief
disambiguation of sense, as in:
present{time/space} N
present{gift}
N

MED present(e (n.(1))


MED present(e (n.(2))

Or it may reference the Old English origins of a word, as in:


lie{licgan}
lie{leogan}

to lie down
to tell an untruth

MED lien(v.(1))
MED lien(v.(2))

21

In either case, the correct choice should become apparent following a brief consultation of
the definitions and etymologies given in the MED. These annotations are as additional
identifying tags rather than definitions, and include figurative senses.
Specifiers have been mainly used for verbs, adjectives and nouns where there is a distinct
range of meaning that it is useful to disambiguate, or when words with identical orthography
have different origins. For adverbs, prepositions and other grammatical words, we have
extensively simplified LAEMEs system for ease and speed of tagging. Thus our tag on PREP,
for instance, includes
on{b} (belief) ; on{c} (condition) ; on{inv} (invocation); on{i} (illative case); on{m} (manner);
on{p} (place); on{re} (concerning); on{t} (time); on{u} (until)

4.7.4 Note on word division


In all of the languages treated in PALM, especially in Middle English and Middle French,
scribes and editors choose to divide words in a variety of different ways.
The choices we have made in annotating PALMs Library, and thus those likely to be
proposed automatically by PALM, reflect a desire to intervene as little as possible in
correcting scribal practice, or even editorial practice.
Thus composite words in manuscripts are not subdivided, with some discutable results for
lemmatisation. For example, in Middle French, the adjective tresredout, commonly left as
a unit in manuscripts and in editions, is not separated. Since only a single lemma can be
assigned to a single word unit, it must then be lemmatised as redout. This choice can
certainly be criticised, in the sense that an element of meaning inherent in the trs is lost.
On the other hand, both in Middle French and in Middle English, certain words, particularly
logical operators, are sometimes grouped in a single unit, sometimes divided by spacing.
Thus, for example, toutefois or nevertheless can equally be toute fois and never the
less. For the purposes of tagging, it has been necessary to analyse each word in such a
group separately. This is necessary since the annotated texts are used for training taggers.
Never the less must therefore be tagged adverb-determinant-adjective, rather than
adverb-adverb-adverb, since otherwise every occurrence of the, for example, could
potentially be tagged adverb.
To avoid the loss of grammatical information, however, we are currently developing a
separate system of tagging to account for these compound forms, and to enable such
adverbial phrases to be recognised in texts exported for textometric analysis.
22

For the moment, in Middle English, forms which are connected by a hyphen (e.g. where-for)
will be treated as a single element for the purposes of lemmatisation.
4.7

Definition of Part of Speech within PALM

The list of parts of speech proposed for each language (Latin, French, English) has been kept
as simple as possible in order to enable speedy annotation.
Within reason, the lists of part of speech have been composed so as to roughly correspond
between languages, for purposes of comparison, but without artificially imposing the
grammatical rules of one language on another.
4.7.1 Parts of speech in Latin
PALM proposes the following parts of speech for Latin:
Preposition
Conjunction of subordination
Conjunction of coordination
Interjection
Adjective
Punctuation
Pronoun
Proper Noun
Common Noun
Number
Verb
Adverb
4.7.2 Parts of speech in Middle French
PALM proposes the following parts of speech for Middle French:
Ordinal Number
Conjunction of subordination
Conjunction of coordination
Punctuation
Proper Noun
Pronoun
Common Noun
Cardinal Number
23

Preposition
Adjective
Interjection
Determinant
Verb
Adverb
4.7.3 Parts of speech in Middle English
Verbal noun
Ordinal Number
Cardinal Number
Verb
Adjective
Conjunction
Punctuation
Determinant
Interjection
Pronoun
Infinitive Marker
Common Noun
Adverb
Proper Noun
Preposition
Verb+Pronoun
4.7.4 Note on Named Entities and Proper Nouns
Simple, single-word proper nouns (John, Paris) are marked in PALM with the part of
speech proper noun. There are, however, a number of short phrases (Notre-Dame-deParis, St Albans Abbey, Stratford-atte-Bowe, Ashby-de-la-Zouche) which refer to a
named entity.
For the purpose of marking part of speech, the use of proper noun is kept to a minimum.
Notre-Dame-de-Paris is thus marked: (adjective)-(common noun)-(preposition)-(proper
noun). This choice was made notably because our annotated corpora are subsequently used
for the training of taggers. It is, however, our aim in the future to put in place a system of
tagging named entities in texts within PALM.
4.8

Navigating through the text in the Annotator

24

You can move through the text either by clicking forward and backward, or by choosing a
page to go to.
4.9

Annotating a Corpus

Once you have finished annotating a text, choose the menu Linguistic Analysis
Morphosyntactic Tagging to continue with the other texts in your corpus. You can then
annotate the other texts in your corpus, again by right-clicking and choosing
Morphosyntactic Tagging.
If for whatever reason you wish to delete your annotation of a text, right-click on the text
and choose Clear tagging. Make sure you really want to delete your tagging before
selecting this option. There is no way of recovering it!
Once you are satisfied with the annotation of your corpus, you can now proceed to the next
step: Export.

5.

Export

5.1

Exporting a Corpus

To export your corpus, choose the menu option Export Corpus.


The export screen permits you to export files from your Workspace, lemmatised or not.
First select the texts which interest you by dragging them from the box on the left (your
Workspace) to the box on the right (files to export). Once you have selected the texts to
export, click on Export Options at the bottom of the page.
You can choose the software package to which you would like to export: Lexico 3,
Hyperbase, Tramer or TXM. You can choose to create a plain text file by selecting Format
TXT.
From the Morphosyntactic menu you can choose whether the corpus exported will include
annotation by lemma and part of speech, by lemma only, or if it will figure no annotation.
You can then choose particular parts of speech which interest you, or choose to select all.
Click submit to confirm your choice.
Now click on the button Export to export your corpus. It will be downloaded to your
computer as a ZIP file.
25

You will need to extract the files from the ZIP and place them in a folder ready for use
according to the instructions of the software package (Analyse, Hyperbase, Lexico 3, Tramer,
TXM...) which you intend to use.
At this point, the ordinary user will have finished using PALM, leaving with a corpus of texts
ready for use with an external software package, annotated by lemma and part of speech.
5.2

Note on Export Formats

5.2.1 TXM
Files exported for use with TXM are in the format XML/w+CSV. You will need to select that
format when importing into TXM.
5.2.2 Lexico 3

<Details for all the export formats which a user will need to know>

6.

Managing Your Account

6.1

Changing User Details

To change or correct your name, email or role, go to the menu Workspace and choose My
Account. Click on the relevant field, correct it, then click on Okay.
6.2

Changing Your Password

To change your password, go to the menu Workspace and choose My Account. At the
bottom of the page, type in your old password, then your new password, then your new
password again to confirm it.

7.

Administering PALM and MEDITEXT

The menu Administration is provided for use by users with the access level Expert and
Administrator, that is those accredited to construct new digital linguistic resources for use
in PALM.
7.1

Add a new user


26

To add a new user to PALM, select the menu Administration Add a new user.
This page allows you to set up a new user with their name, email, access level , username
and password.
7.2

User Account Management

The administrator can modify account details by selecting the menu Administration User
Account Management. Click on a field to modify it, then click on save.
7.3

Manage the Library

A user with Administrator access can manage PALMs Library by selecting the menu
Administration Manage the Library.
Right-click on a text to consult its details (title, access level, lemmatised flag), view the text,
or delete it.
Texts in the Library cannot be changed directly by the Administrator. They must be
downloaded into your workspace, modified there, and then re-uploaded to the Library.

8.

Digital linguistic resources provided by PALM

When a text is first selected for Morphosyntactic tagging, PALM applies three types of
digital resources in order to produce the first, automatic lemmatisation and annotation by
part of speech which the user can correct using PALMs Annotator.
8.1

Electronic Lemma-Form Dictionaries

PALM first applies electronic dictionaries consisting of a list of forms and their associated
lemma.
<A SAMPLE for each language>
These dictionaries are applied to a text word by word, they do not take account of context.
As a result, they cannot distinguish between words with the same form but different lemma,
even in contexts where a human reader would find no ambiguity.
Consider, for example, these two phrases in modern English:

27

She saw him immediately.


She put the saw in the shed.
It is clear to human reader that the first saw is a verb with the lemma see, and that the
second is a noun with the lemma saw. A form-lemma dictionary alone, however, cannot
distinguish between these cases, since it cannot take account of context. The form saw
could refer either to the verb see or the noun saw.
For the application of digital form-lemma dictionaries and its limitations, see <REF>
The problem of ambiguity is considerably worsened in late medieval vernaculars with no
standard spelling, since the non-standard spelling greatly increases the number of forms
which can be linked to a particular lemma, and also the number of lemma which can be
linked to a particular form.
A number of computer methods exist to resolve such ambiguities. One is the training of
Taggers; another is the manual composition of Rules.
8.2

Taggers

Taggers (for example, Treetager) are computer applications which are first trained on a
corpus annotated by lemma and part of speech. The trained tagger can then be applied to an
unknown text in the same language as the corpus.
For a discussion of taggers, see <Something presenting how taggers work>
The Taggers used in PALM were trained using texts from Mditext annotated using PALMs
Annotator.
It is thus to be expected that PALM will be more effective on texts which are generically
close to those on which its taggers were developed (political texts, although from a wide
variety of genres) and from similar regional origins (northern France and England).
8.3

Rules

Even after being trained repeatedly on large corpora, taggers nonetheless often exhibit
consistent faults which are corrected by the application of rules.
<How this is done in PALM>
8.4

Development and Performance of PALM : Latin


28

- Composition of dictionaries: Procedure


-- Taggers
-- Number of entrainements
-- Rate of success. Right/wrong green. % yellow. % red.
-- Remarks: faults, in particular.
8.5

Development and Performance of PALM : Middle French

-- Composition of dictionaries: Procedure


-- Number of entrainements
-- Rate of success. Right/wrong green. % yellow. % red.
-- Remarks: faults, in particular.
-- Stats for both AN and Francilien MFr
8.6

Development and Performance of PALM : Middle English

-- Composition of dictionaries procedure


-- Number of entrainements
-- Rate of success. Right/wrong green. % yellow. % red.
-- Remarks: faults, in particular.
8.7

Further technical remarks on the operation of PALM

Annexe 1
Annexe 2

Fields
PALM-MEDITEXT: List of texts

29

Annexe 1
20
21
22
23
24
25
26
27
28
30
50
60
00

Fields

Religious
Philosophical
Philological (including teaching, rhetoric and grammar)
Scientific
Medical
Literary
Legal
Practical (Everyday Life)
Musical
Administrative
Historical
Political
Others

30

You might also like