You are on page 1of 228

POLITECNICO DI TORINO

Notes on Cryptography
by M ICHELE E LIA

A.A. 2014-2015

This art of ciphering, hath for relative an art of deciphering,


by supposition unprofitable, but as things are, of great use.
Francis Bacon

Preface
Cryptography concerns the principles, protocols, and methods for protecting information against deliberate or accidental modifications (alterations). The protection is achieved by means of transformations, called enciphering, whose aim
is to conceal the information, and by inverse transformations, called deciphering,
to re-obtain the useful information. As Francis Bacons fine irony points out, the
entire set of procedures is nonsense but, as things stand in this world, it is very
useful.
For millennia, cryptography was mainly used in diplomacy and military affairs,
but cryptographic techniques have recently also been introduced into the bureaucratic, managerial, and economic sides of civil life.
Cryptographic techniques can be applied in data transmission, computation and
storage systems. The design of any information protection system may be seen
as the engineering way to solve the philosophically insoluble conflict between
security and the resources needed to achieve it. Therefore, any cryptographic
system is a compromise between functional requirements, scientific knowledge,
logistics, technological possibilities and economical costs. The evaluation of an
information protection system must relate to the combination of these resources,
taking a pragmatic view that excludes both the ingenuity and the presumption
typical of factual knowledge and improvisation.
In this scenario, cryptography is only one component, although an indispensable
one, of any information protection system. A knowledge of it, even if limited
to basic techniques, is an absolute requisite for professionally and successfully
managing any system that needs security.
These Notes are a schematic collection of the basic principles, axioms, and methods of cryptography. They may constitute a didactic support to the lectures, but
should be integrated by reference texts and manuals, among which the following
[39], [27], [48], and [59] are suggested.
These Notes grew out of a graduate course in Cryptography I gave for many year
at the Polytechnic of Turin. Their writing took more time (and more years) than
I would have hoped. I must thank the passionate students whose youthful enthusiasm and interest in the challenging subject of cryptography has stimulated
i

my teaching, including in obscure times, and in very difficult conditions. Unfortunately, their hope of having written notes has been satisfied too late; however, they have my sincere and permanent gratitude for their anonymous, nonrewarded but warm and tangible support. I am indebted to Frances Cooper for
her professional and friendly revision of the English, revision that has also greatly
improved the presentation from a logical point of view. I want also to thank Dr.
Guglielmo Morgari (of Telsy) for his careful reading and for pointing out a lot of
typos and mistakes. The final technical quality owes much to his professional and
deep knowledge of the subject. Obviously, any error or questionable viewpoint,
is my responsibility alone, and due to my too many limits.

Turin, September 2014


michele elia

ii

Texts
J. Hoffstein, J. Pipher, J.H. Sylverman, An Introduction to mathematical Cryptography, Springer, New York, 2008.
N. Koblitz, A Course in Number Theory and Cryptography, Springer, NY, 1987.
F. Fabris, Teoria dellInformazione, Codici, Cifrari, Boringhieri, Torino, 2001.
R. Mollin, An Introduction to Cryptography, CRC, NY, 2007.
Manuals
A.J. Menezes, P.V. vanOorschot, S.A. Vanstone, Handbook of Applied Cryptography, CRC, New York, 1997.

iii

iv

Contents
1

Cryptography from Art to Science


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . .
1.2 Information Protection . . . . . . . . . . . . . . . . .
1.2.1 The goals of Information Protection . . . . .
1.2.2 Aims . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Summary . . . . . . . . . . . . . . . . . . . .
1.3 Historical glimpses . . . . . . . . . . . . . . . . . . .
1.3.1 Cryptography from diplomacy to commerce
1.3.2 From art to science . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

- 1.1 . - 1.1 . - 1.2 . - 1.2 . - 1.4 . - 1.4 . - 1.5 . - 1.5 . - 1.8 -

The Shannon theory of secrecy systems


2.1 Introduction . . . . . . . . . . . . . . . . . . . .
2.2 Uncertainty: Entropy and Mutual Information
2.3 Uncertainty and Secrecy . . . . . . . . . . . . .
2.3.1 Binary message encryption . . . . . . .
2.4 Cryptology . . . . . . . . . . . . . . . . . . . . .
2.5 Cryptography . . . . . . . . . . . . . . . . . . .
2.6 Steganography . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

- 2.1 . - 2.1 . - 2.3 . - 2.4 . - 2.8 . - 2.8 . - 2.9 . - 2.12 -

.
.
.
.

- 3.1 . - 3.1 . - 3.3 . - 3.6 . - 3.13 -

.
.
.
.
.
.
.
.

- 4.1 . - 4.1 . - 4.1 . - 4.3 . - 4.3 . - 4.3 . - 4.4 . - 4.5 . - 4.5 -

Random Sequences and Statistics


3.1 Introduction . . . . . . . . . . . . . .
3.1.1 Sample Spaces . . . . . . . .
3.2 Statistical Tests for Binary Sequences
3.2.1 Linear Complexity Profile. . .
Secret-Key Cryptography - Act I
Block ciphers
4.1 Introduction . . . . . . . . . . . .
4.2 The role of the secret-key . . . . .
4.3 Historical Encryption Systems . .
4.3.1 Substitution encryption .
4.3.2 Transposition encryption
4.3.3 Albertis disk . . . . . . .
4.3.4 Vigenere cipher . . . . . .
4.3.5 Hill Cipher . . . . . . . .
v

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

4.4

4.5

4.6

4.3.6 Francis Bacon Cipher . . . . . . . . .


4.3.7 One-time pad . . . . . . . . . . . . .
4.3.8 Enigma . . . . . . . . . . . . . . . . .
Block ciphers . . . . . . . . . . . . . . . . . .
4.4.1 Common structure of block ciphers
4.4.2 Modes . . . . . . . . . . . . . . . . .
DES . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 DES transformations . . . . . . . . .
4.5.2 Local key generation . . . . . . . . .
AES . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Round Transformations . . . . . . .
4.6.2 Local Key generation . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

Secret-Key Cryptography - Act II


Stream ciphers
5.1 Introduction . . . . . . . . . . . . . . . . . . . . .
5.1.1 The structure . . . . . . . . . . . . . . . .
5.1.2 Finite State Machines . . . . . . . . . . . .
5.2 Output functions - Boolean functions . . . . . . .
5.3 Periodic generators and LFSRs . . . . . . . . . .
5.3.1 The mathematics of LFSRs . . . . . . . . .
5.4 Linear Codes and Binary sequences . . . . . . .
5.4.1 BCH codes . . . . . . . . . . . . . . . . . .
5.4.2 Goppa codes . . . . . . . . . . . . . . . .
5.5 Nonlinear Feedback Shift Registers . . . . . . . .
5.5.1 Clock-controlled LFSR . . . . . . . . . . .
5.5.2 Self-Clock-controlled LFSR . . . . . . . .
5.5.3 Clock-controlling and puncturing . . . .
5.5.4 LCP of clock-controlled LFSR sequences .
5.6 Encryption with rate less than 1 . . . . . . . . . .
5.7 Appendix I - Representation of Finite Fields . . .
5.8 Appendix II - Linear recurrent equations in Fq .
5.8.1 Generating functions . . . . . . . . . . . .
5.8.2 Characteristic equation methods . . . . .
5.9 Appendix III - Tridiagonal matrices and LFSRs .
Public-key Cryptography
6.1 Introduction . . . . . . . .
6.1.1 One-way functions
6.2 The RSA Scheme . . . . .
6.3 The Rabin Scheme . . . . .
6.4 The El Gamal Scheme . . .
6.5 The McEliece Scheme . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
vi

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

- 4.6 - 4.6 - 4.7 - 4.7 - 4.7 - 4.8 - 4.11 - 4.13 - 4.15 - 4.17 - 4.19 - 4.20 -

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

- 5.1 . - 5.1 . - 5.2 . - 5.3 . - 5.3 . - 5.5 . - 5.7 . - 5.9 . - 5.11 . - 5.12 . - 5.13 . - 5.13 . - 5.14 . - 5.15 . - 5.15 . - 5.17 . - 5.20 . - 5.21 . - 5.23 . - 5.27 . - 5.31 -

.
.
.
.
.
.

- 6.1 . - 6.1 . - 6.3 . - 6.6 . - 6.10 . - 6.12 . - 6.14 -

Electronic signatures
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Electronic signature of an electronic document
7.2 Components of Electronically Signed Documents . . .
7.2.1 Document . . . . . . . . . . . . . . . . . . . . .
7.2.2 Standard hash function SHA-1 . . . . . . . . .
7.3 Signature based on RSA . . . . . . . . . . . . . . . . .
7.4 Signature based on Rabin scheme . . . . . . . . . . . .
7.5 Signature based on El Gamal . . . . . . . . . . . . . .
7.6 Blind signature . . . . . . . . . . . . . . . . . . . . . .
7.7 Secret Sharing - Shamir . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

- 7.1 . - 7.1 . - 7.4 . - 7.5 . - 7.5 . - 7.8 . - 7.8 . - 7.9 . - 7.13 . - 7.14 . - 7.16 -

Complexity
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 A heuristic view of computational complexity
8.2 Complexity: the Heart of Cryptography . . . . . . . .
8.2.1 One-way functions . . . . . . . . . . . . . . . .
8.3 Arithmetic complexity . . . . . . . . . . . . . . . . . .
8.3.1 Complexity of product and exponentiation . .
8.3.2 Finite field Arithmetics . . . . . . . . . . . . . .
8.4 Factorization complexity . . . . . . . . . . . . . . . . .
8.4.1 Factorization in Z . . . . . . . . . . . . . . . . .
8.5 Discrete logarithm . . . . . . . . . . . . . . . . . . . .
8.5.1 Discrete logarithm as one-way function . . . .
8.5.2 Discrete Logarithm Complexity . . . . . . . . .
8.5.3 Shanks Bound . . . . . . . . . . . . . . . . . .
8.6 Searching Unsorted Data (SUD) . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

- 8.1 . - 8.1 . - 8.3 . - 8.5 . - 8.7 . - 8.8 . - 8.8 . - 8.9 . - 8.10 . - 8.11 . - 8.11 . - 8.13 . - 8.14 . - 8.16 . - 8.17 -

ECC
9.1 Introduction . . . . . . . . . . . . . .
9.2 Elliptic Curves and Group Law . . .
9.2.1 Group Law . . . . . . . . . .
9.3 EC over Finite Fields . . . . . . . . .
9.4 EC Public-key Schemes . . . . . . . .
9.5 Arithmetics and complexity in ECC
9.6 Historical Notes . . . . . . . . . . . .
9.6.1 The origins . . . . . . . . . .

- 9.1 . - 9.1 . - 9.3 . - 9.4 . - 9.7 . - 9.10 . - 9.11 . - 9.14 . - 9.15 -

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

10 Cryptanalysis
10.1 Introduction . . . . . . . . . . . . . . . .
10.2 Axioms . . . . . . . . . . . . . . . . . . .
10.3 Cryptanalysis of secret-key systems . .
10.3.1 Cryptanalysis of classic schemes
10.4 DES Cryptanalysis . . . . . . . . . . . .
10.5 Cryptanalysis of Public Key Systems . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

- 10.1 . . - 10.1 . . - 10.2 . . - 10.3 . . - 10.4 . . - 10.20 . . - 10.23 -

vii

.
.
.
.
.
.
.
.

10.5.1 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . - 10.23 10.5.2 Discrete logarithms . . . . . . . . . . . . . . . . . . . . . . . . - 10.26 11 Cryptography in GSM


11.1 Evolution of cellular systems .
11.2 GSM . . . . . . . . . . . . . . .
11.2.1 Origins . . . . . . . . . .
11.2.2 Communication aspects
11.2.3 Security and Protections
11.3 Conclusions . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

12 Steganography
12.1 Introduction . . . . . . . . . . . . . . . . .
12.2 Some historical notes . . . . . . . . . . . .
12.3 Steganographic channel models . . . . . .
12.4 Concealment issues . . . . . . . . . . . . .
12.4.1 Examples, Simulation, and Results
12.5 Conclusions . . . . . . . . . . . . . . . . .

viii

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.

- 11.1 . . - 11.2 . . - 11.3 . . - 11.3 . . - 11.5 . . - 11.6 . . - 11.9 -

.
.
.
.
.
.

- 12.1 . . - 12.1 . . - 12.2 . . - 12.4 . . - 12.6 . . - 12.7 . . - 12.11 -

- 0.0 -

Chapter 1
Cryptography from Art to Science
Some people are so busy
learning the tricks of the trade
that they never learn the trade.
V ERNON L AW (Pittsburgh Pirates pitcher)

1.1

Introduction

It is a fact of recent history that, in the last two decades of the twentieth century, a scientific, technological, and cultural revolution swept through the communication systems of high-technology countries. Satellite telecommunications,
cellular telephony, digital television, the Internet and personal computers show
that the convergence of telecommunications and computer technology has overturned the entire world order of Information Technology. This atypical revolution
has had unforeseeable repercussions also on the traditional methods of knowledge production and transmission. However, the effects in these fields will only
be observed in the coming decades, and they will probably turn out to be much
more far-reaching than the highly visible modifications already produced on the
world economy and finance. Commerce is increasingly based on the Internet,
with sometimes disturbing effects on the consolidated systems of dealing and of
handling goods. In the banking world, the traditional branch, thanks to the Internet, has expanded to enter into the homes of net customers, modifying both the
way users relate to the banking system and the inner organization of the banks
themselves.
Whereas in one respect these perhaps irreversible phenomena have improved
the quality of life, they have conversely made the system as a whole more fragile
and more sensitive to any recession. Adversaries of all types, compatriots or foreigners, governmental or private bodies, can order and scan plain text they have
intercepted and selected, based on details of your address, or on convenient key
words present in the message. This improper monitoring activity has been going
on for decades, obviously even before the computer made the job so much easier. The novelty comes from the proportions and the number of customers who
- 1.1 -

entrust their personal transactions and secrets to fiber optics, to copper cables or
to the ether. The more a country is technologically advanced, the more usually
will it be susceptible to interception of electronic traffic. Therefore, protection of
information is becoming an unavoidable necessity to assure a societys operative
life. The technologies for protecting information have been developed in the discipline known as cryptology. For millennia, cryptology had as main objective the
confidentiality of information, but in recent times, the technological evolution,
together with the creation of a world-wide society with integrated services and
global systems of communication, has delegated much more extensive, widerranging and more complex objectives to cryptology. Specifically, the number of
services that need some form of information protection is continuously growing.
Any list would fail to be complete, but will be topped by the telephone, e-mail, ecommerce, tele-working, remote monitoring, tele-medicine, and could continue
almost indefinitely.

1.2

Information Protection

It does not appear that the definition of a system for protecting information can
be formulated in a definitive manner through authoritative statements. Rather,
security comes from the concurrence of needs, situations, and purposes that contribute to defining the scenario in which information plays the role of principal
actor.
A system for the protection of information depends on:
1. Accuracy of the principles.
2. Robustness of the mathematical procedure used to transform the information.
3. Physical security of the technological equipment that processes the information and the environments where such devices reside.
4. Discipline of employees, by discipline meaning the mental attitude and the
behavioral attention to details that could make even the most technicallysecure system vulnerable.
As just noted, security systems bring together many components of a human and
technical nature. Among these, an important role is played by cryptology and
related mathematical techniques.

1.2.1

The goals of Information Protection

The objectives for protecting information against deliberate manipulation in general should respond to four basic questions:
1) What information to protect?
- 1.2 -

i) The message as such, keeping it confidential;


ii) The integrity of the message, that is guaranteeing it is received correctly by the recipient, whether privately or not;
iii) The authenticity of the message, that is reassuring the recipient about
the identity of the messages author;
iv) The very existence of the message.
2) Why protect the information?
(a) To ensure integrity: Information should be preserved in its original
form. It must not be fraudulently altered and passed off as authentic.
(b) To ensure availability: Information should be usable when required,
without delay or uncertainty.
(c) To ensure confidentiality: The information must be kept as private as
the owner wants. Only authorized persons or entities can have access.
(d) To ensure privacy: it should not be possible to trace the source of information.
3) Against whom to protect the information?
(a) Against opponents determined to steal it;
(b) Against accidental or deliberate destruction;
(c) Against improper or unauthorized use.
4) How to protect the information?
(a) In a physical manner, i.e. endowing physical locations or equipment
with defenses difficult to crack;
(b) In a logical manner, that is by transforming the information so that it
cannot be stolen, understood, or manipulated by any opponent;
(c) In a virtual way, namely by preventing persons from locating the information in real terms.
Although these statements may sound authoritative, it is not in any way possible to give definite and final answers to the above four questions, if such responses even exist. Rather, these questions and their partial answers direct the
presentation of cryptography and related mathematical techniques, to give security managers the most valuable tools that are available at the current state of
knowledge. With reference to how to protect the information, the techniques developed to hide the very existence of the message have had a somehow more
esoteric development than cryptographic techniques proper, and fell into the discipline known as steganography (a word of Greek origin that means covered
writing). The first recorded use of steganography is in the title of a book by
Thrithemius. Steganography has recently experienced a great revival, mainly
thanks to the Internet, and a short overview will be given in the last chapter of
these Notes.
- 1.3 -

1.2.2

Aims

The situation that pits the defender against the attacker has a dual aspect, that
characterizes the two main branches into which Cryptology is partitioned: cryptography/steganography and cryptanalysis.
Cryptography/steganography pursues five main goals:
- To protect against intruders, ensuring that access to the information is reserved to authorized persons, entities, or devices.
- To protect from deliberate destruction or alteration, ensuring the datas integrity, both logical (meaning of the texts) and physical (supporting paper,
magnetic tapes, CD-ROMs, etc..).
- To prevent shadowing (authenticity), namely to ensure recognition of the
source of information.
- To prevent repudiation (signature), to ensure the impossibility of denying
the origin of a message.
- To prevent tracking, ensuring anonymity of the source and route of messages, objects or people.
The purposes of cryptanalysis are operations that may be the converse of the aims
of the above list, namely:
- To determine the contents of a message.
- To destroy a message, i.e. to deliberately prevent communication between
two parties.
- To falsify, that is to send a message as if it were from another author, such as
launching a communication with a party and being accepted as a legitimate
counterpart.
- To deny being the author of ones own message.
- To trace the origin and path of messages, objects, or people.

1.2.3

Summary

The five situations considered above are at the core of modern cryptology, and
can all be incorporated into a mathematical description in the framework of the
Shannon information theory. However, for practical purposes, it has been preferred to develop a discipline that is apparently independent, referring to information theory only for the basic principles. This will be the subject of the following chapters.
- 1.4 -

1.3

Historical glimpses

The millenary history of cryptology began in ancient Egypt at the Court of the
Pharaohs where, between sphinxes, pyramids, and plots, for millennia the power
game was played. But it was the warrior soul of Greece, with its oligarchic system, kingdoms, and ambitions of military and cultural domination, that first systematically applied a cryptographic method of which we have any certain knowledge. In the silent palaces of Sparta, King Agide encrypted messages directed to
his distant generals in charge of controlling the eastern Mediterranean, by rolling
up a string of papyri, helicoidally around a skytale (command baton) and writing his message along the length of the roll. The straightened string of papyri
with the encrypted message looked like a chaotic set of symbols. To read his message, the general rolled up the string around a baton of the same diameter. Today,
these procedures for exchanging secret messages may move us to smile. Nevertheless, they solved the problem of private communication in an acceptable way,
compatibly with the available technology.

1.3.1

Cryptography from diplomacy to commerce

From the Spartan hegemony on the Aegean sea, through the grandeur of the
Roman Empire, the effervescent political and cultural milieu of the Italian Renaissance, down to the modern supra-national governments, cryptography has
been variously, but almost exclusively, used in affairs of power. The impulse to
its development was almost always given by the exigencies of war. Surely, the
complex needs of the Roman army to exchange secret messages at the time of
Gaius Julius Caesar promoted the invention and diffusion of a method for concealing information that was relatively secure, and at the same time operatively
easy. The cryptographic method known as Caesars cipher consisted in substituting each letter with a letter three positions onward (in the natural alphabetical
order from A to Z of the letters). For example, the letter A is substituted with D,
B with E, and so on, W being replaced by Z. The last three letters X, Y, and Z are
substituted with A, B, and C, respectively. The rule was very easy and number
3 was the secret key for enciphering and deciphering. The decryption operation
to recover the original message from the encrypted text consisted of the inverse
substitution, which can be described similarly: each letter is substituted with the
letter three positions before it. Technically, in jargon, this encryption rule is called
mono-alphabetic, while its generalization is called polyalphabetic substitution.
This general and relatively strong encryption rule (i.e. polyalphabetic substitution) was perfected by Blaise de Vigen`ere and reported in his Traicte des chiffres,
ou secretes manieres descrire published in 1586, where a square table that bears
his name appeared with a certain emphasis, for the first time. However, this table had already been reported in De Furtivis Literarum Notis by Giovanni Battista
Della Porta, published in 1563. The Vigen`ere polyalphabetic enciphering was
long considered impossible to crack. In polyalphabetic encryption, the key consists of an ordered set of numbers (or letters), for example, encrypting with a key
- 1.5 -

consisting of the numbers 3 and 12, the letters of the text, starting from the first,
are alternately transformed by mono-alphabetic substitution as in the Caesar cipher, with keys 3 and 12.
The principle of substituting a message with a second message, according to a
rule governed by a secret key, easy to remember, in such a way that only the
person who has the secret key may go from the encrypted message to the original
message, constitutes the essential part of private key encryption. The first treatises about these cryptographic techniques appeared around the sixteenth century in Italy, although the first known manual of cryptography had already been
published in 1379 by one Gabriele de Lavinde of Parma, possibly a cryptographer
who served in the secretariat of Clemente VII, the antipope.
A prominent position in the literature on cryptography is occupied by De Componendis Cyfris (1466) by Leon Battista Alberti, a work in which, together with
the principle of polyalphabetic encryption, the first encryption disc is described,
and the concept of cryptanalysis is introduced. In particular, several methods for
attacking encrypted messages are proposed. Cryptanalysis was highly prized, as
testified by many books of the same period, written by scientists of the time who
also acted as court cryptographers, such as Cicco Simonetta with the Sforza family in Milan, Giovanni Soro serving the Venetian Republic, and Giovanni Battista
Argenti, who served the Pope. Interest in the art of secret writing was certainly
stimulated by the quarrelsome nature of the princes of the time, and the typical
Italian taste for political manoeuvring. Meanwhile, connoisseurs of cryptography were artists, scientists, and politicians, working directly in other sectors like,
for example, the mathematicians of the Bologna school. In this cultural environment, the contribution to cryptography made by the great Lombard mathematician Girolamo Cardano was both varied and remarkable. In his work De
Subtilitate libri XXI (1550), Cardano describes, among other subjects, a lock with
rotors that could be unlocked only by a given letter combination (the letters being written on the external side of the rotors). Among eminent Europeans who
were interested in cryptography during the fortunate Renaissance period are men
of vast cultural interests, like the already-cited Leon Battista Alberti, Giacomo
Casanova (who first intuited how to break polyalphabetic encryption), famous
mathematicians like John Wallis and Francois Vi`ete, and Francis Bacon, philosopher and statesman. In the seventeenth and eighteenth centuries, progress in the
cryptographic field was slow and insignificant. The evolution restarted, quite
suddenly, around the middle of the nineteenth century, sustained this time by the
industrial revolution and by the economic and governmental interests of the great
modern States. In 1863, the important work Die Geheimschriften und die DechiffrirKunst by Friedrich W. Kasiski was published. This book describes in detail how
to break, with cryptanalysis, the Vigen`ere encryption. Despite general works,
like the Traite de cryptographie by Lange and Sourdat, published in 1925, or the
important Manuale di crittografia by the general Luigi Sacco, published in 1936,
which is one of the best-known and most interesting cryptography treatises of
the early twentieth century, the true great leap forward toward a mathematical
theory of cryptography only occurred at the end of the second world war. The
- 1.6 -

most significant progress in the nineteenth century comprised the introduction of


encryption machines, the natural evolution of the enciphering discs introduced
in the Renaissance. In fact, the diffusion of cryptography, especially in military
contexts due to the size and composition of the army, entailed using operators
who were not skilled as designers of secret codes, imposing de facto the use of
automatic systems for encrypting and decrypting, that is enciphering machines.
The demands of war gave a big boost to the improvement of such equipment.
Obviously, cryptographic technology had a development parallel to that of principles and techniques for masking (protecting the secrecy of) information. Nevertheless, to become effective, fast, and reliable instruments, encryption machines
needed the mechanical and electrical technology that only became available with
the advance of industrial development. In La Cryptographie militaire (1883) Auguste Kerckhoff von Niuewenhof formulated the main principles of cryptology
that must be satisfied by any encryption equipment used by the army. In 1891,
the Frenchman Etienne Bazeries invented an encryption machine that was used
(at least) until the second world war.
In 1917, the American Gilbert S. Vernan of AT&T invented an encryption machine for teletypes based on the polyalphabetic enciphering of Vigen`ere, which
was adopted by the U.S. Army Signal Corps. Vernans great innovation was the
way that the enciphered text was obtained by combining, bit by bit, two binary
sequences, introducing de facto the modern stream ciphers. The encryption machine designed by the Swede Boris Hagelin around 1920 became famous, and
was also used by the American army. The Hagelin enciphering machine was a
competitor of the Enigma encryption machine used by the German army. The
Enigma machine was invented by the German Arthur Scherbius, an electrical engineer, and patented in April 1918. It was composed of four rotating discs (rotors),
whose initial position was part of the secret key. It was first used by the German
post for encrypting telegrams. After long cryptanalytical studies, it was adopted
by the German Navy in 1926. The structure of Enigma was further improved,
until it achieved satisfactory strength against any kind of cryptanalytical attack.
Without management mistakes or partial private information on the secret keys,
it could be considered definitively secure.
Enigma represents the apex of the evolution of the electro-mechanical encryption machines based on rotors, and fully motivates the great efforts made by the
Allies to decrypt its messages. During World War Two, attacks against Enigma
were organized by the English secret services with an extraordinary deployment
of resources. At Bletchley Park, a town 75 km North-West of London, a group of
cryptographers worked for the entire duration of the war, trying to decrypt, with
alternating fortunes and acceptable success, the messages encrypted by Enigma.
In these efforts, the first electronic computers were employed to implement mathematical attack criteria. Attacks were first developed by Polish mathematicians,
and later by a collaboration among famous mathematicians including Alan Turing.
After the second world war, encryption machines continued to be introduced, implemented with the new electronic technologies, but the underlying algorithms
- 1.7 -

were still based on the old principle of rotors, borrowed from Albertis disc. To
meet the requirements of the globally-expanding economy, several standardization processes were started. In the 1970s, the most widely-debated system for
private key encryption was the DES (Data Encryption Standard) proposed by the
American National Bureau of Standards, and developed on an initial project by
IBM. DES, and its successor AES (Advanced Encryption Standard) may represent
the last step of the rotor machine development.
In the subsequent evolution, the word machine is still maintained, but it must
be intended as a mathematical computing procedure by means of algorithms.
All these machines are commonly known as encrypting machines. They realize
private key encryption and represent the modern variant of the Caesar cipher,
improved with tricks aimed to achieve the perfect encryption system known as
one time pad. Actually, this system, notoriously used in the fascinating world
of spying, encrypted a message by substituting the message letters by a combination with letters suggested by special positions in the pages of a booklet (pad)
used only one time. The system is practically unbreakable without knowing the
book.

1.3.2

From art to science

Cryptography was treated as an art for centuries. From invisible ink to mechanisms combined with key words to open secret doors. From rings that should
combine to show incredible secret passages, to mysterious combinations of carillon notes that open fabulous strongboxes. From the vaguely cryptic love messages of Cirano de Bergerac to the beautiful Roxanna, to the light signals between
lovers in the Certosa di Parma by Stendhal, every action, instrument, or event
contributed to make cryptography a mysterious art.
However, the needs of governments of great modern states called for something
more than a reliance on experts, however loyal, or accredited men skilled in cryptography. Thus, it is not entirely by chance that the English philosopher and
statesman Francis Bacon formulated the basic criteria that should be met by good
encryption systems. Bacon made a major contribution to the rise of a scientific
theory of cryptography. But only relatively recently has a complete axiomatic
formulation of cryptography been achieved, by merit of Claude Elwood Shannon with the publication of his paper Communication theory and secrecy systems, in 1949. Actually, this paper was already completed in 1945, but was classified material, and only after its declassification could it appear in a publiclydistributed journal. Key in this rigorous description of encryption/decryption
operations was Information Theory, a mathematical theory of communication
also due to Shannon. As a result of this approach, all impossible expectations of
cryptographic protection were abandoned. With Shannons proof that certainty
in any cryptographic protection does not exist, the dream of perfect secrecy finally waned. All protection is of a probabilistic nature. We may only reduce the
probability of violating a secret system, but we will never achieve the certainty of
absolute inviolability.
- 1.8 -

The axiomatic formulation made of cryptography a discipline similar to mathematics. Today, the prophetic words, pronounced by Adrian A. Albert at the
opening of the 382nd Conference of the American Mathematical Society in 1939,
are astonishingly concrete:
We shall see that cryptography is more than a subject permitting mathematical formulation for indeed it would not be an exaggeration to state that
abstract cryptography is identical with abstract mathematics.

- 1.9 -

Chapter 2
The Shannon theory of secrecy
systems
There is nothing more difficult to take in
hand, more perilous to conduct, or more uncertain in its success, than to take the lead in
the introduction of a new order of things.
Niccolo` Machiavelli

2.1

Introduction

The theoretical foundation of modern cryptography can indisputably be attributed to Claude Elwood Shannon, with his 1949 publication of the paper Communication Theory and Secrecy Systems, in the Bell System Technical Journal [68].
The paper, almost surely completed in 1945, was considered classified material
(a document or information is classified when it is considered important for national security) and it was declassified only four years later, just before its appearance in the open literature. This paper that founded cryptology followed by
one year, if possible, an even more important paper by Shannon, that is A Mathematical Theory of Communication, which appeared in 1948 in the same Bell System
Technical Journal. This paper was determinant for the theoretical and practical
development of telecommunications systems [69]. In it, after having introduced
axiomatically a measure of information, Shannon developed a totally new theory
by means of which to describe all problems concerning the transmission, storage,
and transformation of information. Shannons measure of information has the character of a physical quantity, analogous to the measure of surfaces, energy, time,
speed, etc.
This definition of a measure of information has enabled the true nature of information to be established, tied to uncertainty and to discrete character. It has also
permitted a better comprehension of the mechanisms conveying information, and
in cryptography the understanding of the achievable limits of data security.
- 2.1 -

All information useful to mankind is represented by means of a finite number of


symbols, in particular the binary form consisting of the use of two symbols, typically represented with 0 and 1, is ubiquitous and has been universally adopted.
The signals, or waveforms, used to carry information are continuous in time
and amplitude, that is they are well described by continuous functions. Nevertheless, today, most of them may be seen as digital objects, in the sense that they
are chosen from among a finite set of waveforms. However, many information
signals are still continuous or analog, like audio signals or some broadcast video
signals, and the corresponding telecommunications systems are still said to be
analog systems.
This situation is rapidly changing, as Shannons information theory is imposing the digital view, its philosophical significance, and its axiomatic formulation,
which, being strictly mathematical, is gradually dominating over every naive approach. The fundamental theorems of Shannons are restricted to telecommunications systems; however, two theorems emerging from the theory are of great
importance in many fields of science.
Theorem 2.1. The nature of information is inherently discrete.
The formal proof of this theorem is based on the notion of differential entropy. However, a heuristic argument is the following: the information associated to analog signals is infinite, that is if we want to describe an analog signal of
stochastic character we need an infinite amount of information, while over any
continuous channel conveying an analog signal, the amount of information given
by the received signal on the transmitted signal is finite, that is channel noise destroy an infinite amount of information. In conclusion, the received information,
being finite, can be described with a finite set of symbols.
The second theorem is more useful in computer science, although its consequences are also important in communications systems and cryptography.
Theorem 2.2. (Data Processing Theorem). Each time digital data are processed (i.e.
transmitted, transformed, of stored), the amount of information carried may only diminish.
Paradoxically, it often occurs that, after processing the information, is apparently more meaningful, because it is in a form that is compliant with what we may
perceive or understand. This undoubted advantage for us induces the erroneous
impression the the transformations have extracted all the information contained
in raw data. Nevertheless, the information contained in the more friendly data
is inevitably diminished. In other words, we have lost something.
It took several decades before the consequences of these theorems produced visible effects on the evolution of telecommunications and computer science. Now
that the transition to digital is completed, the values of these theorems is practically only philosophical and cultural, although their in-deepth comprehension
could still aid all sciences that inevitably deal with information.
- 2.2 -

E NCODER

Noisy channel

E = G(M, N)

D ECODER

Figure 2.1: Shannons model of a communication channel

2.2

Uncertainty: Entropy and Mutual Information

Claude Shannon considered information as a reduction of uncertainty, thus connecting the measure of information to the physical measure of uncertainty, namely
entropy. With admirably mathematical rigor (although Shannon was entering
completely new territory) he deduced the measure of information from a set of
axioms composed of the axioms that are at the basis of the theory of mathematical
measure, completed by a specific axiom for dealing with information. The resulting measure was the classic entropy, that had been introduced, in the ninetieth
century, for sizing the uncertainty of systems of particles like molecules, atoms,
photons, etc.
In the following we recall some results from information theory, as they were
derived by Shannon, which are indispensable to cryptographic uses.
m
Let A = {ai }N
be the set of
i=1 be an alphabet of N symbols, and let M = A
messages of length m over the alphabet A.
Assuming that the messages M M are random events of a stationary stochastic process characterized by a probability distribution p(M), the entropy of the
messages H(M) is defined as the average

H(M) =

p(M) ln

MAm

1
,
p(M)

(2.1)

and the source entropy, i.e. the entropy of the alphabet H(A), is defined as the
limit
1 X
1
H(A) = lim
p(M) ln
.
(2.2)
m m
p(M)
MAm
The entropy H(A) is a nonnegative number that evaluates the uncertainty reduction relative to a symbol of the alphabet A as a consequence of its emission.
Shannons interpretation of the entropy was that H(A) represents the amount
of information that, on the average, a symbol of the stream M can support. If
- 2.3 -

the stochastic process that governs generation of the message M, is memoryless


stationary, then it can be shown that the entropy H(A) is a finite sum
H(A) =

p(a) ln

aA

1
.
p(a)

(2.3)

The entropy H(A) intended as a function of vector P = (p(a1 ), . . . , p(aN )), with
ai A, of dimension N , is a convex function that assumes the maximum value
H(A) = ln N when all symbols are used with the same probability, that is p(a) =
1
. In this model we have
N
H(M) = H(Am ) = mH(A) .
When a message M M is sent over a real communication channel, it is corrupted by some noise, given by a block N of symbols from the same alphabet A.
The received message is E = G(M, N) E. The noisy channel may be ultimately
seen as a discrete memoryless source of random symbols from the same alphabet
A, which corrupt the message M. The received message E may be different from
M; nevertheless, E yields some information on the sent message M.
The most important parameter introduced by Shannon for describing these situations was mutual information I(M, E), defined as:
I(M, E) =

p(M, E) ln

MAm ,EAm

p(M|E)
.
p(M)

(2.4)

where p(M, E) is the joint probability distribution of the sent and received messages, and p(M|E) is the corresponding conditioned probability distribution.
I(M, E) represents the amount of information that each received symbol gives,
on average, about the transmitted symbol.
Mutual information is non-negative: if it is zero, the channel is useless, that is,
instead of transmitting, at the receiver side the symbols may be randomly produced. The following equations connect mutual information and entropies:
I(M, E) = H(M) + H(E) H(ME) = H(M) H(M|E) ,
where H(M, E) and H(M|E) are, respectively, the joint entropy and the conditional entropy of the input and output symbols of the channel. H(M|E) may be
interpreted as the amount of information that is still needed to specify M completely, when the received symbol E is known.

2.3

Uncertainty and Secrecy

In search of perfect secrecy, Shannon conceived the idea of modeling encryption operations as transmission over a very noisy channel, where the messages
M M from an information source are corrupted by noise to such an extent that
they cannot be recognized by any unauthorized observer, but this noise should
- 2.4 -

Private Channel

E = F(M, K)

Public Channel

M = G(E, K)

Figure 2.2: Shannons model of cryptographic channel


be reproducible by the intended recipient, in order to allow reconstruction of the
valid message. This noisy transformation is specified by a function F (., .) defined from Am Ak into Ar
E = F (M, K) ,
(2.5)
with the constraint of being invertible, i.e. given E and knowing K we must be
able to compute M as
M = G(E, K) ,
(2.6)
where G(., .) is a function from Ar Ak into Am . The elements of K = Ak are
called keys, and play the same role as noise in the transmission channel.
The keys thus define the transmission channel, whose aim, in this case, is to modify transmitted messages in such a way that they can only be understood by the
intended recipients, with own reading key.
In this context, the mutual information I(M, E) is defined as the average amount
of information that the encrypted messages E give about the original messages
M, when the message key belonging to K is unknown.
The mutual information is defined through the conditional entropy, which is defined as the amount of information that the messages in Am can still give once a
received (encrypted) message belonging to Ar is known. We have
H(M|E) =

p(M, E) ln

MAm ,EAr

1
.
p(M|E)

(2.7)

In Shannons model of cryptologic channels, a key role is occupied by the amount


of information that the encrypted messages give about the original messages,
in the two contrasting hypotheses that the key word is known or not. In other
words, the mutual information I(M, E) and the conditional mutual information
defined as
I(M, E|K) =

p(M, E|K) ln

MAm ,EAm ,KAk

- 2.5 -

p(M|E, K)
.
p(M|K)

(2.8)

are equally of interest in defining the ideal cryptographic situation, which may
be taken as reference.
Definition 2.1. An encryption process is called perfect when the mutual information
I(M, E) is zero, while the conditional mutual information I(M, E|K) yields all the information of the original message, that is

I(M, E)
= 0
(2.9)
I(M, E|K) = H(M) .
In other words, assuming that the key is not known, the encryption of a message is perfect when the received message does not give any information on the
original message. In terms of entropies the equation (2.9) implies
H(E) = H(E|M) ,

(2.10)

which means that the entropy of the encrypted message is not reduced by the
knowledge of the original message. Furthermore,
H(E|K) H(E|M, K) = H(M) ,
that is H(E|K) = H(M) since H(E|M, K) = 0. In conclusion, knowledge of the
key must enable the original message to be obtained from the encrypted message, without any further uncertainty. Operatively, this condition assures that
the decrypting operation is possible, i.e. that the function G(., .) exists. Since the
relation between the key and the encrypted message is described by the equation
(2.5), the entropy of the encrypted messages, knowing the original messages, is
not greater than the entropy of the keys H(K) H(E|M); then, using (2.10), we
have the chain of inequalities
H(K) H(E|M) = H(E) H(E|K) = H(M) .

(2.11)

The interpretation of this last equation is that to achieve perfect secrecy, the entropy of the keys should not be smaller than the entropy of original messages.
Very frequently, messages, keys, and encrypted messages are all obtained by concatenating statistically-independent symbols from the same alphabet A. In this
case we have H(M) = mH(A), and H(K) = kH(A), thus the inequality among
entropies implies k m for perfect secrecy. We may collect the previous result in
a significant statement
Proposition 2.1. In order to achieve perfect secrecy, it is necessary that the entropy of
the keys not be smaller than the entropy of the original messages. Equivalently in terms
of alphabet symbols, the length of the key should be not shorter than the length of the
original message.
Historically, perfect secrecy was achieved by the classical encryption systems
adopted by spies, and known as a one-time pad, where a book (pad) was used
only once with a precise algorithm for specifying which character to use in which
page, for encrypting the characters of the original message.
- 2.6 -

Short keys In most cases, the key set K has fixed small dimension k, while both
m and r may grow indefinitely. Therefore, it is impossible to satisfy the condition
(2.9) for perfect secrecy.
In this fairly common situation, to evaluate the amount of information that the
encrypted stream gives about the plain-text stream, we make two assumptions:
1) the choice of key is independent of message generation, and 2) m is always
greater than k (actually this is not a real constraint, but avoids repeating this
condition each time). It follows that conditional entropy equates key entropy,
that is
H(M|E) = H(K) .
A large class of encryption schemes consists of cryptographic transformation
working symbol by symbol, that is each symbol mi in the message is changed
into an encrypted symbol Ei as
Ei = f (mi , Ki ) ,
whore Ki is a symbol of a secret key stream. Assuming that the symbols mi are
statistically independent in each message, then we have
H(M) = mH(A) ,
while the entropy of the keys, assuming that each key is generated by a mechanism that produces k statistically-independent symbols, and the remaining r k
symbols in a deterministic way, starting from the previous k random symbols, is
H(K) = kH(A) .
Clearly, we cannot have perfect secrecy, because the key entropy is limited: the
mutual information exchanged by original message and encrypted message turns
out to be
I(M, E) = (m k)H(A) ,
that is, the information given by the encrypted message about the original message grows linearly with the message length. Nevertheless, this scheme is one
of the most widely used in practice. It is reasonably secure, provided that H(K)
is sufficiently large to prevent a computational evaluation of K, even under the
assumption that plain and cipher text are known.
In the previous arguments, it was tacitly assumed that the entropy of the encrypted message and the entropy of the original message are equal. However, in
some circumstances, it may be profitable to consider encrypted messages having
entropies larger than the entropies of the corresponding original messages; this
situation is quantified by the encryption rate.
Definition 2.2. Let M = Am , K = Ak , and E = Ar be the sets consisting of blocks of
m, k, and r symbols from an alphabet A, respectively. The parameters m and r denote the
length of the plain and cipher texts, respectively, and k is the length of the key. The ratio
=

m
,
r

- 2.7 -

is the net encryption rate, while the ratio


=

m
,
r+k

is called the full encryption rate.


Equation (2.11), consistently with Shannons source coding theorem [54], implies that 1, a necessary condition for the invertibility of the encryption
function F (., .), once the key is known. However, the parameter yields a more
faithful measure of the loss (or load) deriving from encryption. Perfect secrecy requires that m = k, because the key should be of the same length as the message;
therefore the full encryption rate (corresponding to the full transmission rate) is
0.5.
Chapter 4 will describe an encryption procedure achieving perfect secrecy without using a secret key of the same length as the plain message, the price to pay
being a net encryption rate not greater than 0.3867.

2.3.1

Binary message encryption

The typical encryption operation of binary sequences is the simple and fast binary
sum modulo 2
ei = mi + ki
mod 2 .
This is the standard encryption operation of stream ciphers, which are machines
generating the key stream ki starting from a short key (the secret key) K. This
encryption procedure will be described in greater detail in Chapter 4.

2.4

Cryptology

cryptology is the scientific discipline that deals with the methodological and mathematical aspects of protecting information against deliberate alterations or intrusions.
It is sub-divided into two great branches with opposite aims:
cryptography/ steganography aim to develop protection methods of different sorts;
cryptanalysis aims to break protection systems and methods.
The two branches are closely correlated, and the design of good cryptographic
functions cannot avoid in-depth cryptanalysis. Nevertheless, the mathematical
methods are quite different.
Cryptography deals with methods for protecting information, in particular to
achieve:
- 2.8 -

1. Confidentiality in any sort of communication, that is to assure the privacy of the exchanged messages.
2. Integrity in any sort of communication, that is to maintain unchanged
a not necessarily secret message against deliberate alterations.
3. Authenticity of the interlocutors, that is to guarantee the identity of the
conversation partners, namely sender and recipient of a message.
4. Non-repudiation of the authorship by the authentic signer, that is to
guarantee the recipient that the signer cannot reject the authorship.
Steganography deals with the protection achieved by hiding the existence of the
message, which may be embedded in innocent information, or in any unsuspected object.
Cryptanalysis deals with problems of offensive, that is with developing attack
methods to violate that characteristic of the message protected by cryptography. In particular, typical cryptanalysis actions are:
1. Retrieving the text of a message protected by cryptography, having at
ones disposal only partial information.
2. Altering or destroying plain or encrypted messages.
3. Fraudulently impersonating the legitimate interlocutor.

2.5

Cryptography

Cryptography includes the study of encryption algorithms, and of the protocols


used to achieve security objectives. After the publication of Shannons paper,
and the burst of interest that was immediately raised, the evolution of cryptology
seemed to return to the traditional discretion, and to an evolution without great
steps forward. However, the progress that occurred in error-correcting codes affected cryptography. Most frequently, it was the same people who carried out
different duties in the public domain of communication systems, and in the private elitist branches typical of cryptographic offices.
Probably, the great economic interests handled by computers, automatic bank
counters, and the need for control over information flows, lead to increasingly
global and fast significant applications of cryptography outside of the traditional
military and diplomatic fields.
However, the paper New Directions in Cryptography by Whitfield Diffie and Martin Hellman, which appeared in the November 1976 issue of IEEE Transactions on
Information Theory, was surprising and caused a sensation, especially among experts of secret cryptography. It proposed a new and challenging paradigm to the
cryptographers:
To transfer private information on a public channel without a previous agreement.
- 2.9 -

The abstract solution of this problem, in Diffie and Hellmans conception, formally introduced the new concept of one-way function, or better, a new way to interpret the traditional cryptographic functions. These traditional cryptographic
functions, they said, should satisfy an asymmetric complexity not in the way they
are used, but between the modes of application and their cryptanalysis. Diffie
and Hellman introduced a function F (.) that should satisfy the following properties:
1) To be easily specified together with its inverse function F 1 (.);
2) To be easy to compute (message encryption) C = F (M ), given M , a plain
message;
3) To be hard to compute F 1 (.) from the sole knowledge of F (.);
4) To be easy to compute M = F 1 (C) the plain message, given the encrypted
message C.
Conditions 3) and 4) are not contradictory, because the goal is to easily retrieve the
message, having designed both F (.) and F 1 (.), but that the uninitiated cannot
obtain the inverse function simply from the knowledge of the function F (.).
This problem has originated public-key cryptography, which was immediately
useful in telecommunications systems, in the management of computing resources, and in many banking operations.
One-way functions. The notion of one-way function is fundamental in publickey cryptography, although their existence is still questionable, as it has not been
formally proved. A large number of systems base their security on a mathematical notion, whose significance has not been rigorously proved.
Many potential one-way functions have been proposed, however the only surviving candidates are borrowed from elementary number theory. Precisely, the only
known putative one-way functions are based on
1. the difficulty of factoring integer numbers
2. the difficulty of computing the discrete logarithm in certain representation
of cyclic groups of prime order
3. the difficulty of decoding linear error-correcting codes that miss symmetry.
The idea of using the difficulty of computing the discrete logarithm in convenient
cyclic groups was introduced by Diffie and Hellman.
The idea of using the difficulty of factoring was introduced by Rivest, Shamir,
and Adleman with the description of the algorithm known as RSA.
Lastly, the idea of using the difficulty of decoding error-correcting codes is due to
McEliece.
All these problems are the object of stringent research, but the main objective
remains an axiomatic proof of the existence of one-way functions.

- 2.10 -

Secret-key cryptography, despite its limitations, is unavoidable for encrypting


large amounts of stored data, or fast secret data transmission. What is remarkable is that all these enciphering schemes are derived from the historical Caesars
cipher, with the variants imposed by two thousand years of evolution, and by the
evolution of technology.
The Caesars cipher consisted of a simple substitution of a letter with a letter
three positions forward in the alphabetic; the last three letters were substituted
with the first three letters of the alphabet, in order. The number 3 was the secret
key. The procedure could be mathematically interpreted by encoding the letters
with numbers from 0 to 25 and introducing the operations modulo 26, that is
working the the residue ring Z26 .
The encryption operation first converts the letters into numbers (encoding operation) then the number 3 is summed to each number modulo 26. These number
may be converted back to letters for transmission; in this event, before decryption the letters are re-converted to numbers. The decryption operation consists in
subtracting the number 3 from each number of the encrypted message, and then
in conversing (decoding) the numbers into letters.
In spite of its apparent simplicity, to state that such a scheme has conserved its
validity is not superficial. Caesars cipher, viewed in mathematical terms, has all
the ingredients for defining the encryption of a message, namely, the concept of
transformation, the notion of secret key exchanged in a secure way, and the encoding notion.
Actually, the source of weaknesses in Caesars cipher is the very short key, this
awkward limitation was avoided by the method using different keys to encrypt
symbols in different positions in the message. The key symbols were taken from
a pad, to be used only once, which was kept secret and only known to sender and
receiver. This scheme was typically used by spies, and became known as one-time
pad encryption. Practically, it achieved perfect secrecy, as shown by Shannon.
Formally, the one-time pad encryption procedure may be described as follows. Let
the (plain text) be the sequence of numbers in ZN = {0, 1, 2, . . . , N 1}
M1 , M2 , . . . , Mi , . . .

The secret encryption key is a sequence from the same set ZN , and of the same
length
K1 , K2 , . . . , Ki , . . . .
The (cipher text) is obtained by composition on a symbol-by-symbol basis
Ei = Mi + Ki mod N

i .

The plain text is easily recovered knowing the secret key


Mi = Ei Ki mod N

i .

Conversely, assuming that the key is a sequence of equally-probable and statisticallyindependent symbols, it is impossible to obtain the plain text from a knowledge
- 2.11 -

of the cipher text only. We have p{Ei = j} = p{Mi + Ki = j}, for every j ZN ,
that is
N
1
X
1
p{Ei = j} =
p{Mi = j ` mod N }p{Ki = `} =
N
`=0
which means that the symbols in the encrypted message are also equally probable and statistically independent.
Shannons perfect secrecy is thus possible. However, it is not practical for use by
a large number of people in unrestricted scenarios. In the real world, the implementation of cryptographic schemes calls for more pragmatic solutions, and thus
some basic principles that should be followed by any good cryptographic system
have been formulated.
In 1883, Auguste Kerckhoffs wrote two journal articles on La Cryptographie Militaire, in which he stated six design principles for military ciphers. Most of them
are now redundant because of computers, but two are still valid:
1. Encryption and decryption rules should be easy to implement. Actually this rule
was already formulated hundreds of years before by Francis Bacon.
2. The encryption rule must not be required to be secret, and it must be able to fall into
the hands of the enemy without inconvenience. In other words, security should
lie all in the secrecy of the key. This is now known as Kerckhoffs principle.

2.6

Steganography

The protection of information achieved by concealing the very existence of the


message itself is studied by the discipline of steganography. The term was first
used by Johannes Trithemius (1462-1516) and derives from the composition of
two Greek terms meaning literally covered writing. In history, we have seen
several examples of information hiding which have folk appeal; however, in
modern times it is usually interpreted as hiding information by means of other
information. Examples include sending a message by altering the pixels of an
image, bits in digital voice recordings or music, introducing artificial errors in
encoded transmissions, or linguistic methods called acrostics. The most famous
acrostic author, namely Boccaccio deserves to be mentioned. He wrote three sonnets, then wrote other poems such that the initials of the successive tercets corresponded exactly to the letters of the sonnets [44].
Today, the more general term information hiding is used to indicate any discipline that is directed, honestly or dishonestly, to goals that are based on concealing the very existence of the information to different degrees. In this context,
Petitcolas [44] considers four sub-disciplines
Steganography properly speaking conceals the existence of the message by methods that include both physical and logical techniques. In principles, attacks
are not possible, because the existence of the message is not known and is
not considered.
- 2.12 -

Watermarking is used to protect proprietary rights, authorships, or any kind of


ownership of a product. It is weaker than steganography, because an attacker expect, or at least suspects that some protection is active on the product. A simple attack may be limited to making the mark undetectable.
Covert channels are typically used by untrustworthy programs to leak information to their owners while performing a service.
Anonymity is a way to conceal the identity of partners in a game. For instance,
to guarantee the secrecy of the vote in e-voting systems, or to hide the
meta-content of a message, or the sender and recipient of a message. The
goal may be different depending on whether anonymity should concern
the sender, the receiver, or both. Web applications have focused on receiver anonymity, while email applications are more concerned with sender
anonymity.

- 2.13 -

Chapter 3
Random Sequences and Statistics
The very name calculus of probabilities is a
paradox. Probability opposed to certainty is
what we do not know, and how can we calculate what we do not know?
H. Poincare, Science and Hypothesis

3.1

Introduction

In this chapter we will consider sequences that may look like random sequences
or, more technically, discrete stochastic sequences. Random sequences, besides
playing a key role in cryptography, have many applications in other fields: in
spread-spectrum techniques, in communication theory, in testing digital devices,
in satellite navigation and localization systems, in computer simulation, to limit
the list to some important applications.
Consider for example the following binary sequences
1)
2)
3)
4)

. . . , 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, . . .
. . . , 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, . . .
. . . , 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, . . .
. . . , 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, . . .

we may ask whether they have been generated by some random process, or have
a deterministic origin. In fact:
- sequence 1) was generated by a deterministic mechanism
- sequence 2) was generated by flipping a coin
- sequence 3) was generated considering the binary representations of the
first 6 digits in the decimal part of
- sequence 4) was generated by a primitive linear feedback shift register of
length 4.
- 3.1 -

In general, given a relatively short sequence, it is not easy to establish whether


it has been generated by a deterministic or a probabilistic mechanism. More formally, let A = {a0 , a1 , . . . ai , . . . aD } be a finite alphabet, and let S be a finite
sequence of N symbols taken from A: the problem of establishing whether S has
been produced by a truly random mechanism is very hard, if not impossible in
principle. This statement does not rule out that some sequences may be precisely
characterized.
It is highly questionable whether a criterion exists whereby using S alone, we can
decide without any ambiguity the origin of the sequence. In fact, it may be only
probabilistic, as the following situation shows.
Assume that an unknown mechanism produces binary sequences of
length M , a large number. With the further assumption that the probability p{0}, of producing 0 is neither 0 nor to 1, then the probability
that the all-zero sequence is generated is different from 0 and 1.
If the mechanism does produce the all-zero sequence, we have three
possibilities: 1) the mechanism may be deterministic; 2) the mechanism is probabilistic with p{0} 1; and 3) a rare event has taken place.
However, a most probable motivated guess is not possible.
Several objections may be made to the procedure of considering the generation
of the sequence as a single statistical event, and thus considering a set of generated sequences with the aim of considering the statistics on the set of generable
sequences:
1) the procedure is not statistical on the sequence;
2) the procedure simply displaces the problem to a sequence of sequences, the
original issue not being solved;
3) in some practical contexts, it could be very onerous to generate more than a
single sequence;
4) in many cases, in particular in cryptography, it make no sense to check more
than one sequence: a sent encrypted message is a unique event, the two
alternatives the message has been recognized by the adversary, or the message
has not been recognized by the adversary are unique and unrepeatable events.
In any case, before making any guess on a given finite sequence, we need a definition of random sequence, in order to know what is under discussion. The most
pragmatic and practically usable definition of random sequence is probably that
given by David Lehmer.
Definition 3.1 (D.H.Lehmer (1951)). A random sequence is a vague notion embodying the idea of a sequence in which each term is unpredictable to the uninitiated and
whose digits pass a certain number of tests, traditional with statisticians and depending
somewhat on the uses to which the sequence is to be put, [47].
- 3.2 -

Unfortunately, Lehmers definition of a random sequence does not exclude paradoxical situations, because the tests are always done on a finite number of instances and suffer from the limits of this kind of statistics. However, a choice
must be made, in spite of possible errors. Lehmers view is unavoidable if we
want to obtain conclusions of practical value.
A key role is always played by the sample space, which depends on the uses the
sequence is to be put to. The sample space specifies the set with respect to which
we want the statistics.

3.1.1

Sample Spaces

A sample space is the set of all possible outcomes of a random experiment. A


random variable is a function defined on a sample space. A sample space may be
finite or infinite; infinite sample spaces may be discrete or continuous.
We will now look at some important examples of sample spaces.
Drawing a card. The experiment is drawing a card from a standard deck of 52
cards. The cards are of two colors - black (spades and clubs) and red (diamonds
and hearts), four suits (spades (S), clubs (C), diamonds (D), hearts (H)), 13 values
(2, 3, 4, 5, 6, 7, 8, 9, 10, Jack (J), Queen (Q), King (K), Ace (A)). There are 52 possible
outcomes with the sample space
{2S, 2C, 2D, 2H, 3S, 3C, 3D, 3H, . . . , AS, AC, AD, AH} .
Of course, if we are only interested in the color of a drawn card, or its suit, or
perhaps its value, then it would be natural to consider other sample spaces:
{b, r}
{S, C, D, H}
{2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A} .
Assuming that deck and drawing are fair, so that the probability of drawing a
given card is 1/52, we may easily compute the probability distributions over the
various sample spaces.
Choosing a birthday and the birthday paradox. The experiment is to select a
single date during a given year. This can be done, for example, by picking a
random person and inquiring about his or her birthday. Disregarding leap years,
for simplicitys sake, there are 365 possible birthdays, which may be enumerated
as
{1, 2, 3, 4, . . . , 365} ,
then the probability that a given person was born on a given day of the year is
1
. This probability leads to the so-called birthday paradox that arises from the
365
question
- 3.3 -

What is the probability of two given people having been born on the
same day?
1
This probability, 365
, is obviously different from the probability
answer to the question

1
,
3652

which is the

What is the probability of two given people having been born on the
same given day?
The specification paradox comes from the infrequent event of two people, who
are not familiar with statistics, who meet and discover that they were born on
the same day: they believe that the event is very rare, and motive for wonder.
1
, a figure that makes the event not so
Actually, the probability of the event is 365
rare in everyday life.
The birthday problem is also generalized as follow
What is the probability that, among n ( 365) given people, at least
two of them having been born on the same day?
The probability is computed as the complement to 1 of the probability that all
people have been born in different days. This probability is computed considering the sample space of n-tuples with entries from the set N = {1, 2, . . . , 365}
which represents a numbering of the days of the year. The size of the space is
365n , whereas the number of favorable events consists of the n-tuples with all different entries. This number is obtained considering that the first position may be
any number in N , the second position may be any number in N different from
that in the first position and so on; thus the total number of n-tuples is
365 364 363 . . . (365 n + 1) .
The resulting probability is
365 364 363 . . . (365 n + 1)
.
365n
The probabilities for various values of n are reported in the following Table
p{collision} = 1

n p{non repetition}
2
0.997
5
0.973
10
0.871
20
0.589
22
0.524
23
0.493
24
0.462
40
0.109
60
0.006

p{repetition}
0.003
0.027
0.129
0.411
0.476
0.507
0.538
0.891
0.994

Sometime, it is seen as surprising that the collision probability exceed 0.5 with
only 23 people. However, the probability that at least two people among 23 are
born in a given day is 0.00186, a value that seems to be of non particular interest.
- 3.4 -

Coin tossing. The experiment of tossing a coin, which lands on either one or
the other of its two sides, called head and tail, has two possible outcomes. In the
case of a single toss, the sample space has two elements that will be denoted as
{H, T }. Let p be the probability of getting H and 1 p the probability of getting
a tail; if p = 21 we say that the coin is fair. Consider the case of two experiments.
One may toss two indistinguishable coins simultaneously, or one coin twice. The
difference is that in the second case we can easily differentiate between the two
throws. If two indistinguishable coins are tossed simultaneously, there are just
three possible outcomes, {H, H}, {H, T}, and {T, T}. If one coin is tossed twice,
there are four distinct outcomes: HH, HT, TH, TT. Thus, depending on the nature
of the experiment, there are 3 or 4 outcomes, with the sample spaces
{H, H}, {H, T }, {T, T }
HH, HT, T H, T T

Indistinguishable coins
.
Distinguishable coins

Repeated throwing yields an example of an infinite discrete sample space, that is


the first tail experiment: a coin is repeatedly tossed until the first tail shows up.
Possible outcomes are sequences of H that, if finite, end with a single T, and an
infinite sequence of Hs:
{T, HT, HHT, HHHT, . . . , {HHH . . .}}

This is a space that contains an event (not impossible) whose probability is 0. A


random variable is naturally defined as the length of an outcome. It draws
values from the set of integer numbers including the symbol of infinity:
{1, 2, 3, 4, . . . , n, . . . } .
The sample space may be equipped with a probability distribution
p{HH . . . HT } = p#(H) (1 p) induced by the probability of H and T .
Rolling dice. The experiment is rolling a cubic dice whose faces show numbers
1, 2, 3, 4, 5, 6 one way or another. These may be the real digits or arrangements of
an appropriate number of dots, e.g. like these for the number 5
r r
r
r r

There are six possible outcomes and the sample space consists of 6 elements:
{1, 2, 3, 4, 5, 6}. To each face is associated a probability p{i}; the dice is said to
be fair if p{i} = 61 .
A second experiment is rolling two dice. If the dice are distinct or if they are
rolled successively, there are 36 possible outcomes, i.e. the sample space is:
{11, 12, . . . , 16, 21, 22, . . . , 66} .
- 3.5 -

If they are indistinguishable, then some pairs of outcomes, like 12 and 21, become
one. There are 65
= 15 such pairs giving the total number of possible outcomes
2
as 36 15 = 21. In this case, the sample space is
{11, 12, . . . , 16, 22, 23, . . . , 66} .
When we throw two dice we are often interested not in the individual numbers
that show up, but in their sum. The sum of the two top numbers is an example
of a random variable, say (ab) = a + b (where a, b range from 1 through 6), that
takes values from the set {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, which is the sample space
of a single throw. If the dice are fair, the probability distribution on the sample
space is
p{2} = p{12} =

2
3
1
, p{3} = p{11} =
, p{4} = p{10} =
,
36
36
36

p{5} = p{9} =

3.2

4
5
6
, p{6} = p{8} =
, p{7} =
.
36
36
36

Statistical Tests for Binary Sequences

In this section we describe a set of statistical tests suitable for binary sequences.
The tests may be easily adapted to any random sequence over any alphabet, however the explanations for the binary case are more expedite, and more common
especially in cryptography and cryptanalysis of enciphered messages.
2 -test. This test is one of the most important in statistics for checking the fitting
of empirical distributions versus theoretical distributions. The problem is to test
whether a given set S = {x1 , x2 , . . . , xN } of samples come from a given probability distribution P = (p1 , p2 , . . . , pk ) over k categories {u1 , u2 , . . . , uk }, specifying
also some confidence intervals with relative probability.
Let Yi be the number of samples in S that belong to the category ui . If the probability distribution P is true, the expected value E[Yi ] is N pi so that the difference
Yi N pi is expected
to be small.
P
Thus the sum i (Yi N pi )2 should be small, and could be taken as a statistics,
however this sum is not very independent from the probability distribution P,
because the differences Yi N pi with pi small count like the differences with pi
large. It was found that a statistics that suffers less of this effect may be obtained
by weighting (Yi N pi )2 with N1pi . The resulting statistics U 2 may be evaluated
as
k
k
X
(Yi N pi )2
1 X (Yi )2
2
U =
=
N .
N pi
N i=1 pi
i=1
It can be shown that this statistics tends to the 2 statistics (of degree of freedom
= k 1 when N becomes large). Thus we may assume that the distribution of
- 3.6 -

U 2 is known and may be used to compute the confidence intervals.


The 2 probability density function is
f2 (y) =

y /21
ey/2 , y > 0 ,
/2
2 (/2)

where is the degree of freedom, i.e. the number ofRsquares of normal random

variables that enter in the sum 2 , [47], and (t) = 0 xt1 ex dx is the gamma
function [84].
p = 1% p = 5% p = 50% p = 95% p = 99%
= 1 0.00016 0.0039
0.4549
3.841
6.635
= 2 0.0201 0.1026
1.386
5.991
9.21
= 5 0.5543 1.1455
4.351
11.07
15.09
= 10 2.558
3.94
9.342
18.31
23.21
= 30 14.95
18.49
29.34
43.77
50.89
xp
-2.33
-1.64
0.00
1.64
2.33
Table I - Confidence intervals of the 2 distribution
The asymptotic expression for large of the upper extreme of the confidence
interval is

2
1
2
+ 2xp + x2p + O( )
3
3

where xp is given in the last row of Table I.


Example. Two dices have been independently thrown N = 100 times obtaining
the following numbers of occurrences of the 6 different faces for the two dices
independently
dice 1
dice 2

1 2
10 20
16 14

3
4 5
18 22 16
18 12 18

6
14
16

We want to estimate at what extent each single dice is fair. The degree of freedom
is 6 1 = 5 in both cases. In the case of dice 1, the value of 2 is 5.6, therefore,
from Table I, we see that the value of 2 is neither too big nor too small, the value
5.6 is assumed some 60% of the times. Then dice 1 seems to be fair. In the case of
dice 2, the value of 2 is 2.24, a value that is assumed only some 20% of the times,
the experimental results are too close to the theoretical distribution, then dice 2 is
very probably not fair.
Zero-One counting. The test is based on counting the number of numbers 0s
and 1s. Let a block of N binary symbols be given
b1 , b2 , . . . , bN

bi {0, 1}

which is generated by a truly random mechanism, that is


- 3.7 -

1. The symbols are statistically independent


2. The symbols are equi-probable, that is p{bi = 0} = p{bi = 1} = 12 .
With these hypotheses, the probability that a block of N bits contains n0 = k 0s
and n1 = N k 1s, given by the binomial distribution, is


1
N
k
2N
The average number of 0s or of 1s in a block of N symbols is
N
,
2
while the standard deviation is the same for both statistics

p
N
E[(n0 E[n0 ])2 ] =
.
2
Let 99% be a confidence (or probability) level; assume N = n0 + n1 , with N
100 to avoid inconsistencies, then with a probability of 0.99, both n0 and n1 are
included in the interval I99
"

#
N
N N
N
3
,
+3
.
2
2
2
2
E[n0 ] = E[n1 ] =

Test: Given a block of N binary symbols, count the total numbers n0 and n1 of
0s and 1s, respectively. The test of randomicity is passed, with 0.99%
confidence, if both n0 and n1 are included in I99 .
Group counting. This test is based on counting the numbers of the same patterns of bits, taken k by k. Since
b0 , b1 , . . . , bk1 of k bits can be interpreted
P a pattern
j
as an integer number mi = j bj 2 lying between 0 and 2k 1, the test is equivalently based on counting the number Ni of mi s for every i. Assuming that N is
sufficiently large to avoid inconsistencies, if the given sequence is a truly random
sequence, the numbers mi s are uniformly distributed, the expected number of
each of them is 2Nk . The standard deviation is the same for every statistics Ni
p
p
N (2k 1)
2
E[(Ni E[Ni ]) ] =
.
2k
The confidence interval is defined in exactly the same way as in the previous
case. Let 99% be a confidence (or probability) level, then with a probability of
0.99, every Ni is included in the interval I99
"
#
p
p
N (2k 1) N
N (2k 1)
N
3
, k +3
.
2k
2k
2
2k
Test: Given a block of N binary symbols, count the number Ni of patterns corresponding to the same number, mi , for every i. The test of randomicity is
passed, with 0.99% confidence, if every Ni is included in I99 .
- 3.8 -

Run counting. A run of 0s of length k is a pattern of k consecutive 0s included


between two 1s. A run of 1s is defined conversely.
The two statistics are the numbers of runs of equal symbols, namely, the total
number of runs of 1s, and the total number of runs of 0s.
Let X1 , X2 , . . . , XN be blocks of N 0s and 1s, considered as integer numbers. It
is convenient to introduce two statistics:
S: the number of 1s in the block; it is easily obtained as a sum
S=

N
X

Xi .

i=1

R1 : the autocorrelation function obtained as


R1 =

N
X

Xi Xi1 .

i=2

It is easily seen that the difference D = S R1 gives the number of runs of 1s:
in fact a run of length k of 1s is a pattern of the form 01 . . . 10, where we have k
consecutive 1s. Its R1 correlation is the pattern obtained as
0 1 1 ... ... 1 0 ...
... 0 1 1 ... 1 1 0
... 0 1 1 ... 1 0 0
in the third row the run of 1s has length k 1. It follows that the difference between the number of 1s in the original pattern and the number of 1s in the pattern
obtained by shifting and multiplication is exactly 1 for every run, thus D counts
exactly the number of runs of 1s. In a truly random sequence, the symbols Xi are
statistically independent and equiprobable. Thus the joint-probability distribution of the pair S and R1 is [45, p.11]


  N
1
N S+1
S1
.
p{S, R1 } =
S R1
R1
2
Since we are interested only in the statistics D, we must sum over the values of S
and R1 whose difference is D, obtaining

  N
1
N +1
p{D} =
.
2D
2
The average value of D can be computed directly, and turns out to be N8 , while its
variance is N16+1 .
Assuming a confidence level of 99%, the number of runs of 1s is included in the
interval I99



N
N +1 N
N +1
3
,
+3
.
8
4
8
4
- 3.9 -

Test: Given a block of N binary symbols, count the number S of runs of 1s and
the number R1 of runs of 1s, and obtain D = S R1 . The test of randomicity
is passed, with 0.99% confidence, if D is included in I99 .
The same test can be done with respect to the runs of 0s.
Up and down runs. Let us consider a sequence of 18 numbers
1

3 10 4 2 7 12 16 15 9 7 6 4 5 6 3 2 12
+ + + + + + + +

where, below each number the symbol + or denotes whether the number is
greater or smaller than the previous ones. We assume that adjacent numbers are
always different, or that the probability of the event that two adjacent numbers
are equal is zero. A sequence of N numbers has N 1 changes.
Assuming that the numbers are identically distributed, the number r of runs up
plus runs down (in the example above r = 7) is asymptotically distributed according to the normal distribution, with average (2N 1)/3 and variance (16N
29)/90.
We obtain a test on the hypothesis that the numbers in the sequence are uniformly
distributed, assuming the statistics r (i.e. the number of runs up and runs down)
is normally distributed with the given average and variance.
Test: Given a block consisting of N digits, the test of randomicity is passed, with
99.7% confidence, if the statistics r is included in the interval
r
r
2N 1
16N 29 2N 1
16N 29
3
,
+3
] .
[
3
90
3
90
Monte Carlo tests. The set of numerical techniques employing random numbers to evaluate integrals or functions, otherwise difficult to compute, bears the
name of Montecarlo methods.
The randomicity tests are defined considering the numerical evaluation by Montecarlo techniques of integrals whose value is exactly known. The discrepancy
between exact and Montecarlo evaluation of the integral is a measure of the goodness of random sequences.
The integral
Z 1
If =
f (x)dx ,
0

may be intended as the expectation E[f ] of a function f () of a random variable


xi uniformly distributed in the [0, 1] interval. The standard deviation is
Z 1
2
2
D = E[f E[f ] ] =
f 2 (x)dx E[f ]2 .
0

Given a putative random sequence of N numbers Xi [0, 1], we define the sum
N
1 X
f (Xi ) ,
fN =
N i=1

- 3.10 -

which represents an estimation of the value of the integral If , since taking the
expectation of fN , we have
N
1 X
E[fN ] =
E[f (Xi )] = E[f ] .
N i=1

Applying the Tschebyscheff inequality, we have


s
D
.
|E[f ] fN |
N
with probability 1 . In the following Table, we report some functions that can
be used in testing the randomicity of a sequence

E[f ]

1
2

1
12

sin 2ux

sin2 u
u

1 sin 2u
+
2
2u

Test: Given a block consisting of N binary symbols, and assuming a confidence


level of 99% (which means that, with probability 1 = 0.99),
q then the
difference between exact and expected value will be less than 10
r
|E[f ] fN | < 10

D
,
N

i.e.

D
.
N

It is remarked that multiple integrals may advantageously be used to strengthen


the test. For example, using double integrals to evaluate the area of a region
Z
dxdy ,
D

where D is defined by a function f (x, y) 0 in the square 0 x 1, 0 y 1.


Two sequences x1 , x2 , . . . xn , and y1 , y2 , . . . yn are obtained by splitting the given
sequence under test of length 2N , and the following summation is evaluated
SN =

N
X

u(xi , yi ) ,

i=1

where u(x, y) = 1 if f (x, y) 0, and u(x, y) = 0 otherwise.


- 3.11 -

The Auto-correlation coefficient. The periodic auto-correlation coefficient ( )


of a binary sequence of length N is defined as
N 1
1 X
( ) =
(1)ai (1)ai+ ,
N i=0

(3.1)

where the index i + should be evaluated modulo N .


Since E[( )] = 0 and E[( )2 ] = N1 for 6= 0, to test whether a given block of
N binary symbols consists of equally distributed and statistically independent
symbols the expression (3.1) is computed. Considering the inequality
1
k
p{|( )| } 1 2 ,
k
N
obtained using the Tschebyscheff inequality, the following test is run.
Test: Given a block consisting of N binary symbols, the test of randomicity is
passed, with 0.99% confidence, if (1) is included in the interval
10 10
[ , ] .
N
N
The Kolmogorov-Smirnov test. This test concerns continuous random variables. That is, given a sequence of real random numbers, the test ascertains
whether the sequence is associated to a random variable with a given probability
density, [47]. However, the test may be adapted to test the uniform distribution
of discrete random variables, in particular binary variables.
A major advantage of the Kolmogorov-Smirnov test is that it is independent of
the probability density f (x) to which it is applied, because it is referred to a random variable defined through the cumulative probability density
Z x
F (x) =
f (t) dt .

Given N samples X1 , X2 , . . . , XN of a random variable putatively with probability density f (x), and cumulative probability density F (x), let the samples be
re-ordered in an increasing way, i.e. Xi+1 Xi , then define the two statistics

N maxj [ Nj F (Xj )]
KN+ =
j = 1, 2, . . . , N

j1

KN =
N maxj [F (Xj ) N ] .
Note that the random variable = F () is uniformly distributed in the [0, 1]
interval, therefore the probability distribution of KN+ and KN can be deduced
referring to a random variable with uniform density in the [0, 1] interval.
The test consists of looking at the values KN+ and KN to determine whether they
are sufficiently high or low. These values should be compared with those given
- 3.12 -

in the following Table, borrowed from [47, page 48]. For example, the probability

is 50% that K10


is 0.5426 or less. The entries in the table were computed using an
expression that asymptotically is
1
1
yp + O( ) where
N
6 N

yp2 =

1
1
ln
.
2 1p

Since in any reasonable test N is large, with probability p, KN+ or KN asymptotically are yp 61N or less.
p = 1% p = 5% p = 50% p = 95% p = 99%
n = 1 0.0100 0.0500
0.5000
0.9500
0.9900
n = 10 0.0291 0.1147
0.5426
1.1658
1.4440
n = 20 0.0381 0.1298
0.5547
1.1839
1.4698
n = 30 0.0435 0.1351
0.5605
1.1916
1.4801
Percentage points of the distributions of KN+ and KN
Although the test has been defined for continuous random variables, it can be
applied to test the uniform distribution of binary sequences. The binary stream
x1 , x2 , . . . , xN may be partitioned in blocks xk+1 , xk+2 , . . . , xk+k of k bits, and
each block may be considered as the positional representation of an integer base
2
k
X
X =
xk+i 2i1 .
i=1

The aim is to test the uniform distribution of these integers in the [0, 2k 1]
interval; equivalently, to look for the uniform distribution in the [0, 1] interval of
. Note that, in the case of uniform probability densities
the rational numbers 2Xkmu
1
in the [0, 1] interval, the cumulative probability density is F (x) = x.
Test: Given a sequence consisting of N = kL binary symbols; the sequence is
partitioned into L blocks of k symbols which are interpreted as integers,
then they are divided by 2k 1 and sorted into ascending magnitude. The
randomicity test is passed, with 0.97% confidence, if both KN+ and KN are
included into the interval
1
1
[1.22 , 1.52 ] ,
6 L
6 L
with L > 200.

3.2.1

Linear Complexity Profile.

The linear complexity profile, strictly speaking, is not a statistical test. However,
it can be seen as a Montecarlo method, because we compare the actual linear
complexity profile with the average (expected) linear complexity profile of a truly
random sequence.
Linear complexity is defined by referring to the linear sequences generated by the
Linear Feedback Shift Register (LFSR), [34, 65].
- 3.13 -

slope







1
2

















Figure 3.1: LCP for a clock controlled Fibonacci LFSR of length 22


Definition 3.2. The linear complexity `(M ) of a sequence X of length M is the minimum
length of a LFSR that generates X .
A LFSR of length m has a period of length not greater than 2m 1: that is the linear
complexity of the sequence is small with respect to its length. In general, given a sequence X of length M , its linear complexity is computed by Berlekamp-Masseys
algorithm, which yields the length of the shortest LFSR generating X [4, 55, 54].
If X is an m-sequence (i.e. a binary sequence generated by a LFSR of length m),
then `(M ) m, and `(M ) = m for every M 2m; then an interesting question is
What is the linear complexity of a genuinely random sequence?

`(n)

n
For each sub-sequence of length n, the approach is to compute, starting from
the beginning, its linear complexity, a process that yields the linear complexity
profile.
Since, if X is a genuine random sequence, then `(M ) = d M2 e on average, the linear
complexity profile is a straight line of slope 12 .
Let Xn be a subsequence of length n of an infinite truly random sequence X ,
and let ` = `(Xn ) denote the length of the shortest recurrence generating Xn . Let
g` (x) = x` + a1 x`1 + . . . + a`1 x + a` be the generator polynomial of Xn .
The linear complexity profile of Xn is defined as the function `(n; k) = `(Xk )
for every k from 0 to n. In order to compute the expectation of `(Xk ), for every
n, it is necessary to define a probability measure over {Xn } the set of all binary
sequences of length n. We say that a sequence Xn is randomly generated if it
- 3.14 -

is picked at random from {Xn } with probability p{Xn } = 21n , since |{Xn }| = 2n .
This definition is tantamount to considering a sequence Xn as produced bit by
bit, with bit probability 21 . Let c(n; k) denote the number of sequences in {Xn }
that are generated by a recurrence of order k; therefore the expectation of `(Xk )
can be written as
X 1
1 X
E[`(Xn )] =
`(X
)
=
kc(n, k)
n
2n
2n k
The last summation is easily computed, taking into account the following observations:
Every generator polynomial of degree k is allowed, [55], including xk which
is assumed to generate the all zero sequence.
The sequence 0 01 composed of n 1 zeros followed by a 1 is necessarily
generated by a recurrence of degree n, [55], therefore c(n; n) = 1 since any
other sequence of length n is generated by some LFSR of length less than n
[55].
c(n; 0) = 1 is a consequence of the previous observation.
c(1; 1) = 1: since we have only the sequence 1, given that the sequence
0 is generated by a recurrence of order 0, and c(n; 1) = 2, n > 1 since
the recurrence with generator polynomial x + 1 generates two sequences,
namely the all-zero and the all-one sequences, but by definition the all-zero
sequence is generated by a recurrence of order 0, and the sequence 01 cannot
be generated by a recurrence of order 1.
if n > 2k and k 0, then
c(n; k) = c(n 1; k) = c(2k; k),
because any periodic sequence generated by a LFSR of length k is specified
by its first 2k digits, and sequences longer than 2k are the periodic extensions of some sequence generated by a LFSR of length k.
c(2; 2) = 1 and accounts for the sequence 01.
c(3; 2) = 4 is obtained as the difference
c(3; 2) = 23 [c(3; 0) + c(3; 1) + c(3; 3)] = 4
Moreover, c(n; 2) = c(4; 2) = 8 for every n 4, where c(4; 2) is obtained by
direct counting, or repeating the same argument used above for evaluating
c(3; 2).
We have the recurrence c(2k; k) = 4c(2(k 1), k 1 because, adding one cell
to a LFSR, we have available one more initial condition and one more tap,
therefore
c(2k; k) = 22k1
k1
- 3.15 -

If 2k > n, then c(n; k) = 4nk for every b n2 c + 2 k n 1, n > 2


An initial set of values of c(n; k) are reported in the Table.
n\k
1
2
3
4
5

0
1
1
1
1
1

1
1
2
2
2
2

2
1
4
8
8

3
1
4
16

4 5
- - - 1 4 1

The average (or expectation) of `(Xk ) is obtained as

n
4
3n + 2 n

2 18
9
1 X
E[`(Xk )] = n
kc(n, k) =
n
2 k=0

5
3n + 2 n

2
2 18
9

even n
odd n

while the mean square deviation n2 , computed through the average of the squares
E[`(Xk )2 ], is

86 42n + 82 n (3n + 2)2 2n

2
2
even n

n
81
81
81
1 X
2
n = n
kc(n, k) =

2 k=0
2

86 39n + 80 2n (3n + 2) 22n


odd n
81
81
81
It is immediately seen that the mean square deviation for separately even and odd
m increases monotonically, and in both cases asymptotically approaches 2 = 86
.
81
Conversely, the average profile asymptotically weakly depends on n parity

n
4

+
even n

X
2 18
1
kc(n, k) =
E[`(Xk )] = n

2 k=0

n
5

+
odd n .
2 18
Note that the square deviation of a linear complexity profile of a truly random
86
sequence of length n from the straight line is approximately n 81
.
Test: Given a block of N binary symbols, construct the linear complexity profile,
then compute the mean square distance D from the straight line of equation
86
y = x/2. If this value is close to 81
N , i.e. the difference D 86
N is O( 1N ),
81
the test is passed.

- 3.16 -

Chapter 4
Secret-Key Cryptography - Act I
Block ciphers
There is nothing makes a man suspect much,
more than to know little.
Francis Bacon

4.1

Introduction

Secret-key cryptosystems base the protection of the content of a message on a key,


which should be known only to sender and legitimate recipient.
Secret-key encryption methods were developed with the practical targets of making encryption and decryption easy and fast. Even in the computer era, these aims
of encryption operations, of being simple and fast are still unavoidable, and it is
not completely surprising that they can still be achieved by the same, obviously
up-dated, encryption principles.

4.2

The role of the secret-key

The secret key controls the transformations whose aim is to conceal the information. Operatively the key used to encrypt is also used to decrypt; for this reason
these methods are also called symmetric key algorithms. The symmetry is perfect
in the case of binary stream ciphers, when the encryption operation is the XORing of plain and key streams. The design of these encryption mechanisms tries to
satisfy what is known as Kerckhoffs principle, according to which the security
(secrecy) of a system must lie entirely in the secrecy of the key, and not in the secrecy of the encryption algorithms. Two procedures have emerged, and are seen
as different implementations of the stream enciphering principle, namely block
ciphers, and stream ciphers proper.
Block cipher method: the stream of text symbols is partitioned into blocks of
fixed length, then a transformation is applied to each block of plain text to pro- 4.1 -

K (secret)
mi

E = f (M, K)

E
ei

Figure 4.1: Block cipher principle

Stream generator

Mi

Ki



Ei = f(Mi , Ki )



Figure 4.2: Stream enciphering principle


duce a block of cipher text of the same length. The transformation depend on a
cipher secret key which may be the same or may change from block to block. A
classic example is DES (Data Encryption Standard), which was proposed by the
U.S. National Bureau of Standards, in the mid-1970, as an enciphering algorithm
to be used for encrypting classified material that had no strategic relevance.
Stream cipher method: the stream of symbols, plain text, is transformed on a
symbol-by-symbol basis, as in the Caesar cipher. The key-stream generator produces a stream of the same length as the plain text, driven by a secret key. The encrypted symbol is obtained by combining a plain-text symbol with a key-stream
symbol. The combining function should be, and usually is, a very simple function. In the case of binary streams, the combining function is a simple sum modulo 2 (XORing), which implies the perfect symmetry of encryption and decryption operations. Probably more than 50% of stream enciphering is done using
stream ciphers, because of several advantages, even if stream enciphering has
- 4.2 -

not been in general standardized,


the encryption operations are much faster than the corresponding operations in block ciphers;
the algorithm may be proprietary and personalized without patent restrictions, and thus may be kept secret for private use;
although the design respects the Kerckhoffs principle, not all details are
disclosed, so that attacks are more expensive.

4.3

Historical Encryption Systems

In this section we briefly describe some historically important secret-key encryption systems, which still drive todays evolution of many cryptographic techniques of practical importance.
During its secular evolution, many variants of the Caesar cipher have been proposed. Although all methods have the same basic principle, they have been differently denominated, to indicate more explicitly the transformation they perform.

4.3.1

Substitution encryption

Substitution encryption is an enciphering rule such that the letters are substituted
with other letters of the same alphabet. The substitution is called monoalphabetic
when the same letter is always substituted with the same letter of the same alphabet, whatever the position of the letter in the message.
If the same letter in different positions is substituted with different letters, the
encryption scheme is called polyalphabetic.
An example of monoalphabetic substitution is Caesers cipher. A second example is offered by the simplest use of Albertis encryption disc.
Monoalphabetic substitution is very weak against statistical attacks based on the
frequencies of the letters in texts of a given language. Poly-alphabetic substitution is more resistant to statistical attack; nevertheless, if the text is sufficiently
long it can be broken, as will be explained in Chapter 10.

4.3.2

Transposition encryption

This is an encryption rule that changes the relative position of the letters without
changing the letters themselves. The main feature of this mechanism is that it is
resistant to statistical attacks based on letter frequencies.
- 4.3 -

4.3.3

Albertis disk

Leon Battista Alberti, in De Componendis Cyfris, introduced a kind of encrypting


machine, Albertis disk, along with a strong encryption scheme performed using
this disk. Albertis original schemes referred to the Latin language.

Z
1

&

n
C

p r t u

A
B

g
k
c e

y s o
m

i h f d b

Figure 4.3: Albertis disk


Albertis disk consists of two concentric disks; the external one (red capital letters) is fixed and the internal one (green lower case letters) is movable (may rotate). The disk is partitioned into 24 sectors. Each sector intercepts one cell on a
border ring of each disk. The cells of the external disk contain 20 characters and
4 numbers, in the order
ABCDEF GI LM N OP QRST V X Z1234 ,
Note that the letters H, Y , K, and & are missing, also U is missing, however in
the Latin language U and V have the same sign, that is, U and V are graphically
indistinguishable. These 20 letters compose the alphabet of the plain text.
In the cells of the inner disk there were the 24 Latin characters in a random order
acegklnprtvz&xysomqihf db .
These 24 letters compose the alphabet of the encrypted message. The correspondence is established by the disk sectors.
The secret key is the inner disk, its initial position, and a sequence of rotations of
this disk after three or four letter of the plain message have been encrypted.
- 4.4 -

4.3.4

Vigenere cipher

Vigenere encryption had a relatively long history, being first devised by Italian
cryptographers of the Renaissance. However, Vigeneres description in his Traite
de ciffres, gave it his name, although he did not have priority in the invention of
the method.
The method, in its more elementary formulation, consists of a block of k Caesar
ciphers, each driven by a Caesar key (i.e. an alphabet letter). These k letters
constitute the secret key. Vigeneres original description is the following table,
where in each line, the letters of the alphabet are in the normal order, but begin
with a specific letter, together making up a Caesar key: in this example, ELENA
shifted by the corresponding initial letter, the Caesar key:
EF GHIJKLM N OP QRST U V W XY ZABCD
LM N OP QRST U V W XY ZABCDEF GHIJK
EF GHIJKLM N OP QRST U V W XY ZABCD
N OP QRST U V W XY ZABCDEF GHIJKLM
ABCDEF GHIJKLM N OP QRST U V W XY Z
For example the Caesars statement alea iacta est would have been encrypted as
EW IN IEN ERSX .
If the message is sufficiently long, and the length of the key is known, a statistical
attack based on the frequency of the letters in a language allows us to break the
Vigenere system systematically. The method will be described in Chapter 10.

4.3.5

Hill Cipher

In [38], Lester Hill proposed the first practical block cipher operating on more
than two symbols. The letters of a plain text are firstly encoded in numbers of
Zm ; a possible m is 31, which leaves number for special characters like commas,
full stops, and blanks. Then a block of n consecutive numbers is considered as
a column vector v and transformed using an invertible matrix H (the Hill matrix) over Zm , producing a vector e = Hv of encrypted numbers, which may be
eventually encoded back to letters. For example, the Latin message
veni,vidi,vici
is first encoded as
21 04 13 13 08 26 21 08 03 08 26 21 08 02 08 ,
where each number corresponds to the position of the letter in the natural alphabetical order, and 26 stands for a comma. Then the following Hill matrix of
determinant 1 in Z31 is used

1 7 5
H = 2 14 11
18 8 11
- 4.5 -

to produce, operating on blocks of three numbers, the encrypted message


21 24 16 13 21 08 30 01 00 16 22 02 00 08 08 ,
which, written using alphabetical characters, is an encrypted message of the form
vyqnvi?baqwcaii ,
where the special character ? corresponds to 30.
Decryption is equally easy: knowing the Hill matrix H, its inverse is computed.
The encrypted text is re-encoded into numbers, partitioned into vectors of dimension n, and the encoded plain text v is recoved by means of the transformation
v = H1 e. The strength of Hill cipher against cryptanalysis will be discussed in
Chapter 10.

4.3.6

Francis Bacon Cipher

Sir Francis Bacon, an English statesman, is also known for his formulation of a set
of principles of good encryption systems [53]. He also proposed an encryption
scheme that anticipates many modern encryption techniques. Bacons cipher is a
double substitution cipher, the 26 letters of the alphabet are first encoded, using
blocks of five binary symbols, i.e. 0 and 1, then the 26 letters are partitioned into
two sets A0 and A1 , thus each 0 is substituted with any letter of the set A0 , and
each 1 is substituted with any letter of the set A1 . Note that the encrypted message
is expanded by a factor 5. For example, the message BACON is first encoded into
binary symbols, where each letter is substituted by the positional binary number
of five bits corresponding to the numbers from 0 to 25, with 00000 corresponding
to A, 00001 corresponding to B, and so on. We partition the 26 letters of the alphabet in two sets, namely A0 = {a, b, . . . m}, and A0 = {n, o, . . . z}; this partition
is actually the secret key. The encryption is described by the following table
B
A
C
O
N
00001 00000 00010 01110 01101
agbjn bkhjc bamzf kpyxa dntlw
Decryption can be performed by substituting the letters with the numbers and
then re-interpreting the binary patterns as letters. The secrecy of the partition A0
and A1 is mandatory for the security of the system. However, without further
transformation, a statistical attack is possible, as will be explained in Chapter 10.

4.3.7

One-time pad

The encryption system known as one-time pad was mainly used by spies: people
who caught our fantasy, and are the protagonists of spy novels, in which names
like Mata Hari, Michele Strogoff, or James Bond became famous. Encryption
mechanisms impossible to break were a central part of the novel The key to Rebecca by Ken Follett (1980).
- 4.6 -

The idea underlying one-time-pad encryption was first described by Frank Miller
in 1882, and rediscovered after Gilbert Vernan introduced a binary encryption
system for telegraph messages using a punched tape, and Joseph Maubourgne
improved it by requiring that the tape should only be used once. The spy one
time pad was a book (pad) used only once. Each character of the plain text is encrypted by modular addition with the character taken from the pad and chosen
according to a specific rule in different positions of different pages. In practical
terms, this system achieves perfect Shannon secrecy.

4.3.8

Enigma

The Enigma encryption machine has only historical interest; nevertheless, it represents an example of combinations of many genial ideas that overcame the technological limits at the beginning of the 20th century, when electronic and mechanical technologies had not yet the present-day levels of flexibility and reliability.
Figure 4.4 reports a scheme of principle of the machine:
when a key corresponding to a message letter is pressed, for example a, an electrical circuit is closed and a letter on a screen lights up, showing the encrypted
letter for a. In the figure a current path is shown which induces the lighting of
letter C.
From Figure 4.4 we may deduce that the letter a will never be enciphered as the
letter A, and this occurs for 26 pairs of letters as a consequence of the machines
structure. This fact could be a source of weaknesses that could facilitate attacks;
for this reason, an initial permutation was introduced, which was part of the secret key and changed periodically. Therefore, to an attacker it was not possible to
know the pairs plain letter- encrypted letter that should be excluded.

4.4

Block ciphers

The enciphering method consisting of transforming blocks of plain text symbols


was widely used after NIST (formerly the National Bureau of Standards) introduced DES in the mid-1970s for encrypting classified material that had no strategic relevance, or reserved material such as bank money transfers or banking information. DES soon became a de-facto standard not least because it was perfectly
matched to the technology of the time.

4.4.1

Common structure of block ciphers

The most widely used block cipher schemes, namely governmental DES and AES,
or proprietary IDEA, have the same basic structure, shown in Figure 4.5, which
was introduced by the famous enciphering machine Enigma. The size of the input buffer is equal to the dimension of the block of symbols to be encrypted at
each stage.
The following box shows a permutation on input symbols; the inverse of this
- 4.7 -

A

a
HH
c s

B

HH
s
c

C


HH
s
c






 

 

 A

 A

 A








B

 B

 B







HH
s
c


















Figure 4.4: Basic Structure of Enigma


permutation is performed on the output block of symbols before storing in the
output buffer, ready for sending. Between these two permutations there are n
round blocks that perform the same transformation, although with different parameters, whose values depend on the secret key and a plan which is usually
known to everybody. The following Table reports size of the blocks, number of
rounds, and size of the secret key for the three block ciphers mentioned.
Enigma
DES
AES

4.4.2

Block
key
# rounds
1 char.
3 char.
32
32 + 32 bits
56 bits
16
8 (4 4) bits 128-192-256 bits 10 12 14

Modes

The original conception of DES was for encrypted blocks of symbols to be transmitted, and then transformed independently of one another. The designed data
transform was not dissimilar to a one-way function. Then, once a one-way function was practically available, it was possible to use such a function in different
modes. In particular, five modes have been conceived:
Electronic Code Book (ECB). This mode specifies the behavior of a block cipher;
block of symbols, usually bits, are transformed into blocks of symbols of
the same length. The transformation is a one-way function that cannot be
- 4.8 -

Input buffer
Input permutation P
Round block 1
..
.
Round block n
Output permutation P 1
Output buffer

Figure 4.5: Common Structure of Standard Block Ciphers


inverted without knowing the secret key. Each block is transmitted independently.
It should be remarked that transmission errors are amplified by a factor that
is half of the block length. For example in DES, where each block has length
64 bits, an error that affects the encrypted block appears as 32 errors, on the
avarege, in the decrypted text.
Cipher FeedBack (CFB). - In this operating mode a block cipher (i.e. nonlinear
transformation) is used to realize an autonomous finite state machine for
generating a random stream. The output block is fed back into the input
block at each step, and the output is a symbol (usually a bit or a block of
bits) that is combined with the plain text, on a symbol by symbol basis, as
shown in Figure 4.8.
The secret key is also used to specify the initial loading of the input register.
This loading may also be chosen randomly and transmitted unprotected on
the public channel in order to avoid the systems always restarting from the
same point and thus being exposed to store-and-replay attack. This mode
is suitable for noisy channels, since it avoids error propagation.
Cipher Block Chaining (CBC). In this mode, each encrypted block is fed back
and combined with the input block before applying the transformation. The
result is that each encrypted block depends in some way on all previous input blocks. Typically, with binary block the combination of plain text block
and fed back block is a bit by bit modulo 2 sum.
At the deciphering side, the concatenation implies an error propagation for
a length equal tu the source memory, that is two blocks: as a counterpart,
at the reception deciphered texts, possible fraudulent alterations occurred
- 4.9 -

K (secret)
mi

E = f (M, K)

E
ei

Figure 4.6: ECB (Electronic Code Book) mode: block enciphering

K (secret)

V = f (V0 , K)
M

V
ri

Figure 4.7: CFB (Cipher Feedback) mode: Random stream generator

K (secret)
mi L
6

E = f (M, K)

E
ei
?

Figure 4.8: OFB (Output FeedBack) mode: Principle of concatenation

- 4.10 -

K (secret)
mi L
6

E = f (M, K)

E
ei
?

Figure 4.9: CBC (Cipher Block Chaining) mode


during the transmission session are revealed as channel errors and thus integrity is guaranteed (with good probability).
Output Feedback (OFB). - In this mode, each encrypted block is produced by
combining a block of plain text with the output block of the cipher (usually
the combination is an XOR operation on a bit-by-bit basis of two blocks).
This encrypted block is then fed back and used as input for the block cipher. Each encrypted block depends in some way on all previous input
blocks. Clearly, the concatenation implies an error propagation within two
contiguous blocks after deciphering, and as CBC the integrity of the message is guaranteed.
Counter (CTR). This mode is a kind of combination of OFB and CFB. Each input
text block is indexed by an integer I which is used as input to the block
cipher, then the output is combined with the text input block to produce
the cipher text block. In practice, the evolution of the block cipher machine
is independent of the input sequence of text blocks, and the connection is
established by the indexes of the blocks. This make encryption and decryption much easier, and also faster. Unlike the OFB mode, it is guaranteed that
each block is encrypted by a different key block, thus a store-and-replay attack is unfeasible. Furthermore, if the combination plain-text/ key-text is
on a bit-by-bit basis, error propagation is avoided.

4.5

DES

DES was the result of a project of the NBS, now NIST, with the aim of defining a
standard suitable for application to cryptography in civil life, business, and commerce. The motivations for standard recommended by governmental institution
were multiple:
- To encourage the application of cryptographic protections to communication systems need for non-military applications.
- 4.11 -

K (secret)

E = f (M, K)

mi
-

ei
L
?

Figure 4.10: CTR (Counter) mode


- To promote studies and public debate about the algorithm in order to evidence possible weaknesses.
- To drive the choices of users who do not have sufficient background for a
full understanding of the algorithms used.
- To establish a framework for approval of encrypting machines from different manufacturers, certifying that the algorithms work correctly and honestly.
- To permit all firms to enter the market on the same footing as regards algorithm security, thus being competitive in terms of costs, reliability, maintenance, etc.
It is observed that some of these requirements are contradictory and clearly not
appropriate for defining security in certain situations, while they are mandatory
in other situations. Further in the mythology of security and security managers,
DES algorithms in software form are not submitted for any type of approval.
The adopted algorithm was proposed by IBM and is described in FIPS 46 document of 15 January 1977. It is based on the Feistel cipher, an encryption algorithm
based on an idea of Horst Feistel, a German-born researcher at IBM.
DES represents a skilled implementation of the traditional techniques consisting
of permutations and substitutions: it is specified by a function that transform a
block M of 64 bits, the text, and a block K of 56 bits, the key, in a block C of 64
bits, the cipher text. Formally, is a map from F64 F56 into F64
C = DES(M, K) .
It is remarked that if any block cipher performs an operation typical of encoding,
then it can be done through a look-up table, which could be used in cryptanalytic
attacks. However, the table is usually prohibitively large. In the DES case, the
table should have 256 columns, each with 264 rows. This dimension of the table
prevents any ingenuous exhaustive attack, although, in Chapter 10, it will be
seen to what extent such an attack is feasible.
- 4.12 -

In its typical application, DES is and acts as a block cipher, where each block of
64 bits of plain text is encrypted always using the same secret key of 56 bits.
In the 1980s, two conferences lead by NBS discussed the mathematical aspects of
DES and the possibility of attacks based on the new emerging technologies. The
relevant conclusions were:
- DES could be accepted for civil applications, in view of the original design
and targets;
- the limited length of the blocks, and in particular of the key, was sufficient at
that time to prevent exhaustive attacks. However, technological evolution,
increasing computational power would induces changes to DES within a
decade (prediction that was verified in about 2000, with the standardization
of Rijndael algorithm).
The DES algorithm transforms a sequence of binary symbols into a binary sequence of the same length. In this contest, the standard byte representation of
the information is immediately interpreted as blocks of 8 bits. DES performs the
following transformation:
A block of 64 bits is changed by an almost involutory transformation
(in the sense that the inverse transformation requires the inversion of
the round order) specified by the key of 56 bits, into a block of 64
bits. Since the map is invertible, every transformation identified by
the key of 56 bits induces a permutation (which is a product of disjoint
transpositions) on the set of 264 blocks of 64 bits.
The set of these permutations generates a subgroup K of the symmetric group Sn
of permutations over n objects, with n = 264 . Comments on the order of K, which
is not less than 256 , are deferred to Chapter 10.

4.5.1

DES transformations

A DES transformation on a vector of 64 bits is defined considering this vector as


the juxtaposition of two vectors of 32 bits each
X = (L | R) = (x1 , x2 , . . . , x32 | x33 , x34 , . . . , x64 )
Let k be the key; the image Y = DES(k, X) is obtained by performing a sequence
of eighteen transformations, where the first P and the last P 1 transformations
are fixed permutations of the components of X, and the remaining sixteen transformations are of the same sort and differentiated by local keys computed from k
according to a plan detailed below:
Y = DES(k, X) = P T (K16 ) T (K15 ) . . . T (K2 ) T (K1 ) P 1 [X] .
The inverse transformation for deciphering is described by a similar sequence
X = DES(k, Y) = P T (K1 ) T (K2 ) . . . T (K15 ) T (K16 ) P 1 [Y] ,
where the only change concerns the local keys, which are used in reverse order.
The combination T2 (Ki ) is called round, and a round is shown in Figure 4.11.
- 4.13 -

Li

Ri

"
"
"

b
b

"

b
b

"
"

b""
"bb
"

"
"
"

Q(S(E(Ri ) + Ki+1 ))
b
b

b 
b
b

"
"



Li+1

Ri+1

Figure 4.11: The DES round


Initial permutation P

1 2 3 4
58 50 42 34

17 18 19 20
62 54 46 38

33 34 35 36
57 49 41 33

49 50 51 52
61 53 45 37

5 6 7 8 9 10 11 12 13 14 15 16
26 18 10 2 60 52 44 36 28 20 12 4

21 22 23 24 25 26 27 28 29 30 31 32
30 22 14 6 64 56 48 40 32 24 16 8

37 38 39 40 41 42 43 44 45 46 47 48
25 17 9 1 59 51 43 35 27 19 11 3

53 54 55 56 57 58 59 60 61 62 63 64
29 21 13 5 63 55 47 39 31 23 15 7

T and transformations. Setting


(L0 , R0 ) = P (X) = P (L | R) = (x58 , x50 , . . . , x8 | x57 , x49 , . . . , x7 )
the transformation is defined as
[(Li , Ri )] = (Ri , Li )
while T (Kj ) is defined as


Lj = Lj1 + f (Kj , Rj1 )


Rj = Rj1

The function f (., .) is a combination of three more elementary functions


f (Kj , Rj1 ) = S(E(Rj1 ) + Kj )
where
- 4.14 -

- E(.) is a function that expands the 32 bits of Rj1 to 48 bits that are summed
(modulo 2) with the 48 bits of the local key Kj produced starting from k,
and following a procedure described below.
- S(.) is a function that compresses the word E(Rj1 +Kj ) of 48 bits to a word
of 32 bits. The function S(.) considt of 8 S-box working in parallel: it has
been the most criticized function of DES.
From the vector
U = (x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15 , x16 , x17 ,
x18 , x19 , x20 , x21 , x22 , x23 , x24 , x25 , x26 , x27 , x28 , x29 , x30 , x31 , x32 ) ,
the expansion function E generates the vector
E(Rj ) = (x32 , x1 , x2 , x3 , x4 , x5 , x4 , x5 , x6 , x7 , x8 , x9 , x8 , x9 , x10 , x11 , x12 ,
x13 , x12 , x14 , x15 , x16 , x17 , x17 , x18 , x19 , x20 , x21 , x20 , x21 , x22 , x23 ,
x24 , x25 , x24 , x25 , x26 , x27 , x28 , x29 , x28 , x29 , x30 , x31 , x32 , x1 )
The permutation Q is defined as follows

1 2 3 4 5 6 7 8
16 7 20 21 29 12 28 17

17 18 19 20 21 22 23 24
2 8 24 14 32 27 3 9

9 10 11 12 13 14 15 16
1 15 23 26 5 18 31 10

25 26 27 28 29 30 31 32
19 13 30 6 22 11 4 25

The 8 S-boxes perform a compression from 48 bits to 32 bits. The 48 bits are
partitioned in 8 blocks of 6 bits each. Thus the first four bits of each block are
used as a column address, and the last two bits are used as row addresses of a
4 16 matrix, Tables D.1 and D.2, of the corresponding S-box, whose elements
are integer numbers in the range [0 , 15], that is number that correspond to 4 bits.
The output consists of 8 blocks of 4 bits, for a total of 32 bits.

4.5.2

Local key generation

The secret key of 56 bits is used to generate the 16 local keys, each of 48 bits,
according to the following scheme
- The key k of 56 bits is expanded to 64 bits by adding parity bits (or possibly
dummy bit) in position 8, 16, . . . , 64.
- The 64 bits of the expanded key are permuted by P C1, and contemporarily,
the parity check bits are discarded, obtaining a block of 56 bits which is
partitioned into two blocks of 28 bits, C0 and D0 .
- The blocks Cj and Dj are transformed by cyclic shifts according to the sequence


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1
- 4.15 -

Table 4.1: S-box: S1 S8


key
0
1
2
3

0 1 2
14 4 13
0 15 7
4 1 14
15 12 8

3 4 5 6 7 8 9 10 11 12 13
1 2 15 11 8 3 10 6 12 5 9
4 14 2 13 1 10 6 12 11 9 5
8 13 6 2 11 15 12 9 7 3 10
2 4 9 1 7 5 11 3 14 10 0

14 15
0 7
3 8
5 0
6 13

0
1
2
3

15 1
3 13
0 14
13 8

8
4
7
10

14
7
11
1

6
15
10
3

11
2
4
15

3
8
13
4

4
14
1
2

9
12
5
11

7
0
8
6

2
1
12
7

13
10
6
12

12
6
9
0

0
9
3
5

5
11
2
14

10
5
15
9

0
1
2
3

10 0
13 7
13 6
1 10

9
0
4
13

14
9
9
0

6
3
8
6

3
4
15
9

15
6
3
8

5
10
0
7

1
2
11
4

13
8
1
15

12
5
1
14

7
14
12
3

11
12
5
11

4
11
10
5

2
15
14
2

8
1
7
12

0
1
2
3
key
0
1
2
3

7 13
13 8
10 6
3 15

14
11
9
0

3
5
0
6

0
6
12
10

6
15
11
1

9
0
7
13

10
3
13
8

1
4
15
9

2
7
1
4

8
2
3
5

5
12
14
11

11
1
5
12

12
10
2
7

4
14
8
2

15
9
4
14

2 12
14 11
4 2
11 8

4
2
1
12

1
12
11
7

7
4
10
1

10
7
13
14

11
13
7
2

6
1
8
13

8
5
15
6

5
0
9
15

3
15
12
2

15
10
5
9

13
3
6
10

0
9
3
4

14
8
0
5

9
6
14
3

0
1
2
3

12 1
10 15
9 14
4 3

10
4
15
2

15
2
5
12

9
7
2
9

2
12
8
5

6
9
12
15

8
5
3
10

0
6
7
11

13
1
0
14

3
13
4
1

4
14
10
7

14
0
1
6

7
11
13
0

5
3
11
8

11
8
6
13

0
1
2
3

4 11
13 0
1 4
6 11

2
11
11
13

14
7
13
8

15
4
12
1

0
9
3
4

8
1
7
10

13
10
14
7

3
14
10
9

12
3
15
5

9
5
6
0

7
12
8
15

5
2
0
14

10
15
5
2

6
8
9
3

1
6
2
12

0
1
2
3

13 2
1 15
7 11
2 1

8
13
4
14

4
8
1
7

6
10
9
4

15
3
12
10

11
7
14
8

1
4
2
13

10
12
0
15

9
5
6
12

3
6
10
9

14
11
13
0

5
0
15
3

0
14
3
5

12
9
5
6

7
2
8
11

- 4.16 -

From the two blocks considered jointly (Cj | Dj ) as a vector of 56 bits, a key
Kj of 48 bits is extracted by means of the table

14 17 11 24 1 5 3 28 15 6 21 10 23 19 12 4
26 8 16 7 27 20 13 2 41 52 31 37 47 55 30 40
51 45 33 48 44 49 39 56 34 53 46 42 50 36 29 32
The permutation P C1 is defined by the following table

57 49 41 33 25 17 9 1 58 50 42 34 26 18 10 2
59 51 43 35 27 19 11 3 60 52 44 36 63 55 47 39

31 23 15 7 62 54 46 38 30 22 14 6 61 53 45 37
29 21 13 5 28 20 12 4

4.6

AES

AES is the acronym for Advanced Encryption Standard, a block cipher algorithm
certified by NIST,the American National Institute of Standards and Technology.
The algorithm at the basis of AES is a specialization of the Rijndael algorithm
that was developed by the Belgian researchers Vincent Rijmen and Joan Daemen
to answer an international competition, launched by NIST in the mid-1990s, with
the aim of selecting a valid substitute for the glorious DES, now outdated due to
advances in electronic and computing, after twenty years service.
A major limitation of DES was the small and fixed key length of 56 bits, which
was vulnerable to exhaustive attacks that could be set up, with the ever increasing
power of computers. The Rijndael algorithm was conceived to perform a block
enciphering with different block lengths, to be chosen from among 128, 192, and
256 bits, and independently different key lengths to be chosen from between 160
and 224 bits.
Standard AES, like DES, performs block ciphering; further like DES, but with
greatest flexibility, it also enables cryptographic functions to be performed, such
as generation of random numbers, generation of streams either synchronous or
auto-synchronizing, HASH functions, and operations at the MAC level.
According to the authors, the design criteria followed in the project of Rijndael
were
Strength against any kind of known attacks;
Fast and compact codes (programs) over a large class of platforms;
Easy to design and to implement.
AES, like DES, performs a sequence of rounds, that is, a sequence of transformations (like Enigma) over a block of bits. Each round consists of four distinct
transformations, called layers, that perform a passage from the initial State to a
Next state
- 4.17 -

X = St(i)

Subbyte: AXij + a Xij


ShiftRows: Xij Xij+i
Mixcolumns:
P
P
i
i
4
i Xij x a(x)
i Xij x mod (x + 1)
X = AddRoundKey(X, w)

St(i + 1) = X

Figure 4.12: Round box AES


The linear mixing layer aimed to achieve ample diffusion of the encrypted bits
among successive rounds
The non-linear layer is a parallel application of S-boxes having optimal nonlinear properties.
The key addition layer is an XOR operation between the Round key and an intermediate State.
The binary information in most transformations is considered to be encoded by
bytes, which consist of 8 bits written as {b7 , b6 , b5 , b4 , b3 , b2 , b1 , b0 }. Each byte is
interpreted as an element of GF (28 ) via the polynomial representation
7

b7 x + b6 x + b5 x + b 4 x + b3 x + b2 x + b1 x + b0 =

7
X

bi x i

i=0

Another representation of bytes is considered using the hexadecimal notation for


4 bits, which are considered base 2 representations of the decimal integers from 0
to 15, although represented by the corresponding digit from 0 a 9, and by letters
from a to f for the numbers from 10 to 15. For example, a block of four bits, 1100 is
interpreted as the decimal number 12, and written as letter c; two sub-blocks of 4
bits are represented by a pair of hexadecimal symbols, that is, 01101011 is written
as 6b.
Input and Outputs are blocks of 128 bits.
The (Cypher key) k may be of length 128, 192, or 256 bits.
- 4.18 -

Arrays of bytes: input, output, and key bits for processing are considered as sequences of 16 bytes (or longer, as required by the length of the key)
a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 , a10 , a11 , a12 , a13 , a14 , a15
The input bits inputi are grouped 8 8 as follows
a0 = {input0 , input1 , input2 , input3 , input4 , input5 , input6 , input7 } .
The bits in each byte are then rewritten in reverse order, as shown in the
third row in the following table, where the second row shows the number
of the bytes
Input bits

Byte number
Bits in a byte

3
0
5 4

7 8

10

11

0 7

12
1
3

13

14

15 . . .
...
0 ...

State Matrix. St represents the internal state of the machine. This matrix St contains the partially transformed block of bits, that is the output of a round
which is input of the next round. St is a 4 Nb array of bytes, where Nb is
equal to the input block length divided by 32 (Nb is either 4, 6, or 8). Each
byte in the state is denoted as Sij , with 0 i 3 and 0 j Nb 1. The
initial state is loaded with the bytes inputm 0 m 15 of the input block of
16 bytes (or 24, or 32 bytes), according to the rule
Sij = inputm

where i = m mod 4, j =

mi
4

Number of rounds. Nr is the number of rounds, which depends on the length


Nk of the key, as shown in the following table
Nk
128
192
256

Nr
10
12
14

Transformations performed upon each byte at each round are of 4 types:


1) byte substitution, using tables (S-box);
2) row shift of the state matrix St of different offsets;
3) data mixing working by columns of the state matrix St
4) combination of the Round key with the current state St.

4.6.1

Round Transformations

This section contains a detailed description of the above transformations that


characterize a round.
- 4.19 -

Subbyte operates a nonlinear transformation over a byte a considered as an element of F28 (the transformation is a simple inversion), followed by an affine
transformation considering the image byte b as a vector of 8 bits according
to the rule Subbyte(b) = Ab + a, that is


0
b0
1 0 0 0 1 1 1 1
b0
1
b01 1 1 0 0 0 1 1 1 b1 1
0


b2 1 1 1 0 0 0 1 1 b2 0
0


b3 1 1 1 1 0 0 0 1 b3 0
0 =


b4 1 1 1 1 1 0 0 0 b4 + 0 ,
0


b 0 1 1 1 1 1 0 0 b5 1
50


b6 0 0 1 1 1 1 1 0 b6 1
b07
0 0 0 1 1 1 1 1
b7
0
where matrix A and vector a are specified in the Standard.
Shiftrow transformation operates in the following way on the rows of St
the first row is not changed
the following rows are shifted by a value that depends on their index
Sij Si,j+i
where the sum index is considered modulo 4.
Mixcolumn is a transformation operating independently on each column of St.
Each column is written as a polynomial cj (x) of degree 3 with coefficients
in F28 , then cj (x) is multiplied by a fixed polynomial a(x) of degree 3 over
the same field, and relatively prime with x4 + 1, i.e.
a(x) = {03}x3 + {01}x2 + {01}x + {02}
modulo the polynomial x4 + 1.
AddRoundKey is the operation of adding a round key, working by columns, to
the matrix St, according to the rule
s0c = sc + w4r+c ,
where r denotes the round number.

4.6.2

Local Key generation

The cipher key K is expanded to generate the local keys that are used in each
round. The key expansion generates a total of 4(Nr + 1) words: the algorithm
needs an initial set of 4 words, and each of Nr rounds requires 4 words of key
data. The resulting key system consists of a sequence of matrices, of words of 4
- 4.20 -

bytes, and denoted [wi ], with 0 i < 4(Nr + 1). Initially [wi ] is loaded with the
key columns as
[wi ] = (key[4 i], key[4 i + 1], key[4 i + 2], key[4 i + 3]) .
then each sub-matrix wi is up-dated according to the rule
wi Subword(Rotword(wi )) + Rcon[i/4] ,
where the functions Subword and Rotword are the same used in the rounds.
In key expansion, the round key is a word wi formed by 4 bytes and generated
from the key K (assumed of 128 bits) in the following way
w0 , w1 , w2 , and w3 are taken directly from the key K;
the following words are recursively generated in the following way
wi = vi + wi4
where the sum is done bit-by-bit modulo 2 (XOR), and

if 4 /| i
wi1
v(i) =

Subword(Rotword(wi1 )) + R[i] if 4|i


with word R[i] defined as R[i] = (i1 , 0, 0, 0) where F28 is a primitive
element;
Rotword transforms a word [a0 , a1 , a2 , a3 ] of 4 bytes into a word with the
entries cyclically permuted [a1, a2, a3, a0]
Subword performs the Subbyte transformation, defined above upon each
byte of the word.

- 4.21 -

Chapter 5
Secret-Key Cryptography - Act II
Stream ciphers
If one has really technically penetrated a subject, things that previously seemed in complete contrast, might be purely mathematical
transformations of each other.
John von Neumann

5.1

Introduction

Symbol-by-symbol encryption is attractive, because of the simple and fast enciphering rule, but it shifts security problems from the combining function to
stream generation. The most common stream encryption rule, that is XORing the
binary information stream with a truly random binary stream, was introduced
by Gilbert Vernan in 1919 along with the notion of one-time-tape (really this last
concept is due to Joseph Maubourgne). However, in practical situations truly
random streams are not available, or are managed with great difficulty: they are
difficult to generate, difficult to exchange on secure channels, and difficult to destroy after use. Since random stream generation has assumed a prominent role,
a large number of structures have been proposed for producing sequences that
look random, including the use of block ciphers in cipher-feedback mode (CFB).
However, in the following only structures, based on LFSR, will be dealt with, because of their flexibility and speed. The general structure and features of stream
cipher generators are described as special finite state machines. Due to its key
role, the mathematics of LFSR is recalled at some length. Typically, the encryption systems have rate 1, that is the length of the encrypted message is equal to
the length of the plain message. However, a section of this chapter will be devoted to describing an interesting stream enciphering scheme with rate less than
1.
- 5.1 -

Periodic FSM

Output function
K

? i


Mi
-

Ei = f(Mi , Ki )



Figure 5.1: Stream cipher generator

5.1.1

The structure

In this section a structure of stream ciphers is described that combines efficiency,


simplicity, and security in a satisfactory way. In chapter 3, the features of a truly
random sequence were described, these are recalled as follows:
1. The sequence should look random.
2. The sequence should pass a battery of statistical tests, traditional to statistics
(with no condition depending on the application).
3. The sequence should have a linear complexity profile of average slope 1/2.
Artificial sequences are subjected to some further constraints
1. The sequence should be as long as possible, to prevent a store-and-replay
attack.
2. The sequence should look random.
3. The sequence should pass a battery of statistical tests, traditional to statistics, with particular attention to the application in mind.
4. The sequence should have a linear complexity profile of average slope 1/2.
- 5.2 -

5.1.2

Finite State Machines

Linear feedback shift registers are particular Finite State Machines (FSM) whose
evolution is governed by linear functions.
Definition 5.1. An autonomous Finite State Machine (FSM) is an object which is described by a quintuple of parameters
{S, A, So , stat, out}
where
1. S is a finite set of states that characterize the machine; it is usually a set of mdimensional binary vectors u which are called states;
2. A is a finite set of symbols, the output symbols
3. So S, is a fixed state, the initial state
4. stat is a mapping from S into S, the next state function; it describes the transition
from one state to another state during the machines evolution.
5. out is a mapping from S into A, and specifies the output symbol when the machine
is in a given state; it is the output function.
The evolution of an FSM is described by a sequence of states
S : u(0), u(1), . . . u(n), . . . ,
where u(n) = stat[u(n 1)] and the output sequence X : x(0), x(1), . . . , x(n), . . .
is generated as x(n) = out(u(n)).
In autonomous evolution, the sequence S is periodic with period the minimum
integer T such that u(n + T ) = u(n) for every n, and the output sequence X has a
period N , which is possibly a submultiple of T . If N = T the function out is said
to be non-degenerate.

5.2

Output functions - Boolean functions

Output functions are indispensable to the strength of stream generators against


plain-text attacks. Output functions may be realized by many different kinds of
mapping, defined in very different algebraic domains. Speed requirements are
of particular importance, thus a large class of output functions is realized using
Boolean functions, due to their fast computation.
Boolean functions owe their name to their first appearance in Booles logic. The
common logical operations: AND, OR, NOT, and exclusive-OR (XOR) are associated to operations with binary variables that indicate the true or false value of a
- 5.3 -

proposition, thus establishing a Boolean algebra. Let Prop.1 stand for Proposition
1, and let X denote its logical value (true or false), then we have
Prop.1 AN D Prop.2
Prop.1 OR Prop.2
N OT Prop.1
Prop.1 XOR Prop.2
Boole logic

X Y

X Y

X
W

(X Y ) (X Y )

Boolean algebra

xy
x+y+xy
.
1+x
x+y
F2

Definition 5.2. A Boolean function is a function of Boolean variables and takes binary
values in Boolean algebra.
Equivalently, a Boolean function is defined as a function of binary variables taking values in the Galois field F2 .
Given m binary variables v1 , v2 , . . . , vm , a product of s distinct variables is a monomial of degree s
vj1 vj2 . . . , vjs ji {1, 2, . . . , m} .
Binary variable are considered only at first degree (that is, they satisfy

a formal
m
identity x2 = x); the number of distinct monomials of degree s is
.
s
Boolean functions admit normal forms that are useful for their classification. Two
normal forms, called disjunctive and conjunctive normal forms, can be described
in Boolean algebra; a third normal form, called algebraic normal form, is conveniently described in terms of binary variables over F2 .
Theorem 5.1 (Disjunctive normal form). Every Boolean function of m variables can
be expressed as a union of monomials of the Boolean variables and their negated.
Theorem 5.2 (Algebraic normal form). Every Boolean function of m variables can be
m
expressed as a sum of monomials of binary variables. The number of functions is 22 .
Proof. Let f (v1 , . . . , vm ) be a Boolean function of m variables. The theorem is a
consequence of the following recursive relation, which is self-explaining:
f (v1 , . . . , vm ) = vm f (v1 , . . . , 1) + (1 + vm )f (v1 , . . . , 0)
.
= vm [f (v1 , . . . , 1) + f (v1 , . . . , 0)] + f (v1 , . . . , 0)

Theorem 5.3. Let g(v1 , . . . , vm1 ) be a Boolean function of m 1 variables, then


f (v1 , . . . , vm ) = vm + g(v1 , . . . , vm1 )
is a Boolean function of m variables that assumes the values 0 and 1 equally often, i.e.
2m1 .
- 5.4 -

Any Boolean function of m variables is completely described by a binary vector


of dimension 2m .
Example - Let f (v1 , v2 , v3 ) = v1 + v2 v3 be a Boolean function of three binary
variables v1 , v2 , and v3 , and consider the integer
i = v1 + 2v2 + 4v3 ,
then setting fi = f (v1 , v2 , v3 ), the following table can be constructed
(v1 , v2 , v3 )
0,0,0
1,0,0
0,1,0
1,1,0
0,0,1
1,0,1
0,1,1
1,1,1

i
0
1
2
3
4
5
6
7

fi
0
1
0
1
0
1
1
0

and the function f (v1 , v2 , v3 ) is represented by an eight-dimensional binary vector


f whose entries fi are numbered from 0 to 7
f (v1 , v2 , v3 ) (0, 1, 0, 1, 0, 1, 1, 0) .

5.3

Periodic generators and LFSRs

The periodic component of a stream cipher may be realized as an autonomous


Finite State Machine (FSM) which has a periodic evolution of period less than
or equal to the number of states. Frequent constituents of the FSM are Linear
Feedback Shift Registers (LFSR), whose main features will be described in this
section.
In general, binary sequences for cryptography are required to have
1. long period;
2. good randomness properties;
3. strong inversion resistance as far as their generation is concerned;
4. good linear complexity profile (LCP), a notion that will be explained below in
regard to LFSR.
Randomness is defined in Lehmers sense [47] and is commonly evaluated using
a battery of randomness tests based on a Montecarlo principle, as described in
Chapter 3.
- 5.5 -

Inversion resistance is mainly deferred to properties of the output functions. However, it may also be reinforced by LFSR sequences properly modified, as we will
see later in this Chapter.
The structure of a binary LFSR is described by a matrix that also governs its evolution. Let k be the LFSR length, and let x(n) be a k-dimensional vector defining
the LFSR state at step n. The evolution of a binary LFSR is described by a system
of linear recurrent equations of the first order over F2 , which, in matrix notation,
is
(5.1)

x(n + 1) = Mx(n) ,

where M is a k k binary matrix. If M is a singular matrix, some cells of the


LFSR can be suppressed and its effective length is less than k; thus, in particular for cryptographic applications, these degenerated cases are not considered.
Any nonsingular binary matrix M characterizes the structure of an LFSR with
generator polynomial p(z) which is the characteristic polynomial of M; however,
for hardware/software implementation purposes, three special forms of matrices
have mainly been considered: i) the companion matrix of a binary polynomial
p(z), which gives a structure also known as the Galois LFSR; ii) the transpose of
the companion matrix of a binary polynomial p(z), which gives a structure also
known as the Fibonacci LFSR; iii) the tridiagonal matrix having the upper and
the lower sub-diagonals filled with 1s, and characteristic polynomial p(z), which
gives a structure also known as the Tridiagonal LFSR.
The following schemes show, for the same polynomial p(z) = z 5 + z 2 + 1 over F2 ,
a sample of each of the three structures.

?
m

Galois LFSR


? 
m


Fibonacci LFSR
- 5.6 -

?
m-

?
m

?
m

?
m

Tridiagonal LFSR
Given a binary polynomial of degree k over F2
p(z) = z k + a1 z k1 + a2 z k2 + + ak z + ak ;
the following matrix is known as its companion matrix

0 0 0 0 . . . 0 ak
1 0 0 0 . . . 0 ak1

0 1 0 0 . . . 0 ak2
.
. .
.

C = .. 0 1 . . 0 .. ..

0 . . . 0 ... 0 0 a
3

0 ... 0
1 0 a
2

0 ...

1 a1

and describes the structure of a Galois LFSR, while its transpose C T describes the
structure of a Fibonacci LFSR. The structure of a tridiagonal LFSR is described by
a binary tridiagonal matrix of the form

d0 1 0
...
0
1 d1 1
...
0

... ...
T = ...
,

0 . . . 1 dk2
1
0 0 ...
1
dk1
which is uniquely identified by the vector dt = (d0 , d1 , . . . , dk1 ). In Appendix
III (i.e. Section 5.9) of this chapter it will be shown how to compute dt from the
characteristic polynomial p(z).

5.3.1

The mathematics of LFSRs

A (binary) sequence s : (a0 , a1 , , aj , ) may be also represented as a formal


series, called generating function
S(z) =

aj z j ,

j=0

- 5.7 -

a representation that is more suitable for studying the algebraic structure of s.


As already said, a binary LFSR consists of a linear register and a next-state transition function. A linear register is a concatenation of k cells; the content or state
of each cell at step n is denoted with a binary variable xi (n), thus the state of the
linear register is represented by a k-dimensional vector X(n).
x1 (n) x2 (n)

xj (n)

xk (n)

Linear Register of length k


The number k of cells is the length of the linear register. The next-state function
of an LFSR is described by a k k matrix M. The next state X(n + 1) is computed
as shown in equation (5.1).
The equation (5.1) is a vector recurrence of order 1, and it is equivalent to a linear
recurrence of order k (as is detailed in an appendix, Section 5.8, of this chapter),
which describes the state evolution of each single cell.
Since the number of states is finite, the evolution of any LFSR over a finite field
is periodic, of period T (k). For cryptographic applications we are interested in
LFSR of maximum period, that is, in the binary case, the period should be 2k 1.
The period length depends on the matrix M, and suborderly on the initial state.
An outline of the general theory is presented in Section 5.8 of this Chapter. The
evolution of LFSRs can be studied via linear recurrent equations of order equal
to the LFSR length, or via linear systems of equations of order one, which are
identified by matrices of dimension equal to the LFSR length. For definitions and
properties, refer to Section 5.8; however, some introductory notions and definitions are in order here.
Definition 5.3. Let k and n be positive integers, and define
x(n)
y(n)
u(n)
A

a column vector of k unknown functions of n;


a column vector of k unknown functions of n;
a column vector of k given functions of n;
a k k square non-singular matrix.

The linear equation


x(n) = Ax(n 1) + u(n) ,

(5.2)

defines a linear recurrent system of equations of order 1.


If u(n) is the null vector, the system is said to be a homogeneous linear system,
and is written as
y(n) = Ay(n 1) .
(5.3)
It is straightforward to verify that the systems (5.2) and (5.3) are equivalent to
linear recurrent equations of order k, which are complete or homogeneous, respectively. Conversely, any linear recurrent equation is always equivalent to a
- 5.8 -

linear system.
The solutions of (5.2) and (5.3) can be obtained from the characteristic equation
of matrix A, that is from its eigenvalues, or by means of a generating function
technique.
Any matrix A satisfies its characteristic polynomial pA (z), that is pA (A) = O.
The minimal polynomial mA (z) of a matrix A is the polynomial of lowest degree
such that mA (A) = O. The polynomial mA (z) is a divisor of the characteristic
polynomial of pA (z). Two square matrices A and B are similar, possibly in some
extension field of their coefficient fields, if a nonsingular matrix S exists such that
B = SAS1 . Similar matrices have the same characteristics and minimal polynomials.
Example - Let us show the state evolution of three LFSR with the same generator
polynomial, different implementations, and same initial state. Consider LFSRs of
length 4 with generating polynomial g(z) = z 4 + z + 1 and initial state (0, 1, 1, 0).
The three sequences of states for Fibonacci, Galois, and Tridiagonal LFSRs, respectively, are

0
0
1
0
0
0
1
1
1
1
0
1
0
1
1

1
0
0
1
0
0
0
1
1
1
1
0
1
0
1

1
1
0
0
1
0
0
0
1
1
1
1
0
1
0

0
1
1
0
0
1
0
0
0
1
1
1
1
0
1

0 1 1 0

0
0
1
0
0
0
1
1
1
1
0
1
0
1
1

1
0
0
1
0
0
0
1
1
1
1
0
1
0
1

1
1
0
0
1
0
0
0
1
1
1
1
0
1
0

0
1
0
0
0
1
1
1
1
0
1
0
1
1
0

0 1 1 0

0
1
0
1
1
1
1
0
0
0
1
0
0
1
1

1
1
1
0
0
0
1
0
0
1
1
0
1
0
1

1
0
0
1
1
0
1
0
1
1
1
1
0
0
0

0
1
0
0
1
1
0
1
0
1
1
1
1
0
0

0 1 1 0

The period is maximum, that is T (4) = 15, and thus does not depend on the initial
state, obviously with the exclusion of the all-zero state.

5.4

Linear Codes and Binary sequences

The interplay between binary sequences and linear codes is of great importance
in cryptography. In particular, a fruitful trick for obtaining good binary (nonlinear) sequences, suitable for cryptographic applications, is code puncturing
of linear codes [65]: the operation of puncturing simply excludes (punctures) some
- 5.9 -

entries of the code vector in the same positions of every code word. In general,
puncturing a linear code (n, k, d) produces a shorter code (n0 , k, d0 ) with the same
dimension, and a smaller minimum distance, i.e. n0 < n and d0 < d. However, if
length and minimum distance reduction are not exceedingly large, the resulting
code words/ sequences are good for cryptography.
It is well known that a block [x(0), x(1), . . . , x(T 1)] of T = 2m 1 consecutive binary symbols in a sequence X produced by an LFSR with primitive generator polynomial g(x) of degree m is a code word of a dual Hamming code
(2m 1, m, 2m1 ) [54]. A code word is generated, symbol by symbol, during the
evolution of the LFSR, where x(n) = xTo u(n) is a linear function of the LFSR state
u(n), and xo being a fixed vector.
Let be a root of g(x), then B = {1, , . . . , m1 } is a basis of F2m .
Let B = {1 , 2 , . . . , m } be the dual basis of B, that is a basis that satisfies the
conditions

1 if i = j 1
i
Tr( j ) =
,
0 if i 6= j 1
where Tr() is the trace function mapping F2m onto F2 defined as
m1

Tr() = + 2 + + 2

The elements of X may be written as x(n) = Tr(xu(n)), where


x = x1 1 + + xm m

u(n) = u1 (n) + u2 (n) + + um (n)m1 .

An interesting generalization of this scheme with Hamming codes is obtained


by considering other codes, for instance dual BCH codes (2m 1, mt, d) or dual
Goppa codes (2m , mt, d). Therefore, the evolution of an LFSR is used to drive the
choice of the element of F2m which identifies a column g() of the generator
matrix G = (g(1 ), g(2 ), . . . , g(2m 1 )) used to compute the output bit as the element xTo g() of a codeword that corresponds to the information vector xo . The
cost of generating a bit in a code word of a BCH or a Goppa code is normally
greater than the cost of generating a bit in a word of a Hamming code. Conversely, an undoubted advantage is the increased difficulty of de-engineering
the generated sequence with the aim of finding the initial content u(0) of the LFSR
from a piece of X.
As regards the linear complexity, in this situation, it is quite easy to compute
the state of the LFSR from 2m consecutive symbols of X, [55]. Therefore, the
linear complexity is m. The choice of code does not particularly increase the linear complexity of the generated sequence, a goal that can be achieved using the
puncturing of a code, an operation that may be achieved using clock-controlled
LFSRs. The visible result of this expedient is a cancellation of elements from X, an
operation that yields a sequence with shorter period but, in general, with a good
linear complexity profile. Dual BCH and Goppa codes will be briefly described,
since their use will be considered in some proposed stream generators.
- 5.10 -

5.4.1

BCH codes

The dual of a BCH (2m 1, 2m mt 1, d) code, with d 2t + 1, is closely


connected with a generalized Reed-Solomon code, and can be viewed as a trace
code [54, p.349]. Let be a root of a primitive polynomial of degree m. Consider
a generalized Reed-Solomon code of length 2m 1 and dimension mt consisting
of 2mt code words defined as follows [81]

m
f (1), f (), f (2 ), . . . , f (2 2 ,
where f (z) is any polynomial in F2 [z] of degree at most mt 1; then a BCH code
C, with parameters (2m 1, mt, d), has words

m
Tr(f (1)), Tr(f ()), Tr(f (2 )), . . . , Tr(f (2 2 )) .
A code word of C can be viewed as a block of bits with length equal to the period
of a binary sequence. The linear complexity (see section 5.5 for formal definition
and details) that is the length of the shortest LFSR that generates any code word,
is given by a theorem whose proof is based on the following two lemmas.
Lemma 5.1. Let s = {z0 , z1 , . . . , zn }, with zn = Tr(n ), be a sequence generated by
the trace function, where , F2m , then the linear complexity L(s) is m, that is s is
produced by a linear recurrence of order m.
Proof. Let m(x) = am + am1 x + . . . + a1 xm1 + xm be a primitive minimal polynomial of . We have
zn = Tr(nm m ) = Tr(nm (am + am1 + . . . + a1 m1 ))
= am Tr(nm ) + am1 Tr(nm+1 ) + . . . + a1 Tr(n1 )
which implies
zn = am znm + am1 znm+1 + . . . + a1 zn1
(1)

(2)

(r)

Lemma 5.2. Let s = {zn + zn + . . . + zn }


n=0 be a sum of r linear recurring sequences
(i)
whose generating polynomials m (x) i = 1, . . . , r have the same degree m and are
relatively primes. Then s is a linear recurring sequence of order mr.
Proof. The generating polynomial of s is the minimum common multiple of the
generating polynomials m(i) (x) i = 1, . . . , r, then it is equal to their product by
hypothesis, thus it has degree mr. It follows that s has order mr.
Note that the sum sequence is generated by a polynomial of degree mr, although
its period is equal to the largest period of the addend sequences, i.e. it is not
greater than 2m 1.
Theorem 5.4. Let be a primitive element of F2m , and assume
that s = {Tr(f (n ))}
n=0
Pmt1
is a sequence generated by the trace function, where f (z) = j=0 j xj is a polynomial of
degree mt 1 with every coefficient j F2m different from zero. If 2j1 for j = 1, . . . , t
are roots of distinct irreducible polynomials of degree m, then s has linear complexity
L(s) = mt, that is, s is produced by a linear recurrence of order mt.
- 5.11 -

Proof. By hypotheses concerning j and 2j1 , which characterize BCH codes


with designed minimum distance 2r + 1, we can write
zn = Tr(

r
X

n(2j12r)

)=

j=1

r
X

Tr(j n(2j12r) )

j=1

therefore zn is a sum of r linear recurring sequences of order m with relatively


prime generating polynomials. The conclusion immediately follows from Lemmas 5.1 and 5.2.

5.4.2

Goppa codes

Goppa codes (2m 1, mt, d) are also closely connected to the generalized ReedSolomon codes, and can be generated, like the BCH codes, using the trace function [54, Theorem 5, p.341].
Let G(z) be an irreducible polynomial (the Goppa polynomial) with coefficients
m
in the Galois field F2m , and let L = {1 , 2 , . . . , 2m 1 } F22m 1 be the ordered
location set, with G() 6= 0 L. Let be a primitive element of F2m , then a
dual of a Goppa code has words


 
m
f ()
f (2 )
f (2 2 )
f (1)
, Tr(
), Tr(
), . . . , Tr(
) .
Tr
G(1)
G()
G(2 )
G(2m 2 )
A code word of C can be viewed as a period of a binary sequence. The linear
complexity, given in a theorem below, is a direct consequence of the following
lemma.
Lemma 5.3. Let G(z) be a polynomial of degree r over F2 that is irreducible over F2m ,
and let F2m be a primitive element. The linear complexity L(g) of the sequence
2m 1
g = { G(1 n ) }
n=0 is lower bounded by m .
Proof. The linear complexity of g is given by the smallest M that allows us to
write
M
X
1
=
nki n = 0 . . . 2m 2
(5.4)
G(n )
i=1
1
for a suitable choice of the set k1 , k2 , . . . kM Z. Let us observe that
=
j
G(n1 2 )
2j

1
for every j less than m, therefore every set of m integers n = n1 2j , j =
G(n1 )
1
0, . . . , m 1 results in a single condition; furthermore, G(
0 ) = 1 implies that M is
2m 1
odd. Therefore equation (5.4) can be seen as a system of m syndrome equations
m
m
that must correct 2 m1 errors in positions ki , thus M 2 m1 .

Theorem 5.5. Let f (z) be a polynomial of degree r 1 with every coefficient j F2m
m
not zero. The limited sequence s = {zn = Tr(f (n )/G(n ))}2n=02 generated by the trace
m
function has a linear complexity L(s) that is lower bounded by 2 m1 .
Proof. Since the conditions of Lemma 5.3 hold, the linear complexity of s is not
less than the linear complexity of g.
- 5.12 -

5.5

Nonlinear Feedback Shift Registers

In cryptography, two notions characterizing a stream are very important, namely


its linear complexity profile, and its inversion resistance.
Definition 5.4. The Linear Complexity Profile (LCP) of a binary stream of length L is
defined as the sequence of lengths of the shortest linear feedback shift registers that generate the subsequence of length n (from the beginning) for every n from 1 up to L.
The inversion resistance of a binary sequence, produced by a finite state machine, is defined with respect to recovering the initial state: it is the difficulty, evaluated with some
measure of complexity, of recovering an initial state from a known piece of sequence.
As regards inversion resistance, a measure of complexity is needed because easy
to recover or hard to recover are naive notions that should be made precise in
the realm of computational complexity.
In this section, the consequences of using linear codes and clock-controlled LFSR
are investigated, and some general conclusions presented.

5.5.1

Clock-controlled LFSR

Let A and B be two m m binary matrices. The simplest, although fairly general,
form of the state transition function s of a clock controlled binary LFSR is
u(n) = [(1 c(n))A + c(n)B]u(n 1) ,
where the binary sequence c = c(0), c(1), . . . , c(n), . . . is called the control sequence. The sequence c may be generated by an LFSR, or may be extracted from
the sequence u(n) itself; in this case the machine evolution is autonomous and is
called self-clock-controlled. For example, if c(n) = f (u(n 1)), then the feedback
shift register can be viewed as a nonlinear device.
The state vector of a clock-controlled LFSR can be expressed at step n as a function of the initial state
n
Y
u(n) =
[(1 c(n))A + c(n)B]u(0) .
i=1

From now on we will assume that A and B are commuting binary matrices, therefore we have
u(n) = AC0 (n) BC1 (n) u(0) ,
(5.5)
where C0 (n) and C1 (n) denote the number of 0s and 1s in the sequence c up to
step n.
The period of the sequence S is the minimum integer N such that u(N +n) = u(n)
for every n. Therefore equation (5.5) implies
AC0 (N +n)C0 (n) BC1 (N +n)C1 (n) = I

and

C0 (N +n)C0 (n)+C1 (N +n)C1 (n) = N

for any n. In the following we will assume that the initial state u(0) belongs to a
periodic cycle, so that we can take n = 0 and write C0 + C1 = N = C0 (N ) + C1 (N ).
- 5.13 -

Primitive LFSR. Let c be generated by a primitive LFSR of length M . Let A


be the companion matrix of a primitive polynomial m(x), and assume B to be a
suitable power A`B of A. In a classical case B = A2 , which is equivalent to a two
step advance of the LFSR when c(n) = 1.
Under these hypotheses, the period N is defined by the following equations
N = C0 + C1
N = K1 (2M 1)
M 2

C0 = K1 2

and C1 = K1 22

C0 + `B C1 = K(2m 1)
where, K and K1 are the smallest integers that satisfy the condition
K1 [(`B + 1)2M 1 1] = K(2m 1)

If (`B + 1)2M 1 1 and (2m 1) are relatively prime, then K1 = (2m 1) , and the
period is N = (2M 1)(2m 1).
The situation is completely different for the self-clock-controlled LFSR.

5.5.2

Self-Clock-controlled LFSR

Let c be produced by the relation c(n) = f1 (u(n 1)) with f1 () being a function
of the state vector u. Letting B = A`B , then equation
u(n) = [(1 f1 (u(n 1)))A + f1 (u(n 1))B]u(n 1)
shows that a self-clock-controlled LFSR is equivalent to a non-linear feedback
shift register, with possibly a shorter period.
Assuming again that the initial state belongs to a periodic cycle, under the above
hypotheses, the period N is defined by the following conditions
N = C0 + C 1
C0 + `B C1 = 2m 1
where C0 and C1 depend on f1 (). For instance, in the case B = A2 and f1 (u(n)) =
ui (n), the period is obtained as follows: the computation of C1 is based on the
distribution of runs of 1s (recall that a proper run of 1s having length k consists in
k consecutive 1s between two 0s) in a sequence generated by a primitive LFSR of
length m [34]
- 1 run of length m
- 0 runs of length m 1
- 1 run of length m 2
- 5.14 -

- 2mk2 runs of length k for 1 k m 3


We obtain

2m 1 + 2
3
where is the remainder of 2m 1 divided by 3, namely = 0 if m is even, and
= 1 if m is odd,
m
and finally
C0 = 2m 1 2C1 = 2 14
3
C1 =

2
2
N = (2m 1)
3
3
It is interesting to observe that, when m is even, the control sequence c is perfectly
balanced.

5.5.3

Clock-controlling and puncturing

In the block [x(0), x(1), . . . , x(T 1)] of one period length, any symbol can be
identified by the corresponding state u(n) = `(n) of the shift register. Therefore
we write
[x(0), x(1), . . . , x(T 1)] = [x(`(0) ), x(`(1) ), . . . , x(`(T 1) )] ,
with the convention that if u(n) = 0 then `(n) = .
The effect of the clock-control is to exclude some states from the sequence S,
hence to exclude some x(i ) from the generated sequence, an operation that corresponds to puncturing the symbols within a period of the sequence, which is
equivalent to puncturing the underlying linear code.

5.5.4

LCP of clock-controlled LFSR sequences

A clever dissertation on the importance of Linear Complexity Profile (LPC) in


stream generators is given in [65], and we refer to that analysis to show that clockcontrolled shift registers generate sequences with good LCPs.
Let A be the m m companion matrix of a primitive irreducible binary generator
polynomial g(x) = xm +g1 xm1 +. . .+gm of a clock-controlled LFSR that produces
the binary sequence x(n) = rT u(n), where r is a fixed m-dimensional vector and
u(n) is the LFSR state at step n.
We have already observed that u(n) = An+`B C1 (n) u(0); furthermore, A admits the
following singular value decomposition
A=

m1
X

i ()i (a)i (bT )

i=0

where
- 5.15 -

1. the root of g(x) is an eigenvalue of A with column eigenvector


a = (1, , 2 , . . . , m1 )T ;
2. i is the Frobenius automorphism that is defined to act componentwise on
i
a vector (or a matrix) as i (z) = z 2 ;
3. let U = (a, 1 (a), . . . , m1 (a)) be the matrix of conjugate eingenvectors,
therefore det(U) = 1, because U is nonsingular and the columns of 1 (U)
are a cyclic permutation of the columns of U. Let

bT
1 (bT )

V = U 1 = ..

T
m1 (b )
be the inverse matrix of U, where the conjugate rows have been reported. It
follows that bT a = 1 and i (bT )a = 0 for any i 6= 0, hence the state of the
shift register at step n can be written in the form
u(n) =

m1
X

i (n+`B C1 (n) )i (a)i (bT u(0)) .

i=0

Observing that
h

A =

m1
X

i (h )i (a)i (bT ) ,

i=0

the n-th term in the output sequence has the form


x(n) =

m1
X

i (n+`B C1 (n) )[rT i (a)][i (aTR )u(0)] = Tr([rT a][bT u(0)]n+`B C1 (n) ) .

i=0

(5.6)
Assuming n > m, these equations show that the linear complexity profile L(n)
of the sequence x(n) is the same as that of the sequence n+`B C1 (n) aTR u(0), which
is given by the smallest integer L(n) for which we can find a set of exponents kj
such that the system

`B C1 (n)

t
X

[bT u(0)]1 cj t(kj 1) t n

(5.7)

j=1

has a solution with cj = 0 for any j > L(n). However, given that the coefficient matrix D = (t(ki 1) ) is a Cauchy-Vandermonde matrix, it follows that
[bT u(0)]1 cj is zero if and only if its corresponding column in D is equal to the
column of known terms. This is possible only for one column. Therefore, except
when C1 (n) = kn so that we can take kj = k, the number of non-zero coefficients
cj is n. In conclusion, the linear complexity is L(n) = n.
- 5.16 -

Stream generator

Source

Encoder

Public channel

Random bit
generator

Figure 5.2: Encryption scheme with rate less than 1

5.6

Encryption with rate less than 1

A stream enciphering scheme with rate less than 1 leaves at our disposal some
redundancy, which permits the introduction of a genuine randomicity in the enciphering stream. The basic idea is to encode the information using error correcting
codes, and to add a number of errors within the code error correcting capability.
Consequently, the information is masked with a random key whose length grows
with the message length. This situation is general for any alphabet; however, it
will be described in detail only for binary stream generators.
A sequence of binary message symbols
x1 , x2 , . . . , xj , . . .
is partitioned into blocks xi of length ki () that depends on a random variable .
Thus, the plain sequence is composed of binary vectors of variable length
x1 , x 2 , . . . , xi , . . .
each of which will be encoded by means of a linear code (ni , ki , di ) of length depending on the index i; a random pattern of ni bits and weight not greater than
b di21 c is then added, following the scheme in Figure 5.2, to mask the information
(i.e. to perform the encryption).
Let C be a class of codes C of parameters [ni (), ki (), di ()], where every code,
together with its parameters, is considered depending on a random variable . In
principle, the class C may contain an infinite number of codes.
Each block of ki () bits is encrypted into a block of ni () bits, thus the encryption
rate could also be a random variable:
i () =

ki ()
.
ni ()

However, it will be assumed that the sub-class of codes used has constant rate
o independent of and i. Assume that the codes [n(), k(), d()] satisfy the
- 5.17 -

Gilbert-Varshamov condition, [54], that is



d()2 
X
1
k()
n() 1
log2 (
)1
,
i
n()
n()
i=0

(5.8)

and assume that, asymptotically, equality is achieved. Using the asymptotic form
of the bound, we have
1 H2 () = o
(5.9)
where = nd , and H2 () is the binary entropy defined as
H2 () = lg2 (1 ) lg2 (1 ) .
Let Gi () be the generator matrix of the code used to encode xi at step i, and
ei () be a vector of errors of length ni () and Hamming weight not greater than
c, which is the number of errors that a code with minimum distance
t = b di ()1
2
di () can correct.
Let C=s0 be the initial code defined by means of the secret key K0 , which is also
used to define both the initial state of the stream generator, and the value s0 of
the random variable .
The encryption is performed as follows:
ri = Gi ()xi () + ei () + vi ,
where vi is a block of ni () bits that belongs to a secret stream produced by some
stream generator. Its presence is essential for the protocol, because it masks the
sequence of code words, which belong to codes that anyway are unknown to the
attacker, but due to their redundancy may offer some weak point of attack. The
scope of the sequence vi is to make the theoretical performances demonstrated
below effective and practically achievable.
In the following analysis, the sequence vi is not considered, because the aim is
to evaluate the strength introduced by the redundancy and the genuinely random errors. Therefore, in the given model, the (binary) mutual information that
a block r of received symbols exchanges with the block x of information transmitted symbols is
I(R, X ) = H(R) H(R|X ) ,
where R is the set of receivable blocks, and X is the set of block that are transmitted. Let E be the set of possible genuinely random block errors; we have
H(R) = max{H(X ), H(E)}
H(R|X ) = H(E) .
In these hypotheses, since Plotkins bound certainly holds true, i.e.
certainly H(X ) H(E), hence the mutual information is
I(R, X ) = H(X ) H(E) .
- 5.18 -

di
ni

< 21 , it is

Further, knowing K0 , we have


I(R, X |K0 ) = H(X |K0 ) H(E|K0 ) = H(X ) ,
that is, the mutual information conditioned by knowledge of the secret key turns
out to be equal to the entropy of the transmitted information, since the error correcting code is used within its error-correcting capability and completely removes
the genuine random errors.
To recover x from r it is necessary to know the code C and an algorithm for correction of the errors that allows us to get e, the error vector, and finally x = r e.
However, the code varies according to a rule i = fi (K0 ) that is unknown to attackers, who may recover the message only in non-deterministic ways, that is by
trying to infer the code or the random errors.
The most favorable and achievable situation for the defender, which we may consider to be one of optimal or perfect enciphering, occurs when
H(X ) = H(E) .
Now, we have
H(X ) = k = n

H(E) = nH2 ( ) ,
2

d
where 2 2n is the rate of correctable errors.
Therefore, perfect encryption is achieved if the condition

= H2 ( )
2
is satisfied. Using Gilbert-Varshamovs inequality, we obtain an upper bound for
, and thus the maximum encryption rate that allows perfect encryption is

1 H2 () H2 ( ) .
2
Solving equation (5.10) taken with the equality sign, we obtain

(5.10)

= 0.15135 ,
thus the encryption rate is upper bounded as
0.3867 .
The meaning of this bound is that of making perfect encryption practically possible without exchanging secret keys of the same length as the message to be
protected, as in the case of the one-time pad. The price to pay is that of sending
useful information at a reduced rate, not greater than 0.3867.
Obviously, every exhaustive local attack, aimed at identifying the code used locally, has a complexity which is a function of the code length. Furthermore, in
the proposed scheme, this attack is made even more difficult by the presence of
the pseudo-random stream that masks the genuine random sequence of errors.
Further, any attack on the code must resolve two issues:
- 5.19 -

1. The code length is not known, thus the attacker must analyze all codes of
all possible (reasonable) lengths.
2. If the codes are sufficiently long, i.e. of length greater than 500, and are
codes of the Goppa class, with generator matrix transformed with a random secret matrix which depends on the secret key K0 , the complexity of
any known attack is at least equal to the complexity of an attack on the
McEliece public-key scheme, with the further issue that the generator matrix is unknown.
As previously observed, the price to pay is a slower transmission rate which is
upper bounded by 0.3867. However, the rate loss is not very large, as could appear at first sight, because Shannon perfect enciphering has net transmission rate
0.5.

5.7

Appendix I - Representation of Finite Fields

A finite field F2m admits a faithful representation by m m binary matrices [42].


Let be a root of a binary irreducible polynomial g(x) = xm + e1 xm1 + e2 xm2 +
. . . + am1 x + am of degree m, which has companion matrix

0
1
0 ... 0
0
0
1 ... 0

..
A=

0
0
0 ... 1
er er1 er2 . . . e1
Definition 5.5. Let M(m, Z2 ) be the ring of the m m matrices with binary entries.
The natural representation of F2m is an homomorphism : F2m M(m, Z2 ) with
kernel the zero element 0 F2m (). Therefore is a linear one-to-one mapping, and the
image of F2m () is precisely a subfield of M(m, Z2 ) consisting of matrices obtained as
follows:
(a) () = A.
(b) (1) = Ir and (0) = Or , where Ir and Or denote the identity and the all-zero
matrix in M(m, Z2 ) respectively.
(c) If x = a0 +a1 +. . .+am1 m1 , ai Z2 , then (x) = a0 It +a1 A+. . .+am1 Am1
Definition 5.6. Two binary representations and of the same field F2m are called
equivalent if there is a matrix U with entries in F2m , such that (x) = U (x)U 1 for
every x F2m .
Definition 5.7. A representation of F2m is said to be reducible over some extension
field F if there is a matrix Q with entries in F such that Q(x)Q1 x F2m is a block
diagonal matrix; otherwise it is said to be absolutely irreducible.
- 5.20 -

A binary matrix representation of F2m is said to be Z2 -irreducible if no matrix Q exists


such that Q(x)Q1 , x F2m is a block diagonal matrix with binary entries; otherwise
it is said to be Z2 -reducible.
Proposition 5.1. The natural representation of F2m , with a root of g(x), is Z2 irreducible, and is completely reducible (diagonalizable) over F2m .
Proof. If is reducible over Z2 , then the polynomial g(x) necessarily factors over
Z2 . Therefore the irreducibility of g(x) over Z2 implies the Z2 -irreducibility of .
Let v = [1, , . . . , m1 ]T be a column vector, therefore
v = Av
i

and the same relation holds for every conjugate of i () = 2 , where i is an


automorphism of the Galois group of g(x). It follows that, if U is a matrix whose
columns are the conjugated of column v, we have
diag(1 (), 2 (), . . . , m ()) = U 1 AU

Proper subfields are represented by matrices having peculiar properties, as specified in the following theorem.
Theorem 5.6. If F2m has a proper subfield F2n , then
i) n|m;
ii) Any natural m-dimensional representation of F2m is equivalent to a representation in which the matrices associated to elements of F2n are the outer tensor
product of the n n matrices of a natural integral representation of F2n by the
m/n m/n identity matrix. The characteristic polynomial of (x), x F2m , is a
power of exponent m/n of the characteristic polynomial of (x).
Proof. Since F2m is a normal extension of degree m/n of F2n , then F2m admits a
natural (m/n)-dimensional representation over F2n . A representation of F2m
over Z2 is obtained by the compound with , namely, the entries of the matrices
of are substituted with the matrices of whose entries are in Z2 . Clearly in
the elements of F2m consist of block diagonal matrices where the same block
occurs m/n times. The two representations and are isomorphic.

5.8

Appendix II - Linear recurrent equations in Fq

This appendix is a short review of the most common methods for solving linear
recurrent equations.
- 5.21 -

Let f (n) be a function defined over Z, and assuming its values in Fq , that is f is a
mapping from Z into Fq . Usually, the values of f form a sequence extending from
to +
, f (m), f (m + 1), , f (1), f (0), f (1), , f (n), ,
however, in many interesting cases, the sequence is bounded on the negative side
f (m), f (m + 1), , f (1), f (0), f (1), , f (n), .
Whatever the definition domain may be, a function f (n) is said to be periodic of
period N , if N is the smallest integer such that f (n) = f (n + N ) n m, where
m is possibly .
Definition 5.8. Given
- a function u(n) defined for every n Z, and assuming values in Fq ,
- a fixed set of k numbers h1 , h2 , h3 , . . . , hk which belong to Fq , with hk 6= 0,
the recurrent relation
x(n) =

k
X

hi x(n i) + u(n)

n Z .

(5.11)

i=1

specifies a function x(n), and is called a linear recurrent equation of order k over Fq .
If the relation (5.11) is considered only for every n k, it is necessary to assign a set of
k initial values for the function x(n)
x(1) = a1 , x(2) = a2 , ..., x(k) = ak ,
called initial conditions, to start the recurrence. The k values ai may be chosen independently.
Further, the following notions and definitions are useful:
1. A linear recurrent equation is said to be homogeneous if the function u(n)
in (5.11) is identically zero
y(n) =

k
X

hi y(n i)

n k ,

(5.12)

i=1

where y(n) denotes the unknown function.


2. Given (5.11), the equation obtained by setting u(n) = 0 n Z is called the
associated homogeneous equation.
3. Any function that represents every solution of (5.11), for any set of initial
conditions, is said to be a general solution.
- 5.22 -

4. A solution that satisfies a given set of initial conditions is said to be a particular solution.
It is straightforward, using properties of linearity, to check that the general solution of (5.11), for every n k, can be written as a sum of two functions:
x(n) = y(n) + w(n)
where:

- y(n) is the general solution of the associated homogeneous equation;


- w(n) is any particular solution of (5.11).
It is equally direct, using linearity properties, to prove the following
Proposition 5.2. The set of solutions of (5.12) is a vector space of dimension k over Fq ,
because each solution is specified assigning k independent initial values of y(n). The total
number of distinct initial conditions is thus q k .

5.8.1

Generating functions

The generating function of a sequence is a way of writing the sequence compactly,


and turns out to be useful to write the solution of a linear recurrent equation explicitly. The generating function method belongs to the wide class of transformation methods that transform an analytical problem into a problem of an algebraic
nature.
Definition 5.9. Let a(n) F2 , n = 0, 1, 2, . . ., be a sequence, and z be a variable; the
sum A(z) of the formal series in z with coefficients a(n)
A(z) =

a(n)z n .

n=0

is said to be the generating function of the sequence a = {a(n)}


n=0 .
The solutions of equations (5.11) and (5.12) may be obtained by operating with
generating functions, as follows. First, consider the equation (5.12) which is homogeneous; multiply both sides of the equation by z n , and sum the expression
for n varying from 0 to infinity:

y(n)z

X
k
k

X
X
X
n
=
{
hi y(n i)}z =
hi {
y(n i)z n } =

n=0

n=0

k
X

i=1

k
X
i=1

n=0

k
i1
X
X
X
n
hi {
y(n i)z } +
hi
y(n i)z n =

i=1

i=1

n=i

hi {

i=1

y(m)z m+i } +

m=0

n=0

k1 X
k
X
{
hi y(n i)}z n .
n=0 i=n+1

- 5.23 -

Defining the following generating functions

Y(z) =
y(n)z n

n=0

X
H(z) = 1 +
hi z i ,
,

i=1

k1 X
k

{
hi y(n i)}z n

I(z) =
n=0 i=n+1

the previous summations can be rewritten as


Y(z) = Y(z)

k
X

hi z i + I(z) ,

i=1

thus, the solution of (5.12) is obtained as a ratio of polynomials


Y(z) =

I(z)
.
H(z)

(5.13)

Let d(z) = gcd{I(z), H(z)} be the greatest common divisor between I(z) and H(z),
then
I(z) = d(z)I1 (z) and H(z) = d(z)H1 (z) ,
thus, the equation (5.13) simplifies to a ratio of polynomials which are relatively
prime
I1 (z)
.
(5.14)
Y(z) =
H1 (z)
Since H1 (z) is a polynomial over F2 , a minimum integer N exists such that H1 (z)
divides z N 1. We can thus write
H1 (z)G1 (z) = z N 1 ,
and, multiplying numerator and denominator of equation (5.14) by G1 (z), we
have
I1 (z)G1 (z)
I1 (z)G1 (z)
Y(z) =
=
.
(5.15)
H1 (z)G1 (z)
zN 1
Note that the polynomial I1 (z)G1 (z) represents the period of the generated sequence. Further, equation (5.15) used jointly with equation (5.13) yields the following theorem.
Theorem 5.7. Every solution of a homogeneous recurrent equation (5.12) of order k
over F2 is periodic of period N 1. The period depends on the polynomial H(z), and
suborderly on the initial conditions:
1) If the initial conditions are the all zero vector X(0) = 0, the period is N = 1 and is
independent of the polynomial H(z).
- 5.24 -

2) If H(z) is an irreducible primitive polynomial and X(0) 6= 0, the period N is


maximal, and is equal to 2k 1.
3) If H(z) is an irreducible non-primitive polynomial and X(0) 6= 0, then the period
is the minimum integer N such that H(z) divides z N 1, and does not depend on
X(0).
4) If H(z) is a reducible polynomial and X(0) 6= 0, the period is a divisor of the
H(z)
divides z N 1; in this
minimum integer N such that the polynomial gcd{H(z),I(z)}
case the period depends on the initial conditions.
Using the generating function technique we can solve equation (5.11) simply by
repeating the same procedure used with (5.12); we obtain
X(z) =
where we have set

U(z)
I1 (z)G1 (z) U(z)
I(z)
+
=
+
,
H(z) H(z)
zN 1
H(z)

(5.16)

x(n)z n

X(z) =
n=0

u(n)z n .
U(z)
=

n=0

Equation (5.16) confirms a previous statement that the general solution of (5.11)
can be written as a sum of two functions. More precisely, we have
1) the general solution of the homogeneous equation is given by the generating function
I(z)
Y(z) =
,
(5.17)
H(z)
where I(z) is a polynomial of degree k 1 whose coefficients are undetermined constants;
2) a particular solution of (5.11), that corresponds to the all-zero initial conditions, is given by the generating function
W(z) =

U(z)
.
H(z)

(5.18)

The method of the generating function may also be directly applied to solve a
system of equations.
Multiply the equation (5.2) by z n , and sum for n from 0 to infinity

x(n)z n = A

x(n 1)z n +

n=0

- 5.25 -

u(n)z n .

Then setting
X(z) =

x(n)z n

and

U(z) =

u(n)z n

where X(z) and U(z) are vectors of k entries which are function of z, we may
write the equation
X(z) = Ax(1) + zAX(z) + U(z) ,
from which we obtain
(I + zA)X(z) = Ax(1) + U(z) .
Finally, taking the inverse of the matrix I + zA we obtain the solution of (5.2) as
X(z) = (I + zA)1 Ax(1) + (I + zA)1 U(z)

(5.19)

In case of the homogeneous equation (5.3), the solution has the same form, with
the term containing the vector U(z) missing.
The generating function technique for solving recurrent linear systems of equations is illustrated by the following examples.
Example - 1.

Given the homogeneous system




0 1
y(n) =
y(n 1) ,
1 1

a formal solution obtained using (5.19) is




1 

0 1
0 1
Y(z) = I + z
y(1) .
1 1
1 1
The computation of the inverse matrix is straightforward






1
1+z z
1 0
0 1
1
.
{
+z
} =
z
1
0 1
1 1
1 + z + z2
 
a
Taking y(1) =
, where a and b are the initial conditions, in conclusion we
b
obtain the solution


1
b + az
Y(z) =
.
1 + z + z 2 a + b + bz
Example - 2.

Given the complete system




 
0 1
1
y(n) =
y(n 1) +
,
1 1
n
- 5.26 -

a formal solution is

1 


1 
1
z
0 1
1
z
Y(z) =
y(1) +
z 1+z
1 0
z 1+z

Taking y(1) =

a
b


.

1
Y(z) =
1 + z + z2
Example - 3.

1
1+z
z
1+z

, in conclusion we obtain the solution




b + az
a + b + bz

1
+
(1 + z + z 2 )(1 + z)2

1 + z + z2
z + z2


.

Given the homogeneous system of order 3

0 1 0
y(n) = 0 0 1 y(n 1) .
1 1 0

a formal solution is obtained using equation (5.19)

1 + z2
z
z2
c1
1
z2
1
z c2 .
Y(z) =
1 + z2 + z3
2
c2
z
z+z 1
Assuming yT (1) = (1, 0, 0), the arbitrary constants are easily found: (c1 , c2 , c3 ) =
(0, 0, 1), thus the corresponding particular solution is
2
z
1

z ,
Y(z) =
1 + z2 + z3
1
whose components are explicitly
y1 (z) =

z2
= z2 + z4 + z5 + z6 + . . .
1 + z2 + z3

z
= z + z3 + z4 + z5 + . . .
1 + z2 + z3
1
y3 (z) =
= 1 + z2 + z3 + z4 + . . . .
2
3
1+z +z
y2 (z) =

5.8.2

Characteristic equation methods

A general solution of linear recurrent homogeneous equations may be obtained


using the classic method of the characteristic equation. The basic idea exploits the
fact that the set of solutions is a vector space of dimension k, thus it is identified
- 5.27 -

by a set of k linearly independent particular solutions. The method looks for


special solutions of the form
y(n) = n

6= 0 .

(5.20)

Let (5.12) be written in the form


y(n) + h1 y(n 1) + . . . + hk y(n k) = 0

(5.21)

then substituting the claimed solution (5.20) and assuming 6= 0, we obtain an


algebraic equation
n + h1 n1 + . . . + hk nk = 0 ,
which should be satisfied by any n; therefore must be a root of the polynomial
ch(z) = z k + h1 z k1 + . . . + hk .

(5.22)

Let 1 , . . . , s be the distinct


Ps roots of ch(z) with respective multiplicities `1 , . . . , `s
satisfying the condition i=1 `i = k. If `i = 1 for every i = 1, . . . , s, then s = k,
the roots of ch(z) are distinct, and the general solution of (5.21) can be written in
the form
y(n) = c1 n1 + c2 n2 + . . . + ck nk ,
(5.23)
where c1 , c2 , . . . , ck are arbitrary constants. The roots i belong to some Galois
field F2m , thus also the constants ch belong to the same field. The exponent m is
obtained by considering a polynomial w(z) product of the irreducible factors of
m
ch(z); m is thus the minimum integer such that w(z) divides z 2 1 1.
When the multiplicity `i of a root is greater than 1, then `i linearly independent
solutions are needed to form a base of the vector space of the solutions of (5.21).
In this case, besides ni we have the `i 1 solutions
nni , n2 ni , , n`i 1 ni ,
which, together with ni , are equivalent to a set of `i independent solutions
ni , nni , n(n 1)ni , . . . , n(n 1) (n `i + 2)ni .
The proof is straightforward, and is done recursively. Assuming `i > 1, it is first
proved that n`ni is a solution of (5.21); substituting the claimed solution into (5.21)
and dividing by , we obtain
n`n1
+ (n 1)h1 `n2
+ . . . + (n k)hk `nk1
=0 .
i
i
i
This equation shows that i is a root of the derivative of z nk ch(z) with respect to
z, that is i is a root of
d(ch(z))
d(z nk ch(z))
= (n k)z nk1 ch(z) + z nk1
,
dz
dz
- 5.28 -

which is true because i is a multiple root of ch(z). Similarly, if we substitute into


(5.21) the solution
n(n 1) (n j)`ni , j `i 2
and divide by j1 , we obtain
n(n 1) (n j)`inj+1 + (n 1)(n 2) (n j 1)h1 `nj
+ ...
i
+ (n k)(n k 1) (n k j 1)hk `inkj+1 = 0 ,
which shows that i must be a root of the j-derivative of ch(z).
The same method may be directly applied to the first order homogeneous linear
system (5.3). In this case, the solution depends on the eigenvalues of the matrix
A, that is, on the roots of the characteristic equation of A.
Since the set of solutions of (5.3) is a k-dimensional vector space, given k linearly
independent solutions
y(1) (n), y(2) (n), . . . , y(k) (n) ,
a general solution will be of the form
y(n) = c1 y(1) (n) + . . . + ck y(k) (n) .

(5.24)

Looking for particular solutions of (5.3) of the form


y(n) = n v ,

(5.25)

from a direct substitution of (5.25) into (5.3) we have


n v = An1 v ,
which is true if and only if
(I + A)v = 0 ,
which implies that must be an eigenvalue of A and v the corresponding eigenvector. If A has k distinct eigenvalues, or in the case of multiple eigenvalues i
of multiplicity `i , then `i associated linearly independent eigenvectors exist, the
general solution can be written as
y(n) = c1 n1 v1 + . . . + ck nk vk ,

(5.26)

where the eigenvalues j , j = 1, 2, . . . , k, are not necessarily distinct.


When an eigenvalue i has multiplicity greater than 1, with a number of linearly
independent associated eigenvectors less than the multiplicity, the situation is
more complex and will not be debated here.
The procedures are illustrated in the following examples.
Example - 4.

Consider the homogeneous system of order 2



y1 (n) = y2 (n 1)
,
y2 (n) = y1 (n 1) + y2 (n 1)
- 5.29 -

which in matrix form is



y(n) =

0 1
1 1


y(n 1) .

The eigenvalues of the coefficient matrix are roots of the characteristic polynomial

ch() = det(I + A) = det

1
1 1+

= 2 + + 1 .

Since 2 + + 1 is an irreducible polynomial over F2 , the two distinct roots 1 =


and 2 = 1 + belong to F4 . The eigenvector associated to and that associated
to 1 + are obtained by solving the two linear systems of equations


v1 + v2
= 0
v1 + (1 + )v2 = 0


and

(1 + )v1 + v2 = 0
v1 + v2
= 0

respectively. Finally, the general solution of the given system of recurrent equations is


 
1+

n
n
y(n) = c1
+ c2 (1 + )
.
1
1
where, in order for the solution to belong to F2 if c1 = a + b, then c2 = a + b + b.
Example - 5. Consider the recurrent equation of second order over F2 :
y(n) = y(n 2) ,
which has characteristic equation
2 + 1 = 0
with double root = 1 in F2 : a solution of (5.27) is y1 (n) = 1n = 1.
A second linearly independent solution is of the form
y(n) = nn .
Substituting into (5.27) we have
nn = (n 2)n2 ,
which implies
n(2 + 1) = 0 = = 1 .
In conclusion, a general solution of (5.2) is
y(n) = c1 + nc2 .
- 5.30 -

(5.27)

5.9

Appendix III - Tridiagonal matrices and LFSRs

In this appendix, the connection between linear recurrent equations of order k,


and linear recurrent systems of order 1 in k equations over F2 , is exploited to
show when and how tridiagonal matrices with given characteristic polynomial
may be constructed. Material and developments are all taken from [22].
Consider the linear homogeneous equation of order k
y(n) = h1 y(n 1) + h2 y(n 2) + . . . + hk y(n k)

hk 6= 0 .

(5.28)

Setting

y(n)
= yk (n)

y(n 1)
= yk1 (n)
..

y(n k + 1) = y (n)
1

we may write the following linear recurrent system

y1 (n)
= y2 (n 1)

= y3 (n 1)
y2 (n)
..
.

yk1 (n) = yk (n 1)

y (n)
= h1 yk (n 1) + h2 yk1 (n 1) + . . . + hk y1 (n 1)
k

where the last equation has been written making the proper substitutions in
(5.28). Setting yT (n) = (y1 (n), y2 (n), . . . , yk (n)), we may write the system of equations in matrix form
y(n) = Ay(n 1) ,
(5.29)
where

0
0
..
.

1
0

A=

0
0
hk hk1

0
0

0 ... 1
. . . h2 h1
0
1

...
...
..
.

is the coefficient matrix. Clearly, the system (5.29) is equivalent to the equation
(5.28), and vice versa. The k k matrix is known as the companion matrix of the
polynomial ch(z), the characteristic polynomial of the linear recurrent equation
(5.28). Moreover, det(zI+A) = ch(z), that is ch(z) is the characteristic polynomial
of A.
Definition 5.10. The linear recurring homogeneous system defined as
w(n) = AT w(n 1) ,

(5.30)

where w(n) is a column vector of k unknown functions, is called the conjugate system of
the linear recurring system defined by equation (5.29).
- 5.31 -

Since A and AT have the same characteristic polynomial, both conjugate systems
are equivalent to the equation (5.28). This equivalence is evident if we write the
solutions of the systems in matrix form, using the generating functions and considering the transpose of the equation (5.30)
Y(z) = (I + zA)1 Ay(1) and WT (z) = wT (1)A(I + zA)1 .
It is useful, for later use, to write the conjugate system (5.29) explicitly

0 0 0 . . . 0 hk
1 0 0 . . . 0 hk1

.. .. ..
w(n) = ...
w(n 1) ;
.
.
.

0 0 0 . . . 0 h2
0 0 . . . 0 1 h1
this corresponds to k linear equations of the form

w1 (n)
= hk wk (n 1)

= w1 (n 1) + hk1 wk (n 1)
w2 (n)
..
.

w
(n)
=
wk2 (n 1) + h2 wk (n 1)

k1

w (n)
= wk1 (n 1) + h1 wk (n 1)
k

(5.31)

For later use, let us introduce the cyclic matrix

0 1 0 ... 0
0 0 1 0 ...

..
.
.
.. .. 0
C= .
,

0 0 ... 0 1
1 0 ... 0 0
which satisfies the property Cn = I.
Let U be a binary nonsingular matrix, then the matrix B = UAU1 has the same
characteristic polynomial as A. Therefore the linear recurrent system
w(n) = UAU1 w(n 1) ,

(5.32)

is equivalent to the linear recurrent equation (5.28). The interesting point is that
the matrix B can have special structures which are useful in many applications.
A particularly interesting structure is the tridiagonal form

d0 1 0
...
0
1 d1 1
...
0

..
..
B = ...
.
.
.

0 . . . 1 dk2
1
0 0 ...
1
dk1
- 5.32 -

We will see that not every polynomial ch(z) of degree k can be the characteristic polynomial of a tridiagonal matrix (i.e. not every linear recurrence (5.28) is
equivalent to a tridiagonal system), although an important class of k-degree polynomials have tridiagonal matrices. In this context, the following matrices will be
useful

d0 1 0
...
1
1 d1 1
...
0

..
..
F = ...
(5.33)
,
.
.

0 . . . 1 dk2
1
1 0 ...
1
dk1
and the diagonal matrix D = diag(d0 , d1 , . . . , dk1 ), which allows us to write
F = D + C + CT .
Moreover, we have Ch DCh = diag(dh , d1+h , . . . , dh+k ), where the subscripts are
evaluated modk.
Many important properties of tridiagonal matrices are deduced from their strict
relation with the continued fractions. Consider the matrix zI + B, and define
D0,k1 (z) = det(zI + B) as its determinant, in other words D0,k1 (z) is the characteristic polynomial of B. Furthermore, define Dh,k1 (z) as the determinant of the
matrix obtained by suppressing the first h rows and columns from zI + B. Developing the determinant D0,k1 (z) along the last column, we obtain the second
order linear recurrence
D0,k1 (z) = (z + dk )D0,k2 (z) + D0,k3 (z) ;

(5.34)

conversely, developing the determinant D0,k1 (z) along the first column, we obtain the linear recurrence
D0,k1 (z) = (z + d0 )D1,k1 (z) + D2,k1 (z) .

(5.35)

This equation, upon division by D1,k1 (z), can be written in the form
1
D0,k1 (z)
= (z + d0 ) +
,
D1,k1 (z)
D1,k1 (z)
D2,k1 (z)
which shows that D0,k1 (z) is the numerator of the k-th convergent of a regular
finite continued fraction
1
z + d0 +
.
1
z + d1 +
1
z + d2 +
..
1
.+
z + dk1
The denominator D1,k1 (z) of the k-th convergent satisfies a second order linear
recurrence
D1,k1 (z) = (z + d1 )D2,k1 (z) + D3,k1 (z) .
(5.36)
- 5.33 -

Let [z + d0 , z + d1 , , z + dk1 ] denote a continued fraction expansion, and let


[[z + d0 , z + d1 , , z + dk1 ]] denote the numerator of the k-th convergent. Clearly
we have
D0,k1 (z)
= [z + d0 , z + d1 , , z + dk1 ] ,
D1,k1 (z)
and
D0,k1 (z) = [[z + d0 , z + d1 , , z + dk1 ]]
D1,k1 (z) = [[z + d1 , z + d2 , , z + dk1 ]] .
Euler pointed out the symmetry
[[z + d0 , z + d1 , , z + dk1 ]] = [[z + dk1 , z + dk2 , , z + d0 ]]

(5.37)

which directly follows from the identity


det(zI + B) = det(K(zI + B)K) = det(zI + KBK)
where K is the all-zero matrix having the second diagonal filled with 1s

0 0 0 ... 1
0 0 ... 1 0

..

.
.
.
.
K= .
,
.
.

0 1 ... 0 0
1 0 ... 0 0
and satisfying the equation K2 = I, i.e. K is involutory. The number of k-degree
polynomials is 2k , but the symmetry shown by (5.37) implies that the number of
tridiagonal matrices with distinct characteristic polynomials is less than 2k . It follows that not every k-degree polynomial can correspond to a tridiagonal matrix.
This fact poses the interesting problem of characterizing the k-degree polynomials that have tridiagonal matrices. A first answer is given by the following theorem, shown in [11], which uses the formal derivative ch(z)0 of the characteristic
polynomial. The proof given here is a shortened version of that given in [11], and
needs a lemma that will be proved as a preliminary.
Pk1
Lemma 5.4 ([11]). Let w1 =
i=0 di mod 2 be the trace modulo 2 of F, and w0 =
k mod 2 be the remainder of the dimension k modulo 2. Let 0,k1 (z) = det(zI + F) be
the characteristic polynomial of the matrix F, then

q(z)2
if w0 = 0 w1 = 0

2
(z + 1)q(z)
if w0 = 1 w1 = 1
0,k1 (z) =
.
(5.38)
2
zq(z)
if w0 = 1 w1 = 0

z(z + 1)q(z)2
if w0 = 0 w1 = 1
Proof. The proof is by induction on k, and consists of 10 cases. The base cases can
be checked by calculating the characteristic polynomials of all cyclic matrices for
- 5.34 -

k 4. These initial values are collected in the following table


k
2

(z + 1)2 or z 2
z2 + z + 1
3
z(z + 1)2 or z z 2
(z + 1)z 2 or (z + 1)(z + 1)2
4 z 4 or (z 2 + z + 1)2 or (z + 1)4 or z 2 (z + 1)2
z(z + 1)(z + 1)2 or z(z + 1)z 2

w1
0
1
0
1
0
1

The exception of 2 2 matrices with trace 1 is not used in the recursion for k 5.
The first eight cases apply when D has at least two equal adjacent elements dj =
dj+1 that, in view of the property
Ch FCh = Ch DCh + C + CT = diag(dh , d1+h , . . . , dh+k ) + C + CT ,
can be assumed to be d0 = dk1 by choosing a convenient h. Developing the
determinant det(zI + F) along the first row, we get the relation
0,k1 (z) = (z + d0 )D1,k1 + D2,k1 + D1,k2 = D0,k1 + D1,k2

(5.39)

which implies 1,k1 (z) = D1,k1 + D2,k2 and 1,k2 (z) = D1,k2 + D2,k3 , thus,
we have
0,k1 (z) = (z + d0 )1,k1 (z) + 1,k2 (z) .
Consider the case that k is odd, w = 0 = 1, and supposing d0 = dk1 = 1, we have
0,k1 (z) = (z + 1)1,k1 (z) + 1,k2 (z) = (z + 1)q1 (z)2 + (z + 1)q2 (z)2
.
= (z + 1)(q1 (z) + q2 (z))2
Assuming d0 = dk1 = 0, we have
0,k1 (z) = z1,k1 (z) + 1,k2 (z) = z(z + 1)zq1 (z)2 + (z + 1)q2 (z)2
.
= (z + 1)(zq1 (z) + q2 (z))2
With similar arguments, we may dispose of the eight cases. The last two cases
appear when it is not possible to have d0 = dk1 , an event that may occur only
if k is even. In this case we may have D = diag(0, 1, 0, 1, . . . , 0, 1). This special
structure gives the equation
0,k1 (z) = z(z + 1)2,k1 (z) + 2,k1 (z) .
It is straightforward to check that the inductive hypothesis holds. This completes
the proof.
The identity (5.37) implies that two continued fractions of the same length k exist
that lead to the same numerator polynomial D0,k1 (z) of degree k, and to two
- 5.35 -

denominator polynomials D0,k2 (z) and D1,k1 (z). We may therefore consider
these latter two polynomials as roots of a second degree equation
y 2 + 1 y + 2 mod ch(z) ,
where 1 = D0,k2 (z) + D1,k1 (z), and 2 = D0,k2 (z)D1,k1 (z). The values of
1 mod ch(z) and 2 mod ch(z) will be specified in next the theorem.
Theorem 5.8 ([11]). A polynomial ch(z) is a characteristic polynomial of a tridiagonal
matrix, only if the quadratic equation
y 2 + z(z + 1)ch(z)0 y + 1 = 0 mod ch(z) ,

(5.40)

has at least two solutions over Z2 [z].


Proof. Let Pj and Qj be numerator and denominator of the j-th convergent of the
continued fraction
[z + d0 , z + d1 , . . . , z + dk2 , z + dk1 ] ,
and let pj and qj be numerator and denominator of the j-th convergent of the
continued fraction
[z + dk1 , z + dk2 , . . . , z + d1 , z + d0 ] ,
then we have
D0,k1 (z)
Pk2
D0,k2 (z)
pk1
Dk1,0 (z)
D0,k1 (z)
Pk1
=
,
=
and
=
=
.
Qk1
D1,k1 (z)
Qk2
D1,k2 (z)
qk1
Dk2,0 (z)
D0,k2 (z)
Notice that the two associated continued fractions we are interested in are
k1
. In particular, their denominators are D1,k1 (z) and D0,k2 (z), thus
and pqk1

Pk1
Qk1

1 = D1,k1 (z) + D0,k2 (z) and 2 = D1,k1 (z)D0,k2 (z) .


It is immediate to see that 2 = 1 mod ch(z), because they are contiguous convergents of a continued fraction and satisfy the relation Pk1 Qk2 + Pk2 Qk1 =
1 mod 2, [17]; we have
D0,k1 (z)D1,k2 (z) + D1,k1 (z)D0,k2 (z) = 1 mod 2 ,
and ch(z) = D0,k1 (z).
The computation of 1 is based on Lemma 5.4. Defining B = {1, z, z + 1, z(z + 1)},
Lemma 5.4 allows us to write 0,k1 (z) = (z)q(z)2 , with B. Using equation
(5.39) we have
0,k1 (z) = D0,k1 (z) + D1,k2 (z)
performing successively the substitutions d0 d0 + 1, dk1 dk1 + 1, and
d0 d0 + 1 and dk1 dk1 + 1, we have
0,k1 (z) = [D0,k1 (z) + D1,k1 (z)] + D1,k2 (z) = (z)q2 (z)2

- 5.36 -

(5.41)

0,k1 (z) = [D0,k1 (z) + D0,k2 (z)] + D1,k2 (z) = (z)q3 (z)2

(5.42)

0,k1 (z) = [D0,k1 (z)+D0,k2 (z)+D1,k1 (z)+D1,k2 (z)]+D1,k2 (z) = D0,k1 (z)+1

(5.43)
since D1,k2 (z) remains unchanged. Combining (5.41) and (5.42) we have
1 = (z)[q2 (z) + q3 (z)]2 = (z)q1 (z)2 ,

(5.44)

while from (5.42) we have


D0,k1 (z) + 1 = 1 (z)q4 (z)2 .
dD

(5.45)
(z)

is a perfect square and


Since the polynomials are over F2 , the derivative 0,k1
dz
coincides, when multiplied by z, with the odd power part of D0,k1 (z). To complete the proof we need to consider four cases
Using equations (5.44) and (5.45) we may write

1) (z) = 1, 1 (z) = z(z + 1) :


1 = q1 (z)2 , and

D0,k1 (z) = q1 (z)2 + z(z + 1)q4 (z)2 = [q1 (z)2 + z 2 q4 (z)2 ] + zq4 (z)2 ,
which implies that

dD0,k1 (z)
dz

= q4 (z)2 , thus we have

1 = q1 (z)2 = z(z + 1)
2) (z) = z(z + 1), 1 (z) = 1 :
1 = z(z + 1)q1 (z)2 , and

dD0,k1 (z)
.
dz

Using equations (5.44) and (5.45) we may write

D0,k1 (z) = z(z + 1)q1 (z)2 + q4 (z)2 = [z 2 q1 (z)2 + q4 (z)2 ] + zq1 (z)2 ,
which implies that

dD0,k1 (z)
dz

= q1 (z)2 , thus we have

1 = z(z + 1)q1 (z)2 = z(z + 1)


3) (z) = z, 1 (z) = z + 1 :
1 = zq1 (z)2 , and

dD0,k1 (z)
.
dz

Using equations (5.44) and (5.45) we may write

D0,k1 (z) = zq1 (z)2 + (z + 1)q4 (z)2 = z[q1 (z)2 + q4 (z)2 ] + q4 (z)2 ,
which implies that

dD0,k1 (z)
dz

= q1 (z)2 + q4 (z)2 , thus we have

q4 (z)2 = z[q1 (z)2 + q4 (z)2 ] mod D0,k1 (z) = z


which implies q1 (z)2 = (z + 1)

dD0,k1 (z)
dz

mod D0,k1 (z), and finally

1 = zq1 (z)2 = z(z + 1)


- 5.37 -

dD0,k1 (z)
mod D0,k1 (z)
dz

dD0,k1 (z)
.
dz

4) (z) = z + 1, 1 (z) = z : Using equations(5.44) and (5.45) we may write


1 = (z + 1)q1 (z)2 , and
D0,k1 (z) = (z + 1)q1 (z)2 + zq4 (z)2 = z[q1 (z)2 + q4 (z)2 ] + q1 (z)2 ,
which implies that

dD0,k1 (z)
dz

= q1 (z)2 + q4 (z)2 , thus we have

q1 (z)2 = z[q1 (z)2 + q4 (z)2 ] mod D0,k1 (z) = z

dD0,k1 (z)
mod D0,k1 (z)
dz

which implies
1 = (z + 1)q1 (z)2 = z(z + 1)

dD0,k1 (z)
.
dz

Corollary 5.1 ([11]). For a polynomial ch(z) of degree n to be the characteristic polynomial of a tridiagonal matrix it is sufficient that (5.8) has at least two solutions q1 (z)
and q2 (z) in Z2 [z] which are polynomials of degree n 1, and that the continued fraction
has length n.
expansion of ch(z)
q1 (z)
The following theorems yield necessary or sufficient conditions for a polynomial
p(z) to be the characteristic polynomial of some tridiagonal matrix. They also
offer an algorithm for constructing the tridiagonal matrix, and show that any
irreducible polynomial has a tridiagonal matrix. The main tool is the Lanczos
tridiagonalization algorithm, which is now briefly recalled from [40] adapted to
our case. Two square matrices A and B are similar if a nonsingular matrix S
exists such that B = SAS1 . For two given column vectors x and y, and a matrix
A, define two Krylov matrices as
K(A, x) = (x, Ax, A2 x, . . . , An1 x) , K(AT , y) = (y, (AT )y, . . . , (AT )n1 y) .
Assume that K(A, x) is nonsingular, then
CA = K(A, x)1 AK(A, x)
is a companion matrix for the characteristic polynomial of A. Moreover, if R is
any nonsingular upper triangular matrix and S = K(A, x)R then S1 aS is in
upper Hessemberg form. Suppose also that G = K(AT , y) is non singular and
that K(AT , y)T K(A, x) can be written as LDU, in which L is lower triangular, U
is upper triangular, and D is diagonal. Therefore, there exist nonsingular upper
triangular matrices R and Q such that
(K(A, x)R)1 = K(AT , y)T Q and the matrix T = QT K(AT , y)T AK(A, x)R
is tridiagonal. The matrix G of a companion matrix C is a Hankel matrix, that is
a matrix of the form

h1 h2 . . . hn1
hn
h2 h3 . . . hn
hn+1

H = h3 h4 . . . hn+1 hn+2 .
..
..
..
..
..
.
.
.
.
.
hn hn+1 . . . h2n2 h2n1
- 5.38 -

Conditions for the decomposition G = LU are given by the following Theorems.


Theorem 5.9 ([40]). A nonsingular n n matrix A = (aij ) has an LU decomposition
if and only if all leading principal minors are nonzero.
Proof. The decomposition is possible if a lower triangular matrix L1 exists such
1
that L1 A = U ; in this event, A = L1
1 U with L = L1 . The existence of a low
triangular L1 is implied by the Gauss triangularization procedure, provided no
row permutations are needed. Therefore necessarily a11 6= 0, and a21 is made zero
by multiplying A by the matrix

1
0 0 0
a21 1 0 0
a11

0 1 0
L10 = 0

.
.
0
. 0
0 0
0
0 0 1
21
a12 , and thus a022 is not zero if and only
which substitutes a22 with a022 = a22 aa11
if a22 a11 a21 a12 6= 0. In a similar way, all elements of the first column are made
zero, after which the elements of the second column are made zero. Therefore,
a033 is a multiple of the leading minor of order 3, and must not be zero in order
to zero the elements below it in the third column. The recursion is evident and
enables the proof to be completed.

Theorem 5.10 ([40]). A binary nonsingular Hankel matrix H has an LU decomposition


if and only if the following equation holds

h1 = 1
.
(5.46)
hi + h2i + h2i+1 = 0 i = 1, 2, . . . , n 1
Proof. Let H be a Hankel matrix, then by Theorem 5.9, it has an LU decomposition
if and only if all leading principal minors are nonzero, thus h1 = 1. The second
leading principal minor is h1 h3 + h22 = h3 + h2 = 1, thus matrix H can be reduced
to the form

1 h2
h3

hn
0 1 h4 + h3 h2 hn+1 + h2 hn

h3 h4
.
h

h
5
n+2

..
..
..
..
...
.
.
.
.
The third leading minor is h5 +h3 +h4 +h4 h3 h2 = 1, which implies h5 +h4 +h2 = 0
because h2 h3 = 0, and 1 + h3 = h2 . The fourth leading minor can be reduced to
the form



1 h2 h3
1 h2 h3

h
h
4
4



0 1 h4 h5 + h4 h2 0 1 h4 h5 + h4 h2


,
0 0 1 h6 + h2 h4 = 0 0 1
h6 + h2 h4


h4 h5 h6
0 0 0 h7 + h6 + h2
h7
which implies h7 + h6 + h3 = 0. The argument can be iterated to show the claimed
recursive conditions.
- 5.39 -

Let p(z) be an n-degree irreducible polynomial with companion matrix C. Then


Hx = K(C, x) is nonsingular for any non-zero vector x. Define Hy = K(CT , e),
with e = (1, 0, , 0)T , then G = HTy Hx is a nonsingular Hankel matrix. The
problem is to show that x can be chosen such that G admits an LU decomposition.
Theorem 5.11 ([75]). Any irreducible polynomial p(z) of degree n over F2 is the characteristic polynomial of a tridiagonal matrix.
Proof. The conclusion amounts to proving that a vector x exists such that the
nonsingular matrix G = HTy Hx admits an LU decomposition. Since G is a Hankel
matrix, the equations (5.46) impose n linear conditions on n unknowns. Although
these conditions are not independent, in our case the effective conditions are n1,
so that we have two solutions. Therefore, two tridiagonal matrices are similar to
C, the companion matrix of p(z).
The steps of the algorithm, given a feasible characteristic polynomial ch(z), are
1. Construct the companion matrix C of ch(z)
2. Construct the matrix B from the n polynomials
z i + z 2i1 + z 2i mod ch(z) i = 1, . . . , n .
3. Consider the vector en = (0, 0, . . . , 1)T , and solve the equation Bv = en ,
4. Construct the matrix
H = [v, Cv, C2 v, , Cn1 v] .

- 5.40 -

Chapter 6
Public-key Cryptography
Nothing doth more hurt in a state than
that cunning men pass for wise.
Francis Bacon

6.1

Introduction

In 1976 two researchers at Stanford University, Whitfield Diffie, and Martin Hellman, invented the Public Key cryptography, with the aim of providing a method
for exchanging secret information (at that time secret keys) on a public channel
without any private agreement. At the origin of Diffie and Hellmans studies was
the explosion, in the 1970s, of the electronic exchanges of sensitive documents;
complex banking systems were also calling for inexpensive systems of secret key
distribution that would avoid couriers or other costly movement of people. But
Diffie and Hellman did more: they introduced a new way of looking at cryptography. The consequences have been very far reaching, and the basic concept
(although not entirely understood) of one-way function lead to many information
protection formats that are indispensable in the modern globalised society. A society that is dominated by global communication systems, electronic signatures,
message authentication, remote control of access, and mobile or cellular phones.
All these remote and secure functions are indispensable in applications such as
e-commerce, home-banking, issue of personal documents, e-medicine, and electronic voting.
Diffie and Hellmans discovery triggered very intense activity in the fields of
proposing public key mechanisms and evaluating their security or strength. This
activity favored a kind of renaissance in subjects such as number theory and discrete mathematics, and great progress was made in domains that had been almost
entirely neglected (or at most dominion of a selected few) for centuries.
It is claimed that the notion of the public key was already known to the British
Intelligence Services at least 10 years earlier, but that it was kept secret. However,
the merit of having created a totally new discipline and opening a new approach
to cryptography undoubtedly belongs to Diffie and Hellman.
- 6.1 -

Diffie and Hellmans key exchange. In their seminal paper [20], Diffie and
Hellman proposed, by way of example, a mechanism for public key exchange
on a public channel based on the difficulty of computing the discrete logarithm
in a cyclic group.
The scheme that they invented has become a standard model for describing any
public key cryptographic scheme. The problem
BOB and ALICE want to establish private communication on a public channel (i.e. home phone) by means of a stream cipher, and need to share a common
secret key that must be communicated on the same unsafe channel without
any previous agreement.
The solution consists of three steps:
Step 1: BOB and ALICE agree to operate with a cyclic group CN of order N , and
to use a generator CN of the group.
Step 2: BOB and ALICE perform the following operations
1: ALICE chooses a random number X < N , computes Xp = X , and sends
this number to BOB;
2: BOB chooses a random number Y < N , computes Yp = Y , and sends
this number to ALICE;
3: ALICE, using Yp , computes the common key as
K = YpX = (Y )X = XY
4: BOB, using Xp , computes the common key as
K = XpY = (X )Y = XY
Step 3: BOB and ALICE start their private conversation encrypted with a stream
cipher (a secret-key encryption system) using the common secret key K.
An adversary (typically the bad EVE) who wants to know K observes the public conversation, and thus knows the algebraic domain where the operations are
performed, and three elements of CN
, Xp , Yp ,
from which she would like to recover the common key K = XY .
If it is easy to compute the discrete logarithm L (.) in CN , then from Xp , the
adversary finds X, and then knowing Yp she computes K = XY .
It is believed that computation of the discrete logarithm is generally a difficult
problem, that is the exponentiation y = x and the inverse operation x = L (y)
are easy and hard respectively: in other words, they define what is known as a
one-way function. However, concerning the Diffie and Hellmans scheme, the
question of whether its strength is equivalent to that of the discrete logarithm has
not yet been settled. It has been conjectured that to compute XY from , X , and
Y is equivalent to the discrete logarithm problem, but the equivalence question
is still open.

- 6.2 -

6.1.1

One-way functions

The public key concept, just described, is based on the introduction of a novel
notion of function: the one-way function. Intuitively this concept may be accepted
but, for a formal definition of one-way function, we must have at our disposal a
measure of computational complexity that is axiomatically defined, and that can
be used to evaluate the complexity of computing a given function in any point of
its definition domain. A detailed discussion about complexity measures is given
in Chapter 8; for the moment we will assume that such a measure relative to a
function f exists, and the complexity of f is denoted C(f ).
To clarify what is easy and what is hard to compute, we will informally introduce some notions that restrict the vagueness of the concept of one-way function.
Assume that f is defined over a domain D of cardinality N :
We say that f can be easily computed if C(f ) = O((log N )s ) for some finite
(possibly small) rational s.
We say that f is hard to compute if C(f ) = O(exp(r log N )) for some finite
small rational number 0 r < 1.
Definition 6.1. Let D and F be, possibly different, finite domains. A one-way function
F (.) from D into F is an invertible function whose images can be easily computed, and
whose counter-images are hard to compute. That is
y = F (x) F,

x D

can be easily computed, while


x = F 1 (y) D,

y =(D) F

must be hard to compute.


The given intuitive definition of easy and hard computations suffices to give an
idea of what we intend of one-way functions not only qualitatively. We will see
some less abstract notions and some concrete examples of computational complexity in discussing the known realizations of functions that are used as though
they were one-way. In Chapter 8 we will describe more accurately the status of
the search for an axiomatic definition of computational complexity satisfying the
axioms of mathematical measure. It is remarked that, rigorously speaking, such
a measure is still missing.
Abstract paradigm. To formally state the mechanism of public key cryptosystems, a group G and a ring R are generally considered. The key idea was to use
different keys on the encryption and on the decryption sides: the encryption key
E R is produced by the addressee and is made public in some way, possibly
through a public directory or is communicated to the sender over a public channel; the decryption key D R is generated together with E by the addressee and,
- 6.3 -

unlike E, is kept secret.


Let M G and X G be the message to be encrypted and the encrypted
message, respectively. The encryption scheme is characterized by two functions
f : G R G and g : G R G such that
X = f (M, E)

M = g(X, D) ,

where the pair [f, E] constitutes the public key used by the sender to encrypt, and
the pair [g, D] constitutes the private key used by the addressee to decrypt.
Both functions f and g should be chosen such that, from a knowledge of the
public key, it is not possible to recover either the private key [g, D] or in particular
the message M .
Since the decryption key is different from the encryption key, these algorithms
are said to be asymmetric. The example given by Diffie and Hellman was based
on the function f (x) = x which is an instance of one-way function, because it is
easy to compute ax , as we will see below, but it is difficult to recover x from f (x).
In many typical realizations, G is the cyclic group of the multiplicative structure
of a finite field, for example Zp , the field of remainders modulo a prime p, and R
is the additive structure of the ring of remainders modulo p 1.
A numerical idea, that holds in general, of the contrast between easy and hard
to compute is offered by the complexity of exponentiation and the complexity of
discrete logarithm computation.
Evaluation of powers. Let x be an element of a multiplicative cyclic group C of
order N , and n be a positive integer less than N . The power xn (function exp) can
be evaluated with no more than 2 log2 N multiplications in C [47]. The algorithm
can be easily described:
Setting s = blog2 N c, and writing n < N in base 2
n = b0 + b1 2 + b2 22 + + bs 2s
we have

2 ++b

xn = xb0 +b1 2+b2 2


Since

s
s2

bi = 0, 1 ,
2

= xb 0 xb 1 2 xb 2 2 xb s 2 .
i

xbi 2 = (x2 )bi ,


the power xn can be obtained by evaluating the sequence of s powers
2

x, x2 , x2 , . . . x2 , . . . , x2

i+1

which can be recursively done with s squares, since x2 = (x2 )2 , and performing s products at most, note that the powers y bi cost nothing, because bi = 0, 1.
Assuming that the complexity of squaring is equivalent to the complexity of multiplying, the number of products sufficient to compute xn is at most
s + s = 2blog2 N c = O(log2 N ) ,
and in conclusion C(exp) = O(log2 N ).
- 6.4 -

Shanks algorithm for computing the discrete logarithm. The idea for computing the discrete logarithm with a complexity smaller than that of an exhaustive
search is fairly simple, and was proposed by Daniel Shanks in the 1970s. No better algorithm that works in general cases has yet been found.
Let g be a generator of a cyclic group C of order N . Given x C, the integer n
such that x = g n is called the discrete logarithm of x to base g, and is written
n = Lg (x). The discrete logarithm satisfies all the properties of usual logarithms,
although the most typical property, that is the logarithm of the product
Lg (xy) = Lg (x) + Lg (y) mod N

must be considered modulo N , since n < N is actually an element of ZN .


The basic idea
n of a given x is to write
discrete logarithm

for computing the


n = n0 + n1 b N c, where n0 < b N c, and n1 < d N e, and the problem is to find
both n0 and n1 . Writing
 n1

x = g n0 +n1 b N c = g n0 g n1 b N c = g n0 g b N c
,
we may derive the key equality
 n1
xg n0 = g b N c
,
which suggests the following procedure for finding n0 and n1 :
1. Construct a table with two columns; in the first column
 write
n1the variable

b Nc
0 n1 < d N e, and in the second column the power g
:
2. Sort the table rows according to the second column.
These operations can
be done off line and require a memory of size O( N ).
Note that search in sorted sequences of length m has complexity O(log m).

3. Compute sequentially zn0 = xg n0 for n0 varying from 0 to b N c, then


search for z in the second column of the Table. When it is found then we
have n0 , and meanwhile n1 fromthe first column.
This step requires no more than N iterations.
In conclusion, the complexity
of computing the discrete logarithm is at most of

the order C(DL) = O( N ).


Remark: Complexity bounds. The pair f and g of direct and inverse functions,
introduced in the above example, is an instance of one-way function whose complexities
of direct and of inverse computation are of the order O(log2 N ) and

O( N ), respectively. These rather naive complexity bounds have not been improved after forty years of research, and seem to be the contrasted bounds for
defining one-way functions. Theoretically they cannot be improved, in the sense
that the lower bound order cannot be reduced and the upper bound order cannot
be increased.
- 6.5 -

Definition 6.2. The elements of a sub-class of one-way functions satisfying the further
condition of being hard to invert only in the absence of suitable private information, are
called trapdoor one-way functions, or trapdoor functions for short.
Trapdoor functions may be weaker than one-way functions without trapdoor.
The question is open, but what is remarkable is that many applications of the
public key concept necessarily require trapdoor functions.
Since the concept of public key was introduced along with the notion of one-way
function, myriads of algorithms, taken from many different areas of mathematics, has been proposed claiming that they comprise one-way functions. Most of
these algorithms are based on elementary properties of the integer numbers. The
hypothesis that one-way functions exist is based on the matter of fact that certain functions have never been easy to invert, or that an algorithm that efficiently
computes their inverse has not yet been found. What emerged, and defines the
present status, is that three kinds of problems (all taken from number theory)
provide the following inverse functions, which are difficult to compute:
1. discrete logarithms in cyclic groups
2. factoring of integers
3. searching unsorted data.
At present, no better way to compute these inverse functions, defined on sets of
size N
, has been found than to perform skilled exhaustive searches of complexity O( N ). The corresponding direct functions have small complexity, that is
they may be computed with complexity O(log N ). This state of affairs is better
explained in Chapter 8, which is devoted to computational complexity.
Although the existence of one-way functions has never been proved, public key
cryptography is pervading civil life, the economy, and banking systems. Ironically, the security, indeed the very existence, of these system is based on unproved
assumptions.
In the following sections, the most significant instances of one-way functions
used today will be described.

6.2

The RSA Scheme

In 1977, Rivest, Shamir, and Adleman proposed a one-way function based on the
difficulty of factoring a number product of two primes, which of course should
be large, i.e. each factor should have at least 100 digits. The factorization problem
was considered difficult from Gausss time, and much efforts has unsuccessfully
been devoted to finding an efficient factoring method. The genial idea of Rivest,
Shamir, and Adleman was to turn this difficulty into an advantage for cryptography.
Given two distinct prime numbers p and q, let N = pq be their product, and E be
and integer less than and relatively prime with the value (N ) = (p 1)(q 1) of
- 6.6 -

the Euler totient function. A message m to be encrypted is encoded as a number


M of the ring ZN , then M is encrypted as
C = M E mod N

The computation is easy because, as we have seen above, this exponentiation


can be done in at most 2 log2 (N ) products. The decryption operation is equally
easy if we have a number D such that D E = 1 mod (N ): the computation of
D is easy, as we will show below, if we know (N ), otherwise it is considered
difficult.
Assuming that M is relatively prime with N (a non-restrictive condition as we
will see) then the Fermat-Euler theorem says that M (N ) = 1 mod N , therefore
the condition D E = 1 + (N ), implies a chain of equalities
C D = M DE = M 1+(N ) = M mod N ,
which show that the message is easily recovered from C and D. Now, the computation of D is easy from E and (N ) using the generalized Euclid algorithm,
which allows us to easily solve the Diophantine equation xE + y(N ) = 1, whenever E and (N ) are relatively prime, as by hypothesis.
Since the procedure for computing is based on the knowledge of (N ) = (p
1)(q 1), the value of (N ) is easily computable if the factorization of N is known.
Conversely, if b = (N ) is known then the factors of N can be easily computed.
We have
b = (N ) = (p 1)(q 1) = pq (p + q) + 1 = N + 1 (p + q) ,
then given b we may compute p + q = N + 1 b, and the problem is to find two
numbers (p and q) knowing their sum and their product. This is a high-school
problem: we consider the second degree equation
Z 2 (N + 1 b)Z + N = 0 ,

(N +1b) (N +1b)2 4N
using the well known formula
and compute its roots p, q =
2
for its roots.
The outlined public key mechanism is known as the RSA algorithm (from the
initials of the inventors names, Rivest, Shamir, and Adleman). It is based on the
difficulty of factoring; however, it is not known whether it can be broken without
factoring. In other words, given E and N , it has not been proved that factoring
N and inverting the function f (x) = M E mod N are polynomially equivalent
problems.
From the practical standpoint, the prime numbers used should have more than
150 digits.
Historical curiosity. In August 1977, in Martin Gardners Mathematical Games
column in Scientific American, a problem posed by Rivest, Shamir, and Adleman
appeared. It consisted of the following challenge:
- 6.7 -

Alice broadcasts her public exponent E and her modulus N , where E = 9007 and
N = 1143816257578888676692357799761466120102182967212423625625618429
35706935245733897830597123563958705058989075147599290026879543541
Eve has intercepted the cipher text
C = 9686961375462206147714092225435588290575999112457431987469512093
0816298225145708356931476622883989628013391990551829945157815154
What is the message exciting Eves curiosity?
This problem become known as the RSA-129 problem, because N has 129 digits.
In order to decipher the message, the only known way to proceed is to factor the
129-digit N into the product of primes.
In April 1994, a team consisting of Derek Atkins, Michael Graff, Arjen Lenstra,
and Paul Leyland succeeded in factoring RSA-129. They used the double large
prime variation of the multiple polynomial quadratic sieve factoring method. The
sieving step was carried out in 8 months by about 600 volunteers from more than
20 countries. The end result was
N = 3490529510847650949147849619903898133417764638493387843990820577
32769132993266709549961988190834461413177642967992942539798288533
When decrypted with the secret exponent the message was
The magic words are squeamish ossifrage.
The use of the RSA needs a procedure for key generation, which requires the
generation of random large prime numbers (over 100 digits) and a random number E relatively prime with the Euler totient function of the modulus. Thus, it is
necessary to specify the encryption procedure which must include an encoding
rule for the alpha-numeric texts. Lastly, a decryption algorithm must be specified
which utilizes the computational resources in some efficient (if possible optimal)
way. The most expensive computations are the modular multiplications of numbers with more than 200 digits.
The procedure is described with the hypothesis that Alice creates the public key,
and Bob wants to send a message encrypted using Alices public key.
Alices Key generation operations .
1. Randomly choose two prime numbers p and q, with more than 100
digits each
2. Compute the modulus N = pq
3. Compute the Euler totient function (N ) = (p 1)(q 1)
4. Choose E randomly (possibly in the interval [max{p, q}+1, N 2]) such
that the greatest common divisor gcd{E, (N )} is 1.
- 6.8 -

5. Compute D = E 1 mod (N ) using the generalized Euclid algorithm


6. Alices public key is the pair {N, E} that can be sent over any public
channel, or published in an open directory.
7. The private key consists of p, q, and D.
Bobs encryption operations .
1. Partition the message in blocks of letters that can be encoded into numbers not greater than N (approximately of 250 digits). Let M be an
encoded block that must be encrypted.
2. Compute and send
C = M E mod N .
Alices decryption operations .
1. Receive C, and obtain M by computing the power
M = C D mod N
since she knows D.
However, Alice knows more, in particular she knows the factors p and q, then
taking advantage from these, the decryption operations may be accelerated
using the Chinese Remainder Theorem (CRT). The procedure is the following:
From p and q determine 1 and 2 such that


1
 1
2
2

=
=
=
=

1 mod p
0 mod q
0 mod p
1 mod q

(6.1)

Thus, the decryption operations may be done working with the moduli p
and q separately
M1 = C D mod p

M2 = C D mod q

The message M is obtained as


M = M1 1 + M2 2 mod N
by CRT.
- 6.9 -

Observation. In the proof that the RSA encryption and decryption algorithms
work, it is assumed that M is relatively prime with N = pq. Now, we prove
that this condition is not necessary, that is M may be either a multiple of p or a
multiple of q; it cannot be a multiple of both, because it is less than N anyway.
Nevertheless, it seems wise to check the relative primality between M and N ; if
by chance they are not relatively prime, the adversary factors N .
If gcd{M, N } = 1, the above proof that RSA decryption works correctly was based
on Euler-Fermats theorem.
The following proof, based on the CRT (Chinese Remainder Theorem), shows
that the inversion mechanism of the RSA works equally well for every M < N ,
whether or not they are relatively prime with N .
The encoded message M is decomposed according to the CRT as
M = M1 1 + M2 2

mod N ,

where M1 = M mod p is an element of Zp and M2 = M mod q is an element of Zq ;


the interpolation numbers 1 e 2 were defined in (6.1), and are computed using
the generalized Euclid algorithm to find a solution of the equation xp + yq = 1.
Given the choice of E and D , we have
M ED = M1ED 1 + M2ED 2 = M1 1 + M2 2 = M mod N
because M1ED = M1 mod p, and M2ED = M2 mod q, whatever be M1 and M2 .
Clearly, it is sufficient to check only one of these two identities. If M is relatively
prime with p, also M1 is, thus we have
1+(p1)(q1)

M1ED = M1

(p1)(q1)

= M1 M1

= M1 mod p

(p1)

because M1
= 1 mod p. While, if M1 is divisible by p, then M1 = 0 mod p and
obviously we have
M1ED = 0 = M1 mod p .

6.3

The Rabin Scheme

The Rabin scheme, like the RSA, consists of a one-way function based on the difficulty of factoring. Moreover, unlike the RSA, it will be seen that to break the
Rabin scheme is equivalent to factoring.
Let N = pq be a product of two primes p and q which are taken congruent 3 modulo 4 to avoid computation issues. The Rabin function is defined in the ring of
remainders modulo N as a simple square
C = M2

mod N .

(6.2)

In spite of its simplicity, the inverse function, formally M = C, is difficult to


compute, thus we have a one-way function. Actually, we will prove that to obtain
M from C is equivalent to factoring N .
- 6.10 -

Given C, if p and q are known, to compute M is easy using the CRT, because the
two equations
 2
u = C
mod p
2
v = C
mod q .
can be solved easily by computing powers
(
p+1
u = C 4
q+1
v = C 4

mod p
mod q .

For example, since C is a quadratic residue modulo p, we have C


with the exponent p1
odd, thus writing
2
u2 = C 1 = C C

p1
2

=C

p+1
2

p1
2

= 1 mod p,

p+1

we get u = C 4 because p + 1 is divisible by 4, and the second root is obviously


u. Therefore combining the four values u and v, we obtain four roots for the
equation M 2 = C mod N :

z1 = u1 + v2

z2 = u1 v2
,
(6.3)
z3 = u1 v2

z4 = u1 + v2
where 1 = qa and 2 = pb have been deduced from the relation qa + pb = 1
which was produced by the generalized Euclid algorithm.
For decryption, selecting the correct value for M requires the knowledge of two
bits (b0 , b1 ). These bits should be computed at the encryption side without knowing the factorization of N , and sent together with the encrypted message C. A
possible choice, first proposed by Williams [61] is the following
1. The first bit specifies the parity of M , that is
b0 = M mod 2
because if the four roots zi are considered modulo N in the interval from 1
to p 1, two roots are even and two roots are odd.
2. The second bit is obtained computing the Jacobi-Legendre symbol



1
M
1+
.
b1 =
N
2
Bob and Alice, who want to communicate privately with respect to Eve, know
that they have public keys NB and NA , respectively. To generate her key, Alice
choses two large random primes p and q congruent 3 modulo 4, and computes
NA = p q .
A communication protocol is the following:
- 6.11 -

Bobs encryption operations .


1. Retrieve Alices public key NA from the public directory
2. Produce a message m encoded as a number of ZNA .
3. Encrypt m computing C = m2 mod NA
4. Compute the parity bit b0 = m mod 2




m
m
1
5. Compute the symbol
and obtain b1 = 2 [1 +
].
NA
NA
6. Send the triple [C, b0 , b1 ] to Alice.
Alices decryption operations .
1. Compute the four roots of X 2 = C
2. Take the two roots z and w with parity b0
3. Compute the respective Jacobi-Legendre symbols and b1 s.
Keep the root corresponding to b1 received.
As previously recalled, and as Rabin showed, the difficulty of inverting the function (6.2) is equivalent to factoring the modulus N .
We have proved that knowing the factorization of N we may compute in polynomial-time complexity the four roots. Conversely, knowing the four roots zi i =
1, 2, 3, 4 of equation (6.2), then we have
zi2 zj2 = 0

mod N

i 6= j ,

hence the factorization


(zi + zj )(zi zj ) = 0

mod N .

Necessarily each of the factors (zi + zj ) and (zi zj ) must be divisible by one
and only one of the prime factors N , then the greatest common divisor gcd{(zi +
zj ), N } is one of the factors of N .
An even faster way of looking at this property is to consider equation (6.3) and
add z1 and z2 getting z1 + z2 = 2u1 = 2uaq from which the factor q of N can be
recovered simply by computing the greatest common divisor gcd{z1 + z2 , N }.
It is remarked that the difficulty of solving the second degree equations in ZN is
strictly connected with the fact that the group ZN of invertible elements in ZN is
not cyclic.

6.4

The El Gamal Scheme

The El Gamal public key encryption algorithm is based on the discrete-log problem, and is a revisitation of Diffie-Hellman public-key exchange, in order to avoid
- 6.12 -

key exchange in any conversation. The public key may be stored in a public directory, from which it is retrieved at each conversation, and does not require the
co-presence of both parties for its preparation.
The system is set up by some authority that manages a public directory, generates
a large prime p, choses a primitive element g of the multiplicative group Zp , and
makes public the pair [p, g].
Bob and Alice, who want to communicate privately with respect to Eve, know
that they have public keys B and A, respectively. To generate her key, Alice choses
a random integer a relatively prime with p 1 and computes
A = ga ,
working in the field Fp . A communication protocol is the following:
Bobs encryption operations .
1. Produce a message m encoded as a number of Fp .
2. Retrieve Alices public key A from the public directory
3. Generate a random number e relatively prime with p 1
4. Compute the ephemeral key
c1 = g e ,
5. Compute the masking factor E = Ae
6. Encrypt the message as c2 = m E
7. Send the pair [c1 , c2 ] to Alice.
Alices decryption operations .
1. Compute z = ca
1
2. Compute the message as m = z c2
The algorithm works because of the following chain of identities
z c2 = ca
1 c2
g ea mAe
g ea mg ae = m .
The encryption and decryption operations are performed computing exponentiations and products in Fp , then their complexity is O(log p).
The adversary, Eve, who wants to know the message, retrieves A, p, and g, and
captures c1 and c2 . A possible attack is to find e from c1 by computing a discrete
log, which is an operation of complexity O( p), then to obtain the message as
c2 Ae .
It is an open question whether it is possible to retrieve m without computing a
discrete-log.
- 6.13 -

6.5

The McEliece Scheme

The McEliece Scheme is abstractly based on the difficulty of searching unsorted


data, but actually, it is based on the difficulty of decoding linear codes that do not
have a special structure. This problem is known to be N P , when only generator
and parity check matrices of a generic form and no special structure are known.
The principle is to use Goppa codes, [54] with some trap.
Let G be a k n generator matrix of a (n, k, d) Goppa code over GF (q). The
structure of G allows us to use efficient algebraic decoding algorithms for Goppa
codes having polynomial complexity in log n.
Let A be a non singular square k k random matrix, and P be a permutation
n n matrix. Consider the equivalent code with generator matrix G = P G A,
which has lost all symmetries and is considered difficult to decode because the
decoding algorithms of Goppa codes are not applicable.
The public key Kp = [G , t] consists of the disguised generator matrix P G A of
c, the number of errors that the code may correct.
the Goppa code, and t = b d1
2
The procedure is described with the hypothesis that Alice creates the public
key, and Bob wants to send a message encrypted using Alices public key.
Alices Key generation operations .
1. Randomly choose a binary Goppa code matrix G of size k n, with n
of size close to or greater than 1000 and k such that t may be at least
100.
2. Generate a random nonsingular k k matrix A
3. Generate a random permutation n n matrix P
4. Compute the public generator matrix G = P GA
5. Compute t = b d1
c
2
6. The pair [G , t] is the public key that can be sent over any public channel, or published in open directories.
7. The private key consists of G, A, and P .
Bobs encryption operations .
1. Partition the message in blocks x of k bits.
2. Encode every x into vectors c of length n as c = xG
3. Generate a random binary vector e of Hamming weight close to but
not greater than t.
4. Compute and send the encrypted message
y =c+e .
Alices decryption operations .
- 6.14 -

i) Compute P 1 y, given that P is known;


ii) Decode the Goppa code, correcting all errors, and find the concealed
message X = Ax (the error vector e may be computed if needed);
iii) Find c = A1 X.
The key point is step ii) of the decryption procedure; this is Goppa code decoding, which has an efficient algorithm correcting every pattern of errors up to the
designed error correcting capability, i.e. the designed minimum distance.
Therefore, deciphering is possible only to those who may exploit the efficient
decoding algorithm, that is who know the permutation matrix P , and also the
random matrix A.
A limitation that has prevented diffusion of the McEliece scheme is the size of the
key, which is of the order of 1 Mega bits, especially if compared with the keys of
size some kbits required by other schemes. An advantage is that the upper bound
to the decryption (decoding) complexity may be of the order of the size of the key
and not of the logarithm.

- 6.15 -

Chapter 7
Electronic signatures
He that will not apply new remedies must expect
new evils; for time is the greatest innovator.
Francis Bacon

7.1

Introduction

Electronic signature is a dream. Nevertheless, electronic signatures of documents,


purchase orders, and certificates are implemented and commonly used. It is a
dream not because the proposed algorithms are unreliable, or are difficult to realize (theoretically the question of their existence is still open), but because the
implementation of all available signature schemes, for the user, is rather cumbersome and appears as a mysterious object. If it is to become an every-day practice, substituting the usual hand signature, the electronic signature requires userfriendly schemes which, at the present state of the art, could be implemented only
based on automatic biometric recognition of the signer. But the performance of
biometrics systems is still far from satisfactory.
In any case, all electronic signature schemes yet implemented, which replace
hand-written signatures, have been possible due to public key cryptography,
which provides the indispensable framework and methods.
An electronic signature should have the same features as a hand written signature in ink on a paper document. As we will see, hand-written and electronic
signatures are only almost equivalent, even if in most practical cases they work,
in the abstracts in the same way, and do the same job.
Historically, signatures have had two, possibly co-existing, purposes, which actually are independent concepts:
to guarantee the authenticity of a document, that is to assure any third
party that the document has been prepared by a clearly specified subject
(the signer);
- 7.1 -

to assure a second party that the document shall not be unrecognized or


denied by a clearly defined subject (the signer).
In a certain sense these two objectives are dual to one another, even if in many
circumstances they co-exist. Note that the double valency sometime occurs in a
natural way; in other cases, only one of the two possibilities happens. For example:
- In the case of the hand signature of a cheque, the signature guarantees the
beneficiary against repudiation, and assures the bank that the cheque has
been issued by the legitimate owner of the bank account.
- In the case of wills the only aim of signature is to guarantee the authenticity of authorship, possibly with an endorsement signature by a notary;
however, electronic signature further assures integrity, and could guarantee
secrecy.
- In case of purchase orders, the main aim of the signature is to avoid repudiation. However, in case of Internet customers, it should also guarantee the
identity (authenticity) of the customer.
Authentication is a fundamental component of the signature procedure. However, authentication has a broader meaning and use, we have:
- Authentication of individuals (access control, voting system, document authorship).
- Authentication of objects (telepass, airplanes, missiles).
- Authentication of signals (remote control, tele-alarm, radar surveillance,
landing systems).
Any classical method of authentication has commonly been based on two items:
1) a sign (typically a USERNAME) that uniquely identifies the subject, and 2) an
authenticator of this sign (typically a PASSWORD or a value computed from this
sign, and possibly some other prameter, by means of a one-way function).
Biometrics. As regard authentication of individuals, it is interesting to mention
the research trends of the methods for (unequivocally) identifying humans with
the further aim of avoiding impersonation. In all situations where personal identity is important, the authentication procedures tend to be based on biometrics.
Definition 7.1. Biometrics is defined as the characteristics that may be used to recognize
humans based on intrinsic physical or behavioral traits.
Since human traits can only be defined approximately, and are variable over time,
it is impossible to recognize a person without error through biometrics, because
the process consists of comparing the actual measured parameters with a stored
- 7.2 -

version of them, and acceptance is based on a metric (or distance) between measured and stored parameters, an operation that may lead to two basic kinds of
error:
1. accepting the wrong individual (false recognition)
2. not recognizing the right individual (false alarm).
The consequences of error may be very dangerous in both cases. In order to improve the reliability of biometric authentication systems, and to reduce their tedious study, the Defense Advanced Research Projects Agency (DARPA) launched
a program, the Active Authentication program, to shift the focus during authentication from the password (including rough forms of biometrics) to the person
using the information systems.
The current standard method for validating a users identity for authentication
on information systems requires humans to do something inherently unnatural:
create, remember, and manage long, complex passwords. Moreover, as long as
sessions remain open, typical systems incorporate no mechanisms to verify that
the person who originally authenticated the access is the same person still in control of the keyboard. Unauthorized individuals can thus improperly obtain access
to information system resources if a password is compromised, or if a user does
not exercise adequate vigilance after the initial authentication.
The new approach seeks to address this problem by changing the game, shifting
the focus during authentication from the password to the person using the information system. This shift would mean, to most people, authentication using
biometric sensors. However, some issues affect the way biometrics is currently
used, thus the challenge is to develop software-based biometrics focused more
on authenticating people than on deploying new sensors.
The DARPA program, lasting from 2011 to 2015, is divided into three phases:
Cognitive Fingerprints. It is aimed to find biometric information that does not
require the installation of additional hardware or special sensors, and to
develop ways to capture aspects of cognitive fingerprints (behavioral traits,
writing speed, posture or gestures) through use of computer mouse and
keyboard. A further objective is the continuous authentication of users, or
guests.
Going Mobile. The aim is to develop a prototype solution that integrates all
available biometrics using a new authentication platform suitable for deployment on standard desktops and laptops.
Integration, Testing, and Deployment Plans. This phase considers the development of a framework to integrate multiple biometrics on mobile devices as
well as desktops. Extensive Individual Validation and Verification (IV& V)
and adversarial partner efforts would be employed to ensure the authentication platform itself would not be susceptible to attack or misuse.
- 7.3 -

7.1.1

Electronic signature of an electronic document

The procedure that defines the electronic signature of an electronic document


must associate to the document a unique number that cannot be altered and cannot be constructed by person other than the legitimate signer. To this aim, an
electronic signature of a document is composed of two parts:
1. A plain Signature consisting of an identifier of the signer, an identifier of
the document, and possibly the date and other properties that must be considered.
2. An Authentication, that is an expression that is a transformation of the
plain Signature: this transformation should not be performed by any other
than the legitimate signer.
Electronic signature verification consists of a transformation that anybody can do
on the pair Signature-Authentication to check that the pair has been produced
by the claimed signer.
In any authentication procedure, the above scheme must be interpreted with a
certain flexibility. For example, consider the authentication process of user Alice
to access her computer:
1. Alice chooses a pair of strings: a USERNAME (her personal public identifier),
and a PASSWORD (her secret authentication) to be stored in her computers permanent memory.
2. The secret PASSWORD stored in the computer must be encrypted by a oneway transformation (encrypt) to avoid intruders being able to capture the secret
word.
3. When Alice accesses the computer using USERNAME/PASSWORD, she follows the procedure:
1. Input USERNAME, the computer looks at the corresponding encrypted
password encrypt(PASSWORD) stored in its data base
2. Input password, the computer applies the transformation encrypt and produces encrypt(password),
3. the computer checks whether
encrypt(PASSWORD) = encrypt(password) ;
if the equality holds, access is allowed, otherwise it is denied.
Evidently, the Authentication token is not a transformation of USERNAME, but
the public expression of this token, i.e. PASSWORD is available only as an encrypted message by a one-way transformation, thus the scheme fits the general
model described above.
- 7.4 -

Remark. In regard to this example, it must be pointed out that the system controlling access to a PC is very weak protection for the secrecy of the data, because
data on the mass storage (Hard Disk) of the computer are not encrypted by default. Therefore the access authentication process only avoids a passive attack on
the data, but it does not protect against an active attack; for instance: dismantle the hard disk from the PC, read the data (make a back-up of the disk) using
another computer, and re-assemble the hard disk on the PC.

7.2

Components of Electronically Signed Documents

The electronic signature of a document, in its most traditional form, consists of


three items:
- A document, which will be denominated DOC.
- A plain signature that includes signer identification data, data uniquely tied
to DOC, and other useful information. The stream encoding this information will be denoted SIGN.
- An authentication of SIGN, which will be denoted AUT.
The main part of this chapter will be devoted to the algorithms for computing
AUT. However, some comments on the construction of SIGN, and how to connect DOC to SIGN, are given first, to put the subject of electronic signature into
context.

7.2.1

Document

An electronic document to be signed is usually stored encoded in ASCII characters, and for its signature it is necessary to extract a short blocks of symbols,
called digest and denoted DIG, satisfying the following requirements
1. DIG must depend on every symbol of DOC.
2. DIG should easily be computed from DOC, e.g. with a function of low
complexity.
3. DIG changes if even a single symbol of DOC is modified.
4. It should not be possible (or unfeasible) to modify DOC in such a way that
DIG remains unchanged.
The functions used to compute the digest DIG from a document DOC belong to
the class of hash functions. These functions ubiquitous in todays cryptographic
systems, are defined considering two domains of numbers D and G.
- 7.5 -

Definition 7.2. A function f from D into G is called a hash function if i) Y = f (X)


is easily computed for every X D, and ii) it is hard to find the counter-image of any
Y G, that is the subset D(Y ) of D, also denoted f 1 (Y ).
In applications where the signature requires the digest of a long document, the
computation of a small block of symbols directly from the entire document is
practically unfeasible. Therefore the hash function of a document is computed
through an iterative procedure, as described in the next paragraph. The same
scheme used to create a digest, may also be used for creating marks to check
the integrity of streams, files, documents, messages, or simply as randomization
functions.
In the most typical schemes the image domain G of the hash function is a small
subset of the domain D.
Hash functions for signature. In this section we will only refer to hash functions for computing the digest of a message. The digest d of a document DOC is
written as H(DOC) = d, where H(.) is the hash function that we intend to define considering documents and hash values written as streams of symbols from
small alphabets, usually the binary alphabet {0, 1}.
The document DOC is firstly encoded as a sequence
DOC : M1 , M2 , . . . , ML
of elements from a domain (alphabet) R. Consider the one-way function f (X, Y )
defined from R R into R. Then, a digest d R is computed recursively, starting
from an arbitrary initial value D0 R, and using the recurrent relation
Di = f (Di1 , Mi )

i = 1, . . . , L .

At the end of the iterations, the last value DL is taken as the digest d.
Note that, in this description, the hash function outputs a single element of the
ring R which is the digest, therefore a large number of messages is mapped into
the same digest d.
Definition 7.3. A hash function H(.) is said to be collision free if it is computationally
unfeasible to find two messages DOC1 and DOC2 such that H(DOC1 ) = H(DOC2 ).
A further condition, which must hold for practical applications, is that the hash
functions should be easily computable in order to avoid annoying delays, or expensive computational burden. An interesting example of hash function for producing the digest H(M ) of a message M is the following:
A message M is encoded into a binary stream M = b1 , . . . , bsm of sm
bits, each block of m consecutive bits may be interpreted either as an
element
F2m or as an element yi of the ring Z2m 1 , i.e.
Pmof the fieldj1
yi = j=1 bm(i1)+j 2
mod 2m 1 is an element of Z2m 1 . Let be
- 7.6 -

a primitive element of F2m . Then, with initial value x0 = 1, a sequence


x0 , x1 , . . . , xs of length s is generated recursively as
xi = xi1 +yi

i = 1, . . . , s .

At the end of the recursion we set


H(M ) = xs .
Note that a block of m consecutive 0s or consecutive 1s represents the
same element of Z2m 1 ; this is not considered a serious drawback for
two reasons: 1) when m is large ( 200 bits) the probability of the
event is very small, and 2) it is not required that the hash function
acting on a symbol be invertible.
Observe that every one-way function may be used as a hash function component in the recursive procedure for computing the digest. In particular, standard
hash functions, like MD4 or SHS-1, may be used. However, also assuming that
a valid hash function is used, particular attention should be paid to defining the
recurrence, to avoid weaknesses that escape a superficial or hasty analysis.
Plain signature. Given a short message DOC, which represents the plain signature known to everybody. Typically, this message consists of a string of alphanumeric characters formed as follows
Place Date
First-Surname Digest
Paris gg/mm/aaaa Tom Paine
101745

Doc. Num. Random Num.


SP 800-67
987354

Where Place, Date, First and Family name are the usual attributes of a signature,
the document number identifies the document publicly, and the random number
identifies the signature (several signatures of the same document on the same
day, or several copies of the same document), and the Digest uniquely connects
the signature to the document. Note that the digest guarantees both the integrity
and the authenticity of the document. For example, the above plain signature
may be
Paris&25/10/1811&TomPaine&101745&SP800-67&987354
The Digest is computed from DOC using a hash function for producing a single
number d. The hash function must be publicly known, and satisfy two further
conditions
Given a digest d, it should be difficult to create a document with this digest:
the hash function should be extremely difficult to invert.
Given a message DOC with digest d, it should be computationally unfeasible to find a second message DOC1 with the same digest d.
- 7.7 -

7.2.2

Standard hash function SHA-1

The hash functions most widely used today are called SHA (Secure Hash Algorithm) followed by a number. The original SHA-1 is a hash function whose
output is 160 bits. Later versions have longer outputs of 224, 256, 384, or 512 bits,
see [60] for the official USA definition of SHA. The original algorithm is briefly
described in the following Table, omitting the specifics of the mixing operations.
Table 7.1 - The SHA-1 Hash Algorithm
Input document DOC as a binary stream
Break DOC (possibly with additional bits appended) into blocks Bj of 512 bits
Define five specific initial values h0 , h1 , h2 , h3 , h4 each of 32 bits (these hi are specified in the standard)
Initialize the chaining variables Hi = hi
LOOP over the blocks Bj
Break Bj into sixteen subblocks (words) of 32 bits
Create a total of eighty words w0 , w1 , . . . , w79 by rotating the chaining variables H0 , H1 , H2 , H3 , H4
LOOP i = 0, 1, . . . , 79
Set a = H0 , b = H1 , c = H2 , d = H3 , e = H4 ,
Compute f as a logical function of a, b, c, d, e, (i.e. in F2 )
Mix a, b, c, d, e by rotating some of their bits, and permuting them
Add f and wi to a.
END i LOOP
Update chaining variables
H0 H0 + a, H1 H1 + b, H2 H2 + c, H3 H3 + d, H4 H4 + e
END j LOOP
Output H0 |H1 |H2 |H3 |H4

7.3

Signature based on RSA

A signature based on RSA, for generating the authentication A of a plain signature F , exploits the symmetry of the encryption/decryption operations, and the
asymmetry of the encryption/decryption keys of RSA.
Let N = pq be the product of two (large) primes p and q (randomly generated),
and let E be a (large) random number relatively prime with (N ) = (p 1)(q 1).
- 7.8 -

Assume that the signer has a RSA public key [N, E], and that [N, D] is his secret key where D has been computed as a solution of the modular equation
ED = 1 mod (N ). A plain signature F and its authentication A are encoded
in numbers of ZN . The signer uses his secret key to encrypt the plain signature F ,
thus producing the authentication A as
A = F D mod N .
In this case, the electronic signature is the pair of numbers [F, A].
Any individual who wants to verify the signature retrieves the public key [N, E]
of the signer and computes
F = AE mod N ,
if F = F , the signature is accepted as valid, otherwise it is rejected.

7.4

Signature based on Rabin scheme

A signature based on the Rabin scheme, for generating the authentication of a


plain signature, as in the case of RSA uses decryption procedure for producing
an authentication, and the encryption operations for verification. The abstract
principle is simple; however, there are several critical points that require careful
analysis, and more elaborate description.
Abstract Rabin scheme authentication. Let N = pq be the public key of the
signer, with p and q primes congruent 3 modulo 4. The plain signature F and the
Authentication A are numbers of ZN . The signer finds a root A of the equation
X 2 = F and forms the pair [F, A]. Verification is easy because it is sufficient
to compute a square A2 mod N and check whether it is equal to F . However, not
every F may be authenticated, because not every F is a quadratic residue modulo
N . To overcome this, a quadratic equation of the form
X 2 + hX U F = 0

mod N

is considered, where h and U should be chosen in such a way that the equation
has roots in ZN . Rewriting the equation in the form
h2
h
(X + )2 = ( + U F ) ,
2
4
it is seen that it is not restrictive to set h = 0, thus U should be chosen in such a
way that U F be a quadratic residue modulo N , that is U F must be a quadratic
residue modulo p, and a quadratic residue modulo q.
The multiplier U is called padding factor, and may be computed by trial and error,
or deterministically. In this second instance, the procedure is the following:
- 7.9 -

Write F = F1 1 + F2 2 by the Chinese Remainder Theorem, where F1 = F mod p,


F2 = F mod q, and 1 , 2 satisfy the relations
1 + 2 = 1
1 2 = 0 mod N
12 = 1 mod N
22 = 2 mod N .

 

F
F
F is a quadratic residue modulo N if and only if
=
= 1. We have
p
q

 
 
 

F
F1 1 + F2 2
F1 1
F1
=
=
=
,
p
p
p
p



 

1
F
F2
because 2 is divisible by p and
= 1. Similarly we get
=
.
p
q
q
Observing that 1 is a quadratic nonresidue modulo p or modulo q, we have




2
F1
F1
F1

p
=
=1 ,
p
p


F1
is a quadratic residue modulo p. Since a similar result holds for q,
thus F1
p
due to the Chinese remainder theorem, a padding factor is




F1
F2
u=
1 +
2 .
p
q
Nevertheless,

 thisu cannot be used as padding factor as it is, because when

F2
F1
and
have different signs, the knowledge of u allows one to factor
p
q
N . Therefore, a padding factor is defined as U = R2 u mod N , with R being a
large random number.
The classic Rabin signature of a message m is then a triple (m, U, S), where U is
a padding factor, found either randomly [61] or deterministically, as described
above [23], such that the equation x2 = mU is solvable, and S denotes one of its
roots. Verification is performed by comparing mU with S 2 . Unfortunately, this
efficient mechanism is exposed to easy forgery.
Given a valid signature (m, U, S), a forgery attack computes S 2 or mU , chooses
any message m0 , and computes U 0 as U 0 = S 2 m01 , thus a valid signature is
obtained as (m0 , U 0 , S) without knowing the factorization of N . In the original
proposal [62], a hash function H(.) is used instead of m, and S is a solution of
x2 = H(mU ), but this does not help against the above forgery attack. This weakness is absent in the Rabin-Williams signature (cf. [31, 82]).
The Rabin-Williams signature (cf. [31, 82]), which is limited to pairs of primes,
where one is congruent to 3 and the other to 7 modulo 8, avoids the vulnerability to forgery. The signature is a four-tuple [m, e, f, S], where e {1, 1} and
- 7.10 -

f {1, 2} are chosen to make the quadratic equation ef S 2 = H(m) mod N solvable in S, where H(.) is a convenient hash function. The non-forgeability is based
on the limited set of multipliers e and f . However, the Rabin-Williams scheme
requires the use of two primes respectively congruent to 3 and 7 modulo 8, while
the classic Rabin signature works with every pair of primes. A possible Rabin
signature that avoids forgery and works for every pair of primes was devised in
[51].
Before introducing a similar scheme, we recall some notions and definitions useful to make the presentation more expedite and sounder.
Definition 7.4. A signature of a message m, of the form [m, f1 , f2 , . . . , f` ], is said to be
weakly non-forgeable if it is not feasible for an outsider to derive from it a valid signature
[m0 , f10 , f20 , . . . , f`0 ] for a given message m0 .
Definition 7.5. A signature of a message m, of the form [m, f1 , f2 , . . . , f` ], is said to be
strongly non-forgeable if it is not feasible for an outsider to derive from it a valid signature
[m0 , f10 , f20 , . . . , f`0 ] for some message m0 .
In other words, a Rabin signature [m, f1 , f2 , . . . , f` ] is strongly non-forgeable if
we cannot derive, without knowing the factorization of N , any valid signature
[m,
f1 , f2 , . . . , f` ].
Instead, a Rabin signature [m, f1 , f2 , . . . , f` ] is weakly non-forgeable if we cannot
derive, without knowing the factorization of N , a valid signature for a specified
message m0 .
For example, the Rabin-Williams signature [m, e, f, S] is weakly forgeable if the
hash function is the identity function, i.e. H(u) = u, because we can derive a
valid signature as [r2 m, e, f, rS] for every factor r. But, depending on the hash
function, this signature may be strongly non-forgeable. In the same way, the RSA
signature [m, mD ], where D is the secret counterpart of the public key E, is weakly
forgeable, because we can obtain a valid signature as [rE m, rmD ] for every factor
r.
Rabin signature.
following:

[24] Given a hash function H(.), the signature process is the

Public-key: [N, H(.)]


Signed message: [m, F, R3 ], where R is a secret random number, S is a root of the
quadratic
equation
x2= H(m)U

 , where the padding factor U is chosen as

H(m)
H(m)
1 +
2 , and F = RS.
U=
p
q
Verification: Compute H(m), and check whether R12 H(m)6 = F 12 ; the signature
is valid if and only if the equality holds.
The algorithm works because F 4 = R4 H(m)2 , given that U 2 = 1.
For this scheme the verification cost is seven squares and three products, plus the
evaluation of a hash function. It is possible to make different signatures of the
same message by choosing different random numbers R.
- 7.11 -

Theorem 7.1. The signature [m, F, R3 ] is weakly non-forgeable. It is weakly forgeable if


H(z) = z and strongly non-forgeable if H(.) is a convenient hash function, in particular,
if H(z) = z(z + 1).
P ROOF. Given m0 , forgery is not possible because, choosing w.l.o.g. F 0 , only
a number K such that KH(m0 )6 = F 012 can be found, but not a fourth root of
it. As above, weak forgeability in the case of H(z) = z follows from pseudohomogeneity, and strong non-forgeability from the difficulty of inverting the hash
function.

Note that using R2 in the signature instead of R3 would expose S 2 and therefore U , which would unveil the factorization of N if U is not 1, but one of
the other two roots of unity. Further, solving x3 = R3 is equivalent to factoring N if 3|(N ), otherwise it is necessary to know (N ) for computing R as the
1(N )
power (R3 ) 3 , where is obtained as a solution of the Diophantine equation
3y + (N ) = 1.
Rabins digital signature of a message M . Let N be Bobs public key, and Hp ()
a publicly-known hash function. To sign a document M electronically, Bob, who
knows p, q, 1 and 2 , proceeds as follows
1. Bob computes the hash Hp (M ) and forms a plain signature F ;
2. Bob computes an authenticator A of F , by
(a) choosing a random number R that remains secret;


(F + 1)F
p

(b) finding a root S of x = (F + 1)F U where U =




(F + 1)F
2 is a padding factor such that U 2 = 1;
q
(c) setting A = RS

3. The signed document consists of the four-tuple [M, F, A, R3 ]


Verification consists of the following steps
1. Retrieve Bobs public key, N , and the hash function H();
2. Compute the digest H(M ) and forms the plain signature F ;
3. If F = F then check the equality (R3 )4 (F (F + 1))6 = F 12 .
- 7.12 -


1 +

7.5

Signature based on El Gamal

The El Gamal signature exploits the one-way property of discrete logarithms. It


has been adopted for defining a Digital Standard Signature (DSS) which is an
algorithm proposed by Americans Institutes for security (NIST, NSA, etc.), for
electronic signatures of public national use.
An implementation of the El Gamal signature is based on the public key system,
as devised by Diffie-Hellman (for key-exchange over a public channel) working
in the cyclic group Fp , p an appropriate large prime.
The system is specified by
p a large prime ( 1024 bits), such that p 1 has at least a large prime factor
( 600 bits). This parameter is publicly known, together with:
a generator of the multiplicative cyclic group Fp .
D a public directory storing users public keys, which contains the pairs [Username, A] generated as follows:
a , a number less than and relatively prime with p 1, which is the secret
key of the user Username.
A = a mod p, the public key of Username.
H(.) a hash function publicly known for computing the digest of a given message.
Authentication of a plain signature F . Suppose that F is the plain signature
of Bob, who has a public key B and a secret key b. The authentication of F is
produced as follows
1. Bob generates a random number k < (p 1) and relatively prime with p 1
that represents the secret key of the signature F .
2. Bob computes
r = k

mod p ,

which represents the public key of F ; it is a number of Fp which may be


interpreted also as a number of Zp1 ,
3. Bob computes
s = [k 1 (F + b r)] mod p 1 ,
that is, Bob solves for s the linear equation
s k b r = F mod p 1
4. The authenticated signature is [F, r, s].
Alices verification of the authenticity consists of the following steps:
- 7.13 -

1. Alice gets [F, r, s] and retrieves B from the public directory;


2. Alice computes h1 = F and h2 = rs B r
3. Alice accepts the signature as authentic if and only if h1 = h2 .
The equality h1 = h2 guarantees the authenticity, because we have
F = s kb r = s k b r = rs B r .
Digital signature of a message M . Bob signs a message M creating a plain signature F and then an authentication of F . The plain signature is formed as follows
1. Bob computes the digest H(M ) of M
2. Bob generates a random number R, and forms the plain signature as
Bob Dolittle Date Place DigestRandom
then he encodes this message as a number F of Zp1 .
The signed message consists of the five-tuple [M, F, R, r, s], where r and s are
computed using El Gamal.
Alices verification runs as follows:
1. Alice retrieves B from the public directory;
2. Alice computes the digest H(M ) using the public hash function, composes,
and encodes the plain signature as F (first check);
3. She verifies the authentication of F , computing h1 = F and h2 = rs B r
(second check);
4. Alice accepts the document M as authentic if and only if F = F and h1 = h2 .

7.6

Blind signature

Blind signature schemes are cryptographic primitives useful in protocols that intend to guarantee the anonymity of the parties [24]. Blind signatures play important roles in e-commerce, e-money, and e-voting procedures. In fact, they were
introduced by Chaum [12] for privacy-related protocols, where signer and message author are different parties.
The blind signature is a form of digital signature in which a message is disguised
before it is signed, while the resulting signature can be publicly verified against
the original message in the manner of a regular digital signature. Formally, a message m is disguised by means of a function d, and then submitted to the signer.
- 7.14 -

The signed message [d(m), f1 , f2 , . . . , f` ] is then made public by the message author, in the form of a valid signed message, as [m, f10 , f20 , . . . , f`0 ].
In principle, a blind Rabin signature is obtained as follows. Let A be the message
author and B be the signer with Rabin public key N , which is the product of two
Blum primes:
1. A wants the message m to be signed by B, without disclosing the message
itself (or part of the message); he thus chooses a random number r and
submits the disguised message r2 m to the signer.
2. The signer B produces the signed message [r2 m, u, S], where S is a root of
x2 = ur2 m, and u is a random padding factor, and sends the signed message
to A.
3. A receives the blindly signed message [r2 m, u, S] and produces [m, u, Sr ], the
signature for the original message.
This simple mechanism may be subject to forgery and other kinds of attacks,
for example the RSA blinding attack, which aims at using the blind signature
protocol to decrypt messages that were encrypted using the public key of the
signer.
Further, [24, Proposition 2] shows that the blind signer cannot use a strongly
non-forgeable signature scheme; nevertheless, the open signed message may be
strongly non-forgeable.
Let H(.) be a hash function used by the message author. Consider the following
process:
Public-key: [N, H(.)]
Disguised message: r2 H(m), where m is the original message to be signed, and
r is a random factor chosen by the author. This message is submitted to the
blind signer.
Blindly signed message: [r2 H(m), F, R3 ], where F = RS is product of two factors, with R a random number chosen by the signer, and S a root of the
quadratic equation x2 = r2 H(m)u, the padding factor u being defined as in
the Rabin signature.
3

Signed message: [m, rF2 , Rr3 ];


Verification: Check whether H(m)6
only if the equality holds.

R3
r3

4

= ( rF2 )12 ; the signature is valid if and

The verification cost is seven squares and three products, plus the evaluation of a
hash function.
The signature of the original message is strongly non-forgeable, and the blind
signature is not vulnerable to the RSA blinding attack, as proved in [24].
- 7.15 -

7.7

Secret Sharing - Shamir

The secret sharing concept offers, in principle, a way to divide the information required to reproduce a secret key among several people, and the secret key may be
reconstructed only if a subset of those people (possibly one or all of them) share
their information.
Note that secret sharing may be seen as a signature with endorsement; all signers should be present to produce a valid signature, i.e. to recover the secret. A
method based on linear algebra was devised by Shamir.
Assume that n individuals want to share a secret that can be reconstructed only
when k of them are present. Suppose that a trusted person, generically called
Warden, is the system designer.
Let m Fq , the secret to be shared, be an element of a finite field. Warden chooses
k 1 random numbers a1 , a2 , . . . , ak1 , in Fq and forms the polynomial pk (x) of
degree k 1
pk (x) = m + a1 x + a2 x2 + + ak1 xk1 .
Note that the secret is the value of pk (x) computed for x = 0.
Warden then generates n random numbers Ui Fq that have the meaning of public identifiers of each individual, and computes n numbers Ki = pk (Ui ) that have
the meaning of individual secret keys, and are kept secret. These Ki s are the passwords or authenticators. Warden then distributes, on private secure channels, the
pairs [Ui , Ki ] to each individual.
When k individuals meet and want to share the secret, they must construct the secret polynomial pk (x), and an efficient method that they may use is the Lagrange
interpolation formula. They form the polynomial
k
Y
Lk (x) =
(x Uji ) ,
i=1

and compute the first derivative L0k (x): the secret polynomial is reconstructed as
pk (x) =

k
X

Kji
0
Lk (Uji )
i=1

Lk (x)
.
x Uji

Finally, the common secret is obtained as m = pk (0).

- 7.16 -

Chapter 8
Complexity
In order for the light to shine so brightly,
the darkness must be present.
F RANCIS B ACON

8.1

Introduction

The notion of complexity is at the root of many areas of engineering and mathematics: two sciences that have different, sometimes conflicting, perspectives from
which to view the concept, and different ways of measuring the difficulties of
problems.
Computational complexity is one of the main topics, if not the principal one, of
computer science. And computational complexity is a fundamental notion indispensable to characterize any cryptographic transformation concerning public key
crypto-systems.
However, the role of computational complexity is also important in the classical
private key encryption schemes, in particular if we want to measure the systems
strength against different types of attacks.
Unfortunately, a satisfactory measure of computational complexity, analogous to
Shannons information measure, is still missing. Therefore, some naive measures
of computational complexity have been considered which have no axiomatic basis, although they are heuristically sound. Further, these measures are useful in
practical applications. The consequences drawn from these practical measures
are acceptable and useful; in particular, cryptographic systems designed using
such measures have shown themselves to be reasonably secure.
From a purely theoretical point of view, many attempts have been made to find an
axiomatic theory of computational complexity measure, and probably the most
successful is due to Andrey Nikolaevich Kolmogorov, a preeminent mathematician of the 20th century. Kolmogorov defined a measure of complexity of a
- 8.1 -

sequence of numbers tied to the minimal amount of information (in Shannons


sense) required for a description of the sequence itself. Although Kolmogorovs
measure of complexity is theoretically well founded, it is not suitable for evaluating the complexity of numerical algorithms (dealing with any sort of numbers).
Most algorithms for which we want a measure of complexity concern the evaluation of mappings between number domains D and R of finite cardinality. A
mapping f is a correspondence, not necessarily one-to-one, written as
f : DR ,
where D has cardinality d, and R has cardinality r, with d r. In this context,
the complexity of f , written cmpx(f ), should be a real function g(d, r), of d and
r. When D and R have infinite cardinality, a measure is still desired; however,
it requires different definitions and hypotheses that cannot be extrapolated from
the finite case. The infinite case will not be discussed further.
Upper bound to the complexity of f . The functions f may be described using
a d r table, thus, the complexity function cmpx(f ) is upper bounded by d r.
However,
with the given hypotheses, it is possible to introduce a correspondence
between the elements of D and the integer numbers from 1 to d. Then,
f may be evaluated by searching in a table consisting of a single column of d rows, with each row consisting of a single cell containing an
element of R. Specifically, `(), with D being the index of a cell
storing f (). Hence, the complexity cmpx(f ) for the purpose of computing f is a function of d only, and thus is upper bounded by d, i.e.
cmpx(f ) d.
Since to represent d distinct integers, n = log2 d bits are necessary and sufficient,
we may use n as unique base parameter in the definition of a measure of complexity.
In general, we use notations like
cmpx(f ) = g(n) , cmpx(f ) g(n) , cmpx(f ) = O(g(n)) , cmpx(f ) = o(g(n))
to indicate varyingly close estimations of the complexity of f .
It is interesting to represent the complexity of a function f as a point in a plane
with coordinates x = log2 d, and y = cmpx(f )). In this plane, as a consequence
of the previous considerations, every function f is associated to a point located
between the positive x-axis and the straight line bisecting the first quadrant. In
the most interesting case concerning cryptography, two curves are relevant which
bound the region of complexity and represent the upper and lower bound. These
curves are indicatively shown in Figure 8.1; their shape gives an idea of how fast
the complexity grows with the size of the domains d.
A tangible view of the growth of complexity may be acquired by looking at the
below table, where the size of d covers a range of values significant in todays
cryptography, with values up to 10100 .
- 8.2 -

Figure 8.1: Complexity typical trends: logarithmic growth vs. linear growth

8.1.1

A heuristic view of computational complexity

Kolmogorovs measure of complexity refers to streams of M bits, and defines


the complexity of a stream (or sequence) as the minimum amount of information, likewise measured in bits, necessary for its description. According to this
approach, it is very hard (and not very practical) to define the complexity of a
function, because it is necessary to describe it by means of a binary stream, in
order to apply Kolmogorovs view.
Computational complexity is an unavoidable characteristic of arithmetic, which
offers simple or intriguing examples of one semi-face of complexity. Fermat
proved that any prime p of the form 4k + 1 can be written as the sum of two
squares. That is, p is represented by the binary quadratic form x2 + y 2 , and a
trivial way to compute a representation is to check when the difference p x2 is

a square for every x between 1 and p. We say that the complexity of this ap
proach is exponential, because, a priori, the number O( p) of attempts that we
must make is not a polynomial in the number log2 p of bits used to represents p.
We say that the complexity is polynomial if the number of attempts or bit operations is O(logr2 p), [67].
Starting with Euler and Lagrange, (through Gauss, Mathews, and later mathematicians of the 20th century), a large number of methods have been developed
for computing a representation p = x2 + y 2 , but every method ultimately has
exponential complexity. In 1982, Rene Schoof proposed a method based on the
properties of elliptic curves, which obtained the two-square sum representation
of a prime in polynomial complexity O(log92 p).
- 8.3 -

Figure 8.2: Typical complexity trends: exponential and sub-exponential growths

log2 d

1
1.00
0
10
3.16
3.32
100
10.00
6.64
1000
31.62
9.96
10000
100.00
13.29
106
1000.00 19.93
109 31622.77
29.89
12
6
10
10
39.86
1024
1012
79.73
50
25
10
10
166.09
10100
1050 332.19

In cryptography, the problem of computing the square root of a quadratic residue


modulo a prime p is ancillary to many computational questions. The purely
mathematical problem is: given a quadratic residue a mod p, compute a root of
the equation x2 a in the field Zp . This problem is essentially equivalent to computing a quadratic non-residue in Zp ; an equivalence proved in a later section.
The computational complexity in these examples may be evaluated or estimated
without serious trouble. Nevertheless, computational complexity in arithmetic
has more hidden traits, which were disclosed by Turings approach to computation, namely by the introduction of Turing machines.
In this scenario, a big challenge is to harmonize these very different ways of thinking, namely, the fascination of the arithmetical complexity of problems, with the
cold, abstract view of Turing computability.
- 8.4 -

In the following sections, a brief description of this state of affairs will be attempted, with no greater scope than providing an introductory sight of problems
that are at the very core of present and future mathematics research.

8.2

Complexity: the Heart of Cryptography

As Francis Bacon said with nice irony, This art of ciphering, hath for relative an
art of deciphering, by supposition unprofitable, but as things are, of great use. The
aphorism is not an understatement, because the art of ciphering actually consists
in applying a transformation to a text which must be made incomprehensible to
the adversary, and then applying the inverse transformation in order to read the
message. These transformations must meet some requisites, which were listed
by Bacon himself. These principles are the essence of the age-long progress in
the art of cryptography, and Bacons great merit is that of having collected them
together, ordered them, and expounded them without any obscuring addition.
Bacons achievement appears to be even more remarkable, since his conscious
goal was certainly not to create a scientific discipline.
Let M be the message to be encoded, T the cryptographic transformation and E
the ciphered text:
1. T must be easy to perform, that is it must be easy to obtain E by applying
the transformation T to M, i.e. E = T (M);
2. T must be invertible, that is, given E, it must be always possible to re-obtain
M; this possibility is indicated with the notation M = T 1 (E), even if T is
not, properly speaking, invertible in the algebraic sense;
3. T 1 must be difficult to calculate without additional information (information that is of a confidential nature in practice).
Without confidential information, the function T must be one-way, that is easy
to calculate but difficult to invert. The classic solution, dating back to Caesars
code, consists in making T depend on a parameter, K, called the enciphering
key, which must only be known to legitimate system users. Should an adversary
manage to learn K, then he is said to have broken the code. Nevertheless, an
adversary might, given E, succeed in determining M without knowing or managing to discover K. In this scenario, the above requisites are completed by a
further restriction:
4. Knowing M, E, and even T , it must be hard (it must be computationally
difficult) to determine K.
A system that meets these conditions is known as a cryptographically robust system. The chief reason for introducing K was to make the coding system depend
on a limited quantity of information compared to the quantity of information to
be protected. Obviously this economy introduced weaknesses, which have constituted the perennial conflict between coding scheme designers and attackers,
- 8.5 -

Private channel
K
S

K
E

Public channel

T 1

Figure 8.3: Cryptographic Communication Channel


who for various reasons seek to break the security protection.
The Shannon approach, based on the measurement of information, definitively
clarified the position. For a system to be secure, the key, K, and the message,
M, must represent the same quantity of information.
Assuming that we know the coding algorithm, which is otherwise valid; assuming that we know a message, M, and the corresponding coded version, E, we
suppose that to compute K is as difficult as to randomly guess K. Under these
hypotheses, a secret key K, consisting of some hundreds of bits, allows us to consider the crypto-system secure from a practical standpoint. Therefore, the issue
is moved from designing the encryption scheme to managing the system of keys.
In this circumstance, a new paradigm was introduced into the game:
To transfer information over a public channel in a confidential fashion,
without prior agreement between the parties, that is without prior exchange of a secret key K, or an enciphering function T .
The same one-way functions of traditional cryptography can be used to perform
the required task, but a new way of looking at the game was required in practice,
exchanging the roles of sender and recipient of the secret message:
In order to receive a secret message from a Sender, the Recipient communicates the function T to the Sender over a public channel. The
function T must depend on a secret key in possession of the Recipient, and it must be difficult to invert it without knowing that key.
This new secret communication model was introduced by Diffie and Hellman
in 1976, together with an example of a mathematical solution.
The mathematical solution proposed by Diffie and Hellman is still valid, and originated public key cryptography, which now plays a major role in the worldwide
communications system of the global village. Besides the target of concealing
information, many new objectives can be attained independently of the secrecy,
in particular: integrity, and authentication.
- 8.6 -

Information integrity aims at ensuring that the original message is received at the
right time, in the right place, by the intended recipient without distortion or alteration. This feature is sometimes implicit in traditional secret-key communication
systems, although it is a totally independent concept from secrecy.
Information authentication was implicit in the hand-written signature, guaranteeing both origin and non-disclaiming of a message. This property of public key
cryptography allows what, in jargon, is called electronic signature to exist.
While the classic cryptographic functions have also been described in the form of
tricks, all valid solutions concerning public key cryptography are derived from
arithmetic or algebraic properties of numbers.

8.2.1

One-way functions

Two types of one-way function can be considered, which, respectively, characterize classic (i.e. private-key) and modern (i.e. public-key) cryptography.
Let F be a function defined on a domain D = D1 D2 and with values in a domain
R. Let G be its inverse defined on the domain A = R D2 and with values in the
domain D1 . We define:
partial one-way-F: F(., .) is an invertible function of two arguments such that its
image z = F(x, y) is easy to calculate, and the inverse image x = G(z, y)
is easy to calculate knowing y, whereas it is difficult to calculate without
knowing y.
total one-way-F: F(x, y) is an invertible function of two arguments such that the
image z = F(x, y) is easy to calculate, and the inverse image x = G(z, y) is
difficult to calculate even knowing y.
Clearly, these definitions contain the expressions in terms of easy and difficult to calculate, that have not been given explicit values of complexity, and that
might appear ambiguous. Nevertheless, the definition intends to convey the intuitive concepts of easy and difficult. In the examples of one-way functions that
we will give, these notions will be clarified to some extent, using the given definition of arithmetical computational complexity. Anticipating the aspects of easy
and difficult computation as will emerge from the examples, a definition of oneway-F may be
Definition 8.1. An invertible function f from a set D of cardinality N into a set R is
said to be one-way if Cm (f ) = O(lnr N ), and Cm (f 1 ) = O(N s ).
Classical systems, that is to say stream ciphers, DES, AES, etc., and block encryption systems in general, may be described as partial one-way-F(x, y) expressions,
where, substantially, y is the key.
Modern systems, in particular the example of Diffie and Hellman, are models of
total one-way-F(x, y) expressions. All the examples of total one-way functions
are derived from arithmetic or from algebra, and are linked to famous problems
- 8.7 -

to which a solution is known to exist, but can only be computed with great difficulty. Only three classes of arithmetical and algebraic problems have been, and
still are, used as generators of functions that are candidates for total one-way
functions: 1) Discrete logarithm (DL); 2) Factorization (FP); and 3) Searching unsorted data (SUD).

8.3

Arithmetic complexity

Arithmetical operations in a ring R occur in many field areas of modern engineering. In particular, cryptography is based on very elementary operations with
integers, or numbers that can be treated as integers, but that are identified by a
very large number of digits, and the complexity of products or exponentiations
ultimately lies in the huge number of digits.

8.3.1

Complexity of product and exponentiation

In any group or ring considered multiplicatively the complexities of two basic elemental operations are of great importance, namely product and exponentiation.
Complexity of product in Z
Let (a, b) denote the function product, namely (a, b) = ab is the product of
two integers. Fast multiplication techniques of integers received much attention
after the appearance of computers. A complete account is found in [47], which
although dated can still be considered the most extensive and reliable reference.
The main conclusion is that, if n is the number of bits representing each factor, the
complexity is of the order of n2 , that is C() = O(n2 ). However, it is possible to
do better: the Karatsuba algorithm performs the multiplication with complexity
C() = O(n ), with = ln 3/ ln 2 = 1.585. Todays best asymptotic result, of

very limited practical application, is Schonhage


and Strassens algorithm, which
performs the product with complexity C() = n log n ln ln n, [58].
Complexity of exponentiation
The complexity C(exp) of the power calculation g N has been extensively analyzed
in the scientific literature, and a fairly complete account is given by Knuth [47].
The computation of g N is known to require a number of products lying between
log2 (N ) and 2 log2 (N ). This well-known fact is demonstrated by considering the
exponent N represented base 2
N = N0 + N1 2 + N2 22 + + Ni 2i + + Nn1 2n1
where n = blog2 (N )c, thus
- 8.8 -

the lower bound is obtained, for example for N = 2n2 + 2n1 , in which it
n1
is clear that n 1 squares will be needed to calculate the power g 2 , and a
further product to obtain N ;
the upper bound is obtained by noting that n 1 squares will be needed to
n1
obtain the powers g 2 , g 4 , . . . , g 2 , and at most n 1 products to calculate
g N from the above powers.
This algorithm for computing powers, also known as LR binary method for exponentiation, already appeared in Legendres book Theorie des Nombres; however, a
very similar algorithm was used by the Egyptians to multiply integers.
Remark 1. The minimum number of squares and products in the domain containing x, sufficient to compute xn , is related to the addition chain for n [47, p.402],
namely a sequence of positive integers aj , beginning with a0 = 1, such that aj is
the sum of two previous elements in the chain, either distinct or not (in which
case the operation is said to be a doubling)
aj = aj1 + aj2

j1 , j2 < j ,

and ending with ar = n. The number r is said to be the length of the addition
chain, and the length of the shortest addition chain for n is denoted `(n). This
minimum `(n) yields the minimum number of products needed for computing
xn . Given n, apparently no general efficient algorithm is known for finding the
shortest chain such that aj = n. Most of the results concern special forms of n, for
example if n = 2A + 2B + 2C , with A > B > C, then `(n) = A + 2. Obviously,
the shortest chains for n = 2m or n = 2m + 1 are well known, and are m and
m + 1, respectively. Many open conjectures related to n = 2m 1 exist, the most
notorious being the Scholz-Brauer conjecture `(2n 1) n 1 `(n), [47, p.459].
Remark 2. The LR binary method for exponentiation can also be conveniently applied to evaluate power X q modulo a polynomial of low degree g(X) = X n +
a1 X n1 + . . . + a0 over any field, preferably finite fields.

8.3.2

Finite field Arithmetics

Let F2m be a finite field generated by a root of a primitive polynomial of degree


m
g(x) = xm + g1 xm1 + + gm1 x + gm .
The elements of F2m are represented as binary vectors of dimension m, whose
entries are defined with respect to a basis {1 , . . . , m }consisting of m elements
of F2m that are linearly independent over F2 .
The additions in F2m are performed by executing m additions in F2 , then no comment is needed.
The multiplications can be performed with a complexity that is upper bounded
- 8.9 -

by m2 ; however, it may be very fast if a circuit performing m2 binary product


(logical AND operations) is available. In this case, the execution time is of the
order of ln2 m clocks, [25, 26].
Complexity bounds One of the most intriguing challenges is the search for
provably hard problems for cryptographic application.
The discovery of a hard problem has both practical and theoretical importance,
since on one hand it would guarantee achievable security, while on the other
hand it would give a sound basis for many purely hypothetical theories.
The factorization of large integers, the discrete logarithm problem, and the search
through unsorted lists have been briefly described in previous chapters; they are
credited as being the only known problems which are potential sources of oneway functions. In the following, the computational complexity involved will be
described in more detail.

8.4

Factorization complexity

The first type of arithmetic problem that may be used to create a one-way function is factoring.
In his Disquisitiones Arithmeticae at item 329 [33, p.396], Gauss recognized the importance of factoring
The problem of distinguishing prime numbers from composite numbers and
of resolving the latter into their prime factors is known to be one of the most
important and useful in arithmetics. It has engaged the industry and the
wisdom of ancient and modern geometers to such an extent that it would be
superfluous to discuss the problem at length. Nevertheless we must confess
that all method that have been proposed thus far are either restricted to very
special cases or are so laborious and prolix that even for numbers that do not
exceed the limits of tables constructed by estimable men, i.e. for numbers that
do not require ingenious methods, they try the patience of even the practiced
calculators.
It is well known that performing the product of two integers is a simple operation,
whereas given the product of integers it is still a difficult operation to determine
the factors. Some one-way functions for cryptography, such as RSA and Rabins
algorithm, have been constructed on this asymmetry. Both schemes are based on
the fact that given N = pq, the product of two prime numbers, it is hard to find
p and q when these numbers are large (bigger than 10100 ). More precisely, these
schemes are well-grounded on related problems which are difficult to solve: 1)
RSA exploits the property that to compute the order (N ) = (p 1)(q 1) of the
multiplicative group in the set of remainders modulo N is equivalent to factorizing N ; 2) Rabins scheme exploits the property that to solve a quadratic equation
x2 = a mod N is equivalent to factorizing N .
The asymptotic law (N )  N/ ln(N ) for the distribution of prime numbers
- 8.10 -

(N ) (the number of primes less than N ), usually called


the prime number theorem, guarantees that the number of primes around N is sufficiently large to
prevent
any exhaustive search. Actually, the number of primes in the interval
3
1
[ 2 N , 2 N ] is about



2 ln 2
N
N
3 N


1
,
2 ln (3 N /2) 2 ln ( N /2)
ln N
ln N

thus, in a symmetric
interval
around
N , we find as many primes as there are

between 1 and N .

8.4.1

Factorization in Z

Let (m) denote the function factorization when m = pq is the product of two
primes;if p and q are of the same order of magnitude, then both are of the order of m. The only known deterministic algorithm for computing
(m) is the
exhaustivesearch; it follows that the complexity is of the order of m, that is
C() = O( m) = O(2n ), with n = log2 m.
The following table summarizes the status of deterministic computational complexity for DL (discrete logarithm), and FZ (factoring)
Direct

Inverse

DL CM

C(CM )O(log2 M ) O( M )

FZ

C(ZN )O(log2 N )

ZN

O( N )

C(CM ) and C(ZN ) denote the complexity of the product in CM and ZN , respectively. The conclusion is that the order of magnitude of the complexities in DL
and FZ problems are comparable. It follows that the choice between the two kinds
of one-way function is based on the criteria cost, efficiency of the algorithms for
the underlying arithmetics, and reliability of the implementation.
The most efficient factorization algorithms are not deterministic. They are derived from an idea of Lucass, and are based on properties of elliptic curves.
The complexity of factoring a number N = pq products of two primes in nondeterministic complexity is sub-exponential, that is of the order
O(exp[ko (ln N )1/3 (ln ln N )2/3 ])

8.5

where ko 32 .

Discrete logarithm

The discrete logarithm is an application from a cyclic group, of order M , into


the additive group of residues modulo M . The discrete logarithm is the second
- 8.11 -

1
1

16


1
Q
Q

P
PP

PP


@
@
@

9
16

Q
@
@
@

5
16

13
16

PP
1
Q
Q

Q

Q

@
@
@
@
@
@

3
16

11
7
16
16

15
16

Figure 8.4: Tree representation of Groups of order 2k2 in Zp


instance of potential one-way function, since the computation of a discrete logarithm is considered a difficult operation, although the difficulty relies on the
actual representation of the cyclic group elements.
As an example of discrete logarithm computation, consider the multiplicative
cyclic group of residues modulo p.
Let 2k2 be the maximum power of 2 dividing p 1, then p 1 = 2k2 (2h + 1). The
cyclic group Zp can be seen as a direct product of two cyclic groups C2k2 and C2h+1
of relatively prime order. Thus, any element m Zp can be written as m = A B
with A C2k2 and B C2h+1 , where A and B are obtained from m as follows
p1

p1

p1

Compute A1 = m 2k2 = A 2k2 , since B 2k2 = 1

p1
2k2

and 2k2 are relatively prime, thus a generalized Euclide algorithm gives
and such that p1
+ 2k2 = 1
2k2

In conclusion

p1
k

A = A1 = A
.
B=m
A

2 2

k2

= A12

=A

The square root of any quadratic


(QR) m can be computed as a product
residue

of square roots, that is m = A B. The square root of B is easily obtained as

B = B

(1+ p1
k )/2
2 2

p1

is odd, therefore the computation of m is reduced to the


since B 2k2 = 1, and p1
2k2

computation of a square root A of A in a cyclic group C2k2 of order a power of


2, with A belonging to the subgroup of order 2k2 1 . The elements of C2k2 are the
roots of the polynomial

2k2

1=

kY
2 1

(x2 1)(x2 + 1) = k2 + (x)k2 (x) ,

i=0

- 8.12 -

k 1

where the roots of k2 + (x) = x2 2 + 1 are quadratic nonresidues, and the roots
k 1
of k2 (x) = x2 2 1 are quadratic residues. Since the square root of a Quadratic
NonResidue (QNR) does not exists, k2 + (x) splits into the product of 2k2 2 quadratic
irreducible polynomials x2 b, while k2 (x) splits into the product of 2k2 2
quadratic reducible polynomials x2 b, that is k2 (x) ultimately splits into linear
factors.
In conclusion, the problem of computing the square root of any QR is reduced to
that of computing a square root of a QR in C2k2 . Analogously, a QNR is obtained
as a k2 -th of unity constructing a sequence of primitive roots for every exponent:
Ir =

p
r1
Ir1 = (1)1/2

for 1 r k2 with I0 = 1 as initial value. The last term Ik2 is a QNR




Ik2
p

p1

= Ik22 = (1) 2k2 1

p1
2

p1

= (1) 2k2 = 1 .

An interesting description of C2k2 1 is obtained using a tree graph. Group elements are used to label the nodes of a binary tree with 1 as label of the root which
is connected by a single branch to the node labeled with 1; every other node
is binary with one input and two output branches, except the terminal nodes,
which have no output branches. The tree is full, and its height is k2 . If `1 isthe
label of a node, then the extremes of outgoing branches are labeled with `1 ,
see Figure 8.4. Any QNR is the label of a terminal node, while the label of every
non-terminal node is a QR.

8.5.1

Discrete logarithm as one-way function

Given a cyclic group C of order M and a generator g C, a one-to-one correspondence exists between the elements of C and the elements of ZM , the additive
group of the ring of residues modulo M :


c = g`
` ZM
` = Lg (c)

The exponentiation is a mapping from ZM into C, while the inverse mapping, denoted Lg (c), from C into ZM is said to be the discrete logarithm of c base g.
The function Lg shares all typical properties of the real logarithm, in particular
the logarithm of the product is equal to the sum of the logarithms, the logarithm
of a power is equal to the product of the exponent and the logarithm of the base,
and the logarithm of the cyclic group identity is 0 ZM .
Whereas the exponential g ` can be easily calculated, in general the discrete logarithm is difficult to calculate (or at least it is not known how to calculate it easily).
It is remarked that the difficulty depends on the representation of the elements of
C. An example illustrating this point is obtained considering C = ZM . Let g 6= 1
- 8.13 -

be the generator of C, g is an integer relatively prime with M . In this case, power


computation is simply multiplication modulo M . We trivially have

c = g ` mod M
` = Lg (c)
= gc mod M ,
thus the discrete logarithm is computed by performing a division modulo M .
In conclusion, power computation and discrete logarithm computation have the
same complexity.

8.5.2

Discrete Logarithm Complexity

Consider a cyclic group CM of order M in which the discrete logarithm problem


is not trivial. Good instances are the multiplicative group of finite fields, and
the cyclic group of elliptic curves over finite fields. The elements of CM can be
represented by n = log2 M bits, and the exponents belonging to ZM can likewise
be represented by n bits. The DL problem is characterized by the different complexities of the two isomorphisms ZM CM and CM ZM . The computational
complexities of the two isomorphisms are, in fact, very different at the current
state of knowledge.
Computational complexity of mappings ZM CM
Let g be a generator of CM . Then, given ` ZM , the complexity of the computation of g ` depends on `, the exponent of the power, and on the complexity of the
operations of product and squaring in CM . It is evident that if ` = 0 or 1 the complexities are trivial. Also of interest is the minimum complexity obtained for
computing the powers using the minimum number of products and squarings,
because the computation of such minima, related to addition chains of minimum
length, is too complex when the powers must be computed routinely.
The very interesting complexity concerns the worst-case evaluation of a power
computation with an unspecified exponent not greater than M . This complexity can be expressed as a function C(exp) = O(ln M ) of log2 M . Furthermore, let
C(|CM ) denote the complexity of the product of two elements in CM (it might be,
in some cases, useful to distinguish whether the factors are equal or not). Thus,
the computational complexity C(M ) of the mapping M : ZM CM can be defined as the products of two contributions
C(M ) = C(exp)C(|CM ) .
Computational complexity of the mappings CM ZM
In the converse case, given c and a primitive element g of CM , the calculation
of the exponent x ZM such that g x = c, namely the discrete logarithm Lg (c),
n
can be performed through 2 2 elementary operations, consisting of products and
comparisons. This bound is substantially due to Shanks, and is the product of
- 8.14 -

a procedure known as baby-step-giant-step. However, the search for groups in


which it is difficult to calculate the discrete logarithm has been made more necessary by the method, due to Pohling and Hellman, for calculating logarithms in
groups whose order M factors into powers of small prime factors.
Pohling and Hellmans computation of logarithms.
Pohling and Hellmans
method shows that the computational
complexity
of
the
discrete logarithm in a
Qr
i
cyclic group CM of order M = i=1 pi depends linearly on the computational
complexities of the subgroups of prime order. Let C(DL|M ) denote the computational complexity of the function discrete logarithm in a cyclic group of order M ,
then, we have
r
X
C(DL|M ) =
i C(DL|pi ) .
(8.1)
i=1

This equation proves that the computational complexity of DL is large in CM only


if M has a large prime factor. The validity of equation (8.1) follows from the
iterated application of two properties, namely

C(DL|m1 m2 ) = C(DL|m1 ) + C(DL|m2 ) with (m1 , m2 ) = 1


,

C(DL|pd ) = d C(DL|p)
from which it follows that calculation of the discrete logarithm in CM can be reduced to calculating the discrete logarithm in two cyclic groups of order m1 and
m2 , or in a cyclic group of order p. Proofs of the two properties are given in the
same order.
Let M = m1 m2 be the order of a cyclic group CM with generator g the product
of two relatively prime numbers. The discrete logarithm x = Lg (a) ZM can be
written as x = x1 1 + x2 2 by the Chinese remainder theorem, with x1 Zm1 and
x2 Zm2 . The two multipliers

1 = 1 m2 1 mod m1
2 = 2 m1 1 mod m2
satisfy the following conditions modulo M
12 = 1

22 = 2

1 2 = 0 .

Then x may be calculated, starting from the relation


a = g x1 1 +x2 2 ,
by calculating the two powers
a1 = g (x1 1 +x2 2 )1 = g x1 1 = (g 1 )x1 ,
a2 = g (x1 1 +x2 2 )2 = g x2 2 = (g 2 )x2 .
- 8.15 -

These equations show that the two coefficients x1 = Lg1 (a1 ) and x2 = Lg2 (a2 )
can be obtained as logarithms in two cyclic groups of order m1 and m2 , respectively. In terms of complexity we may write
C(DL|m1 m2 ) = C(DL|m1 ) + C(DL|m2 ) ,
plus an overhead of complexity due to the calculation of a1 and a2 , which is
polynomials in ln M .
Let M = pd be the order of a cyclic group CM , with generator g, power a prime
number p. Let x = Lg (a) ZM be written as an integer base p
x=

d1
X

xi p i .

i=0

Raising a to the power pd1 we obtain x0 as the logarithm in a cyclic group of


order p
 d1 x0
pd1
x0 pd1 +X0 pd
a
=g
= gp
d1

where g p is the generator of a cyclic group of order p.


d1
Once x0 is computed as Lpd1 (ap ), it is eliminated from the exponent consider2
ing a1 = ag x0 = g x1 p+O(p ) . Therefore x1 is obtained as
x1 = Lpd1 (a1p

d2

) .

The process is iterated d times to produce x0 , x1 , . . . , xd1 by computing in each


case a logarithm in a cyclic group of order p. In conclusion
C(DL|pd ) = d C(DL|p) ,
plus the computational complexity for computing d 1 powers, of exponents
in decreasing magnitude pd1 , pd2 , . . . , p, which are obtained with polynomial
complexity in ln p, that is
d(d 1) log2 p .

8.5.3

Shanks Bound

The computational complexity of the discrete logarithm is upper bounded by a


result of Shanks. The idea is rudimentary and very simple; nevertheless, no
better general upper bound has yet been found.

Consider a cyclic group C of order M and define m = d M e. Let g be a generator


of C. Given an element c = g ` C, write the unknown exponent in the form
` = `0 + `1 m, where `0 , `1 m, are the unknowns to be found. The m numbers
g j , with 0 j m, are pre-computed, and written in an orderly manner in a two
column table (hereafter referred to as Tab), with j written in the first column, and
g j in the second. Then, Tab is sorted by columns according to the elements g j s of
- 8.16 -

the second column.


Given c C, the elements w = cg im are sequentially computed for 0 i < m,
starting with i = 0, then w is searched in the ordered second column of Tab;
searching an ordered set of size m is an operation of complexity O(m). When
a coincidence is found at a position that has `0 in the first column, the process
stops, and both `0 and `1 are thereby found. The complexity of Shanks algorithm
is estimated as follows
Pre-computations are not counted because they are performed once, that is
the preparation of the Table is not considered;
The amount of stored data in Tab is

M log2 M bits ;
The complexity for sorting Tab is not counted because it is performed once;
Each search in sorted Tab requires a maximum of log2 M comparisons;

The evaluation of M products and powers, of maximum exponent b M c,


establishes the computational complexity of every DL computation.
In summary Shanks method requires:

1) a storage capacity of M log2 M bits


2)
a preliminary sorting of a table of size M log2 M bits
3) M (1 + log2 M
) products in CM at every DL computation.
4) a maximum of M log2 M comparisons for the search in sorted Tab
In conclusion, the computational complexity of Shanks algorithm,
for computing

the discrete logarithm in a cyclic group of order M , is O( M ln M ).

8.6

Searching Unsorted Data (SUD)

The third mathematical topics that may be used to define one-way functions is
searching of data. The technical formulation of the problems requires a preliminary definition.
Let f : A N be a mapping from a finite set into the positive set of integers.
The image f (a) N of a A is called the label of A. Two important searching
problems are
1. Given an object b with label ` = f (b), decide whether b is in A.
2. Given a label ` find a such that f (a) = `.
- 8.17 -

If the objects or data are sorted by labels, the search is trivial, that is, the complexity of problem 1. is O(ln(|A|), while the complexity of problem 2. is O(1).
Otherwise, the complexity of both problems is exactly |A|. A specific instance of
SUD problems useful in cryptography utilizes vector spaces, and is known as the
McEliece scheme, whose description is repeated here.
Let Vn be a vector space of dimension n on a finite field Fq . Let Vn,k be a subspace
of Vn of dimension k. Let wH (v) denote the number of non-null components of a
vector v, i.e. wH (v) is the Hamming weight of v, and defines a metric in Vn .
Let ` be a coset leader in the partition of the group Vn into cosets of the subgroup
Vn,k , the corresponding coset is written as ` + Vn,k . If words of minimum weight
in the cosets are chosen as coset leaders, the partition of Vn into cosets specifies a
minimum distance decoding.
In general, given an element v Vn , it is an NP problem to determine the coset
leader `(v) of the coset to which v belongs [79]. Conversely, let d be the minimum
c, it is easy to generate an element
weight of the coset leaders, and set t = b d1
2
v = c + ` in some coset, where c is an element of Vn,k , and ` is any vector of
weight less than or equal to t. These two contrasting operations characterize a
one-way function based on SUD.
McElieces idea has been to use an easily decodable code C with length n, dimension k, large minimum distance d, and generator matrix G. Then, two secret
matrices are selected, namely, an n n random permutation matrix P, and a nonsingular k k random matrix B. A matrix E is computed as
E = PGB ,
c, the error code correcting
and represents the public key together with t = b d1
2
capability.
For encryption, a message m is encoded in a vector x of dimension n, which is
then transformed as
y = Ex + e ,
where e is a random vector of large weight, not greater than t, kept secret.
Decryption consists of the following steps:
- Compute the vector PT y. This transformation does not alter the number of errors, i.e. the weight of e.
- Compute the syndrome
s = HPT y = HGBx + HPT e = HPT e ,
since the parity check matrix H, of the easy-to-decode C, is known.
= PT e from s,
- Recover the error vector e
because the number of errors falls within the code error correcting capability, and
the code generated by G has an efficient decoding algorithm.
= Bx = y + P
- Compute x
e.
- 8.18 -

.
- Obtain x = B1 x
This standard decoding procedure can only be performed knowing P and B,
which are the secret key. It is an open problem to prove or disprove whether an
efficient decoding is also possible knowing only P, or only B, or neither.
McElieces scheme is one of the few attempts exploiting SUD that is still valid.

- 8.19 -

Chapter 9
ECC
Arithmetic is one of the oldest branches, perhaps the very oldest
branch, of human knowledge, and yet some of its most abstruse
secrets lie close to its tritest truths.
H.J.S. Smith

9.1

Introduction

The acronym ECC stands for Elliptic Curve Cryptosystem, and denotes the class of
cryptographic systems whose key transformations are defined using the abelian
additive point groups of elliptic curves over finite fields. Elliptic curves were
originally defined over the complex field; they are algebraic cubic curves such
that in every point the tangent line is uniquely defined. In more technical terms,
an elliptic curve has no singular point, i.e. there are no points in which the tangent
is not unique, or is not defined at all. A few words to fix the terminology. An
algebraic curve Cf of degree n is identified by a polynomial of degree n in two
variables f (x, y) with coefficient in a field K. Any straight line intersects Cf in at
most n points (exactly n points in a suitable extension of K). The coordinates of
any singular points Qi = (xi , yi ), i = 1, . . . , s of Cf , are obtained by solving the
system
f (x, y)
f (x, y)
= 0,
=0 .
f (x, y) = 0,
x
y
Straight lines through a singular point have multiple intersections with the curve
in that point. By a non-singular curve we mean a curve with no singular points.
An elliptic curve over a field F is a non-singular curve described by a cubic polynomial in two variables with coefficients in a field F
f (x, y) = x3 a0 y 2 a1 xy a3 y + a2 x2 + a4 x + a6 + x2 ya7 + xy 2 a8 + y 3 a9 .
- 9.1 -

However, most of the classical developments concerning elliptic curves refer to


the equation
y 2 = 4x3 g2 x g3 .
(9.1)
This is the equation that Weierstrass used to define the -function, now bearing
his name, and to develop the whole theory of double periodic functions over the
complex field [78], i.e. the theory of elliptic functions. These functions are defined
by means of elliptic integrals, that is integrals whose argument involves algebraic functions and polynomials. The theory of elliptic functions emerged in the
second half of the eighteen century, as mathematicians searched for elementary
methods for computing integrals giving the lengths of ellipses, and lemniscates.
It is now proved (Vessiot-Picard, early 20th century) that these integrals cannot
be evaluated in closed form using a finite combination of elementary functions,
namely polynomial, rational, exponential and trigonometric functions. Something more will be said about the origins of this elegant theory in section 9.6 of
this chapter.
The following figure shows a typical shape of an elliptic curve with equation (9.1)
over real numbers. Further, the addition of points A and B is shown: the notation A B indicates the third point intersection, with the elliptic curve, of the line
through A and B.

AB

A+B

Figure 9.1: Elliptic curve y 2 = x3 x over R: Point addition


The curve with the same equation considered over the finite field of residues
modulo 5. F5 = {2, 1, 0, 1, 2}, contains 8 points and may be represented as a
set of scattered points, see Figure 9.2.
Before presenting the cryptographic applications of elliptic curves, their general
properties are briefly reviewed. The following short account is mainly devoted to
- 9.2 -

(2, 2)

(2, 1)
(1, 0)

(1, 0)
(0, 0)
(2, 1)

(2, 2)

Figure 9.2: Elliptic curve y 2 = x3 x over F5


describing and justifying the point group structure related to elliptic curves over
any field. Complete technical proofs of the properties, which may involve the
theory of elliptic functions, algebraic geometry, and plane algebraic curve theory,
are not given, and can be found in the specialized literature, e.g. [80, 71, 78, 73].

9.2

Elliptic Curves and Group Law

An algebraic plane curve is the set of points P (x, y) whose coordinates x and y
(i.e. coordinates in affine planes) satisfy an algebraic equation f (x, y) = 0, where
f (x, y) is a polynomial in x and y. Plane algebraic curves are also defined referring
to homogeneous coordinates X, Y , and Z (i.e. coordinates in projective planes).
The two set of coordinates, projective and affine, are related as
x=

X
Z

y=

Y
.
Z

Therefore, a point P (x, y), can be singled out by any triple (X, Y, Z) with X = xZ
and Y = yZ when Z 6= 0. The triples (X, Y, 0) represent points at infinity. The
triple (0, 0, 0) is excluded.
The most visible difference between the two representations is that homogeneous
coordinates represent points at infinity plainly, while in affine coordinates points
at infinity need a special symbol, like for x and/or y, and must be treated as
exceptional points. Furthermore, homogeneous coordinates offer some computational advantages in ECC.
Definition 9.1. Let F be either an infinite or a finite field. An elliptic curve E over F
consists of a set of points P = (x, y) whose coordinates x and y satisfy a cubic equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
- 9.3 -

(9.2)

where a1 , a2 , a3 , a4 , and a6 belong to F.


A point of E, with coordinates x and y in F, is said to be an F-rational point of E.
The set of F-rational points with the addition of a point O at infinity, is denoted E[F].
(The computations with O are plainly performed using homogeneous coordinates).
Over any field F, a linear (affine) substitution

x = t11 x + t12 y + a
y = t21 x + t22 y + b

(9.3)

transforms equation (9.2) into one of two forms


a)
b)

y 2 + a1 xy = x3 + a4 x + a6
y 2 + a3 y = x 3 + a4 x + a6

Discriminant and j-Invariant. Associated to equation (9.2) are two important


parameters: the discriminant , and a quantity called the j-invariant. These two
quantities are invariant for linear (affine) changes of variables with determinant 1, that is, curves obtained one from another by a substitution of the form
(9.3) have the same and j.
The discriminant is defined to be the resultant with respect to x and y of the
(x,y)
(x,y)
, and fy
. Thus, an elliptic curve is singular
three polynomials f (x, y), fx
(strictly speaking it is not an elliptic curve) if and only if = 0. Explicit expressions for and j are
= 16a24 a22 + 8a21 a24 a2 96a24 a1 a3 + 72a4 a21 a6 30a4 a21 a23 8a23 a22 a21 +
36a33 a2 a1 16a23 a32 216a6 a23 + a51 a3 a4 12a41 a2 a6 a41 a2 a23 +
36a31 a3 a6 48a21 a22 a6 64a32 a6 + a24 a41 432a26 27a43 a61 a6 + a31 a33
64a34 + 16a1 a4 a22 a3 + 288a4 a2 a6 + 72a4 a2 a23 + 8a4 a31 a3 a2 + 144a3 a2 a1 a6
j=

9.2.1

a41 + 8a21 a2 24a1 a3 + 16a22 48a4


.

Group Law

The points of E[F] form an abelian group (the point at infinity O is the group identity) for a point sum defined exploiting the property that a straight line has exactly three points in common with the curve. Despite its purely algebraic nature,
group law was discovered by Giulio Fagnano during his search for a duplication
law for elliptic integrals [28]. Fagnanos achievement was generalized by Euler
resulting in the group law for this sort of integral.
Composition law may be described in algebraic terms referring to a canonical
form of the elliptic curve equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 .

(9.4)

Addition of two different points P1 (x1 , y1 ) and P2 (x2 , y2 ) on the curve written as
P1 (x1 , y1 ) + P2 (x2 , y2 ), and duplication of a single point written as 2P1 (x1 , y1 ) will
be considered separately.
- 9.4 -

Addition. Given P1 (x1 , y1 ) and P2 (x2 , y2 ) on the curve (9.4), let P3 (x3 , y3 ) denote
their sum. To express the coordinates x3 and y3 in terms of x1 , x2 , y1 and y2 , consider the line ` through P1 (x1 , y1 ) and P2 (x2 , y2 ), which has equation
y = m(x x1 ) + y1

with m =

y2 y1
.
x 2 x1

The line ` meets the curve (9.4) in a unique third point Ps (xs , ys ) whose coordinate
xs is found by solving a third degree equation
x3 (m2 + a1 m + a2 )x2 + b4 x + b6 = 0 ,
obtained by substituting the expression for y given by the equation of the line
in (9.4) and rearranging the terms: the coefficients b4 and b6 are deliberately not
reported because they are not necessary in the computation. Since two roots of
the cubic polynomial, namely x1 and x2 , are known, the third root xs may be
simply obtained from the sum of the roots, that is, from the coefficient of x2
x1 + x2 + xs = m2 + a1 m a2 .
We have
xs = m2 + a1 m a2 (x1 + x2 ) and ys = y1 + m(xs x1 ) ,
and the addition point is given by the intersection of the vertical line through
P (xs , ys ) with the elliptic curve (the third point is the so called point at infinity).
Thus, the coordinate x3 is simply equal to xs , whereas the coordinate y3 is the
second root of the equation
y 2 + (a1 x3 + a3 )y = x33 + a2 x23 + a4 x3 + a6 .
We do not need to solve this equation because we already know the root ys . Thus,
from the sum (a1 x3 + a3 ) of the roots, in conclusion we obtain

x3 = m2 + a1 m a2 (x1 + x2 )
.
(9.5)
y3 = (a1 x3 + a3 ) y1 m(x3 x1 )
Duplication. The computation of 2P (x1 , y1 ) uses equations very similar to those
used for addition, the line through two points being replaced by the tangent to
the elliptic curve in P (x1 , y1 ). This tangent has equation
y = m(x x1 ) + y1

3x21 + 2a2 x1 + a4 a1 y1
.
with m =
2y1 + a1 x1 + a3

The tangent meets the curve (9.4) in a unique third point Ps (xs , ys ) whose coordinate xs is found by solving a third-degree equation, and finally we have
xs = m2 + a1 m a2 2x1 and ys = y1 + m(xs x1 ) .
- 9.5 -

As in the case of addition, we find the coordinates of Pd (x3 , y3 ) = 2P1 (x1 , y1 ) as



x3 = m2 + a1 m a2 2x1
.
(9.6)
y3 = (a1 x3 + a3 ) y1 m(x3 x1 )
Commutativity of this point-composition law is obvious since a line trough two
points does not depend on the order of the points. Associativity can be algebraically proved directly from the above derivation, but the proof is very cumbersome. A geometric view, indirectly based on Bezouts theorem on the number
of points in common between two algebraic curves, is more synthetic. It should
be noted that, these properties were implicitly justified by the elliptic integrals
sum (9.9).

AB

C B
C (A + B)
A (C + B)

C +B
A+B

Figure 9.3: Point addition law on Elliptic curves: Associative property


The three blue lines determine a degenerated cubic, denoted b1 b2 b3 , and the three
red lines determine a degenerated cubic, denoted r1 r2 r3 . These two cubic curves
have 8 common points, namely
A, B, C, O, A B, A + B, C B, C + B
- 9.6 -

which belong, three by three, to the six lines


{A, B, A B} b1 , {O, C B, C + B} b2 , {A + B, C, C (A + B)} b3
{C, B, C B} r1 , {O, A B, A + B} r2 , {C + B, A, A (C + B)} r3 .
Thus, both cubics must have also the ninth point in common by Bezouts theorem
(i.e. two cubics have exactly 9 common points), it follows that the orange and the
green points in figure coincide, that is C (A + B) = A (C + B). In conclusion
also their symmetric coincide
C + (A + B) = A + (C + B) (A + B) + C = A + (B + C) .
The structure of this commutative group depends on the field. In general it is
difficult to obtain simply by looking at the set of points; however, some efficient algorithms, determining this group structure, have been recently developed
for computer algebra symbolic programming packages. For example, elliptic
curves over the field Q of rational numbers have very intriguing Q-rational point
group structures. Theoretically, a theorem of Mordells states that the group of
Q-rational points of an elliptic curve is isomorphic to the direct sum of a finite
group (torsion part) and r copies of the infinite cyclic group Z (torsion-free part)
with r finite. The number r is called the rank of the group; it may be any positive
integer, zero included. The torsion part has been completely characterized by an
in-depth theorem obtained by Barry Mazur [80]. As a final remark, it is recalled
where
that the set of F-rational points is a subgroup of the group of points of E[F],

F denotes the algebraic closure of F, that is the field that includes every root of
any polynomial, with coefficients in F and in any number of variables.

9.3

EC over Finite Fields

In a finite field Fq of q elements, where q = pm is a prime power, elliptic curves


maintain point-additive group law. Since the group structures are finite, they
have been completely characterized; however, in general these structures are still
difficult to find.
Definition 9.2. An elliptic curve E over Fq consists of a set of points P = (x, y) whose
coordinates x and y satisfy a cubic equation
y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6

(9.7)

where a1 , a2 , a3 , a4 , and a6 belong to Fpm .


A point of E, with coordinates x and y in Fq , is said to be an Fq -rational point of E.
The set of Fq -rational points with the addition of a point O at infinity is denoted E[Fq ].
(The computations with O are more easily performed using homogeneous coordinates).
- 9.7 -

The invariants and j again have the structure given above; in particular, in
fields of characteristic 2 their expressions assume the simpler form
= a51 a3 a4 + a41 a2 a23 + a24 a41 + a43 + a61 a6 + a31 a33
a41
.

Explicitly, the coordinate composition rules are

x3 = 2 (x1 + x2 )
j=

where
=

y3 = y2 (1 )y1

y1 y2

x1 x2

if P1 6= P2

3x1 + a4

if P1 = P2
2y1
In F2m it is necessary to distinguish between supersingular (j = 0) and non-supersingular curves (j 6= 0). Furthermore, the equations are written in a form
evidencing the minimum number of field arithmetical operations.
Addition formula when j 6= 0.

The elliptic curve equation is

y 2 + xy = x3 + a2 x2 + a6
and the affine coordinates of P3 = P1 + P2 are


2 

y1 + y2
y1 + y2

x =
+
+ (x1 + x2 ) + a2

3
x1 + x2
x1 + x2

if P1 6= P2




y1 + y2

y3 =
(x1 + x3 ) + x3 + y1
x1 + x2
and

a6

x3 = x21 + 2

x1

if P1 = P2 .

y3 = x2 + (x1 + y1 )x3 + x3
1
x1
In homogeneous coordinates, the formulas assume the apparently more complex
form

X3 = [X1 Z2 + X2 Z1 ]{[Z1 Z2 ][Z2 Y1 + Z1 Y2 ]([Z2 Y1 + Z1 Y2 ] + [X1 Z2 +

X2 Z1 ]) + [X1 Z2 + X2 Z1 ]3 + a2 [Z1 Z2 ][Z2 X1 + Z1 X2 ]2 }

Y3 = X3 + [Y1 Z2 ][X1 Z2 + X2 Z1 ]3 + [Z2 Y1 + Z1 Y2 ]([X1 Z2 ][X1 Z2 +


ifP1 6= P2
X2 Z1 ]2 + {[Z1 Z2 ][Z2 Y1 + Z1 Y2 ]([Z2 Y1 + Z1 Y2 ] + [X1 Z2 +

X2 Z1 ]) + [X1 Z2 + X2 Z1 ]3 + a2 [Z1 Z2 ][Z2 X1 + Z1 X2 ]2 })

Z3 = [Z1 Z2 ][X1 Z2 + X2 Z1 ]3
- 9.8 -

and

X3 = (X14 + a6 Z14 )Z1 X1

Y3 = X14 (X12 + Z1 Y1 ) + a6 Z14 (X12 + Z1 Y1 + Z1 X1 )

Z3 = (Z1 X1 )3
Addition formula when j = 0.

if P1 = P2 .

The elliptic curve equation is

y 2 + a3 y = x 3 + a4 x + a6
and the affine coordinates of P3 = P1 + P2 are


2
y1 + y2

x =
+ (x1 + x2 )

3
x1 + x2

if P1 6= P2




y1 + y2

y3 =
(x1 + x3 ) + y1 + a3
x1 + x2
and

x41 + a24

x =

3
a23

if P1 = P2 .

+
a
x

y3 = 1
(x1 + x3 ) + y1 + a3
a3
In homogeneous coordinates the formulas assume the form

X3 = ([Y2 Z1 + Y1 Z2 ]2 Z1 Z2 + [X1 Z2 + X2 Z1 ]3 )(X1 Z2 + X2 Z1 )

Y3 = [Y2 Z1 + Y1 Z2 ]3 Z1 Z2 + (X1 Z2 + X2 Z1 )3 (Y2 Z1 + a3 Z2 Z1 )

Z3 = Z1 Z2 (X1 Z2 + X2 Z1 )3
and


2
1
a

4
2
2

X3 = X1 + Z1 Z12

a3 a3

3 


a4 2
1
a4 2
1
2
2

Y3 = X1 + Z1 + X1 + Z1 + Y1 Z15 + a3 Z16

a3 a3
a3 a3

Z3 = Z16

if P1 6= P2 ,

if P1 = P2 .

Given P E(Fq ) and an integer x, the point xP is defined to be the sum of P with
itself iterated x times
xP = P
| + P + P{z+ + P} .
x

- 9.9 -

The subgroup of Fq -rational points is of interest in several applications. In particular, its order and its structure are important in cryptography.
Theorem 9.1. Let E be an elliptic curve defined over Fpm , where p is a prime. Then two
integers n and k exist such that E(Fq ) is isomorphic to Zn Zk . Furthermore k|n and
k|(pm 1).
Zn denotes the additive set of remainder modulo n, which is the prototype of any
cyclic group of order n.
Hasses theorem gives a good estimation of the order of E[Fq ].
Theorem 9.2 (Hasse).
#E[Fq ] = pm + 1 t,

with

|t| 2 pm .

The following lemma may be useful in the explicit computation of group order
and structure of elliptic curves.

Lemma 9.1. Let m be even, and suppose #E[Fq ] = 2m + 1 t, with |t| 2 2m :


If t2 = 0, 2m , 2m+1 , or 3 2m , then E[Fq ] is cyclic.

9.4

EC Public-key Schemes

A cyclic subgroup of E[Fq ], the group of Fq -rational points of an elliptic curve E


can be used to define a one-way function whose strength is based on the discrete
logarithm.
Let P E[Fq ] be a point of order N , i.e. N P = O. The set of points xP , x =
0, 1, 2, . . . , N 1, forms a cyclic group where the discrete logarithm problem is hard,
if the elliptic curve is wisely chosen. Actually, we have that:
It is easy to compute Q = xP given P
It is hard to recover x from Q given P in the ense that no easy method is
known for recovering x.
Therefore, EC are a rich source of cyclic groups in which the discrete logarithm
problem is generally hard. In particular Diffie-Hellman and El Gamal public key
schemes may be implemented using elliptic curves, and assume the following
forms.
Let P be a publicly known point of a publicly known elliptic curve E[F2m ], which
is a generator of the largest cyclic group.
Diffie-Hellman scheme. The procedure, in principle, is not different from the
original scheme described for the multiplicative group of a prime field.
Let A = aP and a be Alices public and secret keys, respectively
Let B = bP and b be Bobs public and secret keys, respectively
- 9.10 -

The common secret key of Alice and Bob is K = abP .


This key can be used to exchange short messages, or to set up the initial state
of a stream generator in a symmetric enciphering scheme. However, this rough
form of public key exchange is not recommended. In practice, the El Gamal modification should be (wisely) used, to avoid, or to make more difficult, malicious
attacks.
El Gamal signature scheme. The elements of F2m are assumed to be represented
with respect to a normal basis. User A randomly chooses an integer a and makes
public the point A = aP , while keeping a secret.
User A employs the following procedure to sign up a message M , whether public
or private. When private, an over encryption will follow the signature procedure,
then a previous decryption is needed before verification.
1. Encode the message M , and apply any (to be specified) hash function to
produce a digest D ZN .
2. Select a random integer k, and compute the point kP = (
x, y).
3. Compute r = D + (
x) in ZN , where is a map from F2m into ZN .
4. Compute s = k ar mod N
5. Form the triple [M, r, s] as the signed message.
The signature verification process is
1. Compute the digest D ZN from M .
2. Compute (x1 , y1 ) = rA + sP on E(F2m )
3. Compute r0 = D + (x1 ) mod N
4. The signature is valid if r0 = r.
The validity of the scheme is based on the identity
sP + rA = (k ar)P + raP = kP .

9.5

Arithmetics and complexity in ECC

The adoption of EC schemes, based on the discrete logarithm, instead of Publickey systems based on factoring, closely depends on finite field arithmetics and
the computation complexity of point sums over elliptic curves. The computation
is referred to the canonical form of the equation in F2m .
Actually, xP with x large can be computed using the binary method:
- 9.11 -

write x = bs 2s + bs1 2s1 + + b1 2 + b0 as base 2 integer, where s = blog2 xc, and


bi = 0, 1, then xP can be viewed as a sum of (s+1) points
xP = (bs 2s + bs1 2s1 + + b1 2 + b0 )P = bs 2s P + bs1 2s1 P + + b1 2P + b0 P ,
where each addend
2i P = 2(2(2( 2 P ) )
| {z }
i

is computed via i doubling. Therefore, 2s additions/doublings are sufficient to


compute Q = xP . Referring to the addition formulas, the complexity in terms of
arithmetical operations in Fq is found as follows.
Non-supersingular curves. = a6 6= 0 and j = 1/a6
y 2 + xy = x3 + a2 x2 + a6 .
Supersingular curves. = a43 6= 0, and j = 0
y 2 + a3 y = x 3 + a4 x + a6 .
Notice that singular curves have = 0, while supersingular curves have 6= 0,
thus singular and supersingular curves are totally different concepts.
Number of operations in F2m for doubling (i.e. computing 2P ) in affine coordinates
j 6= 0
x3
y3
Tot.
j=0
x3
y3
Tot.

q
2
0
2

s
1
3
4

p
1
2
3

i
1
0
1

q
m+1
0
m+1

s
1
3
4

p
1+ blog2 mc
2
3+ blog2 mc

2
1
3

1
4
5

1
2
3

0
0
0

2
1
3

1
4
5

1
2
3

Number of operations in F2m for summing two distinct points in affine coordinates
j 6= 0
x3
y3
Tot.
j=0
x3
y3
Tot.

q
1
0
1

s
5
2
7

p
1
2
3

i
1
0
1

q
m
0
m

s
5
2
7

p
1+ blog2 mc
2
3+ blog2 mc

1
0
1

3
3
6

1
1
2

1
0
1

m
0
m

3
3
6

1+ blog2 mc
1
2+ blog2 mc

- 9.12 -

where q,s, p, and i denote the number of squarings, sums, products and inversions, respectively.
The costs or complexities of squaring, summing, inverting or multiplying in F2m
are different. These costs depend also on the field basis.
Number of operations in F2m for doubling (i.e. computing 2P ) and summing two
distinct points in homogeneous coordinates, respectively.
Doubling
j 6= 0 q s
X3
4 1
Y3
0 4
Z3
1 0
Tot. 5 5
j=0
X3
3 1
Y3
0 3
Z3
1 0
Tot. 4 4

Summing
j 6= 0 q s
X3
0 3
Y3
0 3
Z3
1 2
Tot. 1 8
j=0
X3
1 2
Y3
0 2
Z3
1 1
Tot. 2 5

p
2
3
2
7
3
6
1
10

p
5
3
7
15
3
3
5
11

With homogeneous coordinates, the number of divisions, which are the most expensive basic operations, is drastically reduced, to 2.
Inversion in F2m . The inverse 1 F2m of an element is computed as a power
with positive exponent
1 = 2

m 2

2 +23 +2m1

= 2+2

this operation requires m 1 squarings and m 2 products in F2m . Although the


number of squarings cannot be reduced, the number of products can be considerably reduced, and an old Scholz-Brauers conjecture implies that this number
is not greater than `(m), the length of the shortest addition chain leading up to m.
It well known that `(m) is upper bounded by 2blog2 mc [47, p416-418].
For the purpose of bounding the complexity, it is not too conservative to assume
that the complexity of the inverse is given by m 1 squares and 2blog2 mc multiplications in F2m . Furthermore, the evaluation of xP , where x is of the order of
2m , requires m doublings and m additions of points at most.
For realistically large m (> 100), the total complexities evaluated using affine and
homogeneous coordinates, are given in the following table, and the use of homogeneous coordinates is certainly preferable (with homogeneous coordinates, the
of the operations required with affine coordinumber of operations is about 21
m
nates).
Affine
j 6= 0

q
2m + m

s
p
11m 2m(1 + blog2 mc)

m2 + 3m

11m

Homogeneous
q
s
p
7m 1 13m 22m + blog2 mc

j=0
m(5 + blog2 mc)
- 9.13 -

6m 1

9m

21m + blog2 mc

9.6

Historical Notes

The early developments of the theory of elliptic functions introduced the notion
of functions (x) admitting an addition theorem, that is functions satisfying a
functional equation of the form
(u + v) = F ((u), (v),

d(u) d(v)
,
)
du
dv

where F (x1 , x2 , x3 , x4 ) is a rational function of four variables. Elementary examples of functions with an additive algebraic theorem are exponential and trigonometric functions, to which are added the elliptic functions.
1. The exponential function (u) = eu satisfies the addition rule
(u + v) = (u)(v) ,
thus F (x1 , x2 , x3 , x4 ) = x1 x2 .
2. The trigonometric function (u) = sin(u) satisfies the addition rule
(u + v) = (u)

d(u)
d(v)
+ (v)
,
dv
du

thus F (x1 , x2 , x3 , x4 ) = x1 x4 + x2 x3 .
3. The elliptic integral
u=P

(z) =

1
p
dx
4x3 g2 x g3

(9.8)

satisfies the following addition property


R z1
R z2
1
1
dx +
dx

3
4x g2 xg3
4x3 g2 xg3
R z3
= 3 1
dx

(9.9)

4x g2 xg3

where
1
z3 = z1 z2 +
4

!2
p
p
4z23 g2 z2 g3 4z13 g2 z1 g3
.
z2 z1

Using the Weierstrass function P, this equation is written as


!2
dP(u)
dP(v)

1
du
dv
P(u + v) = P(u) P(v) +
,
4 P(v) P(u)
which shows that
1
F (x1 , x2 , x3 , x4 ) = x1 x2 +
4
- 9.14 -

x4 x3
x2 x1

2
.

The algebraic relation between z1 , z2 and z3 is a direct consequence of the theory


of elliptic integrals as developed by Euler, Legendre, Jacobi, and Weierstrass.
Since
du
1
=p
,
dz
4z 3 g2 z g3
the previous relations imply that the elliptic curve y 2 = 4z 2 g2 z g3 can be
parameterized as

z = P(u)

y = dP(u)
du
The point P3 = (x3 , y3 ) defined by the coordinates

x = P(u1 + u2 )

3
.

y = dP(u1 + u2 )
3
du
can be viewed as the sum of P1 = (x1 , y1 ) and P2 = (x2 , y2 ). It is straightforward
to verify that P1 , P2 , and P3 = (x3 , y3 ) are collinear.
The addition property of elliptic functions sets up an abelian group structure with
a particular associated rational function F (x1 , x2 , x3 , x4 ). The group properties are
are a direct consequence of the correspondence
P (x3 , y3 ) = P (x1 , y1 ) + P (x2 , y2 ) u3 = u1 + u2 .
This group structure is prolonged to the point set of an elliptic curve. Due to the
algebraic interpretation of the addition law, the additive structure in the point set
of an elliptic curve is maintained also by the curves over finite fields. In this case
the additive groups are finite groups, which provide a large set of abelian groups
with properties that are suitable to define one-way functions.

9.6.1

The origins

The discovery of the additive law may be attributed to the Italian nobleman
Marchese Giulio Carlo Fagnano dei Toschi (1682-1766), and its first academic
organization is due to Leonard Euler, as is clear from J. Dieudonnes review
(MR2000) of R. Ayoubs paper The lemniscate and Fagnanos contributions to elliptic integrals. Arch. Hist. Exact Sci. 29 (1984), no. 2, 131149.
It is well known that in 1751 Euler was asked to examine the collected papers of the Italian nobleman G. Fagnano , and discovered
in them unsuspected relations between special types of elliptic integrals, which led him to create the general theory of these integrals,
culminating in the addition formula
Z u
Z v
Z r
dt
dt
dt
p
p
p
+
=
P (t)
P (t)
P (t)
0
0
0
- 9.15 -

p
p
where P (u) = 1+au2 u4 and r = (u P (u)+v P (v))/(1+u2 v 2 ). The
particular case considered by Fagnano was a = 0; the integral then
expresses the length ofp
an arc of a lemniscate
given by the parametp
2
4
2
ric representation x = (t + t )/2, y = (t t4 )/2. This had been
known since James Bernoulli , and Euler had met the same integral
in his work on elastic curves. The author explains that what Fagnano
did was to prove for that integral the duplication formula, the particular case of the addition formula for v = u. From this and ingenious
changes of variables and geometric considerations, he showed that if
_

OA is the arc of a half loop of the lemniscate, one can construct by


straight edge and compass the values of the parameter t giving the
_

length of the arc obtained by division of OA by 2, 3 or 5.


Many mathematicians, after Fagnano and Euler, have contributed to the development of elliptic integral theory, particularly Legendre, Lagrange, Jacobi, Galois,
Abel, and Weierstrass. These efforts culminated in the proof that there do not exist one-valued functions which have more than two periods. Hence every function for which there exists an algebraic addition-theorem is an elliptic function or
a limiting case of one.
The conclusion is that there is no hope of finding addition formulas different from
those summarized above.

- 9.16 -

Chapter 10
Cryptanalysis
The job of the artist is always to deepen the
mystery.
Francis Bacon

10.1

Introduction

Cryptanalysis is the most important, difficult, intriguing, and challenging aspect


of cryptology. It concerns methods of attacking cryptographic systems, put in
place by enemies or partners, with two main objectives:
1. To verify the validity of cryptographic schemes and protocols, acting with
all publicly-available information, that does not theoretically affect the security of the system.
2. To break the cryptographic schemes of adversaries (military, political, industrial, commercial, or social) acting with limited information about systems and keys. This endeavor may have many goals
(a) To find the cryptographic algorithm employed in a protected connection.
(b) To impersonate the authentic author of a message.
(c) To falsify a digital signature.
(d) To disclose the text of an encrypted message.
(e) To capture the secret key of a cryptographic system.
As Rejewskis cryptanalysis of Enigma showed [13], the security of a cryptosystem is vitally based on the mathematical properties of the enciphering algorithms,
and on the handling protocol. The history of Enigma has taught us much about
- 10.1 -

the security of cryptosystems, which at that time had not been formally proved
to be mathematically secure, both in the algorithm aspects and in the managing
protocol. Pole breaking of the Enigma system, or the decryption of messages encrypted using Enigma machines at Bletchey Park (UK) during World War II, were
a mix of cryptanalysis of the mathematical algorithm and penetration of protocol
features based on intelligence.
Confining our cryptanalysis considerations to enciphering schemes for information concealment, the goals are that of capturing the secret key, and/or the message content. Cryptanalytical processes are guided by two principles: one is essential to the formulation of the problem itself, and is aimed at disclosing protected information; the second is essential for recognizing that attacked enciphering schemes have been effectively broken. These two principles are formalized in
the set of basic axioms of cryptanalysis.

10.2

Axioms

Three axioms are indispensable for characterizing cryptanalysis. Two axioms are
apparently obvious, but are necessary for correctly defining the operative framework of cryptanalytic actions, which should not be confused with intelligence
activities.
The formulation of the first axiom may seem trivial, but is indispensable for specifying any cryptanalytic attack: it is necessary to know that the message is a meaningful encrypted text, written using some encryption mechanisms in recognizable
symbols.
Axiom 10.1. Given a text T to be cryptanalyzed, then
1. T is written in symbols of a known alphabet.
2. T is an encrypted message.
3. T has been produced from a plain text by encryption mechanisms whose formal and
mathematical descriptions are known.
The second axiom is indispensable for deciding the success of any cryptanalytic
attack. Specifically, in order for cryptanalysis to make sense, it is necessary that,
once a text U is obtained from an encrypted message, the cost for verifying that U
is the true original message is close to zero.
Axiom 10.2. When the cryptanalysis of an encrypted message T produces a message U,
the cost for recognizing whether U is the right message has O(1) complexity.
Remark. It is worth observing, despite its obviousness, that any message may
be encrypted in whatsoever message. Thus, if the set of encryption rules (algorithms) and the set of plain messages are not properly restricted, any operation
of cryptanalysis is meaningless.
In spite of their apparent simplicity, the conditions imposed by these axioms are
- 10.2 -

of fundamental importance in evaluating the security of any encryption scheme,


and also in designing such systems.
A third axiom is especially useful in designing encryption schemes, when it is essential to have some reference in order to evaluate the strength of the encryption
algorithms. Recall that for encrypting a message M M into a message E E, a
function f and a key K K are needed to compute
E = f (M, K) ,
with the condition that a function g exists such that M is recovered as
M = g(E, K) .
This third axiom imposes a well-defined condition for establishing the security of
any encryption scheme.
Axiom 10.3 (Kerchoffs Principle). The strength of an enciphering scheme must rely
totally on the key K and its secrecy. The transformation f is, or should be considered to
be, publicly known.
The condition that f be considered publicly known is a hypothetical condition
because, obviously, it is safer not to publish this function, while acting as though
it were known to all.
The way the secret key K is conserved, distributed, or shared affects the security of the system, but it must not be considered in defining the strength of the
encryption algorithm.
Clearly, this axiom applies to symmetric schemes directly, while its application
to asymmetric (or public-key) schemes is more questionable, since their security
lies principally in the strength of the one-way function. However, public-key
schemes may also be encompassed by Axiom 10.3, because some secret information is involved in any case. Unfortunately, the existence of one-way functions is
still unsettled.
The conditions imposed by these three axioms are of great importance in designing any cryptographic system, and in evaluating the strength of any enciphering algorithm. They also specify the framework of any feasible (not necessarily
successful) cryptanalytic attack, by enemies, opponents, or other unauthorized
persons.

10.3

Cryptanalysis of secret-key systems

Cryptanalysis of cryptographic symmetric algorithms is described referring to


the following data
Plain text M = m1 , m2 , . . ., which consists of a sequence of symbols from an
alphabet B
- 10.3 -

Enciphering algorithm A
Secret key K
Encrypted text E = e1 , e2 , . . ., which consists of a sequence of symbols from
an alphabet C.
Attacks may start from different levels of knowledge, ranging from the most favorable conditions for the defender, to the most favorable conditions for the attacker. Further, the objectives of the attack may be different. In particular, we
consider three cases that are of major interest, together with the relative targets:
CTANI. Cipher Text Attack with No Information: in this attack, only the encrypted text E is known; the goal is to find the encrypted message M, without any information about the algorithm A or the secret key K.
CTA. Cipher Text Attack with side information: in this attack, encrypted text E
and algorithm A are known, and the goal is to find the encrypted message
M, and possibly the secret key K.
PTA. Plain Text Attack: in this attack, a pair comprising encrypted text E and
plain message M is known, along with the encryption algorithm A; the
goal is to find the secret key K (to be used in future decryption).
When PTA succeeds, that is when the secret key has been found, we say that
the system has been broken. Since this kind of attack is the most favorable to
the opponents, the designers of symmetric key cryptosystems should project the
encryption schemes to resist PTA for better security. To this end, cryptanalysis is useful to discover hidden bugs, trivial or less trivial weaknesses, that may
have escaped the designers. In this sense, DES cryptanalysis is paradigmatic,
and shows that DES is secure with respect to the size of parameters (key and
test lengths). For this reason it will be described in detail, after describing some
cryptanalysis of historically important enciphering algorithms.

10.3.1

Cryptanalysis of classic schemes

Many famous enciphering schemes, namely, Caesars cipher, Albertis disk cipher, Vigeneres cipher, Hill cipher, and Bacons cipher, are easily broken by a
plain text attack. Therefore their cryptanalysis will only consider the cipher text
attack, but with some side information. That is, the algorithm to be attacked is
assumed to be known along with the language of the message, while the plain
text is assumed to be unknown. Given an encrypted text and no further information, the problem of finding which cipher is used among the various enciphering scheme will not be addressed here. This problem is certainly of the greatest importance to start the cryptanalysis of the message; however, besides trying
cryptanalysis for every possible cipher scheme, there are few methods, based on
- 10.4 -

frequency analysis, for recognizing the encryption algorithm from the encrypted
text.
In the following, we present cryptanalysis examples of cipher-text only attacks
on the following classic encryption algorithms:
A - Caesar cipher with the alphabet not in standard order.
B - Vigenere cipher with the alphabet in standard order.
C - Alberti disk cipher with reference to an Albertis disk adapted to the English
alphabet, and used in the simplest mode; attacks on the more sophisticated
modes also proposed by Alberti would require large encrypted text, and are
not suitable for exemplification.
D - Hill cipher with Hill matrix of dimension 2 or greater than 2.
E - Bacon cipher with either known or unknown partitions of the alphabet, but
with known binary encoding of the alphabet.
A - Caesar cipher. Known plain-text cryptanalysis of a message encrypted with
the Caesar scheme is trivial when the alphabet is taken in its natural order, since
an exhaustive search within 26 possibilities breaks the system.
The cryptanalysis of Caesar systems that use a set of alphabet letters in a permuted order is less trivial. Attacks on these schemes, already in the Renaissance, made use of statistical methods and probabilities (frequencies), and these
probabilistic methods are still today the most powerful tools for attacking cryptographic systems.
We will use a numerical example to examine how to proceed. Assume an alphabet consisting of 31 symbols, that is, the 26 letters of the alphabet plus 5
punctuation marks including blank (space). The letters are encoded with ordered
numbers from 0 to 25, blank is encoded with 26, and the remaining symbols are
encoded with 27, 28, 29, and 30, as shown in Table 10.1 together withthe rela
i
tive frequencies in the English language. A permutation is written as
.
(i)


i
A Caesar permutation of key k is denoted as
, where the number bei+k
low, obtained as an addition modulo 31, is substituted for the number above.
Therefore,
a Caesar permutation performed after permutation is denoted as

i
, which is equivalent to a permutation of the original ordered set,
(i) 
+k

i
that is
with (i) = (i) + k mod 31. In this operation mode, k may play
(i)
the role of a key that changes with the message to be encrypted (that is k has the
meaning of communication or message key), it is possibly known (or public, that
is communicated together with the encrypted message) , while the role of common secret key is played by the permutation , which should only be known to
- 10.5 -

sender and legitimate recipient of the message.


There are 31! keys. Cryptanalysis without any inside information entails looking for the plain message from the encrypted message. However, it also derives
the combined key of the secret permutation with the Caesar communication key.
Consider the encrypted text (already encoded):
4, 24, 13, 4, 15, 13, 10, 20, 24, 13, 30, 10, 27, 10, 20, 7, 10, 13, 24, 20,
13, 21, 2, 13, 3, 20, 7, 13, 21, 18, 10, 8, 13, 21, 2, 10, 13, 3, 18, 22, 2, 13, 3,
18, 17, 0, 13, 18, 10, 17, 13, 15, 24, 4, 9, 9, 13, 3, 18, 22, 2, 0, 13, 24, 3, 2,
13, 20, 1, 4, 10, 4, 20, 10, 13, 24, 3, 18, 24, 13, 24, 3, 2, 13, 18, 28, 28, 18, 4,
11, 15, 13, 20, 28, 13, 24, 3, 2, 13, 7, 20, 11, 9, 17, 13, 18, 11, 2, 13, 4, 10,
13, 15, 30, 16, 3, 13, 7, 4, 15, 2, 13, 23, 20, 22, 2, 11, 10, 2, 17, 13, 19, 8, 13,
28, 20, 11, 24, 30, 10, 2, 13, 18, 10, 17, 13, 19, 8, 13, 23, 20, 17, 13, 24, 3,
18, 24, 13, 21, 2, 10, 13, 7, 4, 24, 3, 13, 24, 3, 2, 4, 11, 13, 7, 4, 15, 17, 20,
21, 13, 16, 18, 10, 10, 20, 24, 13, 17, 4, 11, 2, 16, 24, 13, 24, 3, 2, 21, 13, 18,
10, 17, 13, 24, 3, 18, 24, 13, 10, 20, 13, 20, 10, 2, 13, 16, 18, 10, 13, 2, 22, 2,
10, 13, 3, 2, 9, 1, 13, 24, 3, 2, 21, 5, 13, 18, 10, 17, 13, 19, 2, 16, 18, 30, 15,
2, 13, 20, 28, 13, 24, 3, 4, 15, 13, 24, 3, 2, 8, 13, 7, 20, 30, 9, 17, 13, 3, 18,
22, 2, 13, 30, 15, 13, 19, 2, 9, 4, 2, 22, 2, 13, 24, 3, 18, 24, 13, 4, 24, 13, 4,
15, 13, 10, 20, 24, 13, 10, 2, 16, 2, 15, 15, 18, 11, 8, 13, 24, 20, 13, 9, 18, 19,
20, 30, 11, 13, 21, 30, 16, 3, 13, 4, 10, 13, 18, 28, 28, 18, 4, 11, 15, 0, 13, 19,
30, 24, 13, 24, 20, 13, 9, 2, 24, 13, 16, 3, 18, 10, 16, 2, 13, 23, 20, 22, 2, 11,
10, 13, 24, 3, 2, 21, 25, 13, 24, 3, 4, 15, 13, 20, 1, 4, 10, 4, 20, 10, 13, 3, 18,
15, 13, 19, 2, 2, 10, 13, 21, 20, 11, 2, 13, 16, 11, 2, 17, 4, 24, 2, 17, 13, 4, 10,
13, 20, 30, 11, 13, 24, 4, 21, 2, 15, 13, 19, 2, 16, 18, 30, 15, 2, 13, 20, 28, 13,
24, 3, 2, 13, 23, 11, 2, 18, 24, 13, 16, 3, 18, 10, 23, 2, 15, 13, 4, 10, 13, 18,
28, 28, 18, 4, 11, 15, 13, 7, 3, 4, 16, 3, 13, 3, 18, 22, 2, 13, 19, 2, 2, 10, 13,
15, 2, 2, 10, 0, 13, 18, 10, 17, 13, 21, 18, 8, 13, 15, 24, 4, 9, 9, 13, 19, 2, 13,
15, 2, 2, 10, 0, 13, 2, 22, 2, 11, 8, 13, 17, 18, 8, 25, 13, 19, 2, 8, 20, 10, 17,
13, 18, 9, 9, 13, 3, 30, 21, 18, 10, 13, 16, 20, 10, 26, 2, 16, 24, 30, 11, 2, 25,
13, 15, 20, 21, 2, 24, 4, 21, 2, 15, 13, 1, 20, 10, 17, 2, 11, 4, 10, 23, 13, 20,
22, 2, 11, 13, 24, 3, 4, 15, 0, 13, 4, 13, 18, 21, 13, 4, 10, 13, 15, 20, 21, 2, 13,
17, 2, 23, 11, 2, 2, 13, 4, 10, 16, 9, 4, 10, 2, 17, 13, 24, 20, 13, 24, 3, 2, 4, 11,
13, 20, 1, 4, 10, 4, 20, 10, 25, 13, 10, 2, 22, 2, 11, 24, 3, 2, 9, 2, 15, 15, 0, 13,
10, 20, 24, 13, 24, 20, 13, 2, 6, 24, 4, 10, 23, 30, 4, 15, 3, 13, 20, 30, 11, 13,
28, 11, 2, 2, 13, 7, 4, 9, 9, 0, 13, 4, 13, 3, 20, 9, 17, 13, 4, 24, 13, 24, 20, 13,
19, 2, 13, 24, 11, 30, 2, 13, 24, 3, 18, 24, 13, 28, 20, 11, 24, 30, 10, 2, 13, 4,
15, 13, 24, 3, 2, 13, 18, 11, 19, 4, 24, 2, 11, 13, 20, 28, 13, 20, 10, 2, 29, 3,
18, 9, 28, 13, 20, 28, 13, 20, 30, 11, 13, 18, 16, 24, 4, 20, 10, 15, 0, 13, 19,
30, 24, 13, 24, 3, 18, 24, 13, 15, 3, 2, 13, 15, 24, 4, 9, 9, 13, 9, 2, 18, 22, 2,
15, 13, 30, 15, 13, 24, 20, 13, 17, 4, 11, 2, 16, 24, 13, 24, 3, 2, 13, 20, 24, 3,
2, 11, 13, 3, 18, 9, 28, 0, 13, 20, 11, 13, 1, 2, 11, 3, 18, 1, 15, 13, 18, 13, 9, 4,
24, 24, 9, 2, 13, 9, 2, 15, 15, 25
A count of the relative frequencies of symbols (numbers), normalized to 100, is
performed and reported in Table 10.2, columns 2 and 3. Column 1 is obtained
- 10.6 -

by comparing the entries of column 3 of this Table with the entries of the third
column of Table 10.1. When the frequencies are close (possibly the same) then the
letter of column 1 in Table 10.1 is written in the first column of Table 10.2. Using
Table 10.2, it is easy to obtain, after some adjustments, a passage from Il principe
by Niccolo Machiavelli.
B - Vigenere cipher. Cryptanalysis of the Vigenere encryption scheme, when
only the cipher is known, can be performed in two steps:
1. Compute the length `V of the secret key.
2. Obtain the secret key by performing `V frequency cryptanalysis on a breakdown of `V classes, with each class containing the symbols whose positions
differ by `V .
Vigeneres encrypted messages were considered practically unbreakable until the
middle of the nineteenth century, when Frederich W. Kasiski found a method
(non exhaustive analysis) for obtaining the likely length of the secret key: this
key length is true with probability close to 1, or is deterministically exact, depending on the length of the message. Kasiskis idea is based on the observation
of repeated patterns in the encrypted message. If in the encrypted text, equal patterns (usually taken of length 2 or 3) come from the same text pattern, then they
are encrypted with the same key characters: it follows that the distance of the patterns is a multiple of `V . Clearly, in this analysis coincidental occurrences should
be disregarded, and Kasiski also proposed a method for dealing with spurious
(false) coincidences. The method is described using a simple example.
Consider the encrypted message produced with a standard Vigenere table 3131,
where the capital letters have the meaning specified by Tables 10.1 and 10.2:
xrvjy yMxmx bhpim BqrgQ QcaQk Cmcti rvmCM mwmaM Qbefg
chfcQ ivmxe xysuv esbwz mxrej teQim BmhnQ athxq iPmha epzqt
xnrss Pk
Observe that the pattern [xr] repeats at distance 66, the pattern [im] at distance
60, the pattern [Qi] at distance 23, then we have the greatest common divisors
gcd{66, 60} = 6 and gcd{66, 60, 23} = 1, thus, very probably [Qi] is a false coincidence, and `V = 6. The message is short; however, the frequency attack works;
after some attempts, we find the secret key jerome, and the message, an aphorism of Lichtenbergs:
one must judge men not by their opinions but by what their opinions have
made of them.
C - Albertis disk. The cipher-text only cryptanalysis of messages encrypted
with Albertis disk requires approaches that are completely different from standard frequency methods, when the modes of use proposed by Alberti are implemented. Referring to Albertis disk structure of Fig. 4.3, stationary and movable
- 10.7 -

E
Code freq.
a
00
5.856417935
b
01
0.915065302
c
02
1.746942850
d
03
3.161134681
e
04
8.734714250
f
05
1.622161218
g
06
1.688711422
h
07
4.999584061
i
08
5.814824058
j
09
0.116462856
k
10
0.457532651
l
11
2.753514683
m
12
1.256135097
n
13
5.606854671
o
14
6.239081607
p
15
0.873471425
q
16
0.06655020
r
17
3.660261209
s
18
5.257466101
t
19
7.528491806
u
20
2.104650195
v
21
0.856833874
w
22
1.514017137
x
23
0.124781632
y
24
3.105981199
z
25
0.058231428
blank
26
15.97728974
-
27
0.06655020
,
28
1.081440812
.
29
6.738208136
;
30
0.016637550
Table 10.1: Frequency distribution of English language alphabet

- 10.8 -

E
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
B
M
C
F
Q

Code
18
19
16
17
02
28
23
03
04
26
27
09
21
10
20
01
14
11
15
24
30
22
07
06
08
12
13
29
00
25
05

freq.
5.602923264
1.705237515
2.192448234
2.557856273
10.96224117
1.948842875
.9744214373
5.481120585
5.968331303
.1218026797
.1218026797
3.045066991
2.070645554
6.455542022
6.090133983
.8526187576
0.
4.141291108
4.628501827
7.917174178
2.436053593
1.461632156
1.096224117
.1218026797
1.096224117
0.
18.87941535
.1218026797
1.218026797
.6090133983
.1218026797

Table 10.2: Frequency of letters in the encrypted text

- 10.9 -

disks are partitioned into 24 sectors: the capital letters on the external fixed (or
Stationary) disk are in lexicographic order, thus we may consider A as initial reference letter of the list ending with four digits; the small letters on the movable
disk are arranged in random order (the letter v is excluded), and constitute the
secret key
A B
d e

C
z

D
l

E
u

F
s

G I
j m

L M
r k

N
b

O
t

P
f

Q R
g i

S
c

T
h

V
n

X
q

Z
x

1 2
a p

3
y

4
o

There are many variants or modes of using the disk; we will consider three
modes which present different levels of cryptanalysis difficulties.
Mode 1. Proposed in Albertis book:
1. the small letters in the internal movable disk are in a random secret order,
and a letter, say d, is considered as reference letter and kept secret
2. the numbers 1, 2, 3, 4 are null, and thus may be randomly inserted in the
plain text before encryption, and must be discarded after decryption
3. Capital letters inserted in the encrypted text indicate the alignment of the
letter d of the movable disk with the given Capital letter of the stationary
disk for decrypting the next group of small letters.
Cryptanalysis is made harder by the presence of genuine random symbols in the
encrypted text, that alter the statistics, further the 24 relative positions of the two
disks require (if all positions are used) a very long encrypted text in order to obtain significant estimations of the frequency of letters. For illustrative purposes,
we assume that only three different positions of the movable disk are used; further assuming that the three capital letters in the encrypted text are A, Q, Z, an
encrypted message look like
Ay1 y2 yb1 Zz1 z2 zb2 Ayb1 +1 y2 yb1 +b3 Qx1 x2 xb4
From the structure of the encrypted message, it is evident that frequency cryptanalysis is possible, although the presence of random symbols (whose frequencies are unknown) makes identification of the characters more difficult. The decryption of short texts cannot be performed in any systematic way; it requires
patient trials and a good deal of luck.
Mode 2. Proposed in Albertis book and considered by Alberti himself the hardest to break:
1. the small letters in the internal movable disk are in a random secret order,
and a letter, say d, is considered as reference letter; it is kept secret, and
initially is juxtaposed to the index letter A.
2. the numbers 1, 2, 3, 4 are encrypted in small letters; when decrypted they
will indicate a change of alphabet (the change of alphabet is not known
to the attacker, and further could be truly random); their encrypted value
indicates the capital letter to be aligned with d.
- 10.10 -

Initial encryption position of the disk


ABCDE F GILM N OP QR ST V XZ 1 2 3 4 Stationary disk
d b a c e g k l n p r t u z & x y s o m q i h f Movable disk

(10.1)

Cryptanalysis of Mode 2 is very difficult and requires many guesses that are not
driven by general rules. The attack methods must be ad-hoc and depend closely
on the message.
Mode 3. Albertis disk is used in a way similar to Vigenere enciphering:
1. the small letters on the internal movable disk are in a random secret order,
and a letter, say d, is considered as reference letter and kept secret
2. the numbers 1, 2, 3, 4 are null, thus may be randomly inserted in the plain
text before encryption, and must be discarded after decryption
3. After h letters are encrypted the movable disk is rotated by k positions, then
again h letters are encrypted. Both h and k are kept secret.
Due to the crude scheme, once h is known, frequency cryptanalysis is possible considering blocks of h symbols at a distance of 24 blocks (i.e. blocks that
are encrypted with the same Vigenere key) as having been encrypted with the
same substitution rule. This unknown value of h can be found either by exhaustive search (attack), since reasonable values of h are restricted to be less than
15,utilizing frequency analysis, or by some variant of Kasiskis technique.
Once h is known, the shift value k is obtained by comparing the substitution rules
of consecutive blocks of h symbols.
Also in this case,the amount of encrypted text necessary (and frequently sufficient) for a systematic frequency attack is around 1000 characters.
In the following we give an example of cryptanalysis of a text encrypted with
a further (simplified) variant of Mode 3. The attack technique is derived from
methods for breaking Vigeneres scheme. The cryptanalysis exploits some weaknesses in the text, and proceeds more or less fortuitously.
Consider the encrypted message reported in Table 10.3 which has been produced
using an Albertis disk with 26 letters which are circularly ordered as follows
ABCDE F GHI 1 KLM N O P QRST U 2W XY Z Stationary disk
rklnp tuzwx ysomq ihf jv dbaceg
Moving disk
(10.2)
The alphabet has been adapted to the English language, thus it is slightly different from that reported in equation (10.1). In particular, the plain text has been
prepared for encryption, making the following modifications:
1) i and j have been converted to a single letter I,
2) u, v have been converted to a single letter V .
- 10.11 -

cwawsstwfjvnpjlfwkp
bkiodalwxvohbsotrlc
ciqjpvzrvaprufppnvq
blrokiovokkogjwrwbh
cmnpcspvvpfwmvzpoqd
bwxvoshrjwkkiodadob
cvqtafwvwmuwawssiqj
bhkhabkiokpashrjram
cvzptqfodsrrjwawjzt
bagotwdcvohltkwcarh
cmuvzpwmnpcspvvpfvq
bywchkwvxphkiwvvaki
cpfjorssspvvpfjlqff
borcabshbqkakioywch
cvrsspvvpfjrkqdpvzpo
Table 10.3: Message encrypted using Albertis disk: Mode 3
3) The numbers 1, 2 have been randomly inserted in the text (in the plain
text alphabet they occupy the positions of the suppressed letters).
4) All blanks have been suppressed
5) All punctuation marks have been deleted.
6) In the initial position of the movable disk the letter r is aligned with A of
the stationary disk.
7) After performing the first 19 letter encryptions, the movable disk is rotated
counterclockwise byf 8 positions.
8) After every 19 letter encryptions, the disk is rotated back or forth by 8 positions, therefore encryption of 19 consecutive letters is performed referring
to two states of the disk. These states are identified as State 1 and State 2.
In spite of the oversimplification compared to Albertis original indications, the
text is too short to permit an immediate frequency attack. Therefore we have
looked for patterns of three letters that likely are present in original plain text.
The article THE is certainly a highly probable pattern in the set of 263 = 17, 576
possible patterns of length three.
Scanning the text reported in Figure 10.3, we find that the pattern kio occurs 5
times, and the pattern vzp occurs 4 times. Due to these occurrences much higher
than 1, a reasonable guess is that both patterns are encryptions of THE, clearly
referred to State 2 and State 1. In this case, the block length (a state duration) is a
number between 16 and 20. It follows that 5 cases should be considered; however,
we will only consider the correct one, namely the duration 19.
- 10.12 -

State 1 5,0,10,3,0
a,b,c,d,e,
r,k,l,n,p,
a,b,c,d,e,
State 2 7,6,4,6,0
a,b,c,d,e,
w,x,y,s,o,
a,b,c,d,e,

12,0,0,2,9
f,g,h,i,j,
t,u,z,w,x,
f,g,h,i,j,
5,2,6,6,5
f,g,h,i,j,
m,q,i,h,f,
f,g,h,i,j,

2,2,5,4,3 22,9,7,12,4
k,l,m,n,o,
p,q,r,s,t,
y,s,o,m,q,
i,h,f,j,v,
k,l,m,n,o,
p,q,r,s,t,
10,2,5,1,12 10,7,8,8,3
k,l,m,n,o,
p,q,r,s,t,
j,v,d,b,a,
c,e,g,r,k,
k,l,m,n,o,
p,q,r,s,t,

3,20,12,0,0,6
u,v,w,x,y,z
d,b,a,c,e,g
u,v,w,x,y,z
1,14,11,1,1,1
u,v,w,x,y,z
l,n,p,t,u,z
u,v,w,x,y,z

Table 10.4: Albertis disk encryption: Frequency analysis


cwawsstwfjTnEjlfwkE
bTHEdalwxvEhbsEtrlc
ciqjpTHrvaprufppnTq
blrETHEvETTEgjwrwbh
cmnpcsETTEfwmTHEoqd
bwxvEshrjwTTHEdadEb
cTqtafwTwmuwawssiqj
bhThabTHETpashrjram
cTHEtqfodsrrjwawjHt
bagEtwdcvEhltTwcarh
cmuTHEwmnEcsETTEfTq
bywchTwvxphTHwvvaTH
cpfjorsssETTEfjlqff
bErcabshbqTaTHEywch
cTrssETTEfjrkqdpTHEo
Table 10.5: Albertis disk encryption: Semi-decrypted text
At this stage, frequency analyses are performed independently on the sequences
of States 1 and 2, and the results are reported in Table 10.4.
This Table shows only partial agreement of the previous guess with the frequency
analysis; in particular, the most frequent plain letter pertaining State 2 is L,
which is clearly an anomaly with respect to the standard distribution in the English texts.
Performing the substitutions
vT zH pE
kT iH oE

in State 1
in State 2

we get the semi-decrypted text shown in Table 10.5.


The occurrence of several pattern ET T E suggests that the possible word is LETTER, thus fixing two further letters L and R. Completing the analysis with
the help of the frequency table to allocate the remaining letters, we get the plain
text shown in Table 10.6.
- 10.13 -

1,I, W,I,L,L, F,I,R,S,T, D,E,S,C,R,I,B,E,


2,T,H,E, M,O,V,A,B,L,E, I,N,D,E,X, S,U,P,
1,P,O,S,E, T,H,A,T, W,E, A,G,R,E,E,D, T,O,
2, U,S,E, T,H,E, L,E,T,T,E,R, K, A,S, A,N, I,
1,N,D,E,X, L,E,T,T,E,R, I,N, T,H,E, M,O,V,
2,A,B,L,E, D,I,S,K, A,T, T,H,E, M,O,M,E,N,
1,T, O,F, W,R,I,T,I,N,G, I, W,I,L,L, P,O,S,
2,I,T,I,O,N, T,H,E, T,W,O, D,I,S,K,S, O,F,
1, T,H,E, F,O,R,M,U,L,A, A,S, I, W,I,S,H, F,
2,O,R, E,X,A,M,P,L,E, J,U,X,T,A,P,O,S,I,
1,N,G, T,H,E, I,N,D,E,X, L,E,T,T,E,R, T,O,
2, C,A,P,I,T,A,L, B, W,I,T,H, A,L,L, O,T,H,
1,E,R, S,M,A,L,L, L,E,T,T,E,R,S, C,O,R,R,
2,E,S,P,O,N,D,I,N,G, T,O, T,H,E, C,A,P,I,
1,T,A,L, L,E,T,T,E,R,S, A,B,O,V,E, T,H,E,M
Table 10.6: Albertis disk encryption: Text decrypted with cryptanalysis
Inserting blanks, punctuation marks, and the right value of the letters that were
merged at the encryption stage, we get the following text, which is the beginning
of Albertis description of Mode 1:
I will first describe the movable index. Suppose that we agreed to use the
letter k as an index letter in the movable disk. At the moment of writing I
will position the two disks of the formula as I wish, for example juxtaposing
the index letter to capital B, with all other small letters corresponding to the
capital letters above them.
D - Hill cipher. Cryptanalysis of Hill encryption schemes, when only the cipher is known, can be performed similarly to Caeser cipher cryptanalysis. Since
a clever cryptanalytical attack is described in Sinkovs book [74], it will be quoted
more or less literally.
Consider the enciphering scheme with 2 2 matrices over a field Fp (in the numerical examples p = 31), in this case encryption is described by the equation


en
en+1


=

a b
c d



xn
xn+1

where every pair (en , en+1 ) of encrypted symbols depends on a single pair (xn , xn+1 )
of plain text symbols, therefore frequency attacks are possible on pairs of symbols, even if the Hill transformation produces a kind of equalization of probabilities. In terms of probabilities we have
p{en = , en+1 = } = p{axn + bxn+1 = , cxn + dxn+1 = }
- 10.14 -

which, rewritten in the form


p{en = , en+1 = } = p{xn =

d b
c +a
, xn+1 =
}
ad bc
ad bc

shows that the probabilities (hence the frequencies) of pairs are maintained after
encryption. A frequencies table is constructed containing the number of occurrences of each pair, then the two most frequent pairs, say (y1m , y2m ) and (y3m , y4m ),
are assumed to come from the two most frequent pairs of letters, say (x1 , x2 ) and
(x3 , x4 ), in the language (supposing that the language of the plain text is known).
A system of four linear equations in Fp is then written

ax1 + bx2 = y1m

cx1 + dx2 = y2m


(10.3)
ax3 + bx4 = y3m

cx3 + dx4 = y4m


and solved for a, b, c, d. The choice of x3 and x4 may also be based on guess work
concerning the contiguous pair to (y1m , y2m ): for example, if the most frequent
pair corresponds to T H, it is likely that the following adjacent pair would be
E 0 blank 0 , therefore the system (10.3) is modified accordingly. Once the Hill matrix
is known, decryption is straightforward.
It is observed that system (10.3) admits a special invariant
(x1 x4 x2 x3 )(ad bc) = y1m y4m y2m y3m
which indicates that unimodular matrices, i.e. matrices with ad bc = 1, should
not be used as Hill matrices, because exhaustive attacks would be greatly facilitated, for two reasons:
1. The number of modular nonsingular matrices is (p2 1)(p2 p), while
the number of unimodular matrices is (p2 1)p.
2. The above invariant, given x1 , x2 , excludes a large number of pairs x3 , x4 .
These remarks indicate that one way to strengthen Hill encryption is to choose
matrices of dimension greater than 2 which make frequency attacks very hard,
because the probabilities of patterns formed by three or more letters are very
small and practically equal. Second, the number of modular matrices prevents
exhaustive attacks. For example, if the dimension is 5 and p is 31, the number of
Hill matrices is
NH = (p5 1)(p5 p)(p5 p2 )(p5 p3 )(p5 p4 )
,
= 18592367211458725020090947951616000000 = 1.86 1037
a number that makes exhaustive search unfeasible.
Consider the encrypted text using a Hill cipher with a 2 2 unimodular matrix
- 10.15 -

17, 8, 5, 19, 29, 23, 24, 16, 3, 28, 19, 2, 12, 13, 0, 17, 14, 16, 20,
21, 26, 17, 1, 19, 15, 3, 16, 27, 25, 22, 6, 2, 16, 30, 8, 3, 0, 17,
2, 20, 14, 20, 20, 19, 4, 15, 16, 13, 24, 16, 16, 18, 8, 25, 29, 23, 18,
14, 25, 22, 6, 2, 16, 30, 8, 3, 0, 17, 24, 28, 0, 15, 14, 9, 17, 5, 8, 27
The message is too short to effectively exploit the laws of large numbers. However, there is a pattern that occurs three times, that is (0, 17), six patterns that
occur twice, and also the pattern (8, 3, 0, 17) occurs twice. Looking at the most
probable pattern (0, 17) may correspond to (t 0 blank 0 ), or (e 0 blank 0 ), thus (8, 3)
may be (e n) or (n e) in the first instance, and (n t) in the second case.
We have the systems

19 a + 26 b = 0

19 c + 26 d = 17
4 a + 13 b = 8

4 c + 13 d = 3
Let us assume ad bc = 1. From the first and third equations we find a = 7
and b = 8, and from the second and fourth equations we find c = 1 and d = 19,
thus we check that these values yield a modular Hill matrix by verifying that
ad bc = 1: actually, we have 7 19 1 8 = 1 mod 31. With the corresponding
Hill matrix


7 8
1 19
the following message is easily obtained
life is the art of drawing sufficient conclusions from insufficient premises.
The method works plainly, requiring few or no guesses, if the encrypted text is
sufficiently long: in our numerical case it contains some 1000 characters.

E - Bacon cipher. Cryptanalysis of messages encrypted with the Bacon cipher,


when only the cipher text is known, has very different difficulties depending on
whether the alphabet partition is known or not (since Bacon encryption is a kind
of steganography, it must be known, or suspected, that the text in hand actually
is an encrypted text).
If the alphabet partition into two sets B0 and B1 (of 13 letters) is known, cryptanalysis is trivial since each letter is substituted with 0 or 1, thus blocks of five
bits are interpreted as letters, obtaining the plain text.
If the alphabet partition is not known, cryptanalysis is very hard and may require a large amount of text when the binary encoding of the plain text letters is
not known. In our example, we make the simplifying assumption that we know
the binary encoding reported in the following table:
- 10.16 -

a=(0,0,0,0,0),b=(0,0,0,0,1),c=(0,0,0,1,0),d=(0,0,0,1,1),e=(0,0,1,0,0),
f=(0,0,1,0,1),g=(0,0,1,1,0),h=(0,0,1,1,1),i=(0,1,0,0,0),j=(0,1,0,0,1),
k=(0,1,0,1,0),l=(0,1,0,1,1),m=(0,1,1,0,0),n=(0,1,1,0,1),o=(0,1,1,1,0),
p=(0,1,1,1,1),q=(1,0,0,0,0),r=(1,0,0,0,1),s=(1,0,0,1,0),t=(1,0,0,1,1),
u=(1,0,1,0,0),v=(1,0,1,0,1),w=(1,0,1,1,0),x=(1,0,1,1,1),y=(1,1,0,0,0), z=(1,1,0,0,1)
blank=(1,1,0,1,0),,=(1,1,0,1,1), .=(1,1,1,0,0),:=(1,1,1,0,1),?=(1,1,1,1,0)
Consider the encrypted message reported in Table 10.7, a cryptanalytic attack
may be based on guessing highly probable patterns, for instance suppose that
the article the occurs in the plain text, then blocks of 15 cipher text letters are sequentially analyzed considering a sliding window of 15 characters and checking
for compatibility with the pattern
10011, 00111, 00100
the compatibility of the partition is checked, supposing that 1 is associated to letters in B1 and 0 is associated to letters of B0 . If there is a compatible solution then
some letters are definitively allocated in B1 and B0 .
The same procedure is repeated for other highly probable patterns, until the partition is practically found.
For example, consider the first 15 letters
qvszq amif g xasgd
10011 0 0111 00100
it is evident that it cannot be the pattern the because the letter s in the first block
of five letters should be 0, while in the third block it should be 1. Then we slide
the window by five characters
amif g xasgd qklab
1 0011 00111 00100
again the letter a should be 1 in the first group of five letters, and it should be 0
in the second group of five letters. Continuining this way, we arrive at a pattern
starting with the letter in position 156
aqhuv mmklp iovdi
10011 0 0111 00100
in which no incompatibility occurs, then we may reasonably assume that such a
pattern corresponds to the article the, and identify some characters of B0 and B1 ,
that is
q, h, m, i, o, d B0 , a, u, v, k, l, p B1
Sliding the window up to position 766, another compatible pattern is found
reqxc zmlgj hichd
10011 0 0111 00100
- 10.17 -

qvszqamifgxasgdqklabpmpdhmbvcytqvvl
nwwfcjxonwjsbjloaunmvrsumbiimceijqb
cndxqkwbyftepzbhcrmtqqrzztvloomzhdp
hbcztjebsjitgboeqdlkachaslyeuuhwcjx
thmqenbenvnuwntaqhuvmmklpiovdirqyou
dhnohcnejmslzidfdqaivkhphoxgocicuay
cztlpiipalskqbebpazammfrhvgyrtdllwq
dgnfzkzeelssjiqvxogbdwmpfwpdmihikqj
ysvtxeryqbtboabgsvsehpianlzhpupjyct
fyecxylxrbfvdneubbxfzztzbhleaeqtfso
gnbpsmxohmzurtkxnwuhbhaakdhzttinnwn
qzexxfnyxkrasnstnuweikucslwzwkzwrzd
agmxqocrrnhspeikimermvtsbofynpbavxz
rqamhktbaypamlyuioakwjjjdnuwuzssykb
wkxpwhacoksohjautvitmosadreexlkahcx
llmudsakciasdwjrpscdbfphwmrvadvbeix
mdvzealhlortktmoaltkityvozorzynwzwa
ntzccmeooqhfmqybnkyvfjzvdynzomsuutl
almgtmchosuimjvuimpsavtkdpiyktlolhi
zohcwqowaweylstrzwrouzovigjmkcnvhuy
ldwfldscnghdwbwojxoxanoxeahokkyxrvs
xcmpyrhmggsihhsyjyximqrohcnhubreqxc
zmlgjhichdurqgzmnblkmmpqzdqzhowmixx
xnzneopqmhoulhkakbgsktinueykaasbgiz
jntfdtahshqxlwvjmzrfchowlzlfvttqbpf
pmazomtbkeummxcmxhwqmcagozxnzgnvtrb
zpgcymtfhlacigemmdyekrmkzoxcsfzmnbz
rsulhvfyxequxrbnotbcimbpkhtjoecombx
nxzgbhkcvqehcipavdndciwgnyiacrqlsdo
olxyazykfdahwlyffpmm
Table 10.7: Encrypted message using Bacons cipher

- 10.18 -

therefore, the numbers of entries in B0 and B1 may be increased to


q, h, m, i, o, d, e, q, z B0 ,

a, u, v, k, l, p, r, x, c, g, j B1

With this information, the encrypted text may be modified, substituting their binary values for the letters, obtaining the following encoded/encrypted message
01s00100f111s100111b101000b11yt0111
nwwf1110nw1sb11011n011s10b00010010b
1n0101wbyft010b0110t00100t110000001
0b10t10bs10t1b0000111101s1y0110w111
t0000nb0n1n1wnt10011001110010010y01
00n001n010s1000f001011010011010111y
10t1100111s10b0b110100f1011y1t011w0
01nf010001ss1001101b0w01fw100000101
ys1t101y0btb01b1s1s00101n1001111y1t
fy011y111bf10n01bb1f00t0b010100tfs0
1nb1s01000011t11nw10b0111000tt0nnwn
00011fny1111snstn1w00111s1w0w10w100
110100111n0s1001000101tsb0fyn1b1110
101001tb1y1101y10011w1110n1w10ssy1b
w111w01101s00111t10t00s101001111011
11010s11101s0w111s10bf10w011101b001
00100110101t1t0011t10ty100010ynw0w1
nt011000000f00ybn1y1f1010yn000s11t1
1101t0100s100111001s11t1010y1t10100
0001w00w1w0y1st10w101001011011n101y
10wf10s1n100wbw011011n01010011y111s
1101y10011s000sy1y10001001n01b10011
0011100100110100nb110010000000w0011
1n0n0010000110111b1s1t0n10y111sb100
1ntf0t10s0011w11001f100w101f1tt0b1f
101000tb1010011010w00111001n01n1t1b
0111y0tf0111010000y011010011sf00nb0
1s1101fy100111bn0tb100b110t100100b1
n101b0111000101110n010w1ny011101s00
011y10y1f010w1yff100
Many patterns of 5 bits may be already converted into letters, the remaining can
be converted by guess work and compatibility, since the message has a semantic
meaning. With very few attempts, it is possible to arrive at the following sentence
of Macchiavellis:
it ought to be remembered that there is nothing more difficult to take in hand,
more perilous to conduct, or more uncertain in its success, than to take the
lead in the introduction of a new order of things.
- 10.19 -

10.4

DES Cryptanalysis

DES cryptanalysis began as soon as the NIST (formerly the NBS) published the
algorithm which was designed by a team at IBM, who also used their secret (private) information about cryptanalysis. In part, this was re-discovered and published many years later. Differential cryptanalysis and linear cryptanalysis, when
the the two methods became publicly known, proved that DES could not be broken by these techniques. The explanation concerning differential cryptanalysis
was given by Don Coppersmith, who stated that such cryptanalysis methods
were known at IBM and used in DES design. Coppersmith also indicated the
way to prove that DES is not, in a technical sense explained below, a group. A
proof that DES cannot be broken by means of attacks based on group properties
was published by Campbell and Wiener in 1992 [9].
56
The set of messages is a vector space F64
2 , and the set of keys is F2 , therefore,
56
there are 2 DES transformations, each identified (parametrized) by a key k; we
denote this dependency as DES[k]. A sequence of transformations on the same
plain message X is written as
DES[kn ](DES[kn1 ](. . . (DES[k1 ](X) . . .) .
If DES is a group for this composition law, then the sequence of n transformations
is equivalent to a single transformation, say DES[ka(n) ]. In this case, an attack
a la Shanks can be mounted, and has complexity 228 (Kaliski, Rivest, and
Sherman). Assuming plain text attack, a pair of plain text X and encrypted text
Y is assumed to be known, and the problem is to find the secret key k such that
Y = DES[k](X).
Attack a la Shanks
If DES is a group G, its order is 256 , then it is a 2-group; it follows that
G has a subgroup H of order 228 . The set T of right coset leaders of H
in G contains 228 elements.
The unknown DES[k] may be written as a composition of two elements
DES[k] = DES[h]DES[t]

DES[h] H, DES[t] T ,

then applying DES[h]1 to both sides of equation Y = DES[k](X),


we obtain the identity
DES[t](Y) = DES[h]1 (X)
which is checked as follows:
1) A table with two columns is pre-computed with DES[h] H
in the first column and DES[h]1 (X) in the second column; the
table is then sorted according to the second column.
- 10.20 -

2) A loop is started, computing rr = DES[t](Y) for DES[t] running


sequentially in T; then rr is searched for in the second column of
the Table; the complexity of this search is O(ln(228 ). When rr is
found the loop stops, and the entry in the corresponding position
of the first column is DES[h]; then the DES secret key is obtained
from DES[h](DES[t]).
Fortunately for DES, this attack cannot be mounted because the set of DES transformations is not a group, when used to encrypt a fixed message, and keys range
through the whole set of 256 keys, as we will see as a consequence of a result of
Coppersmiths.
Since DES transformations are invertible, they induce a permutation among the
264 messages. Therefore DES transformations are faithful, that is, if k1 6= k2 , then
DES[k1 ](X) 6= DES[k2 ](X)

X.

Further, DES transformations generate a subgroup of the symmetric group S264 ,


which is actually a subgroup of the alternating group A264 , since the permutations
(induced by DES on the set of 264 binary vectors) are even.
In fact, DES is not a group, because Coppersmith found that the permutations
induced over the set of 264 messages contain short cycles (of size about 232 ) [16].
The proof is by computer. In particular, he considered the composition of two
transformations DES[e1 ](DES[e0 ]), where e1 and e0 are the all 1s and the all 0s
sequences of 56 bits, respectively. Coppersmith found the cycle lengths for 33
messages, which proved that the size of the permutation group generated by DES
transformations is at least 2277 . In 1992, Campbell and Wiener, following Coppersmith, proved that DES does not generate permutation groups of sizes that may
be attacked, [9]. They found the cycle lengths for 295 additional messages that,
combined with Coppersmiths, yield a lower bound on the size of the subgroup
generated by the DES permutations of 1.94 102499 . This is too large for potential
attacks on DES, which would exploit a small subgroup.
In summary, Coppersmiths results show that DES is not a group, for otherwise it
would have order 256 ; more precisely, the set {DES[k]} (all possible k keys) under functional composition is not a group. A further consequence is that multiple
encryption modes, such as Triple DES, increase the security. Further, Campbell
and Wiener proved that {DES[k]} does not generate a group of attackable order. These results definitively established that the complexity of breaking DES,
exploiting every possible symmetry, is O(250 ) as detailed in the following section.
Fast Attacks. Three attacks, of a theoretical nature and unfeasible to mount in
practice, are known that can break DES with less complexity than a brute-force
search, namely, differential cryptanalysis, linear cryptanalysis, and Davies attack.
Differential cryptanalysis, rediscovered in the late 1980s by Eli Biham and Adi
Shamir, was known earlier to both IBM and the NSA, and kept secret. To
- 10.21 -

break DES, differential cryptanalysis requires 249 chosen plain texts. As revealed by Coppersmith, one of the team that participated in the DES project,
DES was designed to be resistant to DC.
Linear cryptanalysis was discovered by Mitsuru Matsui, and needs 243 known
plain texts (Matsui, 1993); the method was implemented (Matsui, 1994), and
was the first experimental cryptanalysis of DES to be reported. There is no
evidence that DES was tailored to be resistant to this type of attack.
Davies attack: while linear and differential cryptanalysis are general techniques
and can be applied to a number of schemes, Davies attack is a specialized
technique for DES, first suggested by Donald Davies in the 1980s, and improved by Biham and Biryukov (1997). The most powerful form of the attack requires 250 known plain texts, has a computational complexity of 250 ,
and has a 51% success rate.
Cryptanalytic weaknesses Let en denote the all 1s vector of length n. DES exhibits the complementation property, that is
DES[e56 + k](e64 + X] = e64 + DES[k](X) ,
where the additions are performed modulo 2.
The complementation property means that the work for a brute force attack could
be reduced by a factor of 2 (or a single bit) under a chosen-plain-text assumption.
By definition, this property also applies to the triple DES cipher.
DES has four so-called weak keys. Encryption and decryption of any message
under a weak key k have the same effect (i.e. DES transformation is involutory):
DES[k](DES[k](X)) = X .
DES has also six pairs of semi-weak keys. Encryption with one of the pair of
semi-weak keys operates identically to decryption with the other (this property
would hold for every key, if DES were a group):
DES[k1 ](DES[k2 ](X)) = X .
It is easy enough to avoid the weak and semi-weak keys in an implementation,
by testing for them explicitly. The keys are not in any case really any weaker than
any other keys, as they do not give an attack any advantage.
It is known that the maximum cryptographic security of DES is limited to about
64 bits, even when independently choosing all round subkeys instead of deriving
them from a key, which would otherwise permit a security of 768 = 28 3 bits.
- 10.22 -

10.5

Cryptanalysis of Public Key Systems

The cryptanalysis of public key cryptosystems has a strictly algebraic basis, and
uses methods proper to number theory, thus the approach is more formal. Nevertheless, the lack of an axiomatic measure of complexity means that definitive
conclusions cannot be drawn about the strength of the algorithms. The following
considerations concern only algorithms based on factoring difficulties, and the
discrete logarithm in finite groups. The problem of searching unsorted sets will
not be considered, mainly because the algorithms based on this problem are not
sufficiently mature, and have not a large number of applications.
The strength of the RSA and Rabin public key algorithms depends on the difficulty of factoring the modulus N , N being the product of two primes. In particular, it is known that:
- To break the Rabin scheme is equivalent to factoring N = pq.
- It is an open question whether to break the RSA scheme is equivalent to
factoring N . Certainly, if we factor N , we break RSA; the converse has not
yet been proved.
To set up an RSA or a Rabin system, i.e. to generate the modulo N , naturally
involves some probabilistic operations, therefore the complexity, strictly speaking, is not deterministic polynomial. The choice of the modulo N = pq must
be random, therefore we must generate two random prime numbers p and q.
1
by
The probability that a random number with n digits is prime is n ln1 10 = 2.3n
the prime number distribution [36]. This means that, with a reasonable number
of attempts, a prime number can be found; obviously, every generated random
number must be tested. The probabilistic primality tests are very efficient and
practical. Nevertheless, from a theoretical point of view, deterministically testing
whether a number is prime is a problem of polynomial complexity with respect
to the number of its digits, a result proved by three Indian researchers, Manindra
Agarwal, Neeraj Kayal, and Nitin Saxena [1].
The strength of the ElGamal and Diffie-Hellman public-key algorithms depends
on the difficulty of computing the discrete logarithm in a finite group, see Chapter
8. The final issue is the computation of the discrete logarithm in a cyclic group of
prime order, and little progress has been made in this direction since the problem
was first formulated.

10.5.1

Factorization

The status of cryptanalysis of the Rabin and RSA algorithms is the same as that
of integer factorization. Based on what is publicly known today, it is possible to
attack numbers of 160 digits, which are the product of two primes of about 80
digits each. Obviously, it is easier to factor numbers of the same size that have
more than two prime factors.
- 10.23 -

The status of the factorization of N = pq, product of two prime numbers, may be
summarized as follows:
It is known that to factor N is equivalent to computing the Euler totient
function (N ), that is computing (N ) = (p 1)(q 1);
It is known that to factor N is equivalent to solving the modular equation
x2 = b mod N , i.e. finding the four roots provided they exist.
The only known deterministic factorization
algorithm

of N is the exhaustive
search, which has complexity O( N ). Therefore N is an upper bound to
the factorization complexity of N .
Lenstras factorization of N = pq is a probabilistic
algorithm, based on ellipp
tic curves, having complexity O(exp( ln N ln(ln N )) , i.e. sub-exponential
complexity.
This factorization method of Lenstras considers point groups of elliptic
curves E[ZN ] over the ring ZN which contains zero-divisors. Since N is
the product of two primes, an elliptic curve E[ZN ] is a direct sum (by the
Chinese remainder theorem) of elliptic curves over fields
E[ZN ] = E[Zp ] E[Zq ] ,
that is, every point PN (x, y) E[ZN ] can be written as
PN (x, y) = P (x1 , y1 ) P (x2 , y2 )

P (x1 , y1 ) E[Zp ], P (x2 , y2 ) E[Zq ] ,

where x = 1 x1 + 2 x2 and y = 1 y1 + 2 y2 , with 1 and 2 the interpolation coefficients of the Chinese remainder theorem. The order |E[ZN ]| is the
product of the orders |E[Zp ]| and |E[Zq ]|. Therefore, we have
AN = |E[ZN ]| = (p + 1 ap )(q + 1 aq )
= N + 1 + p + q ap (q + 1) aq (p + 1) + ap aq ,
where the defeat numbers (that is ap and aq ) satisfy Hasses bound |ap |

2 p and |aq | 2 q. Further, assuming that p and q are of the same order

O( N ), N can be bounded as
3

N (4N 4 + 6N 2 + 4N 4 ) < AN < N + (4N 4 + 6N 2 + 4N 4 ) .

(10.4)

A probabilistic factorization algorithm of N makes use of randomly selected


elliptic curves (EC).
Factorization procedure with EC
EC selection: choose a point G = (u, v) and a random polynomial defining
the elliptic curve x3 + Ax + B, where A is randomly selected in ZN
and B is computed by imposing that the point G stay on the curve, as
B = v 2 (u3 + Au).
- 10.24 -

Initial test: evaluate the discriminant = 4A3 + 27B 2 of EC, and check
whether g = gcd{N, } = 1. If not, and if mod N 6= 0, then N is
factored. This lucky event is rare.
3

If g = 1, choose n such that n! < (4N 4 +6N 2 +4N 4 ), that is, take n

ln N
ln(ln N )

Iterations: for k from 2 to n, compute iteratively Pk = kPk1 starting with


P1 = G:
if during the iterative process a division by 0 occurs, then N is factored,
otherwise choose another random elliptic curve and repeat the steps.

Making the heuristic assumption that randomly chosen elliptic curves have
3
order uniformly distributed in the interval (10.4) of 8N 4 extension, it is possible to estimate the probability of a curve with points of small order existing, and the probability of meeting such a point.

Example. Consider N = 77. The factorization is obviously trivial, but is significant to explain how the method works, and the issues encountered.
Take G = (1, 1) and the elliptic curve of equation y 2 = x3 3x + 3, which contains
G, and has discriminant = 135 = 19 mod 77 relatively prime with 77.
Since the prime factors of N are small, the points on the two curves E[F7 ] and
E[F11 ] may be easily listed, see table.

y 2 = x3 3x + 3
E[F7 ]
E[F11 ]
x y
x
y
0 0
5,6
1 1,6 1 1,10
2 2
4,7
3
3 0
4 4
0
5,6
5 1,6 5
6 6
5,6
7
8
9 1,10
10 4,7

A direct count shows that |E[F7 ]| = 6 and |E[F11 ]| = 16, thus |E[F77 ]| = 96 = 24 3
which is the product of powers of small primes.
The sequences of points nG is reported in the following table
- 10.25 -

y
n
2
3
4
5
6
-

E[F77 ]
= x 3x + 3
y 2 = x3 + 21x + 3
n[1,1]
n
n[1,5]
[75,76]
2
[13,74]
[10,70]
3
[72,68]
[5,71]
4
[18,58]
[50,27]
5
[52,67]
obst.
6
[47, 4]
7
[54,38]
8
[66,17]
9
[11,5]
10
[65,72]
11
[69,73]
12
[8,65]
13
obst.
3

If we try to compute 6(1, 1), as (1, 1) + 5(1, 1), we get an obstruction, thus we find
a factor of 77 as gcd{77, 50 1} = 7. A favorable obstruction occurs with the
computation of nG, when the point G has order a divisor of n modulo p and not
modulo q. When n is also a divisor of the period of G modulo q, the situation
is unproductive and n is not split: it is this asymmetrical behavior that allows
factoring.
The curve y 2 = x3 3x + 3 is lucky because its order contains powers of small
prime factors. The curve y 2 (x3 +21x+3) has order N = 221 = 1317, thus it splits
77 when we attempt to compute 13[1, 5] = [1, 5] + 12[1, 5] = [1, 5] + [8, 65], since 13
is the minimum order of any cyclic subgroup; actually, we have gcd{77, 81} = 7.
When this situation occurs in practice, the curve does not reduce the factorization
complexity with respect to Shanks method.

10.5.2

Discrete logarithms

The complexity of the discrete logarithm problem depends on the group element
representation and on the prime factor decomposition of the group order.
Let CN be a cyclic group of order N and generator g. The discrete logarithm is an
isomorphism from CN into the additive group ZN .
Representation of group elements. Since all cyclic groups of the same order are
isomorphic, the choice of the representation is crucial to obtain sets where the discrete logarithm computation has great complexity (i.e. exponential complexity).
The following properties should be considered in this endeavor:
If CN can be identified with ZN , the discrete logarithm computation is trivial.
The complexity of discrete logarithm computation in multiplicative groups
C2m 1 of finite fields F2m is certainly less than the complexity of discrete loga- 10.26 -

rithm computation in multiplicative groups of finite fields Fp of comparable


order (Coppersmiths result).
The complexity of discrete logarithm computation (deterministic or probabilistic) in the point group of an elliptic curve E[Fp ], if the curve is properly
chosen, is greater than the complexity of the computation in a finite field Fq
of comparable order.
Order of the group. Several conditions have
Q been found on the decomposition
into prime factors of the group order M = si=1 pi i . Recalling that a cyclic group
of order M can be decomposed into a direct product of s cyclic subgroup Cpi i , we
summarize the following facts:
The complexity of the discrete logarithm in CM is upper bounded by

M.

The complexity of the discrete logarithm in CM is not greater than the sum
of the complexities of the discrete logarithms in each subgroup Cpi i (PohligHellmans theorem).
The complexity of the discrete logarithm is maximal when the group order
M is prime.

- 10.27 -

Chapter 11
Cryptography in GSM
Guglielmo Marconi. 1874 - 1937, Italian
physicist, who developed radiotelegraphy and
succeeded in transmitting signals across the
Atlantic (1901): Nobel prize for physics 1909.

Collins; English Dictionary

Radio-mobile systems, or modern wireless systems, include any sort of cellular


phone system, any radio-communication system closely connected with mobile
objects (that is aeroplanes, cars, and ships), and all kinds of sensor networks, even
if strictly speaking the sensors are not always movable. It is a matter of history
that wireless mobile communications was very useful already in Marconis times,
especially for maritime navigation. Later, it became indispensable for air navigation. A first extensive application of wireless communications was in the modern
army. Starting from World War I, wireless communications became determinant
in World War II, with mechanized or air transported troops in the new tactical
moving battles.
However, the widespread use in civil life had to wait for technological progress
in electronics, which reduced costs. Today, large scale production of sophisticated communications equipment makes for very low costs, and consequently
the enormous distribution of i-phones, smart-phones and a myriad other gadgets
for the home.
When the principal problem was to establish radio-connections, security issues
and cryptographic problems were not the main aim. However, cryptography
and coordinated coding, requiring international standard institutions to be set up,
emerged almost immediately because of the need to distinguish different users,
avoid collisions or misunderstandings, and to protect sensitive or vital information.
Today, the success of the clever use of cryptography is at the base of the global
deployment of cellular telephony.
- 11.1 -

11.1

Evolution of cellular systems

The first radio-mobile systems appeared soon after Guglielmo Marconi invented
the radio, and demonstrated the possibility of sending electrical signals worldwide. A successful application of wireless telegraphy was in ship-to-shore communications, both for positioning, for guidance, and also for distress calls. A
famous case of the use of telegraphy was in the RMS Titanic, when it sank during
its maiden voyage from Southampton (UK) to New York City, on April 14th 1912.
And famous was the distress call
CQD (6 times) DE (this is) MGY (6 times) position 41.44N. 50.24W
where:
- the code CQD means a general call to all vessels, which indicates the vessel
sending is in distress and requires immediate assistance. At the time of the
sinking, the Marconi companys CQD was still in common use, although
it had been officially replaced by the well known SOS.
- MGY was RMS Titanics call sign, where M identifies British vessels, although it came to denote Marconis ground stations, and the remaining two
letters identify the vessel.
However, the first radio-mobile systems, as they are today understood, appeared
in military contexts, and their effectiveness was confirmed by their deployment
during World War II. Certainly, in those troubled times, massive use was made
for air traffic control, and in conducting squadrons in air battles, where the interconnections between planes of the same squadron were possible only via wireless
phones. The war experience was then expended in the enormous development
of air transportation in the last 70 years or so. However, these systems were far
from permitting personal distribution as later occurred for cellular phones. Big
progress in electronic miniaturization has been necessary, a process that started
at the end of World War II with the invention of the transistor.
As soon as the technology was sufficiently developed, that is starting from the
beginning of 1970s, phone cellular systems were introduced in USA, UK, and
Northern European Countries. The number of consumers at the end of the 1970s
was estimated as a few thousand in European countries, while the USA were
ready to take off with 40, 000 users and at least 20, 000 people on the waiting list.
The available systems were denominated
AMPS (Advanced Mobile Phone Service) in the US
NMT (Nordic Mobile Telephone) in North European Countries
TACS (Total Access Communication System) in the UK.
The first Italian public radio-mobile system was introduced, on an experimental
basis, in 1973, by Telecom (formerly SIP). The system, denominated RTM1, was
- 11.2 -

able to supply services of voice conversation and radio news. A major limitation
was the impossibility of receiving calls at the mobile terminal, but only call announcements. Further, the system was designed for a small number of users in
each area served. The VHF working band was located around 160 MHz. This
system had a maximum of 5000 subscribers.
In 1983, RTM1 was replaced by a system denominated RTMS, operating at 450
MHz. In RTMS, it was possible to receive calls, since the mobile terminal was
identified by a number with prefix 0330. At the beginning of 1990s, this system
reached 100, 000 users, when it was replaced by the ETACS system (Extended
TACS), which, as the name shows, was derived from the British system. ETACS
worked at 900 MHz with cellular coverage, and frequency re-use.
ETACS was based on analog FM modulation for sending the voice signal, and
digital FSK modulation for control signaling. Communication was between the
mobile equipment and a base station that served a limited geographical area
called a cell, whose shape is approximately a disc with the base station in the
center.
In ETACS, two fundamental notions, that are indispensable for full mobility were
definitively introduced:
handover that is the automatic management of call changeover that occurs when
the user crosses the border between two cells, without a loss of the conversation;
localization with a continuous updating of the position of the mobile equipment
(cell phone) in the territory partitioned into cells.

11.2

GSM

GSM, acronym for Global System for Mobile communications, was developed with
the aim of defining a common European standard for mobile communications.
The chosen technology was TDMA/FDMA/FDD with a radio link at a communication rate of 270 kbits/sec in a frequency band allocated around 900 MHz. The
modulation used was GSMK (Gaussian Minimum Shift Keying) with a separation of 200 kHz between adjacent channels. The voice is digitally encoded by a
linear predictive method (LPC). The source rate is 13 kbits/sec, increased to a
transmission rate of 22, 8 kbits/sec to take care of error correcting and error detecting codes.

11.2.1

Origins

The origins of GSM are more chronicle than history, mainly thanks to the disputes, certainly not of a technical nature, pertaining to the definition of the standard itself. The adoption of one technology rather than another meant making
the fortune of a firm, or precipitating its decline. In June 1982, at a meeting of
CEPT in Vienna, it was decided to set up a working group with the mission of
- 11.3 -

defining the technical characteristics for a pan-European system of cellular communications, in a band around 900 MHz. Actually, such a band had already been
specified in 1978, when it was decided to reserve two half-bandwidths of 25 MHz,
around the central frequency of 900 MHz, for mobile communications in Europe.
Year

Event

1982
1986
1987

Group Special Mobile (GSM) is instituted through CEPT


A permanent work group for GSM is set up
Choice of the main transmission techniques after
tests and evaluations on prototypes.
GSM becomes a technical committee of ETSI
GSM900 specifications are definitively adopted
Work begun defining specifications of DCS1800
Work begins on the first GSM900 systems
DCS1800 specifications are adopted
Commercial operations of all major European providers
of GSM900 begin.

1989
1990
1991
1992

CEPT - Conference Europeenne des Poste et Telecommunications


ETSI - European Telecommunications Standard Institute
CCITT - Comite Consultatif International Telegraphique et Telephonique

The work of the GSM group lasted for about ten years and followed the model of
CCITT recommendations. The work plan was split into four work packages:
WP1 devoted to defining services;
WP2 devoted to specification of radio transmissions;
WP3 devoted to specification of network architecture, signaling protocols and
open interfaces between different networks;
WP4 devoted to defining the services of data transmission.
The conclusions were contained in a report of more than 5000 pages, and the effects were the extraordinary boom in European mobile telephony.
The system offers basic services of telephony and digital data transmission, besides a multitude of possibilities that are of use to users, but that are manly of a
promotional commercial character.
telephoning with the possibility of making and receiving calls
roaming throughout Europe, i.e. the possibility of calling and receiving
when traveling in different countries
identification of the caller
- 11.4 -

data transmission (up to 9600 bits/sec)


facsimile group 3
electronic mail
teletext
video text
messaging
GSM has its limits, for example the data transmission rate is too slow to permit Internet surfing. Some of these limits cannot be overcome due to the characteristics
of the system, which was designed to better exploit the bandwidths in relation to
local traffic constraints (delay constrained). Nevertheless, the number of possible
services is practically unlimited, ranging from emergency calls, to phone conferences (i.e. a number of users may be collected together contemporarily, and all
may hear and speak together), call deviations, and all information services typical
of the fixed network.

11.2.2

Communication aspects

GSM is a full digital communication system. The advantages are many, in particular the security aspects are more easily managed, as are the quality aspects
and number of services offered. In particular, the digital character allows error
control codes to be applied that permit quality data transmission.
GSM transmission is by packets (called BURSTs) of data; to each packet is reserved an interval or SLOT of time; groups of packets are hierarchically organized
in FRAMEs. The SLOT basis is constituted of 156,25 bits, and transmitted at 270
kbits/sec. The structure of a SLOT depends on the function of the group of bits
that compose the BURST. The normal BURST structure is

ST Data H

SP

where
ST: a group of 3 bits of Start
Data: a group of 57 bits of useful data
H: a group of 1 bit used for distinguishing voice from data
T: a group of 26 bits for training
SP: a group of 3 bits of Stop
G: a group of 8,25 bits for defining a temporal guard interval between BURSTs.
- 11.5 -

Voice encoding. The voice is converted to digital by the LPC (Linear Prediction
Coding) technique, which considers slots of 20 msec of speech. It is known that
the alteration (due to disturbance or noise during the transmission) of bits in this
kind of encoding has different effects on the final reproduction of the voice, which
depend on the meaning of the bit corrupted. Therefore, different strengths of
protection are necessary to combine economy of transmission and final quality of
voice. The strategy is to partition into classes the bits corresponding to a frame of
20 msec. Thus, 50 bits of the most sensitive classes are protected with a CRC code
(which adds 3 bits of parity-check control to 50 bits); a convolutional code with
rate 1/2 encodes the previous 53 bits together with the remaining 182 bits.
The sampling process produces 160 samples for a total of 2, 080 bits every 20 msec.
LPC compression reduces the total number to 260 bits, of which 182 are encoded
and the remaining 78 are transmitted interleaved to these, at a rate 22.8 kbits/sec
in the Data field of the Normal BURST.

11.2.3

Security and Protections

The security mechanism of GSM may supply three services:


1. Authentication of the user identity (SIA)
2. Confidentiality of user data and signaling. This service includes three functions:
a) Confidentiality of the user data over the physical connection (in the
air)
b) Confidentiality of the user data outside the connection
c) Confidentiality of signaling information
3. Confidentiality of user identity.
Protection is achieved by means of essentially three algorithms, namely A3, A5,
and A8, whose aim and operations are now briefly described.
Authentication. The authentication of user identity procedure is the kernel of
the security and functionality of GSM. This procedure is used by the network to
uniquely, unequivocally, and reliably authenticate the subscriber (thus to charge
the cost to the legitimate subscriber), and for defining and managing the cryptographic keys required by confidentiality services.
The authentication service must be supported by all networks, independently of
the country, provider, or roaming issues, and obviously by all cell phones. After
enrollment, authenticity is verified according to a plan that depends on providers,
or agreements.
When a user (MS - Mobile Subscriber) wants to be enrolled in the network, his cell
phone sends a request to the network at the nearest entry point (BS -Base Station),
- 11.6 -

then MS receives from the provider, via the entry BS, a random number RAND.
MS, utilizing the algorithm A3, which realizes a one-way function, computes
RES = A3(RAN D, KI ) ,
where KI is a private key of user authentication, known only to the user and
the provider. This secret key is stored in the SIM card in a protected way, and also
stored in the service center of the network provider.
The result RES is sent back to the provider service center for a challenge, that is
the center does the same computation and obtains res (this is possible because the
Service center knows all parameters). If res = RES, then the service center sends
back an ack, so that the user is enrolled in the network. Of course, if RES 6= res
access is denied. The authentication process, if successful, also provides BS of the
(secret) synchronization key Kc (called communication key) for the stream cipher
that encrypts messages in the air. The BS must decrypt the received encrypted
messages from MS, because the message (voice or data) must be clear when it
enters the ground wired network.
This communication key Kc is generated by the algorithm A8, which implements
a one-way function that combines RAND (known to all) and KI (secret user identifier)
Kc = A8(RAN D, KI ) .
This key Kc need not be communicated to MS, because he has all information,
namely RAND, his secret identifier number KI , and the algorithm A8 implemented on the cell phone, that is required for its computation.
Confidentiality of user information. In the air, the confidentiality of user information is achieved by encrypting the bit stream by means of a stream cipher,
which is driven by the secret communication key A8. The stream generator produces a binary stream (the encryption key) which is summed bit by bit with the
bits of the digitalized message, before the modulator. The key stream is produced by an algorithm denoted A5. The structure of the generator implementing
A5, shown in Figure 11.1, consists of 3 LFSR with a clock control. Generator polynomials and positions of the control clock bit for each LFSR are shown in the
following Table
LFSR
length
19
22
23

Polynomial
generator
x19 + x5 + x2 + x + 1
x22 + x + 1
x23 + x7 + x2 + x + 1

Control
clock bit
8
10
10

The stream generator is initialized at each BURST, and every LFSR is feed according to a defined strategy. At each step, the clock controlled LFSRs advances
according to a majority rule:
- 11.7 -

18

LFSR - 19

0
V OICE LPC



21

LFSR - 22

10






E NCRYPTED V OICE

22

LFSR - 23

10

Figure 11.1: Stream generator A5


A control circuit counts the number of 1s and 0s in the control bit positions
of each LFSR, then only the LFSRs that have a symbol belonging to the
majority are updated. Therefore, every LFSR advances with a probability
equal to 3/4.
This feed criterion is applied also during the initialization stage, that is performed at every FRAME (BURST).
The number of steps is 114, and two blocks of streams are produced, one for
incoming bits, and one for outgoing bits.
The initialization is strangely complex and combines the communication key A8
of 64 bits, with the FRAME number of 22 bits, thus the initialization procedure is
repeated at every FRAME:
All LFSRs are loaded with zeros
For 64 steps, the corresponding bit of Kc is added (modulo 2) to the bit of
the less significant cell of every LFSR, the contents of the registers are fed.
For 22 steps, the corresponding bit of FRAME number are introduced in the
same way
All LFSRs are fed for 100 steps
The generator is ready, and produces two blocks of 114 bits each, for encrypting outgoing bits, and decrypting incoming bits.

- 11.8 -

Confidentiality of user identity. The user identity is masked by using an alias,


called TMSI (Temporary Mobile Subscriber Identity), which is a number between
1 and 999 temporarily valid. This alias is unknown to unauthorized observers.
The TMSI is sent encrypted to the mobile phone MS with A5, by the base station
BS, after enrollment is successfully completed. MS acknowledges receipt of TMSI,
and from that point on uses the alias as his identifier.
The TMSI is valid only in the cell where the procedure has been completed. For
the hand over, a LAI (Location Area Identification) is necessary in addition to the
TMSI. The new TMSI is re-assigned, encrypted with A5 as for the first enrollment.

11.3

Conclusions

The choice of the algorithms A3, A8, and A5 has been at the center of numerous
disagreements over security. It is frequently claimed that A5 may be easily broken (actually, there are at least two similar algorithms, denoted A5/1 and A5/2).
However, assuming that such a protection is active to conceal all conversations
flying in the air, there is no serious motivation for using stronger protection mechanisms, given that at every base station the voice is decrypted and goes in clear
on the wires. The fortune of GSM, with respect to the previous generations TACS,
ETACS etc., has been due to the possibility of transmitting data, to the digitalization of the voice, and most importantly to the sufficient strength of the algorithm
A3, and the challenging protocol for accessing (or being enrolled) in the network,
when the cell phone is switched on. Even if the system is breakable with a brute
force attack, without intelligence or side information, does the price pay off? In
the most fortunate cases, with todays technology, it takes days to break the system, a lot of resources, and very high costs of specialized personnel.
The Committee decided not to publish the algorithm A3, motivated by obscure
reasons of security. In the book The GSM System for Mobile Communications by
M. Mouly and M.B. Pautet, at page 478 we read:
... The algorithm describing the computation is referred to as algorithm A3 in the Specifications but its specification cannot be found
there.
... Such algorithms are usually kept secret.
However, it is clear that the reasons were more political than technical. The validity of an algorithm actually lies in its public availability rather than in its secrecy.
The adage Secret of three, secret of everybody is absolutely true, but the choices
of the big organizations cannot be discussed, and must only be accepted. About
the strength of these algorithms, there have been many disputes and claims that
the algorithms could be easily broken. Nevertheless, the real issue concerning
the weakness of these algorithms is not their intrinsic quality, as history teaches.
The issue lies in the protocols used in their deployment; these protocols generally
have some very, very weak point and may contain hidden bugs. For massive applications at very low level, the mathematical robustness of the algorithms is the
- 11.9 -

last target.
The adoption of relatively weak algorithms in GSM is supported by several reasons:
The algorithms should mainly play a deterrent role.
The algorithms should be easy to implement and should not be energyconsuming.
The algorithms should be breakable by national security agencies, in view
of the principle that the security of the individual citizen should not affect
the security of the nation.
The algorithm A5 must be universal, independent of the provider or nation,
because it must run on all cell phones of all manufacturers, and all base
stations in the world.
Meanwhile, the general philosophy is wise enough to permit an independent
choice of the algorithm A3 (authentication algorithm) by the provider, who may
adopt a proprietary algorithm as strong as he likes, in order to protect his economic interests. However, experience has shown that the A3 algorithm, together
with the authentication protocol is sufficiently strong to guarantee the success of
GSM: it has lead to its world dominance, a domination that is indisputable and
above any controversy.
GSM is a dominating phone system, but mobile phones have undergone enormous evolution, up to smart-phones, which practically are computers oriented to
exploit the potentialities of the Internet; their security is essentially based on the
security of the Internet and its providers.

- 11.10 -

Chapter 12
Steganography
Imagination was given to man to compensate
him for what he is not; a sense of humor to console him for what he is.
Francis Bacon

12.1

Introduction

The objective of steganography is to conceal the very existence of the message. It


is evident that hiding the existence of information gives the best protection, because it is impossible to search what we do not even know exists. The history of
steganography is a strange story, but todays applications in civil life have almost
nothing to do with the extraordinary fantasy and inventions that were pursued
for centuries, or even for millennia.
In next section we give a brief account of the history of steganography, referring readers to books [43, 44] for more precise and detailed facts, events, data
and news. The following section presents an example of modern steganographic
tricks. A typical steganographic artifice is to hide information in innocent cover
messages by exploiting their high semantic redundancy, as occurs for example in
voice messages or pictures. However, information-hiding techniques operating
at lower-level layers in a transmission chain have also been proposed. Following
this second approach, we describe a steganographic scheme that exploits the redundancy of the error-correcting codes necessarily used over noisy channels, for
example in the ubiquitous cell phone or wireless data access communications.
The stratagem is to insert the steganographic bits as artificial channel errors on
an honest communication channel, and to use some method to discriminate between genuine and artificial errors, thus recovering the hidden information. The
resulting steganographic or secret channel is a Binary Symmetric Channel (BSC)
with a bit error probability pg , that can be used to send the covered information
using standard techniques and protocols. This subject is discussed in the following sections. Section 12.3 introduces a specific steganographic channel, called
stega-channel, and obtains its description as a BSC whose bit error probability,
- 12.1 -

pg , turns out to be a function of the primary code, the bit error probability of the
primary channel, and the principle adopted for identifying the stega-bits. Section
12.4 analyses a simple yet concrete implementation of this scheme, evaluates performance, and discusses some critical cryptographic issues. Lastly, Section 12.5
draws some conclusions.

12.2

Some historical notes

Herodotus (c. 486-425 BC) in his Historiae [37] gives several examples of encryption messages, and two particular examples of steganography methods.
Histiaeus, tyrant of Miletus, wanted to send a message from Susa (Persia) to instigate a revolt in Ionia, against the Persians, in order to be sent to stop the rebellion
and thus acquire merit with the king (a typical political game in any age). Of
course the message must be kept secret from the Persian king. Histiaeus shaved
the head of his most trusted slave and tattooed it with the message, which disappeared after the hair had regrown. Apparently [43, 44] the method was also used
by some Germans at the beginning of the 20th century.
The second example of steganography given by Herodotus concerns Demaratus,
a Greek king of Sparta who was deposed and exiled. He went to the court of the
Persian king Darius. However, Herodotus credits Demaratus of having warned
the Spartans of an imminent invasion by Xerses, Darius successor. To send his
secret message, he removed the wax from a writing tablet, wrote his message on
the wood, and restored the wax: the tablet looked like a blank one. The legend
continues, telling that it was king Leonidas wife who discovered the trick and
decrypted Demaratus message, allowing the Spartans to stop the great Persian
army at the Thermophylae, for long enough for the remaining Greeks to organize
an army that defeated Xerses.
The general idea is to embed the secret information (stega-message) in other information that is innocent (dummy) as concerns to the private stega-message. An
old technique was to mark, with almost invisible signs, those letters of text which
composed the stega-message. This trick may be easily discovered, especially if
the stega-text is not properly encoded; however, it has been in use for centuries.
A further step forward was to use invisible ink to mark the letters, or to write
invisible messages in the blank spaces of innocent or faked messages. There are
a very many types of ink:
1. Inks that can be developed by chemical reactions.
2. Inks visible under ultraviolet light.
3. Inks developed by heat.
Todays uses of this sort of expedient include anti-counterfeiting, property marking, childrens games, and marking for the purpose of identification in manufacturing. UV markers, with fluorescent ink that glows when illuminated with a
UV light, may be used to invisibly mark valuable items as a safeguard against
- 12.2 -

burglary. Security marker pens of this sort can be obtained commercially and are
widely used as a crime countermeasure.
A method for hiding information in texts, without any physical alteration of the
paper is the acrostic. There are many acrostic poems. Probably the first was
due to Boccaccio (1300), and probably L amorosa Visione is still the most gigantic
acrostic: written in terzine, the initial letters of all the triplets throughout the work
compose three poems of considerable length. An example is the short poem An
acrostic by Edgar Allan Poe (1829)
Elizabeth it is in vain you say
Love not thou sayest it in so sweet a way:
In vain those words from thee or L. E. L.
Zantippes talents had enforced so well:
Ah! if that language from thy heart arise,
Breathe it less gently forth and veil thine eyes.
Endymion, recollect, when Luna tried
To cure his love was cured of all beside
His folly pride and passion for he died.
The initials of every verse form the word ELIZABETH. Obviously, the position
of the acrostic, when used to hide information, must not be so easily identified,
a point already realized and in use in ancient China. A more faithful example
from the literature comes from Hypnerotomachia Poliphili (1499), [44], written by
an anonymous author. An acrostic formed by the first letters of the 38 chapters
reveals (in Latin) an affair between a monk and a woman:
Poliam frater Franciscus Columna peramavit.
In the 16th and 17th centuries, a large amount of literature on steganography appeared. Trithemius occupies a prominent position: he wrote an extensive treatise, Polygraphiae, printed in 1606, dealing with occultism, cryptography, and
steganography. In the book he proposes an extremely complex steganographic
code, which was expanded by Gaspar Schottus, a Jesuit, in his Schola Steganographica (1665).
Human fantasy knows no limit when it come to devising ways to insert hidden
information, in pursuit of the strongest goals, from vendors, who want to control
resale, to copyright protection, from lovers messages, to spying on sensitive information. However, in modern cryptography, mathematical models, and what
can and cannot be achieved only via information manipulation, tend to dominate the scene. In this mathematical world, we have private-key and public-key
steganography. An example of private-key steganography that can be used in
telecommunication systems will be the object of the remaining part of this chapter. We close this section by saying that, informally, a public-key steganography
scheme is intended to allow two parties (Alice and Bob), who have not exchanged
a secret key, to send hidden messages over a public channel, so that an adversary
(Eve) cannot even detect that these hidden messages are being sent.
- 12.3 -

Unlike the classical settings, in which provable security may be applicable to


steganography, public-key steganography is information-theoretically impossible; nevertheless, it is sometimes used. This is rather extravagant, but the human mind
tends to pursue the crazy conviction that the stega-messages will never be detected.

12.3

Steganographic channel models

A digital communication chain connecting a source S with a user U comprises a


binary encoder E using a (n, k, d) linear code C with d = 2t+1, a binary symmetric
channel (BSC) with bit error probability p, and a decoder D. The chain E-BSC-D
is referred to as a primary channel. The decoding rule D is described by a full set
T of coset leaders `i , 1 i 2nk , which identify the cosets `i + C.
A steganographic channel is created by inserting artificial errors on the primary
channel. We will consider and compare two basic modes of artificial bit error
insertion. Both modes introduce a single artificial error in a code word:
Mode 1:
A 0 stega-bit is inserted as an artificial no-error affecting a bit in a
fixed position within a primary code word.
A 1 stega-bit is inserted as an artificial error affecting a bit in a fixed
position within a primary code word.
Mode 2
A 0 stega-bit is inserted as an artificial error affecting a 0-bit in a
randomly-chosen position within a primary code word.
A 1 stega-bit is inserted as an artificial error affecting a 1-bit in a
randomly-chosen position within a primary code word.
In both Modes, the primary code is used to recognize both error status and stegabits; however, the separation of artificial and genuine errors is very different,
specifically:
In Mode 1 the stega-information is encoded in a known position within a primary
code word, therefore it is easily recognized using the decoding rule D.
In Mode 2, the stega-information is carried by a primary symbol artificially corrupted in a random position unknown to the stega-user, therefore the decoding
rule D is not sufficient for identifying the artificial error and a detection criterion
is required.
The stega-channel between stega-sender and stega-receiver is a binary communication channel characterized by a bit error probability pg which depends for each
mode on
1. Primary code error correcting capabilities
- 12.4 -

Stega-channel
K
S

K
E

Public BSC

T 1

Figure 12.1: Communication Channel with Error-Correcting Codes and the Stegachannel
2. Cover channel bit error probability p
3. The decoding rule D.
4. The detection criteria in Mode 2.
The bit error probability pg is thus defined as follows
pg =

X XX

p{
x 6= x|c, e, x}pt (x)p(c)p(e)

x=0,1 cC eFn
2

where
- pt (x) is the probability of sending a stega-bit x;
- p(c) is the probability of sending a code word c;
- p(e) = pwH (e) (1 p)nwH (e) is the probability that an error pattern e of Hamming weight wH (e) occurs;
- x is the stega-bit detected.
Let Li denote the detection rule of Mode i that extracts from a detected error
the stega bit x, thus we may rewrite the expression for pg as
pattern e
pg =

X XX

p{Li (c + e + D(c + e)) 6= x|c, e, x}pt (x)p(c)p(e) .

x=0,1 cC eFn
2

Letting `(e) denote the coset leader of the coset containing the error pattern e, we
have
pg =

X XX

p{Li (e + `(e)) 6= x|c, e, x}pt (x)p(c)p(e) .

x=0,1 cC eFn
2

- 12.5 -

In particular, referring to Mode 1, the average on the transmitted code words can
be computed and the equation simplifies to
X X
pg =
p{L1 (e + `(e)) 6= x|e, x}pt (x)p(e) .
x=0,1 eFn
2

However, the exact computation of pg is usually very difficult, as it requires the


enumeration of a large number of error configurations. Thus estimations are
practically unavoidable: in particular, a good estimation is obtained as follows.
Assuming that a code word c is sent, two probabilities may be computed exactly:
1. pc (c), the probability that the stega-bit is received correctly for sure, given
that c was sent;
2. pe (c), the probability that the stega-bit is received incorrectly for sure given
that c was sent.
The stega-bit error probability pg (c), given that c was sent, is estimated as
1
1 1
1
pg (c) = pe (c) + [1 pc (c) pe (c)] = pc (c) + pe (c)
2
2 2
2
since it is very likely (though not necessarily true) that, in all remaining cases, the
stega-bit is detected incorrectly one time out of two. Finally, the stega-bit error
probability pg is the average of pg (c) computed over all code words:
pg =

1 X
pg (c) .
2k cC

(12.1)

It is worthwhile to observe that in Mode 1 pg (c) = pg , thus no average is necessary


and the computation of pg is independent of the transmitted code word. Whereas,
in Mode 2, the computation of the averages in equation (12.1) can be obtained
from the polynomial weight enumerator W (x, y) of the primary code.
Remark 1. In Mode 2, a detection strategy may be devised such that the stegachannel is modeled as a binary erasure channel: in this case, the performance
may be greatly improved by the use of a stega-code, which is allowed to correct
both erasures and errors.
Remark 2. As a consequence of the definition of a stega-channel as a binary
symmetric channel (BSC), the steganographic information may be pre-processed,
namely compressed, encrypted, and encoded using an error-correcting code; the
resulting stream is the sequence of bits to be sent. The concealing capability of
the schemes mainly depends on the transmission protocol, and it is decisive in
defining the stega-channel transmission rate.

12.4

Concealment issues

If the channel noise is negligible, no genuine error occurs and the stega-bits are
easily recovered, but conversely the stega-channel is easily detected. Therefore,
- 12.6 -

the secrecy of the stega-channel depends on primary channel noise. In particular,


the difficulty of detecting the stega-channel depends on the ratio = ap between
the rate p of genuine errors and the rate a of artificial errors. If is large, the
existence of stega-bits is unlikely to be recognized; conversely, the transmission
rate of the stega-channel is small given p. Thus, the choice of is a compromise
between the achievable rate of the stega-channel and its undetectability.
p The average number of genuine errors per code word is np, its variance is np(1 p);
the average number of total errors per code words is np + a, and the stega-bit is
undetectable at level > 0 if
p
np + a = np + np(1 p)
p
thus a = np(1 p). For example, if p = 102 , = 1/10, and n = 31, then a =
0.055, that is one stega-bit may be inserted about every 20 code words. However,
if the level is increased to 1 the rate a is increased to 0.5, which means that a
stega-bit may be inserted every two code words.
be the transmission rate of the cover channel, then the rate of the stegaLet R bits
sec
bits
channel is Ra
. Continuing the previous example, if the transmission rate is
n
sec
kbits
bits
, the net rate of the stega channel is 10000
= 16 kbits
.
10 Msec
620
sec
sec

12.4.1

Examples, Simulation, and Results

In this section we present some examples of stega-channels, which are obtained


considering both Modes 1 and 2 over noisy channels, and using primary codes
with different error correcting capabilities. The main scope is to illustrate the
computation of the bit error rate pg of the equivalent BSC stega-channel, and to
asses the validity of the approximated expressions.
Example 1. Consider a cover channel using the repetition code (3, 1, 3) which is
a perfect single error-correcting code. Since the dimension of the code is small the
computations may be exact. It is assumed that the stega-bits 0 and 1 are equally
probable, i.e. pt (0) = pt (1) = 21 .
Mode 1: For computational purposes it is not restrictive to assume that the
steganographic bit is allocated at the first position in every code word, and that
the word (0, 0, 0) is sent. We have
1. If 0 is sent, then it is incorrectly received only if an error is detected that
affects the first entry of (0, 0, 0), and this event occurs only when the error
patterns are
(1, 0, 0) , (0, 1, 1)
then pe (0) = p(1 p)2 + p2 (1 p) = p p2 , consequently the probability
of correct detection is pc (0) = 1 pe (0) = 1 p + p2 since no ambiguous
configuration occurs.
- 12.7 -

2. If 1 is sent, then it is incorrectly received when no error is detected in the


first code word position, and this event occurs only when the error patterns
are
(1, 0, 0) , (0, 1, 1) , (0, 0, 1) , (0, 1, 0) , (1, 1, 0) , (1, 0, 1)
then pe (1) = 3p(1 p)2 + 3p2 (1 p) = 3p 3p2 , consequently the probability
of correct detection is pc (1) = 1 pe (1) = 1 3p + 3p2 since no ambiguous
configuration occurs.
Summarizing, we have
pe = 12 (pe (0) + pe (1)) = 2p 2p2 and pc = 12 (pc (0) + pc (1)) = 1 2p + 2p2 , and the
bit error probability of the BSC model of the stega-channel is pg = 2p 2p2 .
Mode 2:

for computational purposes, it is convenient to use a coset code


{(1, 0, 0), (0, 1, 1)}

(that is a translation of the repetition code) so that the all-zeros and the all-ones
code words are excluded, and transmission of a stega-bit 0 or 1 is possible without
exception. Moreover, we know that pe and pc do not depend on which stegabit is sent, but both probabilities depend on the code word used, thus they are
computed referring to a 0 stega-bit sent. Moreover, in this case some undecidable
situations occur, which may be seen as erasures, thus also an erasure probability
pu is computed. The standard array is
leaders 000
100
011

010
110
001

001
101
010

100
000 ;
111

the words in each row are decoded into the code word in the first position of the
row.
1. If (1, 0, 0) is the code word used, and the 0 stega-bit is sent using the second
entry, the eight possible situations corresponding to genuine error patterns
are reported in the following table, where column I contains the genuine
error patterns, column II the received words, column III the decoded code
word, column IV the error position and the type of error with E meaning
erroneous stega-bit, U erased stega-bit, and C correct stega-bit, and column
V contains the probability of the event
I
000
111
100
011
110
001
101
010

II
110
001
010
101
000
111
011
100

III
100
011
011
100
100
011
011
100
- 12.8 -

IV
2C
2C
3C
3C
1E
1E
0U
0U

V
p3
(1 p)3
p(1 p)2
p2 (1 p)
p2 (1 p)
p(1 p)2
p2 (1 p)
p(1 p)2

2. If (0, 1, 1) is the code word used, the configurations are summarized in the
following table as in the previous case
I
000
111
100
011
110
001
101
010

II
111
000
011
100
001
110
010
101

III
011
100
011
100
011
100
011
100

IV
1C
1C
0U
0U
2E
2E
3E
3E

V
p3
(1 p)3
p(1 p)2
p2 (1 p)
p2 (1 p)
p(1 p)2
p2 (1 p)
p(1 p)2

Summarizing, we have
pe = 21 (3p(1 p)2 + 3p2 (1 p))
pc = 21 (2p3 + 2(1 p)3 + p(1 p)2 + p2 (1 p))
pu = 21 (2p(1 p)2 + 2p2 (1 p))
The probability pg is obtained splitting pu between pe and pc , and is
pg = 2p(1 p) .
Example 2. Consider a cover channel employing a BCH (31, 16, 7) code correcting three errors. In this case, the exact computation is unfeasible, and therefore
we consider the approximate computation and compare the results with numerical simulations.
Mode 1: For computational purposes, it is not restrictive to assume that the
steganographic bit is sent using the first position in a code word. In this mode,
the computation of pe and pc depends on whether the stega-bit is 0 or 1, which
are assumed to be sent with the same probability 1/2. We have
1. If 0 is sent, then pe (0) is the probability that an error affects the position of
the stega-bit and the total number of errors is at most three
pe (0) = p(1 p)30 + 30p2 (1 p)29 +

30 29 3
p (1 p)28 .
2

If 1 is sent, pe (1) is equal to the probability that one error affects the position
of the stega-bit and at most three errors affect the other positions within the
code word
pe (1) = p(1 p)30 + 30p2 (1 p)29 +

30 29 3
30 29 28 4
p (1 p)28 +
p (1 p)27 .
2
6

- 12.9 -

2. If 0 is sent, pc (0) is equal to the probability that at most three errors affect
positions within the code word, excluding the stega-bit position
pc (0) = (1 p)31 + 30p(1 p)30 +

30 29 28 3
30 29 2
p (1 p)29 +
p (1 p)28 .
2
6

If 1 is sent, pc (1) is the probability that at most two error hit positions within
the code word, excluding the stega-bit position
pc (1) = (1 p)31 + 30p(1 p)30 +

30 29 2
p (1 p)29 .
2

Summarizing, we have
p3 (1 p)28 + 302928
p4 (1 p)27
pe = p(1 p)30 + 30p2 (1 p)29 + 3029
2
12
pc = (1 p)31 + 30p(1 p)30 + 3029
p2 (1 p)29 + 302928
p3 (1 p)28 .
2
12
thus, the bit error probability pg of the BSC model for Mode 1 is
pg =

1 1
29
405 2
1595 3
(1p)31 p(1p)30
p (1p)29
p (1p)28 +1015p4 (1p)27 .
2 2
2
2
2

Mode 2.

In this mode, we define a detection rule of the stega-bit as follows:

Use the primary code to recognize the positions of every error.


Compute the number N0 of errors affecting the zero positions and the number N1 of errors affecting the one positions in the primary code word.
The decision rule establishes that either 0 or 1 is the output corresponding to the subscript of the largest between N0 and N1 .
Ties may be resolved in two ways: i) by a random choice, or ii) by inserting
an erasure (for later use in the decision process); in this case pu indicates the
erasure probability.
The probabilities pc , pe , and pu do not depend on the transmitted stega-bit, and
are obtained by averaging on the code word set, since they are conditioned on
the transmitted code word c. The computations of pc (c), pe (c), and pu (c) may
be done referring to a finite (relatively small) number of genuine error configurations, which are summarized in the following table (ar indicates the artificial
error position) under the assumption that a 0 stega-bit is transmitted.
It is direct to obtain pc (c), pe (c), and pu (c), with n k0 = wH (c), from Table 1.
However, with the aim of computing pg it is necessary to average these probabilities over the whole code. These averagesPEC [pj (c)] may be obtained from
the weight enumerator polynomial W (x, y) = nj=0 Anj xnj y j of the code, since
they are linear combinations of averages
EC [k0 (k0 1) . . . (k0 v)(n k0 )(n k0 1) . . . (n k0 u)] = Wyy,xx (1, 1)0
- 12.10 -

where Wyy,xx (1, 1)0 is the multiple partial derivative of W (x, y) with respect to
the subset of subscript variables evaluated for x = y = 1.
The approximated error probabilities of the BSC, modeling a stega-channel with
Mode 2, is
pg =

29
403 2
29
405 4
1 1
(1 p)31 p(1 p)30
p (1 p)29 + p3 (1 p)28 +
p (1 p)27
2 2
4
4
4
4

xar

e
e
e
e
e
e
e

12.5

(k0 1) 0-posis. (n k0 ) 1-posis. event prob.


correct detection error configurations
(1 p)31
e
(k0 1)p(1 p)30
e
(k0 1)p2 (1 p)29
(k0 1)(k0 2) 2
ee
p (1 p)29
2
e
e
(k0 1)(31 k0 )p2 (1 p)29
(k0 1)(k0 2) 3
p (1 p)28
ee
2
(k0 1)(k0 2)(k0 3) 4
eee
p (1 p)27
6
(k0 1)(k0 2)(nk0 ) 4
ee
e
p (1 p)27
2
incorrect detection error configurations
e
(31 k0 )p2 (1 p)29
(31k0 )(31k0 1) 2
p (1 p)29
ee
2
(31k0 )(31k0 1) 3
ee
p (1 p)28
2
(31k0 )(31k0 1)(31k0 2) 4
eee
p (1 p)27
6
(k0 1)(31k0 )(31k0 1) 4
e
ee
p (1 p)27
2
erasure detection error configurations
e
(31 k0 )p(1 p)30
e
e
(k0 1)(31 k0 )p3 (1 p)28

Conclusions

Error-correcting codes over communication channels may be used as cover objects for secret conversations. The steganographic information is sent as artificial
errors on the channel, and it is undetectable provided that the artificial errors do
not affect the rate of the genuine errors, in the sense that the artificial error rate is
undistinguishable from random variations of the genuine error rate.
An interesting observation is that the performance of the stega-system principally
depends on the primary code error correcting capabilities, rather than on the code
itself, and it is also to some extent independent of the code rate.
Two modes of inserting artificial errors, denoted Mode 1 and Mode 2, have been
considered and compared. Although Mode 1 is incomparably better then Mode
2, because of the achievable pg , Mode 2 does not need synchronization, and it
seems to be less easily detectable.

- 12.11 -

Bibliography
[1] M. Agrawal, N. Kayal, N. Saxena, Primes is in P, Ann. Math. 160, 2004, 781793.
[2] E. Bach, J. Shallit, Algorithmic Number Theory, vol.1, MIT Press, Cambridge
(MS), 1996.
[3] G.V. Bard, Algebraic Cryptanalysis, Springer, New York, 2009.
[4] E. R. Berlekamp, Algebraic Coding Theory, McGraw-Hill, New York, 1968.
[5] I. Blake, G. Seroussi, N. Smart, Elliptic Curves in Cryptography, Cambridge
Univ. Press, Cambridge, 1999.
[6] I. Blake, G. Seroussi, N. Smart, Advances in Elliptic Curves Cryptography, Cambridge University Press, Cambridge, 2005.
[7] A. Borodin, I. Munro, The Computational Complexity of Algebraic and Numeric
Problems, Elsiever, New York, 1975.
[8] J.A. Buchmann, Introduction to Cryptography, Springer, New York, 2000.
[9] K. W. Campbell, M. J. Wiener, DES is not a Group, Proceeding CRYPTO 92,
Proceedings of the 12th Annual International Cryptology Conference on Advances
in Cryptology, Springer, London, UK, 1993, 512-520.
[10] J.W.S. Cassels, Lectures on Elliptic Curves, Cambridge Univ. Press, Cambridge,
1991.
[11] K. Cattel, J.C. Muzio, Analysis of One-Dimensional Linear Hybrid Cellular
Automata over Fq , IEEE Trans. on Computer, Vol. 45, N.7, July 1996, 782-792.
[12] D. Chaum, Blind signatures for untraceable payments, Advances in Cryptology Proceedings of Crypto 82, Plenum, 3, 1983, 199-203.
[13] C. Christensen, Polish Mathematicians Finding Patterns in Enigma Messages, Mathematics Magazine, Vol 80, N.4, October 2007, 247-273.
[14] F. Chun-I, W. Lin-Chuan, W. V. Shi-Ming, Cryptanalysis on Chen-Qui-Zheng
Blind Signature Scheme, Applied Math. Sciences, Vol. 8(16), 2008, 787-791.
- Bib.1 -

[15] H. Cohen, A Course in Computational Algebraic Number Theory, Springer, New


York, 1995.
[16] D. Coppersmith, The Real Reason for Rivests Phenomenon, Advances in
Cryprology - Crypto 85 Proceedings, Springer, New York, 1985, 535-536.
[17] H. Davemport, Higher Arithmetics, Dover, New York, 2008.
[18] D. Denning, Cryptography and Data Security, Addison-Wesley, Reading (MS),
1983.
[19] L.E. Dickson, Introduction to the Theory of Numbers, Dover, New York, 1957.
[20] W. Diffie, M.E. Hellman, New Directions in Cryptography, IEEE Transactions
on Information Theory, vol.IT-22, n.6, November 1976, 644-654.
[21] M. Elia, G. Morgari, M. Spicciola, On Binary Sequences Generated by Selfclock Controlled LFSR, MTNS 2010, Budapest, Hungary.
[22] M. Elia, On Tridiagonal Binary Matrices and LFSRs, Contemporary Eng. Sciences, Vol. 3, no. 4, 167-182.
[23] M. Elia, M. Piva, D. Schipani, The Rabin cryptosystem revisited,
arXiv:math.NT/1108.5935, 2011.
[24] M. Elia, D. Schipani, On the Rabin signature, to appear on JDMSC.
[25] M. Elia, M. Leone, C. Visentin, Low Complexity Bit-Parallel Multipliers for
GF (2m ) with Generator Polynomial xm + xk + 1,
Electronics Letters, vol.35, No.7, April 1999, 551-552.
[26] M. Elia, M. Leone, On the Inherent Space Complexity of Fast Parallel Multipliers for GF (2m ), IEEE Trans. on Computer, 51, N.3, March 2002, 346-351.
[27] F. Fabris, Teoria dellInformazione, Codici, Cifrari, Bollati Boringhieri, Torino,
2001.
[28] Giulio Fagnano de Toschi Produzioni Matematiche del Conte Giulio Carlo di Fagnano, Marchese de Toschi, e di SantOnorio, Stamperia gavelliana, Pesaro, 1750.
[29] C.C. Foster, Cryptanalysis for Microcomputers, Hayden, Rochelle Park (NJ),
1982.
[30] H. Fouche Gaines, Cryptanalysis, Dover, New York, 1956.
[31] S. Galbraith, Mathematics of Public Key Cryptography, Cambridge Univ. Press,
Cambridge, 2012.
[32] V.K. Garg and J.E.Wilkes, Wireless and Personal Communications Systems, Prentice hall, Englewood Cliffs, 1996.
- Bib.2 -


[33] C.F. Gauss, Disquisitiones Arithmeticae, Gottingen
(1801); English translation
by A. Clarke, revised by W. Waterhouse, Springer, New Uork, 1986.
[34] S. W. Golomb, Shift Register Sequences, Aegean Park Press, Laguna Hills,
1982.
[35] U. Hansmann, M.S. Nicklaus, T. Schack, F. Seliger, Smart Card Application
Development Using Java, Springer, New York, 1999.
[36] G.H. Hardy, E.M. Wright, An Introduction to the Theory of Numbers, Oxford
Univ. Press, Oxford, 1971.
[37] Herodotus, The Histories (Historiae), Penguin Books, London, 2003.
[38] L. S. Hill, Cryptography in an Algebraic Alphabet, The American Mathematical Monthly, vol. 36, June-July 1929, 306-312.
[39] J. Hoffstein, J. Pipher, J.H. Silverman, An introduction to mathematical cryptography, Springer, New York, 2008.
[40] R.A. Horn, C.R. Johnson, Matrix analysis, Cambridge Univ. Press, Cambridge, 1999.
[41] Hua, L. K., Introduction to Number Theory, Springer, New York, 1982.
[42] D. Jungnickel, Finite Fields - Structure and Arithmetics, Wissenschaftsverlag,
Mannheim, 1993.
[43] D. Kahn, The Codebreakers, The Story of Secret writing, Scribner, New York,
1996.
[44] S. Katzenbeisser, F. A. P. Petitcolas, Information Hiding, techniques for staganography and digital watermarking, Artech House, Boston, 2000.
[45] B. Kedem, Binary Time Series, Marcel Dekker, New York, 1980.
[46] D.E. Knuth, The Art of Computer Programming, Fundamental algorithms, vol.
I, Addison-Wesley, Reading (MS), 1967.
[47] D.E. Knuth, The Art of Computer Programming, Seminumerical algorithms,
vol. II, Addison-Wesley, Reading (MS), 1981.
[48] N. Koblitz, A Course in Number Theory and Cryptography, Springer-Verlag,
New York, 1987.
[49] N. Koblitz, Algebraic Aspects of Cryptography, Springer, New York, 1999.
[50] N. Koblitz, Introduction to Elliptic Curves and Modular Forms, Springer, New
York, 1984.
- Bib.3 -

[51] K. Kurosawa, W. Ogata, Efficient Rabin-type Digital Signature Scheme, Design, Codes and Cryptography, 16, 1999, 53-64.
[52] R. Lidl, H. Niederreiter, Finite Fields, Addison-Wesley, Reading (MS), 1983.
[53] H. Lysing, Secret Writing - An Introduction to Cryptograms, Ciphers and Codes,
Dover, New York, 1974.
[54] F.J. MacWilliams, N.A.J. Sloane, The Theory of Error Correcting Codes, North
Holland, Amsterdam, 1977.
[55] J.L. Massey, Shift-Register Synthesis and BCH decoding, IEEE Trans. on Inform. Th., IT-15, 1969, 122-127.
[56] A.J. Menezes, Elliptic Curve Public Key Cryptosystems, Kluwer Academic Publishers, Boston, 1993.
[57] A.J. Menezes, P.C. van Oorschoot, S.C. Vanstone, Handbook of Applied Cryptography, CRC Press, New York, 1997.
[58] M. Mignotte, Mathematics for Computer Algebra, Springer, New York, 1992.
[59] R. Mollin, An Introduction to Cryptography, CRC, NY, 2007.
[60] NBS-SHA. Secure Hash Standard (SHS). FIPS publication 180-2, Ntional Bureau of Standards, 2003.
http://carc.nist.gov/publications/fips/fips180-2/fips180-2.pdf
[61] J. Pieprzyk, T. Hardjono, J. Seberry, Fundamentals of Computer Security,
Springer, New York, 2003.
[62] M. Rabin, Digitalized signature as intractable as factorization, Technical Report MIT/LCS/TR-212 , MIT Laboratory for Computer Science, 1978.
[63] R. Rivest, A. Shamir, L. Adleman, A Method for Obtaining Digital Signatures
and Public-Key Cryptosystems, Comm. of ACM, vol. 21, Feb. 1978, 120-126.
[64] M. Rosing, Implementing Elliptic Curve Cryptography, Manning Publication
Co., Greenwich (CT), 1999.
[65] R.A. Rueppel, Analysis and Design of Stream Ciphers, Springer, New York,
1986.
[66] B. Schneier, Applied Cryptography, Wiley, New York, 1995.
[67] R. Schoof, Elliptic Curves Over Finite Fields and the Computation of the
Square Roots modp, Mathematics of Computation, vol. 44, number 170, April
1985, 483-494.
[68] C.E. Shannon, A Mathematical Theory of Communication, BSTJ, vol. 27,
1948, 379-423 and 623-656.
- Bib.4 -

[69] C.E. Shannon, Communication Theory and Secrecy Systems, BSTJ, vol. 28,
1949, 656-715.
[70] J.H. Silverman, A friendly Introduction to Number Theory, Prentice-Hall, Upper
Saddle River, 2001.
[71] J.H. Silverman, The Arithmetic of Elliptic Curves, Springer, New York, 1986.
[72] A. Sinkov, Elementary Cryptanalysis, MAA, Washington, 1966.
[73] J.H. Silverman, J. Tate, Rational Points on Elliptic Curves, Springer, New York,
1992.
[74] A. Sinkov, Elementary Cryptanalysis, MAA, Washington, 1966.
[75] Sung-Jin Cho, Un-Sook Choi, Ham-Doo Kim, Yoon-Hee Hwang,Jin-Gyoong
Kim, Seong-Hun Heo, New Synthesis of One-Dimensional 90/150 Linear
Hybrid Group Cellular Automata, IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, Vol. 26, N.9, Sept. 2007, 1720-1724.
[76] C. Swenson, Modern Cryptanalysis, Wiley, Indianapolis (IN), 2008.
[77] H. van Tilborg, An Introduction to Cryptology, Kluwer Academic Publ.,
Boston, 1988.
[78] F. Tricomi, Funzioni Ellittiche, Zanichelli, Bologna, 1951.
[79] A. Vardy, The intractability of computing the minimum distance of a code,
Information Theory, IEEE Transactions on, Vol. 43 , n. 6, 1997, 1757 - 1766.
DOI: 10.1109/18.641542
[80] L.C. Washington, Elliptic Curves, Number Theory and Cryptography, Chapman
& Hall, Boca Raton, 2003.
[81] S. Wicker, V.K. Bhargava, Reed-Solomon Codes and their applications, IEE Press,
New York, 1994.
[82] H. C. Williams, A modification of the RSA public key encryption procedure,
IEEE Trans. Inf. Theory, Vol. 26(6), 1980, 726-729.
[83] D. Zheng, K. Chen, W. Qiu, New Rabin-like Signature Scheme, Workshop
Proceedings of the Seventh International Conference on Distributed Multimedia Systems, Knowledge System Institute, 2001, 185-188.
[84] M. Abramowitz, I.A. Segun, Handbook of Mathematical Functions, Dover, New
York, 1968.

- Bib.5 -

You might also like