Professional Documents
Culture Documents
expanded by Key Expansion into 10 round keys K(r) = (K(r)0 memory that is installed directly onto the CPU, thereby
, . . . , K(r)15 ) for r = 0, . . . , 10; with k = K(0). After an facilitating very fast access to the frequently used data. Level
Add round key operation, AES performs r successive rounds 2 cache is cache that is external to the microprocessor.
where SubBytes, ShiftRows, MixColums and AddRoundKey A diagram of cache has been give in Figure 1
are applied to a state. A state is defined as x(r) = (x(r)0 , . . . ,
x(r)15 ) and it is the result of the r-th AddRoundKey. The
initial state is obtained by the first AddRoundKey, i.e. (0) j,i =
C. Bernstein’s Attack
pj,i kj . We then introduce the r-th round of a plaintext p(r)i
= (p(r)0,i , . . . , p(r)15,i) as input of the r-th AddRoundKey, Bernstein’s Attack is based on the fact that AES leaks
i.e., timing information during cache hits and misses. In his attack,
x(r)j,i = p(r)j,i ⊕ K(r)j . An encryption of plaintext p by AES a client computer which is remotely connected to a server
with key k produces a ciphertext c, denoted as c = EAES(p, k) sends random plain texts to the server. The server then
[2] . Each round r state word is generated as. encrypts the data and instead of replying with ciphertext
replies with the time needed for encryption. The attack
consists of four stages, which are usually referred to as
Profiling, Attacking, Correlation and Brute force key search.
Profiling phase:
Cache
Memory
0 2 abc
0 xyz
1 pdq 1 0 xyz
2 abc
3 rgf
Brute force key search phase: have a matching time profile [9], naturally leads to
correlation between the two matrices calculated. So, the
Finally, a brute force key search is applied, wherein all secret key can be derived as:
the possible key combinations are used to encrypt a packet
containing all zeros and it is compared with ciphertext k’j = pj,i ⊕ kj ⊕ p’j,f .
which was saved in to the attack file in the attacking phase.
As the number of correlations increase, the number of
potential keys decreases and it results in a quicker results, III. INVESTIGATION OF THE ATTACK
with the AES key being recovered. The purpose of the attack was to extend the work done
Robert Salembier [1] in verification of the attack proposed
The Math behind the attack can simply be stated as by Bernstein. In [1], Robert has verified the attack using an
follows: AMD athlon XP processor, using an Open SSL version
.9.7e. He speculated that the attack will take less time if
The input of the system to the encryption is actually either done using three computers in parallel. He also proposed
pj,i ⊕ kj(known key) or p’j,f ⊕ k’j( for secret key) , that the attack be verified against other processors and also
where p represents plain text and k the key. Bernstein’s do the profiling phase with a non-zero key. We did a total
method computes the matrices as mentioned before, which of 4 tests, 3 for the complete attack and 1 for the profiling
have the times for encryption and the byte data, averaging phase using a non-zero key.
out the individual times of each possible value a singly byte
can take independent of other 15 input bytes. A. Testing Environment:
So, individual time profiles are arising out of random A total of 7 computers were used. The specifications and
plaintext encryptions for every byte separately, depending the environment under which they were setup are as shown
on the key. So, applying the simple heuristic that those
in Table 1.
pairs satisfying the equality pj,i ⊕ kj = p’j,f ⊕ k’j will also
ECE 746 Project Report 4
Test 1 Server:
Fedora Core 6 32 bit
Server: Pentium M mobile 1.8 GHz, 512 MB RAM
Centos 4.4, X86_64 bit edition, L1 Cache : 64 KB
AMD Athlon 3200+ Venice Core, 2.0 GHz 2 GB RAM L2 Cache : 2 MB
L1 Cache : 128 KB GCC Version : 4.1
L2 Cache : 512 KB Open SSL Version : 0.9.7a
GCC version :
Open SSL version :
Attacker 1
Attacker 1 Fedora Core 6, 32 bit
Fedora Core 5, 32 bit Intel Xeon processor, 512 MB RAM
Pentium 4 mobile 3.06 Ghz, 512 MB RAM L1 Cache : 64 KB
L1 Cache : 8 KB data cache L2 Cache : 512 KB
L2 Cache: 512 KB GCC Version : 4.1
GCC version: 4.1 Open SSL Version : 0.9.8 b
Open SSL version: 0.9.8 b
Attacker 2 and Attacker 3 have the same configuration as
Attacker 2 Attacker1.
Fedora Core 5, 32 bit
Pentium M mobile 1.8 GHz, 512 MB RAM Network Connection :
L1 Cache : 64 KB
L2 Cache: 2 MB All computers were connected through a Linksys Switch
GCC version: 4.1.1 on a 100 Mbps LAN connection.
Open SSL version: 0.9.8 b
B. Overview of the tests
Attacker 3
Fedora Core 5, 32 bit Test 1
Pentium M mobile 1.7 GHz, 512 MB RAM
L1 Cache : 64 KB The first test was to simply familiarize ourselves with
L2 Cache: 2 MB various parts of source code and setting up all the
GCC version: 4.1.1 computers. No information was documented. Profiling and
Open SSL version: 0.9.8 b attacking phases with different packet sizes of 400 bytes,
600 bytes and 800 bytes went on smoothly and information
was collected for less amount of time than specified in
Network Connection : Bernstein’s paper. Correlate program was run and it found
very low number of correlations as expected. So, doing a
All computers were connected through D-Link DI 624 Brute force search was meaningless as it would never
Router on a 100 Mbps LAN connection. finish. The attack was carried for the same amount of time
as specified by Bernstein in [1] and it was found out that
the amount of correlations were really small. The number
of packets that were sent can be determined by checking
the file sizes as explained in [1].
Test 2
suggested in [1], it was known when to end the profiling given by column 2. Column 3 gives all the possible
and attacking phases. For study.400, study.600 and numbers for that particular key byte.
study.800 files, about 2^22 packets were sent for each
packet size. For the attacking phase, about 2^23 packets
were sent. All the information was saved in to the
attack.xxx and study.xxx files. Profiling phase took about 6 Test 4:
days, attacking took about 10 days. This is quite different
than [1], when individual profiling took 11 days. It was This test was done to check if profiling based on a non-
expected that the profiling phase would take 4 days zero key will work in giving correlations. For this purpose,
because the largest amount of time taken for profiling was we had to know how the code written by Dr. Bernstein
for the 800 byte packet in [1] was 4 and all the three actually finds the secret key using the math explained in
packets were being used for profiling in our case. However, Section 2 of this paper. This was accomplished after help
the time required was atleast 2 days more than what was from the analysis given by [2], explained in the background
expected. Also, attacking phase took about 10 days, which about the attack.
is 3 days more than as predicted in [1]. The result had lot For this purpose, the key at the server was setup to be a
of correlations. But they were huge with each key location known key by getting bytes out of the random number
having about 256. Doing a brute force key search on them generator of Linux and then using them to setup the key.
would prove to be useless as it would never finish. Possible Study program was used to find out the timing information
reasons for this were investigated. It was found that openssl as was done for the case for a zero key.
had recently used a mitigation technique for the cache
timing hazard. More details about the technique used to It printed out information as shown in Figure 4.
mitigate the attack have been discussed in the results Information about what the columns mean is clearly given
section. The correlations for this test are given in the Figure in Bernstein’s paper.
2.
Test 3
16 0 d9 db d8 d0 d4 d1 df d3 de d5 d2 da d7 dc d6 dd
70 1 86 8d 85 82 81 8b 8e 88 89 8f 8a 87 83 8c 84 80 44 40 ....
32 2 5f 5b 55 50 51 54 5e 57 5a 59 53 5d 5c 58 56 52 63 66 ....
240 3 87 86 8b 89 84 85 81 8a 80 83 8f 82 8e 8d 88 8c fc fd f6.............
134 4 86 81 8b 8d 87 82 89 8c 83 85 8a 8f 88 80 8e 84 1a ........
32 5 88 8b 86 82 8c 81 8e 80 83 8a 8f 85 8d 87 89 84 f1 f2 fb fd f4 f8 f9 ff f7 fa f0 f3 fe f5 f6 fc
16 6 37 3b 33 32 31 34 3e 38 30 36 3c 3f 3d 3a 39 35
16 7 b1 bd b2 b4 b3 b5 bc bf b7 b8 be ba b9 bb b0 b6
16 8 23 2d 2b 28 25 27 24 2c 20 26 2e 2f 22 2a 29 21
48 9 bd bf b5 bc b6 b0 b8 b1 ba be bb b7 b4 b2 b3 b9 4a 49 4b 40 42 48 47 4c 41 46 4d 43 45 4e ….
16 10 96 91 9f 90 92 93 97 9d 9b 98 9e 9a 9c 94 99 95
16 11 f1 f0 f3 fd fe f8 f2 fa f7 f4 ff fc f9 fb f6 f5
16 12 72 79 70 7a 7f 75 7d 77 73 7c 78 7b 7e 76 71 74
16 13 fc f0 ff f7 fe f9 f4 f2 fa f8 fd f3 f1 fb f6 f5
16 14 0a 0f 05 04 09 01 02 07 06 03 0b 0d 00 0c 0e 08
16 15 82 85 89 8a 87 8e 88 8b 83 84 80 86 8d 8c 81 8f
Study. 400 3.8 days Study.600 4.4 days Study.800 4.8 days
216 70 70 90 90 120 120
219 80 150 100 190 140 260
222 4050 4200 6146 6336 6652 6912
increases, difficulty in obtaining results through Bernstein’s computation itself. All in all, it depends on the architecture
simple scheme increases manifold. So, Bernstein’s method of the CPU. In [7], authors Osvik, Shamir and Eran have
has to be improved in order to get results. This was done by thoroughly discussed about various schemes to prevent
the authors of [3]. these attacks.
With three computers in parallel, the attack took a total Some of these schemes include
of 15 days with the profiling and attacking phases taken in
to account. However, the secret key which the attacker Avoiding Memory Accesses:
intends to know may not stay the same for such a long
time. The policy of SSH or any other protocol using AES In this scheme, the authors suggest that the Table
would usually try to change the key atleast every few hours. Lookups done by AES can be replaced by an alternative
Since the attack takes days to complete, it is really difficult description of the cipher which uses logical operations.
for such an attack to actually succeed. Another approach is to place the tables in registers instead
of cache. Some architectures like 64 bit, Power PC have
Intrusion detection systems have become really enough space in their registers to accomplish this.
sophisticated over the years. Tools like Snort can be used
to alert the administrator about the type of suspicious Alternative Look Up tables :
traffic flowing in to the network. This can result in the
traffic from the attacking system being blocked. So, the In Open SSL’s implementation of AES, look up tables of
attacker should modify this simple attack in a manner that size 1024 byte each are used. Several variants of this table
traffic moves in a stealthy manner on a non suspicious port. can be used, which occupy much lesser space in the cache.
These include, 256 byte tables, loading only one table and
A total of 275 packets were needed for the Bernstein obtaining others by rotation, etc.
attack to actually recover the key successfully. A new breed
of attacks called Cache collision attacks were proposed Data Oblivious Memory Access pattern :
which can recover the key with much less packets [2]. An
expanded final round attack would need only 213 packets as This scheme doesn’t avoid the use of look up tables but
compared to the huge number of packets needed by instead ensures that the pattern of accesses
Bernstein’s attack. to the memory is completely oblivious to the data
passing through the algorithm. More details can be found in
[7].
V. MITIGATION METHODOLOGIES
Cache State Normalization and Process Blocking :
Various methods have been proposed since the original Normalization of cache can be used to prevent
attack was proposed. They improve upon the original paper synchronous attacks. It can be achieved by in lot of ways.
and provide innovative ways to find AES key using the One such way would be to load all the lookup tables in to
same basic principle as Bernstein. It is fortunate that all the cache. It should be ensured that the table elements are
these authors have provided ways to mitigate whatever not evicted by the encryption itself, by accesses to the
attack they have proposed in their respective papers so that stack, inputs or outputs. Ensuring this is a delicate
implementers don’t have to search for ways to counter architecture-dependent affair. However, this method fails
these proposals. All the mitigation methodologies can be to protect against asynchronous attacks.
divided in to 2 broad categories. They are Hardware based
mitigations and Software based mitigations. Dynamic Table Storage:
Brickell, Graunke, Neve, and Seifert (BGNS) combined The cache timing attack described by Bernstein was
some of these identified methods for mitigating against this verified unsuccessfully by attacking using 3 computers in
attack into one process [1]. They proposed to use smaller parallel and on Pentium M architecture. The methodology
tables while frequently randomizing them and preloading adopted in [1] was reused to determine the number of
them in to relevant cache lines. BGNS claimed that this packets required to be sent to extract the key successfully.
was verified experimentally. We tend to agree with them However, they didn’t work owing to various reasons like
because of our results. Test 2 resulted in very small mitigation of the attack in newer version of Open SSL,
correlations due to a newer version of Open SSL, which large cache sizes of newer processors requiring much
had some of these mitigation techniques implemented in it. greater number of packets to be sent to average out the
noise. Math behind profiling using a non-zero key was
discussed and was done for one packet size.
Hardware Mitigations:
Obviously, best hardware mitigation would be to stop Real world feasibility of this attack was discussed and it
was concluded that Bernstein attack in its original form is
using cache altogether. This will result in severe
not feasible in the current real world situation.
performance degradation for all applications and hence is
not a viable option. This area is very new as no one has
verified this type of attack in hardware. However,
Newer and improved versions of this attack were
countermeasures for normal side channel attacks will be a
mentioned in recent papers which will be very useful for
good starting point to use when implementing ciphers in
further advancement of study in this field. Apart from
hardware. In a recent paper by Page [11] , he proposes a
them, several important items that would be of interest to
new cache architecture which partitions cache removing
researchers seeking advancements in this field have been
cache as a shared resource and preventing data to be
mentioned.
forcibly flushed from cache.
This research has brought in to light several
Cryptographic co-processors in another interesting idea,
advancements in the field of side channel cryptanalysis
explained in [1]. However, as mentioned there, not lot of
which will serve as a guide to future work.
information is available in this aspect.
.
REFERENCES
VI. FUTURE WORK
[1] Robert G. Salembier, “Analysis of Cache Timing Attacks against
AES”,Manuscript received May 12, 2006.
Original attack proposed by Bernstein is not a good http://ece.gmu.edu/courses/ECE746/project/F06_Project_resources/Sal
proposition on present architectures and network based embier_Cache_ming_Attack.pdf
[2] Joseph Bonneau and Ilya Mironov, “Cache-Collision Timing
attacks. So, it would be a good idea to extend this attack Attacks Against AES”
and experimentally verify such attacks. The following http://www.stanford.edu/~jbonneau/AES_timing.pdf
items may be of interest to researchers interested in these [3] Michael Neve and Jean-Pierre Seifert and Zhenghong Wang, “Cache
time-behavior analysis on AES”
attacks for future work in this field http://www.cryptologie.be/document/Publications/AsiaCCS_full_06.pdf
ECE 746 Project Report 11
[9] E. English and S. Hamilton, "Network security under siege: the timing
attack," IEEE, Computer, vol. 29, pp. 95--97, March 1996
[10] Michael Neve and Jean-Pierre Seifert and Zheng hong Wang, “A
refined look at Bernstein's AES side-channel analysis”, Fast Abstract
in Proceedings of the 2006 ACM Symposium on Information,
Computer and Communications Security – Asia.
[11] Open SSL toolkit
http://www.openssl.org/
[12] D. Page, “Partitioned Cache Architecture as a side
channel Defence Mechanism.” 2005. Available from the World
Wide Web: <http://eprint.iacr.org/2005/280.pdf>