Introduction To Cryptography

CHAPTER 1
INTRODUCTION TO CRYPTOGRAPHY
From the Dawn of Civilization, to the highly networked societies that we live in today communication has always been an integral part of our existence. What started as simple sign-communication centuries ago has evolved into many forms of communication today the Internet being just one such example. Methods of communication today include Radio communication. Telephonic communication. Network communication. Mobile communication.
All these methods and means of communication have played an important role in our lives, but in the past few years, network communication, especially over the Internet, has emerged as one of the most powerful methods of communication with an overwhelming impact on our lives. Such rapid advances in communications technology have also given rise to security threats to individuals and organizations. In the last few years, various measures and services have been developed to counter these threats. All categories of such measures and services, however, have certain fundamental requirements, which include Confidentiality: which is the process of keeping information private and secret so that only the intended recipient is able to understand the information? For example, if Alice has to send a message to Bob, then Bob only (and no other person except for Bob) should be able to read or understand the message. Authentication: This is the process of providing proof of identity of the sender to the recipient so that the recipient can be assured that the person sending the information is who and what he or she claims to be. For example, when Bob receives a message from Alice, then he should be able to establish the identity of Alice and know that the message was indeed sent by Alice.
Nishitha College of Engineering and Technology
Integrity: This is the method to ensure that information is not tampered with during its transit or its storage on the network. Any unauthorized person should not be able to tamper with the information or change the information during transit. For example, when Alice sends a message to Bob, then the contents of the message should not be altered with and should remain the same as what Alice has sent. Non-repudiation: This is the method to ensure that information cannot be disowned. Once the non-repudiation process is in place, the sender cannot deny being the originator of the data. For example, when Alice sends a message to Bob, then she should not be able to deny later that she sent the message.
1.1 Basics of Cryptography

Cryptography is the science of protecting data, which provides means and methods of converting data into unreadable form, so that The data cannot be accessed for unauthorized use. The content of the data frames is hidden. The authenticity of the data can be established. The undetected modification of the data is avoided. The data cannot be disowned by the originator of the message.
Cryptography is one of the technological means to provide security to data being transmitted on information and communications systems. Cryptography is especially useful in the cases of financial and personal data, irrespective of the fact that the data is being transmitted over a medium or is stored on a storage device. It provides a powerful means of verifying the authenticity of data and identifying the culprit, if the confidentiality and integrity of the data is violated. Because of the development of electronic commerce, cryptographic techniques are extremely critical to the development and use of defense information systems and communications net. As already discussed, the messages were first encrypted in ancient Egypt as a result of hieroglyphics. The Egyptians encrypted messages by simply replacing the original picture with another picture. This method of encryption was known as substitution cipher. In this method, each letter of the clear text message was replaced by some other letter, which results in an encrypted message or cipher text.
Nishitha College of Engineering and Technology 2
For example, the message WELCOME TO THE WORLD OF CRYPTOGRAPHY can be encrypted by using substitution cipher as XFMDPNF UP UIF XPSME PG DSZQUPHSBQIZ.
In the preceding example, each letter of the plaintext message has been replaced with the next letter in the alphabet. This type of substitution is also known as Caesar cipher. Caesar cipher is an example of shift cipher because it involves shifting each letter of the plaintext message by some number of spaces to obtain the cipher text. For example, if you shift the letters by 5, you get the following combination of plaintext and cipher text letters:
Plaintext
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.
Cipher text F G H I J K L M N O P Q R S T U V W X Y Z A B C D E.
However, simple substitution ciphers are not a very reliable type and can easily be broken down. In such a case, an alternative way is to use multiple alphabets instead of one alphabet. This type of a cipher, which involves multiple cipher alphabets, is known as a polyalphabetic substitution cipher. An example of the polyalphabetic substitution cipher is the Vigenere cipher. With the recent advances in mathematical techniques, there has acceleration in the development of newer methods of encryption. Today, cryptography has emerged so powerful that it is considered rather impossible to break some ciphers. Cryptography has now become an industry standard for providing information security, trust, controlling access to resources, and electronic transactions. Its use is no longer limited to just securing sensitive military information. In fact, cryptography is now recognized as one of the major components of the security policy of an organization. Before moving further with cryptography, let us first look at a few terms that are commonly associated with cryptography: Plaintext: Is the message that has to be transmitted to the recipient. It is also commonly referred to as clear text.
Encryption: Is the process of changing the content of a message in a manner such that it hides the actual message. Cipher text: Is the output that is generated after encrypting the plain text. Decryption: Is the reverse of encryption and is the process of retrieving the original message from its encrypted form. This process converts cipher text to plaintext. Hash algorithm: Is an algorithm that converts text string into a string of fixed length. Key: Is a word, number, or phrase that is used to encrypt the clear text. In computer Based cryptography, any text, key word or phrase is converted to a very large number by applying a hash algorithm on it. The large number, referred to as a key, is then Used for encryption and decryption. Cipher: Is a hash algorithm that translates plaintext into an intermediate form Called Cipher text, in which the original message is in an unreadable form. Cryptanalysis: Is the science of breaking codes and ciphers. Before looking at the Details of various cryptographic techniques let us now look at the steps involved in the conventional encryption model. A sender wants to send a HELLO message to a recipient. The original message, also called plaintext, is converted to random bits known as cipher text by using a key and an algorithm. The algorithm being used can produce a different output each time it is used, based on the value of the key. The cipher text is transmitted over the transmission medium. At the recipient end, the cipher text is converted back to the original text using the same algorithm and key that was used to encrypt the message.
Figure 1.1 Conventional Encryption Model

1.2 Overview of cryptography

Let us now look at the various cryptography techniques available. For the purpose of classification, the techniques are categorized on the basis of the number of keys that are used. The two main cryptography techniques are Single key cryptography: This cryptography technique is based on a single key. It is also known as symmetric key or private key or secret key encryption.
Public key cryptography: This cryptography technique is based on a Combination of two keys secret key and public key. It is also known as Asymmetric encryption. Let us look at each of these methods in detail.
1.2.1 Single Key Cryptography

The process of encryption and decryption of information by using a single key is known as secret key cryptography or symmetric key cryptography. In symmetric key cryptography, the same key is used to encrypt as well as decrypt the data. The main problem with symmetric key algorithms is that the sender and the receiver have to agree on a common key. A secure channel is also required between the sender and the receiver to exchange the secret key. Heres an example that illustrates the process of single key cryptography. Alice wants to send a For Your Eyes message to Bob and wants to ensure that only Bob now, to read the encrypted message, Bob would need the secret key that has been generated by Alice. Alice can give the secret key to Bob in person or send the key to Bob by any other means available. If Alice sends the key to Bob in person, it could be time-consuming depending on the physical distance between the two of them or other circumstances such as Bobs availability. After Bob receives the secret key, he can decrypt the message to retrieve the original message. The figure1.2 explains the single or secret key cryptogarphy
Figure 1.2: Secret key cryptography
Many secret key algorithms were developed on the basis of the concept of secret key cryptography. The most widely used secret key algorithms include Data Encryption Standard (DES) Triple-DES (3DES) International Data Encryption Algorithm (IDEA) RC4 RC5 CAST-12 RC6 Advanced Encryption Standard (AES).
1.2.2 Public Key Cryptography

The approach called asymmetric cryptography evolved to address the security issues posed by symmetric Cryptography. This method solves the problem of secret key cryptography by using two keys instead of a single key. Asymmetric cryptography uses a pair of keys. In this process, one key is used for encryption, and the other key is used for decryption. This process is known as asymmetric cryptography because both the keys are required to complete the process. These two keys are collectively known as the key pair. In asymmetric cryptography, one of the keys is freely distributable. This key is called the public key and is used for encryption... Hence, this method of encryption is also called public key encryption. The second key is the secret or private key and is used for decryption. The private key is not distributable. This key, like its name suggests, is
private for every communicating entity. In public key cryptography, the data that is encrypted with the public key can only be decrypted with the corresponding private key. Conversely, data encrypted with the private key can only be decrypted with the corresponding public key. Due to this asymmetry, public key cryptography is known as asymmetric cryptography. How the public key works does: Lets see how this works out in practice. Consider an example, where Alice wishes to send an encrypted file to Bob. In this situation, Bob would obtain a key pair, retain the private key, and distribute the public key. Alice, therefore, has a copy of Bobs public key. Alice then encrypts the file using Bobs public key and sends the encrypted file to Bob. Since the key pairs are complementary, only Bobs private key can decrypt this file. If someone else intercepts the file, they will be unable to decrypt the file, because only Bobs private key can be used for the decryption. Figure 1.2.1 explains the process of public key cryptography.
Figure 1.2.1: Public key encryption
This method very clearly indicates that the data you send to a user can only be encrypted by the public key. Similarly, the decryption can be done only by the private key, which is supplied by the recipient of the data. So, there is very little possibility of the data in transit being accessed or tampered by any other person. Need to share a key, as required for symmetric encryption. All communications involve only public keys, and no private key is ever transmitted or shared. The above mechanism also brings out the point that every recipient will have a unique key that he will use to decrypt the data that has been encrypted by its counterpart public key. Diffie and Hellman first discussed the process of asymmetric cryptography.
Many Public key algorithms were developed on the basis of the concept of Public key cryptography. The most widely used Public key algorithms include RSA. ECC.
1.2.3 Combining Techniques: Symmetric and Asymmetric Encryption

The disadvantage of using public key encryption is that it is a slow process because key lengths are large (1024 bits to 4094 bits). When you compare both processes, secret key encryption is significantly faster as the key length is less (40 bits to 256 bits). On the other hand, there is a problem in transferring the key in secret key encryption. Both these techniques can be used together to provide a better method of encryption. This way you can make use of the combined advantages and overcome the disadvantages. The steps in data transaction in a combined technique are: 1. Encrypt your file by using a symmetric encryption. 2. Use asymmetric encryption to encrypt only this key using the recipients public key. Now send the encrypted key to the recipient the recipient, at his end, can now decrypt the key using his/her private key. 3. Next, send the actual encrypted data. The encrypted data can be decrypted using the key that was encrypted by using the public key from the asymmetric key pair.
The combined technique of encryption is used widely. It is basically used for Secure Shell (SSH), which is used to secure communications between a client and the server and PGP (Pretty Good Privacy) for sending messages. Above all, it is the heart of Secure Sockets Layer (SSL), which is used widely by Web browsers and Web servers to maintain a secure communication channel with each other Fig 1.2.2 explains the combined technique of encryption.
Figure 1.2.2: Combined technique of encryption
1.3 Applications of Cryptography

Let us now look at the implementation of cryptography to provide basic security features, which are, Confidentiality, integrity, authentication, and non-repudiation. All these security features can be provided by using any one of the following methods: Message encryption. Message Authentication Code (MAC). Hash functions.
CHAPTER 2
SYMMETRIC CRYPTOGRAPHY ALGORITHMS 2.1 Data Encryption Standards (DES)
DES is a block cipher: It encrypts/decrypts data in 64-bit blocks using a 64-bit key (although effective key length is 56-bit). DES is a symmetric algorithm: The same algorithm and key are used for both encryption and decryption. DES is an iterative cipher: the basic building block (a substitution followed by a permutation) called a round is repeated 16 times. For each DES round, a sub-key is derived from the original key called key schedule. Key schedule for encryption and decryption is the same except for the minor difference in the order (reverse) of the sub-keys for decryption.
A basic algorithm for encrypting / decrypting one block of data .Encryption begins with an initial permutation (IP), which scrambles the 64-bit plain-text in a fixed pattern. The result of the initial permutation is sent to two 32-bit registers, called the right half register and left half register. Those registers hold the two halves of the intermediate results through succeeding 16 iterations. The contents of the right half register are permuted (permutation E) and sent to an exclusive-OR unit along with the sub-key for each iteration. Note that some bits are selected twice, allowing the 32-bit register to expand to 48 bits. The 48-bit output of the exclusive-OR block is divided into eight groups (6-bits each) to address eight substitution memories (S-boxes). A permutation P is applied to 32-bit output from S-boxes and then feed into an exclusive-OR block along with the contents of the left half register. The output of this block is written into temporary register, concluding the first iteration.
At the next clock cycle, the contents of the temporary registers are written into the right half register and previous contents of the right half register are written into left half register. This process repeats through 16 iterations. After the 16 iterations, the right half and left half register contents are subjected to a final permutation IP1, which is the inverse of the initial permutation. The output of IP1 is the 64-bit cipher-text.
10
2.2 International Data Encryption Algorithm

International Data Encryption Algorithm (IDEA) is a block cipher; it operates on 64 bit plaintext blocks and uses 128-bit long input key K. The design philosophy behind this algorithm is mixing operations from different algebraic group. The 64-bit input block is divided into four 16-bit sub-blocks: X1, X2, X3, and X4, which become the input blocks of the first round of the algorithm (figure 1). There are a total of 8 rounds. In each rounds, the four sub-blocks are XORed, Added, and multiplied with one another and with six 16 bit sub-blocks of key material. Between each round the second and the third blocks are swapped. Finally the four sub-blocks are combined with four sub-keys in an output transformation. The following are the basic operation used. Bit-by-bit exclusive-OR of two 16-bit sub-blocks;
16
Addition of integers modulo 2 where the 16-bit sub-block is treated as an unsigned integer.
16
Multiplication of integers modulo 2 + 1 where the 16-bit sub-block is treated as an

16
unsigned integer except that all-zero sub-block is treated as representing 2 .
Key scheduling: The algorithm uses 52 sub-keys (six for each of eight rounds and four more for the output transformation). First, the 128-bit key is divided into eight 16-bit subkeys. These are the first eight sub-keys for the algorithm (the six for the first round, and the first two for the second round). Then the key is rotated 25 bits to the left and again divided into eight sub-keys. The first four are used in round 2, the last four are used in round 3. The key is rotated another 25 bits to the left for the next eight sub-keys, and so on until the end of the algorithm The decryption scheme is the same as encryptions scheme, except it utilizes a different set of sub keys generated from the key of IDEA where Ki denotes encryption sub keys and Ui denote decryption sub keys, where 1 < i < 52.
11
2.3 Advanced Encryption Standard (AES)

AES is a block cipher developed in effort to address threatened key size of Data Encryption Standard (DES).It allows the data length of 128, 192 and 256 bits, and supporting three different key lengths, 128, 192, and 256 bits. AES can be divided into four basic operation blocks where data are treated at either byte or bit level. The byte structure seems to be natural for low profile microprocessor (such as 8-bit CPU and microcontrollers) The array of bytes organized as a 44 matrix is also called "state" and those four basic steps; BytesSub,ShiftRow, Mix columns, and AddRoundKey are also known as layers. These four layer steps describe one round of the AES. The number of rounds is depended on the key length, i.e., 10, 12 and 14 rounds for the key length of 128, 192 and 256 bits respectively. The block diagram of the AES with 128 bit data is shown below:
Figure: 2.3.1 AES Block Diagram
12
Substitute bytes Transformation: This operation is a non-linear byte substitution. It composes of two sub-transformations; multiplicative inverse and affine
transformation. In most implementations, these two sub-steps are combined into a single table lookup called S-Box. Shift Row Transformation: This step is a simple permutation process, operates on individual rows, i.e. each row of the array is rotated by a certain number of byte positions. Mix columns Transformation: This is a substitution step that makes use of arithmetic over GF (28). Column vector is multiplied (in GF (28)) by a fixed matrix where bytes are treated as a polynomial of degree less than 4. AddRoundKey: Each byte of the array is added (respect to GF (2)) to a byte of the corresponding array of round sub keys. Excluding the first and the last round, the AES with 128 bit round key proceeds for nine iterations. Round keys are generated by a procedure called round key expansion or key scheduling. Those sub-keys are derived from the original key by XOR the two previous columns. For columns that are in multiples of four, the process involves round constants addition, S-Box and shift operations.
2.4 RC6
RC6, like RC5, consists of three components: a key expansion algorithm, an encryption algorithm, and a decryption algorithm. The parameterization is shown in the following specification: RC6-w/r/b, where w is the word size, r is the non-negative number of rounds, and b is the byte size of the encryption key. RC6 makes use of datadependent rotations, similar to DES rounds (Rivest et al., 1998a). RC6 is based on seven primitive operations as shown in Table. Normally, there are only six primitive operations (Rivest et al., 1998a); however, the parallel assignment is primitive and an essential operation to RC6. The addition, subtraction, and multiplication operations use twos complement representations (Rivest, 1997). Integer multiplication is used to increase diffusion per round and increase the speed of the cipher (Rivest et al., 1998a).
13
Operation a+b ab a b
Description Integer addition modulo 2w Integer subtraction modulo 2w Bitwise exclusive-or (XOR) of w-bit words Integer multiplication modulo 2w Rotate the w-bit word a to the left given by the least significant (log2 w) bits of b.
axb a <<< b
a >>> b
Rotate the w-bit word a to the right given by the least significant (log2 w) bits of b Table 1: RC6 operations
Diffusion involves propagating bit changes from one block to other blocks. An avalanche effect is where one small change in the plaintext triggers major changes in the cipher text. To speed up the avalanche of change between rounds, a quadratic equation is introduced (Rivest et al., 1998a). By increasing the rate of diffusion, the rotation amounts spoiling sooner is more likely, due to the changes from simple differentials (Rivest et al., 1998a). To achieve the security goals for transformation, the following quadratic equation is used twice within each round: f(x) = x(2x + 1)(mod 2w). The high-order bits of this equation, which depend on all of the bits of x, are used to determine the rotation amount used (Rivest et al., 1998a). In conjunction with the quadratic equation, the (log2 w) bit shift complicates advanced cryptanalytic attacks (Rivest et al., 1998a). Integer multiplication also contributes by making sure that all of the bits of the rotation amounts are dependent on the bits of another register (Rivest et al., 1998a).
14
2.5 MARS
MARS takes as input four 32-bit plaintext data words A, B, C, D and produces four 32-bit cipher text data words A',B', C', D'. The cipher is word-oriented, in that all the internal operations are performed on 32-bit words. MARS is a type-3 Feistel network, divided into three phases: a 16-round cryptographic core phase wrapped with two layers of 8-round forward and backwards mixing The cryptographic core rounds provide strong resistance to all known crypt analytical attacks, while the mixing rounds provide good avalanche and offer very wide security margins to thwart new (yet unknown) attacks. MARS accepts a variable size user-supplied key ranging from 4 to 14 words (i.e., 128 to 448 bits). MARS uses a key expansion procedure to expand the user-supplied key (consisting of n 32-bit words, where n is any number between 4 and 14) into a key array K[ ] of 40 words for the encryption/decryption operation. The MARS cipher uses a variety of operations to provide a combination of high security, high speed, and implementation flexibility. Specifically, it combines exclusiveor (xor), addition, subtractions, multiplications, and both fixed and data-dependent rotations. MARS also uses a single (S-box) table of 512 32-bit words to providegood resistance against linear and differential attacks, as well as good avalanche of data and key bits. This S-box is also used by the key expansion procedure. Sometimes the S-box is viewed as two tables, each of 256 entries, denoted by S0 and S1. In the design of the Sbox, we generated the entries in a pseudo-random fashion and tested that the resulting S-box has good differential and linear properties.
The operations used in the cipher are applied to 32-bit words, which are viewed as unsigned integers. In this pseudo-code we use the following notations. We number the bits in each word from 0 to 31, where bit 0 is the least significant (or lowest) bit, and bit 31 is the most significant (or highest) bit. We denote by c&d a bitwise exclusive-or of the two words c and d. We denote by c+d addition modulo 2^32, by c-d subtraction modulo 2^32, and by cd multiplication modulo 2^32. Also, c<<<d and c>>>d, denote cyclic rotations of the 32-bit word c by d positions to the left and right, respectively. The
15
decryption operation of MARS is the inverse of the encryption operation and the code for decryption is similar to the code for encryption.
The MARS key expansion procedure expands the user-supplied key ranging from 4 to 14 words into a 40-word key for use in the encryption/decryption operation. The key expansion procedure consists of three steps (Figure 3). The first step is linear expansion which expands the original user-supplied key to forty 32-bit words using a simple linear transformation. The second step is S-box based key stirring which stirs the expanded key using seven rounds of a type-1 Feistel network to destroy linear relations in the key. Then a multiplication key-word modifying step examines the key words which are used in the MARS encryption/decryption operation for multiplication and modifies them if needed. In the pseudo-code cd denotes bitwise-and of the two words c and d.
16
CHAPTER 3
OPERATIONS OF SYMMETRIC KEY ALGORITHM 3.1 Modular Addition Two
The addition of two elements in a finite field is achieved by adding the coefficients for the corresponding powers in the polynomials for the two elements. The addition is performed with the XOR operation (denoted by ) i.e., modulo 2 -so that 1 1 = 0, 1 = 1 , and 0 0 = 0. 0
Alternatively, addition of finite field elements can be described as the modulo 2 addition of corresponding bits in the byte. For two bytes {a7a6a5a4a3a2a1a0} and {b7b6b5b4b3b2b1b0}, the sum is {c7c6c5c4c3c2c1c0}, where each ci = ai bi (i.e., c7 = a7 b7, c6 = a6 b6, ...).
For example, the following expressions are equivalent to one another:
Algorithm: for Modular addition Two Require: Binary Polynomials a(z), b(z) with maximum degree m-1. Ensure: c(z)=a(z) + b(z). 1: for i from 0 to M-1 do 2: C[i] A[i] 3: end for 4: Return(c) B[i].
17
3.2 Modular Multiplication 2^8

In the polynomial representation, multiplication in GF (2^8) (denoted by ) corresponds with the multiplication of polynomials modulo an irreducible polynomial of degree 8. A Polynomial is irreducible if its only divisors are one and itself. For the AES algorithm, this irreducible polynomial is
In Prime Field operations modulo means divide it requires more time .so in binary field operation it requires less time with simple addition..
18
3.3 Matrix Multiplication

Four - term polynomials can be defined - with coefficients that are finite field elements as: (1) Which will be denoted as a word in the form [a0, a1, a2, a3 ]. Note that the polynomials in this section behave somewhat differently than the polynomials used in the definition of finite field elements, even though both types of polynomials use the same indeterminate, x. The coefficients in this section are themselves finite field elements, i.e., bytes, instead of bits; also, the multiplication of four-term polynomials uses a different reduction polynomial, defined below. The distinction should always be clear from the context. To illustrate the addition and multiplication operations, let (2) define a second four-term polynomial. Addition is performed by adding the finite field coefficients of like powers of x. This addition corresponds to an XOR operation between the corresponding bytes in each of the words in other words, the XOR of the complete word values. Thus, using the equations of (1) and (2),
(3) Multiplication is achieved in two steps. In the first step, the polynomial product c(x) = a(x) b(x) is algebraically expanded, and like powers are collected
(4)
Where
19
(5)
The result, c(x), does not represent a four-byte word. Therefore, the second step of the multiplication is to reduce c(x) modulo a polynomial of degree 4; the result can be reduced to a polynomial of degree less than 4. For the AES algorithm, this is
4
accomplished with the polynomial x + 1, so that
The modular product of a(x) and b(x), denoted by a(x) b(x), is given by the four-term polynomial d(x), defined as follows:
(6) When a(x) is a fixed polynomial, the operation defined in equation (6) can be written in matrix form as:
(7)
20
Because x +1 is not an irreducible polynomial over GF (2 ), multiplication by a fixed four-term polynomial is not necessarily invertible.
3.4 Fixed Coefficient Multiplier

Multiplying the binary polynomial defined in equation with the polynomial x results in
The result x b(x) is obtained by reducing the above result modulo m(x), as defined in equation (4.1). If b7 = 0, the result is already in reduced form. If b7 = 1, the reduction is accomplished by subtracting (i.e., XORing) the polynomial m(x). It follows that multiplication by x (i.e., {00000010} or {02}) can be implemented at the byte level as a left shift and a subsequent conditional bitwise XOR with {1b}. This operation on bytes is denoted by xtime (). Multiplication by higher powers of x can be implemented by repeated application of xtime (). By adding intermediate results, multiplication by any constant can be implemented. For example {57} {13} = {fe} because
There are many ways to implement a finite field multiplier. An originally proposed one in the AES takes the form of Xtime () which is essentially multiplied by x or left-shift with {1B} feedback. That could imply either a bit-serial or a bit-parallel architecture. Rudra proposed the implementation of Rijndael system with composite field arithmetic. We are
considering a fast
multiplier, simple, small area, and support pipeline architecture (if
needed). Notice of the fix-value multiplications (by {02} or by {03}) leads us to a fixedcoefficient multiplication in GF (2^8) that fulfils our requirements. We are investigating this multiplier... Let Si, c = B(x) be an element to be multiplied. B(x) can also be written in the polynomial form as;
(1) Where b (0,1).
Multiplications used in the Mix Column transformation are {03}.B(x)=(x+1)B(x) and {02}.B(x) = x.B(x). The resulted multiplications are:
(2)
(3)
Implementations of above equations are simple since additions are simply XORs. As an example the circuit to compute xBi is shown in Figure (3.4.1). The implementation of (x + 1) Bi shown in Figure (3.4.2).According to terms given in (2), and an architecture shown in Figure (3.4.2) , the maximum delay time is expected to be that of the a delay unit of a 2-input XOR gate.
22
Figure 3.4.1: A2 Fixed Coefficient Multiplier
Figure 3.4.2: A3 Fixed Coefficient Multiplier
3.5 Mix Columns () Transformation

The Mixcolumn () Transformation operates on the state column-by-column as a fourterm polynomial .The columns are considered as polynomials over GF (2^8) and multiplied modulo x^4 + 1 with a fixed polynomial a(x), given by
3 2
a(x) = {03}x + {01}x + {01}x + {02} . (3.5.1) As described in Sec. 3.3, this can be written as a matrix multiplication. Let s(x) = a(x) s(x):
(1)
As a result of this multiplication, the four bytes in a column are replaced by the following
(2) By Using fixed coefficient multiplier we can implement the mix columns from equation (1) and (2).we can reduce the multiplication. State column by column matrix as shown in the figure 3.5.1 which explains the mixcolumn Transformations and Architecture is shown in the figure 3.5.2
Figure 3.5.1: Mix Column Transform
Figure 3.5.2: MixColumn Transform Architecture
24
3.6 Multiplier X (2X+1) Modulo 2^8

RC6 algorithm requires this operation from converting plain text to cipher text. Multiplying 2x refer in fixed coefficient multiplier. To reduce multiplication.
Let, X = b0 + b1x + b 2 x 2 + b 3 x3 + b4 x4 + b 5 x 5 + b 6 x 6 + b 7 x7 Now,

.
(1)
{02}.B(x) + 1 = ( b7 + 1) + (b0) x + b 1 x 2 + b 2 x3 + b3 x4 + b 4 x 5 + b 5x 6 + b6 x7 (2) x.({02}.x +1} mod 28 = x . ({02}.{x} +1 ) x8 (3)
Eq (3.6.3), operation requires less time to implement RC6 Algorithm.
3.7 Shift Row Transform

In the Shift Rows() transformation, the bytes in the last three rows of the State are cyclically shifted over different numbers of bytes as shown in the figure 3.7.1
Figure 3.7.1 Shift Rows Architecture
25
3.8 Left Logical Shift and Right Logical Shift

In a logical shift, zeros are shifted in to replace the discarded bits. Therefore the logical and arithmetic left-shifts are exactly the same. Logical left-shift inserts value 0 bits into the least significant bit, instead of copying the sign bit, it is ideal for unsigned binary numbers, while the arithmetic right-shift is ideal for signed 2'scomplement binary number. The Figure (3.8.1) which explains the Left Logical Shift
Figure 3.8.1: Left Logical Shift
Logical right-shift inserts value 0 bits into the most significant bit, instead of copying the sign bit, it is ideal for unsigned binary numbers, while the arithmetic right-shift is ideal for signed 2'scomplement binary numbers. The Figure (3.8.2) which explains the Left Logical Shift
Figure 3.8.2: Right Logical Shift
26
3.9 Left Circular Shift and Right Circular Shift

In this operation, the bits are "rotated" as if the left and right ends of the register were joined. The value that is shifted in on the right during a left-shift is whatever value was shifted out on the left .As shown in the figure 3.9.1.
Figure 3.9.1: Left Circular Shift
The value that is shifted in on the left during a right-shift is whatever value was shifted out on the right. As shown in the figure 3.9.2.
Figure 3.9.2: Right Circular Shift
27
CHAPTER 4 DESIGN OF CRYPTOGRAPHY PROCESSOR ARCHITECTURE

4.1 Basic Processor
The microprocessor is a tiny device that functions by fetching and processing data. There are generations of microprocessors that have followed this fetch and process philosophy. The 8085 is one of the earliest and most important processor in terms of development. The 8085 concentrates more on fetching and decoding instructions. It has a 16-bit address bus that can identify 64 Kbytes of memory. The instruction decoding process is as follows; when the MP receives a request it decodes the instruction, which activates the control logic Rd to enable the MP to read from the memory. Then the MP goes to the Memory stack from which it goes to the specific memory location. Here the data required is placed on the data bus and placed on the internal data bus on which it is placed. This internal data bus transfers the data to the Arithmetic logic unit or ALU, which processes the data and then puts it back on the internal data bus to be sent to the O/P port. The 8086 has the same no of pins as the 8085. But it has a lot of functions absent in the 8085. It has a 16-bit, multiplexed address bus and data bus. The bus in the 8086, functions exactly in the same way as in any other Microprocessor. If data has to be written into the memory, then the MP outputs the address on the address bus and issues a WR to the memory and the control signal M/IO=1. If data has to be read from the memory, the MP O/Ps an address from the memory and issues a RD signal along the control bus and accepts the data via the data bus. Now the microprocessor might be fast, but the peripherals connected to the processors are slow. To offset these disadvantages, the MP has one function. This is the Ready I/P.
The 80286 MP is an advanced version of the 8086 MP that is designed for multiuser and multi tasking environments. The 80286 addresses 16 MB of physical memory
and 1 GB of virtual memory by using its memory management system. The 80286 contains a memory manager to optimize memory management. The 80286 does not incorporate internal peripherals, instead it contains something called a memory management unit, also called the address unit. The address is 24-bit wide to accommodate 16 MB of Physical memory. In the real mode the 80286 acts or functions like the 8086, but in the protected mode it addresses 16 MB of memory space. The 82284-clock generator provides the clock in the 80286 and the system bus controller provides the system signals.
The 80386 is a full 32-bit version of the earlier 8086 and 80286 16-bit MP and represents a major advancement in architecture. Along with this larger word size are many improvements and additional features. The 80386 features multi tasking, memory management, virtual memory with or without paging, software protection and a large memory system. All the software written for the 8086 is compatible with the 80386. The physical memory is increased from 1 MB in the 8086 and 16 MB in 80286 to 4 GB in 80386. The 80386 can switch from protected to real mode without resetting the MP. The 80486 is a highly integrated device with a powerful memory management unit. A complete numeric coprocessor that is compatible with the 80387.It has a high-speed level 1 cache (8KB). It is similar to the 80386 in its architecture except for a new concept called burst cycle or burst mode in the retrieval of data.
4.2 CISC Processor

CISC is an acronym for Complex Instruction Set Computer. It is nothing but the 8085, 8086, 80x 86 families of processors discussed in the previous section. They are a range of processors that perform a range of operations. They are complicated and are used in applications where a variety of operations are carried out and time is not a critical factor. They have a large instruction set. The 8085 itself, has an instruction set of 76 instructions. The 8086 and the 80x86 family of processors have a very large instruction set in excess of 200 instructions. These processors have instructions for each and every specific task. As a result the final number of instructions is very large. The main function of any Micro Processor is to fetch the data and process it. The chip as such, doesnt have
29
any provision for memory. So the chip has to depend on peripherals like memory etc. To facilitate this MP is programmed with a large no of instructions that specialize in data or instruction fetching operations. This is one of the main reasons for the large instruction set. They have complicated memory operations to increase the speed of memory accessing. They have concepts like cache memory, inter leaving and burst mode operation that aim at idealizing the memory interactions. Since they are expected to perform a variety of operations they have a variety instruction length formats. They have a large number of addressing modes often ranging from 5-20 and beyond for higher end processors.
4.3 RISC Processor

RISC processor [Reduced Instruction Set Computer], computer arithmetic-logic unit that uses a minimal instruction set, emphasizing the instructions used most often and optimizing them for the fastest possible execution. Software for RISC processors must handle more operations than traditional CISC [Complex Instruction Set Computer] processors, but RISC processors have advantages in applications that benefit from faster instruction execution, such as engineering and graphics workstations and parallelprocessing systems. They are also less costly to design, test, and manufacture. In the mid1990s RISC processors began to be used in personal computers instead of the CISC processors that had been used since the introduction of the micro processors. A RISC (Reduced Instruction Set Computer) uses simple constructs and has a small instruction set compared to CISC Processors. It is basically designed in the following pattern in order to achieve faster executions. This is achieved carrying out most of the operation within the Processor and minimizing the use of frequent operations requiring slower peripherals. The major characteristics of a RISC are: Relatively few instructions. Relatively few addressing modes. Memory access is limited to Load and Store instructions.
30
All operations done within the registers of the CPU. Fixed-length, an easily decodable instruction format. Single cycle instruction execution. Hardwired rather than micro-programmed control. A relatively large number of registers in the processor unit. Use of overlapped register windows to speed-up procedure call and return. Efficient instruction pipeline. Compiler support for efficient translation of high-level language programs into machine language programs. Its architecture simplifies the instruction set and encourages the optimization of register manipulation. Almost all instructions have simple register addressing. An important aspect of the instruction set is that it is easy to decode. Thus the Opcode and Instruction Register fields can be accessed simultaneously. Due to the simplification of the instructions and their format control logic design is very much simplified. While implementing a digital logic design it is convenient to break up the entire architecture of the complete design into individual modules and test them individually for the functionality. After all the modules are found to work correctly they are put together and checked for there working in totality. We follow this approach here in the implementation of the 32-bit RISC processor.
4.4 Need of Cryptography Processor

Implementing public-key cryptosystems on a general-purpose processor (GPP) is flexible because a variety of cryptosystems can be used at runtime. A drawback of a GPP realization is that it generally results in a lower throughput rate and larger power consumption. Considerable effort has been directed towards a fast realization of cryptography algorithms consisting of very large integer operands (up to 4096 bits). For real-time applications, a dedicated hardware implementation is required to speed up the computation of cryptosystems. The traditional approach to develop a digital system was to use a set of interconnected digital integrated circuits like counters, buffers, logic gates and memory. That task required lots of analysis, testing and the need to adapt the design
to the hardwares inherent limitations (speed, response time, power consumption, etc.) which resulted in capped headroom for development. In this processor we are performing various operations of cryptography so we called as cryptography processor.
4.4.1 Instruction Set

For a complete design, it was necessary to create a specific instruction set and its own assembly code with its proper instruction format. The Instructions are classified into two groups. Data Manipulation (Load and Storage). Operations (Arithmetic and Logical). The Logical operations like Shift Left, Shift Right, and Rotate Word Which requires only one Source Register. shown in Type 3.The Arithmetic Operations like addition ,modular functions ,etc to execute these operations we requires two source registers and to store result in destination register. Shown in Type 2.The Load instructions and store instructions requires address from different data sources shown in Type 1. Table 1 describes complete Instruction set. Each Instruction having its own Opcode. As the complete set contains 13 instructions; 4 bits are enough to represent them.
32
Table:1 Instruction Set Of The Developed Processor
Type1
31 29 28 2524 2019 1615 0
Type2
31 29 28 2524 2019 1615 0
Type3
31
29 28
2524
2019
1615
33
4.5 Architectural Design of 32 bit Cryptography Processor

The architecture of a 32 bit processor is shown in Figure 4.5. The processor is designed with load/store architecture. Separate memory for instructions (program) and data Different stages of the pipeline perform simultaneous accesses to memory. This Harvard style of architecture can either be used with two completely different memory spaces, a single dual-port memory space with separate data and instruction.. Three stages of pipelining have been incorporated in the design which increases the speed of operation. The processor presented instruction set and uses a Single Instruction Single Data (SISD) execution order. Its main characteristics are: Sixteen 32-bit general purpose registers. ALU with basic arithmetic and logical operations.
Figure 4.5: Cryptographic Processor
34
4.6 Designing All Modules of Cryptography Architecture

All modules of the cryptography architecture are captured using Verilog HDL in behavioral modeling and they are structurally designed to capture its overall architecture. The various modules of cryptography processor architecture consist of
Control and Decoder

General Purpose Register
Instruction Register
Program Counter
Multiplexer (2:1) Multiplexer A (16:1) Memory Multiplexer D (16:1)
Arithmetic logical unit (ALU)

The functionality of each and every module including top module of cryptography processor is verified using Active HDL simulator and synthesis is done using Xilinx ISE 10.1 Tool. This section describes design of all modules of Cryptography processor and their results are captured in a snapshot.
4.6.1 Control and Decoder

The control unit design is based on using FSM (Finite State Machine) and we designed it in a way that allows each state to run at one clock cycle, the first state is the reset which is initializes the CPU internal registers and variables. The machine goes to the set state by enabling the reset signal for a certain number of clocks. Following the reset state would-be the instruction fetching and decoding states which will enable the appropriate signals for reading instruction data from the ROM then decoding the parts of the instruction. The decoding state will also select the next state depending on the instruction, since every instruction has its own set of states, the control unit will jump to the correct state based on the instruction given. After all states of a running instruction are finished, the last one will return to the fetch state which will allow us to process the next instruction in the Program. The Figure 4.6.2 shows the state diagram for the control
unit. The figure 4.6.1 explains the Block diagram of Control and Decode and simulation results are shown in figure 4.6.3
Figure 4.6.1 Block diagram of control and decoder
Figure 4.6.2 state diagram for the control unit
36
Fig 4.6.3: Simulation Results Of Control and Decode
Figure 4.6.4: Top Block Of Control and Decoder

Synthesis Report of Control and Decoder

/******************************* Final Report ******************************/ RTL Top Level Output File Name : ctrlanddecode.ngr Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs Cell Usage: # # # # # # # # BELS GND LUT2 LUT3 LUT4 IO Buffers IBUF OBUF : 20 : : : : : : : 1 4 9 6 41 20 21 : ctrlanddecode : NGC : Speed : NO : 41
Device utilization summary Number of Slices Number of 4 input LUTs Number of IOs Number of bonded IOBs
: Selected Device: 4vlx15sf363-12 : : : : 11 out of 6144 19 out of 12288 41 41 out of 240 17% 0% 0%
Timing Summary Minimum period Minimum input arrival time before clock
: : :
Speed Grade: -12 No path found No path found
Maximum output required time after clock : No path found Maximum combinational path delay : 5.694ns
38
4.6.2 General Purpose Register

General Purpose Register (GPRs) is used to store and save operands, results during program execution. ALU and memories must be able to write/read those registers, so a set of Sixteen 32-bit registers were used, along with multiplexers and control& decoder which register is read or written. These two registers are the Operands to ALU which performs the operation. The figure 4.6.5 explains the block diagram of General Purpose registers and simulation results are shown in figure 4.6.6.
Figure 4.6.5 Block Diagram of General Purpose Register
39
Figure 4.6.6: Simulated Timing diagram of General Purpose Registers
Figure 4.6.7: Top Block of General Purpose Registers

Synthesis Report of General Purpose Registers

/************************* Final Report of Register Block *******************/ RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs : RegBlock.ngr : RegBlock : NGC : Speed : NO : 550
Cell Usage : # BELS # LUT4 # FlipFlops/Latches # Clock Buffers # IO Buffers # # IBUF OBUF : : : : : : 18 16 512 1 549 37
: 512
Device utilization summary Number of Slices Number of 4 input LUTs Number of IOs Number of bonded IOBs IOB Flip Flops Number of GCLKs
: : : : : : :
Selected Device: 4vlx15sf363-12 9 out of 6144 18 out of 12288 550 550 out of 240 229% (*) 512 1 out of 32 3% 0% 0%
Speed Grade: -12
: No path found : 2.722ns
Maximum output required time after clock : 3.753ns

Maximum combinational path delay
: No path found
4.6.3 Instruction Register

Instruction registers store the instruction which read from the program memory, and keep it as output for the decoder, which separates the operation code, Source Registers, Operand address and operands and these values will set to General purpose registers, Multiplexers and ALU to execute the command. This achieved simply using buffers to translate data to/from the Processor. The figure 4.6.8 explains the block diagram of Instruction Register and the simulation results are shown in the figure 4.6.9.
Fig 4.6.8 Block Diagram of Instruction Register
Figure 4.6.9: Simulated Timing diagram of Instruction Register

Figure 4.6.10: Top Block of Instruction Register
Synthesis Report of Instruction Register

/********************* Final Report of Instruction Register***************/
RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics Cell Usage: # IOs
: InstructionRegister.ngr : InstructionRegister : NGC : Speed : NO : 67
# BELS # INV
: :
1 1
43
# FlipFlops/Latches # FDCE
: : : : : : : :
32 32 1 1 66 34 32 32
# Clock Buffers # BUFGP
# IO Buffers # # # IBUF OBUF FDCE
Device utilization summary Number of Slices Number of Slice Flip-flops Number of 4 input LUTs Number of IOs Number of bonded IOBs IOB Flip Flops Number of GCLKs
: : : : : : : :
Selected Device: 4vlx15sf363-12 1 out of 6144 32 out of 12288 1 out of 12288 67 67 out of 240 32 1 out of 32 3% 27% 0% 0% 0%
Timing Summary Minimum period Minimum input arrival time before clock Maximum output required time after clock Maximum combinational path delay
: Speed Grade: -12 : No path found : 1.849ns : 3.793ns : No path found
4.6.4 Program Counter

The program counter produces the address to fetch instructions from the program memory. It has to be capable of loading a random address if the program requires so (i.e. loops or branches), and should be able to wait while the other functional parts complete their tasks (i.e. while the ALU gets the sum of 2 registers).The Figure 4.6.11 explains the Block Diagram of Program Counter and simulation result is shown in figure 4.6.12.
Figure 4.6.11 Block Diagram of Program Counter
Figure 4.6.12: Simulated Timing diagram of Program counter
Figure 4.6.13 Top Block of Program Counter
45
Synthesis Report of Program Counter

/****************** ** Final Report of Program Counter ********************/ RTL Top Level Output File Name : ProgramCounter.ngr Cell Usage: # # # # # # # # BELS GND LUT3 FlipFlops/Latches Clock Buffers IO Buffers IBUF OBUF : : : : : : : : 50 1 16 16 1 34 18 16 Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs : Program Counter : NGC : Speed : : NO 35
Device utilization summary: Number of Slices
Selected Device: 4vlx15sf363-12 : : : : : : 8 out of 6144 16 out of 12288 18 out of 12288 35 35 out of 1 out of 240 32 14% 3% 0% 0% 0%
Number of Slice Flip Flops Number of 4 input LUTs Number of IOs Number of bonded IOBs Number of GCLKs
Timing Summary Minimum period
Speed Grade: -12
: 1.912ns (Maximum Frequency:523.067MHz)
Minimum input arrival time before clock : 2.621ns

46
Maximum output required time after clock : 3.806ns Maximum combinational path delay : No path found
4.6.5 Multiplexer (2:1)

The main function of multiplexer is to receive the two inputs and produces the single output either program counter address are opcode address based on the selection lines. The figure 4.6.14 explains the block diagram of multiplexer (2:1) and simulation results is shown in figure 4.6.15
Figure 4.6.14 Block Diagram of Multiplexer (2:1)
Figure 4.6.15: Simulated Timing diagram of Multiplexer (2:1)
47
Fig 4.6.16: Top Block of Multiplexer (2:1)
Synthesis Report of Multiplexer (Mux)

/****************** ** Final Report of Mux ********************/ RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs : : : : : : Mux.ngr Mux NGC Speed NO 49
Cell Usage : # BELS # LUT3 # IO Buffers : 16 : 16 : 49

48
# IBUF # OBUF
: 33 : 16 : Selected Device : 4vlx15sf363-12 : : : : 9 out of 6144 16 out of 12288 49 49 out of 240 20% 0% 0%
Device utilization summary Number of Slices
Number of 4 input LUTs Number of IOs Number of bonded IOB
4.6.6 Multiplexer A (16:1) (Mux A)
The main function of mux A is to receive the multiple inputs and produces the single output which acts as Alu operand based on the selection lines. Here the inputs are 16 with 32 bit.The figure 4.6.17 explains the block diagram of mux A and Simulation results are shown in figure 4.6.18.
Figure 4.6.17: Block Diagram of Mux A (16:1)
49
Figure 4.6.18 : Simulated Timing diagram of MuxA(16:1)
Figure 4.6.19: Top Block of MuxA (16:1)
50
Synthesis Report of Mux A

/********************* Final Report of Module MuxA (16:1) ****************/ RTL Top Level Output File Name : MuxA1.ngr Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs : MuxA1 : NGC : Speed : NO : 548
Cell Usage:
BELS LUT3 IO Buffers IBUF OBUF
: 480 : 256 : 548 : 516 : 32
# # # #
: Selected Device: 4vlx15sf363-12 : : : : 128 out of 6144 256 out of 12288 548 548 out of 240 228% (*) 2% 2%
Number of 4 input LUTs Number of IOs Number of bonded IOBs
: Speed Grade: -12 : No path found : No path found : No path found : 6.894ns
Minimum input arrival time before clock Maximum output required time after clock
Maximum combinational path delay
51
4.6.7 Memory
The processor is designed with load/store architecture. Separate memory for instructions (program) and data Different stages of the pipeline perform simultaneous accesses to memory. This Harvard style of architecture can either be used with two completely different memory spaces, this architecture is a single dual-port memory space with separate data and instruction. Three stages of pipelining have been incorporated in the design which increases the speed of operation. The processor presented instruction set and uses a Single Instruction Single Data (SISD) execution order. Its main characteristics are: Sixteen 32-bit general purpose registers. ALU with basic arithmetic and logical operations. ROM Program Memory: The program memory as its name describes- stores instructions to be executed. It has to be non-volatile and fast. It was decided to use internal ROM as program memory, because it was the fastest option and eliminated the need for external storage. RAM Data Memory the RAM memory is a data storage block, there the stack is handled and other data are kept as variables. The figure 4.6.20 explains the Block diagram of Memory and simulation results are Shown in figure 4.6.21.
Figure 4.6.20 Block Diagram of Memory
52
Fig 4.6.21: Simulated Timing diagram of Memory
Figure 4.6.22 Top Block of Memory
Synthesis Report of Memory

/**************************Final Report of Module Memory*************/ RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal : Memory.ngr : Memory : NGC : Speed
53
Keep Hierarchy Design Statistics Cell Usage: # IOs
: NO : 50
# # # #
BELS LUT2 LUT3 LUT4
: : : : : : : : :
499 3 256 16 512 16 16 38 6 : : : : : : : Selected Device: 4vlx15sf363-12 422 out of 6144 512 out of 12288 275 out of 12288 50 38 out of 240 16 out of 32 15% 50% 6% 4% 2%
# FlipFlops/Latches # Clock Buffers # BUFG
# IO Buffers # IBUF
Device utilization summary Number of Slices Number of Slice Flip Flops Number of 4 input LUTs Number of IOs Number of bonded IOBs Number of GCLKs
: Speed Grade: -12 : No path found : 3.090ns
Maximum output required time after clock : 5.353ns Maximum combinational path delay : 6.940ns
54
4.6.8 Multiplexer (16:1) D (Mux D)

The main function of MuxD is to receive the multiple inputs and produces the single output based on the selection lines. The output is the second ALU operand .Here the inpus are 16. The figure 4.6.23 explains the block diagram of Mux D and simulation results are shown in figure 4.6.25.
Figure 4.6.23: Block Diagram of MuxD
Fig 4.6.24: Top Block of MuxD
55
Figure 4.6.25: Simulated Timing diagram of MuxD
Synthesis Report of Mux D

/******************************Final Report of Module MuxD ******************/. RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs Cell Usage: # # # # # BELS LUT3 MUXF5 MUXF6 MUXF7 : : : : : 480 256 128 64 32
56
: MuxD1.ngr : MuxD1 : NGC : Speed : NO
: 548
# # #
IO Buffers IBUF OBUF
: 548 : 516 : 32
Device utilization summary : Selected Device: 4vlx15sf363-12 Number of Slices Number of 4 input LUTs Number of IOs Number of bonded IOBs : 128 out of 6144 : 256 out of 12288 : 548 : 548 out of 240 228% (*) : Speed Grade: -12 : No path found 2% 2%
Minimum input arrival time before clock : No path found Maximum output required time after clock : No path found Maximum combinational path delay : 6.894ns
4.6.9 Arithmetic logical unit (ALU)

The Arithmetic-Logic Unit has 12 operations; each one of them was created and converted into a symbol, then, a multiplexor was placed in order to obtain a 4 bit selector. The ALU design comprises of 2 units. One unit is meant for logic operation and the other unit is meant for arithmetic operations shown in Table 2
Figure 4.6.26: Block Diagram of ALU
57
Figure 4.6.27: Simulated Timing diagram of ALU The Figure 4.6.26 explains the Block Diagram of ALU and Simulation result is shown in figure 4.6.27.
58
Figure 4.6.28: Top Module for ALU
Synthesis Report of ALU

/**********************Final Report of Module ALU *************************** / RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs : ALU.ngr : ALU : NGC : Speed : NO : 102
Cell Usage: # # # # BELS GND INV LUT2 : : : : 722 1 1 50

59
# # # # # # # # # # # # # # #
LUT3 LUT4 MUXCY MUXF5 VCC XORCY Flip-flops/Latches FDC Clock Buffers BUFGP IO Buffers IBUF OBUFT DSPs DSP48
: : :
76 12 38
: 112 : : : : : : : : : : : 1 31 64 64 1 1 101 69 32 22 22
Device utilization summary : Selected Device: 4vlx15sf363-12 Number of Slices Number of 4 input LU Number of IOs Number of bonded IO IOB Flip Flops Number of GCLKs Number of DSP48s : : : : : : : 296 out of 6144 539 out of 12288 102 102 out of 64 1 out of 22 out of 32 32 3% 68% 240 42% 4% 4%
: Speed Grade: -12 : No path found : 10.531ns
Maximum output required time after clock : 3.793ns Maximum combinational path delay : No path found
60
4.7 Design Summary of various Modules
Area is considered in terms of Number of LUTs since the processor is designed on Programmable SOC Spartan 3 Board and its Operating Frequency is in terms of MHz. Item Control&Decoder General Purpose Register Instruction Register Program Counter Memory ALU Area(No.of 4-I/P LUTs) 539 out of 12288 18 out of 12288 1 out of 12288 16 out of 12288 275 out of 12288 539 out of 12288 Table 4.1: summary of various modules Operating Frequence(MHz) 95MHZ 367MHZ 540MHZ 381MHZ 323MHZ 94MHZ
61
CHAPTER 5
The Cryptographic processor performs the tasks of instruction fetch, instruction decode, execute all in one clock cycle. First the PC value is used as an address to index the instruction memory which supplies a 32-bit value of the next instruction to be executed. This instruction is then divided into the different fields. The instructions opcode field bits [31-26] are sent to a control unit to determine the type of instruction to execute. The type of instruction then determines which control signals are to be asserted and what function the ALU is to perform, thus decoding the instruction. The instruction register address fields rs bits [25 - 21], rt bits [20 - 16], and rd bits [15-11] are used to address the register file. The register file supports two independent register reads and one register write in one clock cycle. The register file reads in the requested addresses and outputs the data values contained in these registers. These data values can then be operated on by the ALU whose operation is determined by the control unit to either compute a memory address (e.g. load or store), compute an arithmetic result (e.g. add, sub ), or perform a compare (e.g. branch). If the instruction decoded is arithmetic, the ALU result must be written to a register. If the instruction decoded is a load or a store, the ALU result is then used to address the data memory. The final step writes the ALU result or memory value back to the register file.
Once the Cryptographic processor verilog implementation is completed, our next task is to pipeline the Cryptographic processor. Pipelining, a standard feature in RISC processors, is a technique used to improve both clock speed and overall performance. Pipelining allows a processor to work on different steps of the instruction at the same time, thus more instructions can be executed in a shorter period of time. For example in the verilog single-cycle implementation, the data path is divided into
different modules, where each module must wait for the previous one to finish before it can execute, thereby completing one instruction in one long clock cycle. When the processor is pipelined, during a single clock cycle each one of those modules or stages is
62
in use at exactly the same time executing on different instructions in parallel.The Block Diagram is shown in the figure 5.1 and Simulation Results are shown in figure 5.3
Figure 5.1: Block Diagram of Top Module
Figure 5.2 Top Block Of Top Module
63
Figure 5.3
Simulated Timing diagram of Top Module
64
Synthesis Report of Cryptographic processor

/**************** Final Results RTL Top Level Output File Name Top Level Output File Name Output Format Optimization Goal Keep Hierarchy Design Statistics # IOs : TopModule.ngr : : TopModule NGC Final Report Of Cryptographic Processor ***************/
: Speed : NO
: 54
Cell Usage : # # # BELS FlipFlops/Latches FDC : : : : : : : : : : 1953 593 49 2 2 54 4 18 32 22
# Clock Buffers # BUFG
# IO Buffers # # # # IBUF OBUF OBUFT DSPs
: :
Selected Device : 4vlx15sf363-12 843 out of 6144 593 out of 12288 1315 out of 12288 54 13% 4% 10%
Number of Slice Flip Flops : Number of 4 input LUTs Number of IOs : :
65
Number of bonded IOBs Number of GCLKs Number of DSP48s
: : :
54 out of 2 out of 22 out of
240 32 32
22% 6% 68%
Speed Grade: -12
: 10.792ns (MaximumFrequency: 92.659MHz)
Minimum input arrival time before clock : 15.683ns Maximum output required time after clock : 5.068ns Maximum combinational path delay Total memory usage Total equivalent gate count for design : 6.509ns : 279800 kilobytes. : 14,518gates.
Design Summary of TOP Module Cryptography Processor

Item Cryptography processor (Top Module) Table 5.1: summary of top module Area(No.of 4-I/P LUTs) 1315 out of 12288 Operating Frequence(MHz) 197MHZ
66
CHAPTER 6 CONCLUSION AND FUTURE WORK
The 32 bit cryptographic Processor perform mathematical computations used in Symmetric Key Algorithms has been designed using verilog the simulations are done with Active HDL simulator. The design is verified through exhaustive simulations. Thus processor architecture follows that one instruction executes in one clock cycle. The cryptographic processor concept proved that 20% of instruction did 80% of the work. By this we increase overall performance of the speed with low area and low propagation delay. Future Work In order to obtain a more sophisticated architecture it is necessary to add some advanced pipelining techniques .This processor can also perform floating point operations and differential equations. Apart from this it can be used in portable gaming kits, Smart cards, ATMs.
67
CHAPTER 7 REFRENCES
[1] Crypto Aware Instruction RISC Processor Nima Karimpour Darav Reza Ebrahimi Atani,Erfan Aghei,Ahmad Tamsivand ,Mahsa Rahmani and Mina Moazam IEEE-2012 [2] Antonio H. Zavala RISC Based Architecture for Computer Hardware Introduction Edicin,, 2011 IEEE.
[3] NIST, "Advanced Encryption Standard (AES), (FIPPUB 197)", November 26, 2001, http://csrc.nist.gov/publications/. [4] A. Rudra et. al., "Efficient Implementation of Rijndael Encryption with Composite Field Arithmetic", Proc.CHES2001, LNCS Vol. 2162, pp.175-188, 2001.
[5] Rohit Sharma, Vivek Kumar Sehgal, Nitin Nitin1, Pranav Bhasker, Ishita Verma , 2009, Design And Implementation Of 64-Bit RISC Processor Using Modeling And Simulation, pp. 568 573. [6] R. Uma / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 2, Mar-Apr 2012, pp.053-058. [7] IEEE TRANSACTIONS on very large scale integration (VLSI) systems, vol. 18, No 8, August 2010 1145 A High-Performance Unified-Field Reconfigurable Cryptographic Processor Jun-Hong Chen, Ming-Der Shieh, Member, IEEE, and Wen-Ching Lin. [8] FPGA Implementations of the RC6 Block Cipher Jean-Luc Beuchat Laboratoire de lInformatique du arallelisme, Ecole Normale Superieure de Lyon,46, Allee dItalie, F 69364 Lyon Cedex 07,Jean-Luc.Beuchat@ens-lyon.fr. [9] Imyong lee, Dongwook Lee, Kiyoung choi ODALRISC: A Small, Low power and Configurable 32-bit RISC processor International SOC design conference 2008. [10] Wa yne Wolf, FPGA Based System Design , Prentice Hall, 2005.
Computer
[11] R. Razdan and M.D. Smith, A High-Performance Micro architecture with Hardware-Programmable Functional Units,Proc. Micro-27, IEEE Computer Society, 1994, pp. 172-180. [12] Vincent t P. Heuring, and Ha rry F. Jordan, Computer Systems Design and Architecture, 2nd Edition, 2003. [13] The Practical XILINX Designer Lab Book, Dave Van den Bout, ISBN 0-13095502-7, p 30-31. [14] XILINX datasheet library, http:// www.xilinx.com/ part info/4000.pdf
[15] Evaluation of a reconfigurable computing engine for digital communication Applications, Jonas Thor, ISSN 1402-1617, p 12-17.
[16] A 32-b RISC Implemented in Enhancement-Mode JFET Ga As Rasset, T.L. ;Niederland, R.A.;Lane,J.H,Geideman,W.A.;McDonnellDouglas Astronautics Company, Huntington Beach, CA 92647 Date of Current Version: 27 March 2009
[17] A 32-b RISC/DSP microprocessor with reduced complexity Dolle, M. Jhand, S. Lehner, W. Muller, O. Schlett, M. Hyperstone Electron., Konstanz Date of Current Version: 06 August 2002 [18] VHDL-based development of a 32-b pipelined RISC processor for educational Purposes Buhler, M. Baitinger, and U.G. Stuttgart Univ. Date of Current Version: 06 August 2002
69

Introduction To Cryptography

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Cryptography

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Nishitha College of Engineering and Technology

1.1 Basics of Cryptography

Figure 1.1 Conventional Encryption Model

1.2 Overview of cryptography

1.2.1 Single Key Cryptography

Nishitha College of Engineering and Technology

Figure 1.2: Secret key cryptography

1.2.2 Public Key Cryptography

Figure 1.2.1: Public key encryption

1.2.3 Combining Techniques: Symmetric and Asymmetric Encryption

Nishitha College of Engineering and Technology

Figure 1.2.2: Combined technique of encryption

1.3 Applications of Cryptography

Nishitha College of Engineering and Technology

Nishitha College of Engineering and Technology

2.2 International Data Encryption Algorithm

Multiplication of integers modulo 2 + 1 where the 16-bit sub-block is treated as an

unsigned integer except that all-zero sub-block is treated as representing 2 .

Nishitha College of Engineering and Technology

2.3 Advanced Encryption Standard (AES)

Figure: 2.3.1 AES Block Diagram

Nishitha College of Engineering and Technology

Nishitha College of Engineering and Technology

Nishitha College of Engineering and Technology

Nishitha College of Engineering and Technology

Nishitha College of Engineering and Technology

For example, the following expressions are equivalent to one another:

Nishitha College of Engineering and Technology

3.2 Modular Multiplication 2^8

Nishitha College of Engineering and Technology

3.3 Matrix Multiplication

Nishitha College of Engineering and Technology

accomplished with the polynomial x + 1, so that

Nishitha College of Engineering and Technology

3.4 Fixed Coefficient Multiplier

multiplier, simple, small area, and support pipeline architecture (if

(1) Where b (0,1).

Nishitha College of Engineering and Technology

Figure 3.4.1: A2 Fixed Coefficient Multiplier

Figure 3.4.2: A3 Fixed Coefficient Multiplier

3.5 Mix Columns () Transformation

Figure 3.5.1: Mix Column Transform

Figure 3.5.2: MixColumn Transform Architecture

Nishitha College of Engineering and Technology

3.6 Multiplier X (2X+1) Modulo 2^8

Let, X = b0 + b1x + b 2 x 2 + b 3 x3 + b4 x4 + b 5 x 5 + b 6 x 6 + b 7 x7 Now,

{02}.B(x) + 1 = ( b7 + 1) + (b0) x + b 1 x 2 + b 2 x3 + b3 x4 + b 4 x 5 + b 5x 6 + b6 x7 (2) x.({02}.x +1} mod 28 = x . ({02}.{x} +1 ) x8 (3)

Eq (3.6.3), operation requires less time to implement RC6 Algorithm.

3.7 Shift Row Transform

Figure 3.7.1 Shift Rows Architecture

Nishitha College of Engineering and Technology

3.8 Left Logical Shift and Right Logical Shift

Figure 3.8.1: Left Logical Shift

Figure 3.8.2: Right Logical Shift

Nishitha College of Engineering and Technology

3.9 Left Circular Shift and Right Circular Shift

Figure 3.9.1: Left Circular Shift

Figure 3.9.2: Right Circular Shift

Nishitha College of Engineering and Technology

CHAPTER 4 DESIGN OF CRYPTOGRAPHY PROCESSOR ARCHITECTURE

4.2 CISC Processor

Nishitha College of Engineering and Technology