You are on page 1of 20

AES encryption and decryption on GPU

Presented by : S ROY

In this case study we take up integer stream processing on the GPU. The new GeForce 8 Series GPU, several new extensions and functions have been introduced to GPU programming. New integer processing features include not only the arithmetic operations but also the bitwise logical operations (such as AND and OR) and the right/left shift operations. Array parameters and the new texture-buffer object provide a flexible way of referring to integer-indexed tables.

Contd.
With the new "transform feedback mode," it is now possible to store our results without the need to render to textures or pixel buffers. Several block-cipher modes of operation are also considered here.

New Functions for Integer Stream Processing

Transform Feedback Mode

Features of transform feedback mode

The GL's target parameter is changed to GL_TRANSFORM_FEEDBACK_BUFFER_N V Need to specify the output attributes and whether each of them is output into a separate buffer object or they are all output interleaved into a single buffer object The output buffer must be bound through special new API calls Rasterization can also be optionally disabled

GPU Program Extensions


Two features are used: When declaring a register, we can either specify its type, such as FLOAT or INT, or just leave it typeless we can refer to tables using an integer index, array parameters, or one of the newly introduced texturebuffer objects

Contd.

An Overview of the AES Algorithm

The AES algorithm is currently the standard block-cipher algorithm that has replaced the Data Encryption Standard (DES) A rough summary of the requirements made by NIST for the new AES were the following:

Symmetric-key cipher Block cipher Support for 128-bit block sizes Support for 128-, 192-, and 256-bit key lengths
AES cipher operation algorithm is as:

Contd.

The encryption step uses a key that converts the data into an unreadable ciphertext, and then the decryption step uses the same key to convert the ciphertext back into the original data. This type of key is a symmetric key; other algorithms require a different key for encryption and decryption

Contd.

The precise steps involved in the algorithm In cryptography, algorithms such as AES are called product ciphers For this class of ciphers, encryption is done in rounds, where each round's processing is accomplished using the same logic.

Contd.
these product ciphers, including AES, change the cipher key at each round round keys is determined by a key schedule, which is generated from the cipher key given by the user

The AES Implementation on the GPU


The code given throughout this chapter uses C-style macros and comments to improve readability Head of the AES Cipher Vertex Program

Contd.

Program Parameters for Arguments and Constant Tables

In this application we expand the cipher key using the CPU and store the key schedule in the GPU program-local parameters.

Input/Output and the State

AES encryption operates over a two-dimensional array of bytes, called the state.

During the input step, we slice our data into sequential blocks of 16 bytes and unpack it into 4x4 arrays that we push onto the GPU's registers.
Finally, during the output step, we pack these 4x4 arrays back into sequential blocks of 16 bytes and stream the results back to the transform feedback buffer

Contd.

Initialization
During the initialization stage, we do an AddRoundKey operation, which is an XOR operation on the state by the round key, as determined by the key schedule

Rounds A round for the AES algorithm consists of four operations: the SubBytes operation, the ShiftRows operation, the MixColumns operation, and the previously mentioned AddRoundKey operation

Contd.

The SubBytes Operation


The SubBytes operation substitutes bytes independently, in a black-box fashion, using a nonlinear substitution table called the S-box

Contd.

The ShiftRows Operation


The ShiftRows operation shifts the last three rows of the state cyclically, effectively scrambling row data

The MixColumns Operation The next step is the MixColumns operation, which has the purpose of scrambling the data of each column

Contd.

The AddRoundKey Operation This operation determines the current round key from the key schedule As an optimization we can also combine the MixColumns and AddRoundKey operations into a single subroutine

Performance
Tests were performed on a test machine with the following specifications CPU: Pentium 4, 3 GHz, 2 MB Level 2 cache Memory: 1 GB Video: GeForce 8800 GTS 640 MB System: Linux 2.6, Driver 97.46

Vertex Program vs. Fragment Program

Results were obtained by processing a plaintext of 128 MB filled with random numbers and averaging measurements from ten runs The throughput for the vertex program is 53 MB/sec, whereas for the fragment program, the throughput is 95 MB/sec with a batch size of 1 MB

You might also like