You are on page 1of 57

Cipher EX V1.

2 - CodeProject Page 1 of 57

Articles » Languages » C# » General

Cipher EX V1.2
John Underhill, 25 Dec 2014 CPOL
   4.94 (28 votes)

Twofish 512, Serpent 512, Rijndael 512, the HX series, and


Super-Ciphers

Download CEX_Free.zip - 1.7 MB

Introduction
What follows is the product of my study of several encryption
algorithms. I decided to write this library, out of a desire to
learn more about them, and encryption in general. I have in
the past adapted classes from popular libraries like Mono and
Bouncy Castle, but this time I wanted to write my own
implementations, ones that were optimized for the C#
language, and possibly faster, and more flexible than these
popular C# versions. As I was writing the base classes, I also
began thinking about various attack vectors, and how they
might be mitigated, and also how the existing primitives
might be improved upon from a security perspective.

It is important to note, that using the base ciphers with


their original key sizes, output from those classes will be
exactly the same as any other valid implementation of that
cipher; RDX (Rijndael) with a 256 bit key is Rijndael, as TFX
(Twofish) with a standard key size is Twofish, and SPX
(Serpent) is a valid Serpent implementation. This is proven.
The Tests section contains the most complete and
authoritative test suites available for each of these ciphers. So
if you choose to stick with standard key lengths, you can use
configurations that have been thoroughly cryptanalyzed.

One has to consider that these ciphers were designed more


than 17 years ago; at the time, Windows 95 was the
predominant operating system, and computer hardware was

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 2 of 57

quite primitive by today's standards. So, concessions had to


be made in cipher design in regards to speed and memory
footprint. We are not so constrained with the hardware of
today, so adding rounds to a cipher, or using a larger key size
is less a consideration now, and will have even less impact in
the future.

Speed remains an important design criterion with this project.


The CTR mode and the decryption function of the CBC mode
have been parallelized. If a block size of 1024 bytes is passed
to the mode, and the hardware utilizes multiple processor
cores, the processing is automatically parallelized. On my
middle tier quad-core Acer, I have reached speeds of over 6
gigabytes per minute with this library using Rijndael, making
this by far the fastest implementation of these ciphers in the
C# language that I have found.

I definitely have some strong reservations about publishing


this code, not the least of which is that it is likely to spawn a
number of so called 'AES 512' copies by people who may
not understand enough about the algorithms to
evaluate, produce, or maintain a secure encryption software.
It's kind of a quandary though, if I leave it on Github, no one
will ever see it, if I publish it, chances are it could be used
irresponsibly.. I would urge anyone considering using one of
the extended algorithms to study the work, thoroughly
evaluate the implementations, and make an informed choice.

As for my part, I wrote these implementations based on well-


known versions, and made as few changes to the ciphers as
possible to extend the key size. I have confidence in the
library itself, because I took care to test it every step of the
way, and feel I am developing a good understanding of the
cryptographic primitives used in its construction. This should
however, be considered as what it was intended to be;
experiments..

I hope to expand this library in the future, as I continue my


exploration of encryption technologies, and I welcome input
from cryptographers and programmers. If you have a
comment or concern, I'd be glad to hear from you. My goals
include moving what I feel are the best and strongest
implementations to a Java library.

Before downloading the source files, it is your responsibility to


check if these extended key lengths (512 bit and higher) are
legal in your country. If you use this code, please do so
responsibly and in accordance to law in your region.

For a full featured implementation of these algorithms,


including key management, authentication controls, anti-
tampering measures, encrypted assets, a very cool interface,

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 3 of 57

and many more features; check out the version on my


website: CEX on vtdev.com.

Library Components
The library contains the following components, as it evolves,
some will be added, some removed, and when possible,
changes will be made to improve upon performance, security
and documentation.

Encryption Engines

Base Algorithms

The three base ciphers; Rijndael, Serpent, and Twofish have


all underwent thorough testing to ensure that they align with
valid implementations that use a smaller maximum key size.
The same algorithms are used to transform data at any key
size; only the key schedule itself has been extended, (a key
schedule takes a small user key, and expands it into a larger
working array, used in the rounds function to create a unique
output). These changes to the key schedule, and a flexible
rounds assignment, increase the potential security of the
cipher, make it more difficult to cryptanalyze, and more
resistant to brute force attacks.

• RDX: (Rijndael) This is an implementation of the


Rijndael algorithm used in AES, extended to a 512 bit
key.
• SPX: (Serpent) An implementation of the Serpent
encryption algorithm, extended to accept a 512 bit key.
• TFX: (Twofish) An implementation of the Twofish
encryption algorithm, extended to accept a 512 bit key.

HX Ciphers: Hash based Key Schedules

The HX Series Ciphers use the identical encryption and


decryption algorithms (transforms), of the standard ciphers,
the difference is that the key schedule has been replaced by a
Hash based Key Derivation Function (HKDF). The HKDF is
powered by an SHA512 HMAC, and is one of the most
cryptographically strong methods available to generate
pseudo-random output. The HKDF based key schedule takes a
minimum 192 bytes (1536 bits) of input as a user key, and
uses that keying material to generate a cryptographically
strong working key array.

There are several advantages to using a hash based KDF; the


stronger working keys are less susceptible to attack vectors
that leverage weak or related keys. The larger user key size,

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 4 of 57

(256-1)
also makes brute force attacks practically impossible; 2
(1536-1)
compared to a minimum of 2 iterations. Another
advantage of the HX ciphers is that the number of diffusion
rounds, (transformation cycles within the rounds function), are
configurable independent of the initial key size.

• RHX: Minimum 1536 bit key, and up to 38 rounds of


diffusion.
• SHX: Minimum 1536 bit key, and up to 128 rounds of
diffusion.
• THX: Minimum 1536 bit key, and up to 32 rounds of
diffusion.

Super Ciphers

These are merged ciphers where two ciphers are combined


during the rounds processing stage. There are a number of
existing ciphers that use this combined cipher technique, they
either encrypt a file twice, each time using a different key and
cipher, or use the combined output from counter
encryptions to create a pseudo random key stream which is
combined with the input to create the cipher-text. Both of
these methods share the same weakness, called a ‘meet in the
middle’ attack. This is where it may be theoretically possible to
unwind both ciphers with little more than the computational
effort required to break one alone. I got around this by
‘merging’ the ciphers within the transform function. For
example, RSM combines the Rijndael and Serpent ciphers. In
the rounds function, the input data undergoes a round of
Rijndael, the output of that round is processed as a round of
Serpent, then another round of Rijndael etc. So if set to 18
rounds, it process the input with 18 rounds of Rijndael and 16
rounds of Serpent. This should effectively mitigate the meet in
the middle attack, and make most forms of differential or
linear analysis far more difficult.

• RSM: Rijndael and Serpent merged. HKDF key


scheduler and up to 42 rounds of diffusion.
• TSM: Twofish and Serpent merged. HKDF key scheduler
and up to 32 rounds of diffusion.
• Fusion: Rijndael and Twofish merged. HKDF key
scheduler and up to 32 rounds of diffusion.
• DCS: Two separate Rijndael cipher instances combined
and used as a random key stream.

Cipher Modes

The project focuses on two modes; CTR, a segmented


integer counter mode, and CBC, cipher block chaining. These
are considered two of the most secure cipher modes. The CTR
mode is automatically parallelized, as is the decryption

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 5 of 57

function of the CBC mode. The project also includes ECB and
CFB modes that are not currently implemented.

Padding

Some modes, like CBC, require block aligned input lengths. If


at the end of an array of input data, the last block is less than
the cipher block size, padding is added to complete the block.
The project currently implements X9.23 and PKCS7 padding
modes.

Hash Algorithms

• SHA256: An implementation of SHA-2 with a 256 bit


hash output.
• SHA512: An implementation of SHA-2 with a 512 bit
hash output.
• SHA-3: An implementation of the Keccak based SHA-3,
with variable output sizes.

MAC

• HMAC: Wrapper for Hash based Message


Authentication Code, works with all 3 hash algorithms.
• SHA256HMAC: HMAC and SHA256 combined in a
class.
• SHA512HMAC: HMAC and SHA512 combined in a
class.

Queuing

• WaitQueue class, demonstrates using a queue to


create a constant time implementation.

Security

• SecureDelete: a 6 stage secure file deletion class.

Utilities

• Compression: a fully implemented compression and


folder archiving class.
• FileUtilities: a variety of file, folder, and drive functions.

Overview

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 6 of 57

Before we start looking at some of the ciphers and getting


into implementation details, I think it helps to 'break it down'
a bit, give you a general idea of what has been done, and
clarify some of the concepts and terminology used in the
article.

First off, the key schedule: A key schedule is a function that


takes a small amount of user supplied data (the cipher key),
and expands it, usually into a larger integer array. For
example; Rijndael takes a 32 byte key (256 bit), and expands
that into 60 integers, or 240 bytes worth of keying material.
That array of integers is sometimes called an array of
'rounded' keys, 'subkeys' or 'working' keys, I'll use the term
working key, because it makes it clear that it is a derived key.
Some key schedules have a simple algebraic expression;
Rijndael for example, derives most of the working keys with a
simple exclusive OR of two previous keys. Serpent uses a
much more elaborate key schedule, one that was designed to
resist some forms of cryptanalysis. These working keys,
created by the key schedule are used to create a unique
cipher text, and a good cipher design is one in which a
change of just a single bit in the cipher key, results in a
completely different output, this is known as the 'avalanche'
property. The working keys are usually added to the state
(input data at some stage of transformation), with a simple
addition or XOR.

Larger keys play an important role in a ciphers security. Many


of the techniques used to 'break' a cipher involve the
reduction in the number of times a unique key is tested
against the ciphers decryption output, in other words, they
reduce the number of brute force attempts required to
decrypt the output. When thinking of key sizes in this context,
it helps to understand some binary math.
Keys are measured in bits for a reason, because the sum of
the integer the key represents is the 2 square sum of its bits.
Think of it like the penny a day riddle; I loan you a dollar, you
agree to pay me back a penny on the first day, and then
double that each day for the rest of the month. By the last day
of the month, you'd owe more than 10 million dollars! That's
binary math, each time you add a bit, you double the size of
the previous sum. So, a 256 bit key represents an integer with
a maximum size of
1.1579208923731619542357098500869e+77, or rounded
as 1.15 times 10 to the power of 77. A simply monstrous
number.. and one might think that computers will never be
fast enough to run a decryption cycle that many times, and at
the this time that is almost certainly true, but some
cryptanalytic attacks aim to reduce that number, sometimes
quite substantially. A larger key puts this further out of reach;
given that the cipher key and the working key produced is
done in a cryptographically strong way, much larger keys are

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 7 of 57

feasible, and by using those longer keys, data can be kept


beyond the capabilities of technology for a longer period of
time.

There is evidence that the key schedule plays a part in


providing strength against linear and differential cryptanalysis,
and there have been some serious attacks on Rijndael that
leverage the weak key schedule. So it follows that a
cryptographically strong key schedule, can help create a
stronger cipher.

In this article, a transform is a function that performs the


actual encryption of data, just as the inverse transform
performs the decryption in a reversible iterative block cipher.
In a rounds based cipher, (sometimes called a product cipher),
a round can be thought of as one complete sequence of
transformation, whereas the transform function, (or rounds
function), may loop through a number of rounds and
use whitening stages.

In the three block ciphers presented here (Rijndael, Serpent,


and Twofish), The input bytes are first copied into
four integers. These integers are (in the case of Rijndael and
Twofish), XORd with members of the working key. These state
integers are then processed in a series of rounds, which
change the state via a series of substitutions,
permutations, and modular arithmetic, (with the key added in
stages), finally the processed state is whitened with the
remaining key members and copied into the output byte
array.

Let's look at a round of Serpent:

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb0(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 through R3 are state integers. Before each round the state


is XORd with a member of the working key. The state is then
processed through one of eight bit slicing S-Boxes before
undergoing a linear transform. This clearly illustrates the role
of the working key during a round cycle; the working keys are
used to mix with the state in a way that will produce an
output that is unique to that key, this is their purpose, and in
these ciphers, they do not interact with the algebraic
transformation functions in any other way.

One often hears of the term rounds in the context of the


number of rounds that can be broken using an attack on the
cipher; Rijndael has been shown to be vulnerable to a known-
key distinguishing attack against a reduced 8-round version of

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 8 of 57

AES-128. These attacks are often aimed at reduced versions,


where a smaller number of rounds can be broken, as both a
means of providing proof with limited computing power, and
positing the method by which a full transformation might be
reversed. This is because with most ciphers, adding rounds
increases the security of the cipher by making differential or
linear cryptanalysis more difficult. There have been a number
of noted cryptographers who have stated that the number of
rounds used in Rijndael should be increased, that its simple
algebraic description makes it vulnerable with the current
round counts, (10, 12, and 14), and it should be increased to
18 or more rounds to ensure its continued integrity..

The Base Ciphers

RDX (RijnDael eXtended) is a Rijndael implementation that


can process up to a 512 bit key. Up to a 256 bit key, it will
produce the exact same output as any other valid
implementation of Rijndael. This is proven. In the tests section
of the project, the AesAvsVector class tests the complete set
of AESAVS (Advanced Encryption Standard Algorithm
Validation Suite) known answer vector and monte carlo tests.
These are the same tests used to get an AES implementation
certified by NIST. Further tests from the AES submission
package and KATs (Known Answer Tests) testing a 32
byte block size are also included.

SPX (SerPent eXtended) is a Serpent implementation. It can


also use up to a 512 bit key. The number of rounds in SPX is
also configurable; from the default 32 rounds, to a full 64
transformation rounds. Just like with Rijndael, I used the most
complete and authritative test suite I could find; the
complete Nessie Serpent test suite. The tests include 100
thousand rounds of Monte Carlo tests, and is the authoritative
test suite for the Serpent cipher. This means that up to a 256
bit key, the output from SPX is identical to any other
valid version of Serpent.

TFX (TwoFish eXtended) is a Twofish implementation. Just as


with RDX and SPX it can process the larger 512 bit key size.
The number of rounds is also configurable; from the default
16 rounds to a maximum of 32 rounds of transformation. Just
as with the other two, the most complete tests available are
run on the standard key lengths, in this case the official
Twofish KATs.

With all three of these ciphers, I first analyzed the existing


patterns within the key schedules, then sought to extend
them using as few changes as possible to the original
algorithms. In the case of Rijndael and Serpent, the changes
were almost trivial, Twofish, because of the keyed S-Box,
required a more thorough examination. In all cases, I did my

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 9 of 57

best to understand the nature of the function, both


programmatically and as mathematical expression,
implementing the extensions in the way I thought was closest
to the original, but also best leveraged the additional
cipher key entropy provided by a larger initial key size.

The HX Ciphers

One of the central goals of this project has been to create the


strongest ciphers possible, using existing and proven
cryptographic primitives. Another important goal was to try to
better understand various attack vectors, and create
something that was more resistant to these attacks.

The three base ciphers all have something in common; they all
use the working key in a similar way; to change or 'whiten' the
state values to create a unique output, other than that, they
do not interact with the actual computational processes used
to transform the state. What that means is that how that
cipher key is expanded, (so long as it is done in a secure way),
does not directly impact the data transformation. Creating
that expanded key using a more secure means, like a hash
function, can increase the overall security of the cipher itself.

The HX ciphers; RHX, SHX, and THX all use HKDF, that's a


Hash based Key Derivation Function, a kind of pseudo-
random generator. HKDF is powered by an SHA-2 512
HMAC, a keyed hash function. This is one of the most
cryptographically strong methods of creating a pseudo-
random output; even a strong key schedule like the one used
in Serpent, is not as secure as using this method to generate
the working keys. Aside from the increased security, there are
two additional advantages to replacing the key schedule with
a hash based KDF; it is more resistant to weak key and sliding
attacks, and the longer cipher key size, makes brute force
attacks impossible.

Timing attacks use discrete differences in the length of time


it takes to perform a task with a given set of parameters. In
the case of an attack on a key schedule, it measures the
distance in timing of things like branching and table lookups
to make predictions about the key; like the slight difference
between looking up the first or last value in a table of
integers, or the computational time averaged to compute an
output given the value of a specific table
member. SHA-2 is less vulnerable to timing attacks, because
the amount of time required to run is typically more constant
than say.. the Rijndael key schedule.

The other advantage is the key size; the minimum key size for
an HX cipher is the block size of the hash function (SHA512 =
128 bytes) + the IKM, or the HMAC key material (64 bytes). So
the key for these ciphers is a minimum of 192 bytes (1536

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 10 of 57

bits), but expandable up to any size in multiples of the hash


functions block size. This might seem like a very large key, but
consider; my 256GB thumb drive could easily store over a
billion keys.. the benefit is obvious; even when quantum
computers are made that can break a 256 bit key, it will still
be many years (decades) from that point before they could
brute force a 1536 bit key.

Super Ciphers

Super Ciphers have been around for a while, and there are a
number of different implementations of software that double,
or even triple encrypt an input using different ciphers and
keys. There are also implementations that use multiple
instances of the same cipher, (think Triple-DES). This is done
to extend key size and make the output more resistant to
some forms of Linear and Differential cryptanalysis. The
problem with this approach is that it is subject to a 'meet in
the middle attack'. Some theoretical models project that
decryption could be performed with little more computational
energy than brute forcing only one of the cipher instances.
One way to mitigate this attack, is instead of encrypting
successively with different cryptographic instances, it makes
more sense to combine the primitives at the rounds
processing level.

Let's look at two rounds of RSM:

// serpent sbox and transform


Sb0(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

// rijndael round
R0 = T0[C0 >> 24] ^ T1[(byte)(C1 >> 16)] ^ T2[(byte)(C2
>> 8)] ^ T3[(byte)C3] ^ _exKey[keyCtr++];
R1 = T0[C1 >> 24] ^ T1[(byte)(C2 >> 16)] ^ T2[(byte)(C3
>> 8)] ^ T3[(byte)C0] ^ _exKey[keyCtr++];
R2 = T0[C2 >> 24] ^ T1[(byte)(C3 >> 16)] ^ T2[(byte)(C0
>> 8)] ^ T3[(byte)C1] ^ _exKey[keyCtr++];
R3 = T0[C3 >> 24] ^ T1[(byte)(C0 >> 16)] ^ T2[(byte)(C1
>> 8)] ^ T3[(byte)C2] ^ _exKey[keyCtr++];

Sb1(ref C0, ref C1, ref C2, ref C3);


LinearTransform(ref C0, ref C1, ref C2, ref C3);

C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2[(byte)(R2


>> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2[(byte)(R3
>> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2[(byte)(R0
>> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2[(byte)(R1
>> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];

The rounds in RSM run in a loop; all eight of the Serpent


S-Boxes and linear transforms are processed in one complete
loop cycle, along with eight rounds of Rijndael. Looking at the
code above, you can see how the state (Rn and Cn), first passes
through a full round of Serpent, then the product of that

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 11 of 57

transformation undergoes a round of Rijndael. Manipulating


the state directly, and this combining of ciphers with very
different algebraic compositions, should make the output
more resistant to meet in the middle attacks, as well as
making other forms of cryptanalysis far more difficult.

It was even possible to create an invertible cipher in the case


of RSM (Rijndael/Serpent Merge), and TSM (Twofish/Serpent
Merge). The third 'super cipher' is a stream cipher named
Fusion. Fusion combines full rounds of Twofish and Rijndael,
including the working key processing for each. It uses a
random 128 bit integer counter to create a pseudo-random
key stream, in parallel, and XORd with the input to create
cipher text.

Benchmarks
Speed tests were performed on an AMD ASD-3600 Quad-
Core, 4GB RAM (3.47 GB usable), compiled Release/Any CPU
on Windows 7.
Test is a transform of a byte array in a Monte Carlo method.
Sizes are in MBs (1000000 bytes). Time format sec.ms, key
sizes in bits. Rate is MB per minute.
HX series will have similar times, as they use the
same transformation engines.
CTR mode and CBC decrypt are run in parallel mode. CBC
encrypt is in linear (single processor) mode.
Highest rate was RDX with a 128 bit key: 6.052 GB per minute!

Block Ciphers

RDX (Rijndael): 256 Key, 14 Rounds


Mode  State   Size    Time    Rate
----     -----     ----     ----     ----
CTR     ENC     100     1.18    5084
CTR     DEC     100     1.18    5043
CBC     ENC     100     3.03    1980
CBC     DEC     100     1.59    3773

RSM (Rijndael/Serpent Merged): 1536 Key, 18 Rounds


Mode  State   Size    Time    Rate
----     -----     ----     ----     ----
CTR     ENC     100     2.69    2230 
CTR     DEC     100     2.68    2238
CBC     ENC     100     8.15    736
CBC     DEC     100     3.17    1892

SPX (Serpent): 256 Key, 32 Rounds


Mode  State   Size    Time    Rate
----     -----     ----     ----     ----

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 12 of 57

CTR     ENC     100     2.22    2702


CTR     DEC     100     2.21    2714
CBC     ENC     100     6.45    930
CBC     DEC     100     2.39    2510

TFX (Twofish): 256 Key, 16 Rounds


Mode  State   Size    Time    Rate
----     -----     ----     ----     ----
CTR     ENC     100     1.33    4511
CTR     DEC     100     1.35    4444
CBC     ENC     100     3.55    1690
CBC     DEC     100     1.74    3448

Stream Ciphers

ChaCha 256 Key, 20 Rounds


Size    Time    Rate
----     ----      ----
100     2.26    2027

DCS 768 Key


Size    Time    Rate
----     ----      ----
100     2.12    2830

Fusion 2560 Key


Size    Time    Rate
----     ----      ----
100     2.16    2777  

Salsa20 256 Key, 20 Rounds


Size    Time    Rate
----     ----      ----
100     2.68    2238

RDX (Rijndael)
RDX is an implementation of the Rijndael encryption
algorithm, the same one used in the AES standard. What I
have done is to extend Rijndael so that it now accepts the
longer key length (512 bits). The extended key length
provides more security against attacks that attempt to brute
force the key, and also adds eight more rounds of diffusion.
The increased number of rounds brings the total from 14
rounds with a 256 bit key, to 22 rounds with the 512 bit key
size. These added passes through the transform further
disperse the input through row and column transpositions,
and XOR’s with a longer expanded key array.

The Key Schedule:

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 13 of 57

The key schedule is where this version departs from the


standard. The maximum key length has been extended to
allow for a 512 bit (64 byte) key size. Creating this extension
requires two conditions; the diffusion rounds should be
written as a loop, (to accommodate the additional rounds of
mixing), and the key scheduler code must be written in a way
that it can process the user key into a larger working key, (the
integer array that the diffusion algorithm uses as part of the
input transformation process). The key schedule I have used
here is based on the Mono library implementation used in the
class RijndaelManagedTransform.cs.

private void ExpandKey(byte[] Key, bool Encryption)


{
int pos = 0;
// block and key in 32 bit words
Nb = this.BlockSize / 4;
Nk = Key.Length / 4;

// rounds calculation
if (Nk == 16)
Nr = 22;
else if (Nb == 8 || Nk == 8)
Nr = 14;
else if (Nk == 6)
Nr = 12;
else
Nr = 10;

// setup expanded key


int keySize = Nb * (Nr + 1);
_exKey = new UInt32[keySize];

// add bytes to beginning of working key array


for (int i = 0; i < Nk; i++)
{
UInt32 value = ((UInt32)Key[pos++] << 24);
value |= ((UInt32)Key[pos++] << 16);
value |= ((UInt32)Key[pos++] << 8);
value |= ((UInt32)Key[pos++]);
_exKey[i] = value;
}

// build the remaining round keys


for (int i = Nk; i < keySize; i++)
{
UInt32 temp = _exKey[i - 1];

// if it is a 512 bit key, maintain step 8


interval for
// additional processing steps, equal to a 256
key distribution
if (Nk > 8)
{
if (i % Nk == 0 || i % Nk == 8)
{
// round the key
UInt32 rot = (UInt32)((temp << 8) |
((temp >> 24) & 0xff));
// subbyte step
temp = SubByte(rot) ^ Rcon[i / Nk];
}
// step ik + 4
else if ((i % Nk) == 4 || (i % Nk) == 12)
{

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 14 of 57

temp = SubByte(temp);
}
}
else
{
if (i % Nk == 0)
{
// round the key
UInt32 rot = (UInt32)((temp << 8) |
((temp >> 24) & 0xff));
// subbyte step
temp = SubByte(rot) ^ Rcon[i / Nk];
}
// step ik + 4
else if (Nk > 6 && (i % Nk) == 4)
{
temp = SubByte(temp);
}
}
// w[i-Nk] ^ w[i]
_exKey[i] = (UInt32)_exKey[i - Nk] ^ temp;
}

// inverse cipher
if (!Encryption)
{
// reverse key
for (int i = 0, k = keySize - Nb; i < k; i +=
Nb, k -= Nb)
{
for (int j = 0; j < Nb; j++)
{
UInt32 temp = _exKey[i + j];
_exKey[i + j] = _exKey[k + j];
_exKey[k + j] = temp;
}
}
// sbox inversion
for (int i = Nb; i < keySize - Nb; i++)
{
_exKey[i] = IT0[SBox[(_exKey[i] >> 24)]] ^
IT1[SBox[(byte)(_exKey[i] >> 16)]] ^
IT2[SBox[(byte)(_exKey[i] >> 8)]] ^
IT3[SBox[(byte)_exKey[i]]];
}
}

this.IsInitialized = true;
}

The first thing to note is the rounds calculation; I have added


an additional clause of

if (Nk == 16) Nr = 22

Nk is equal to the number of 32 bit words in the user supplied


key
Nr is the number of rounds as a function of Nk and Nb
Nb is the size of the state in 32 bit words, in this
implementation Nb is either 4 (16 byte block), or 8 (32 byte
block).

The number of round keys created is a function of Nb (Nr +


1), which is the number of rounds plus one, multiplied by the

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 15 of 57

number of 32 bit words in the block size. A 512 bit user key
with a 16 byte block size will generate 92 working keys, or 184
working keys with a 32 byte block. A 256 bit key generates
60 or 120 working keys. A better dispersion ratio of user key
to expanded key size is achieved with the larger 512 bit key;
bytes(32 byte key: 32/240 and 32/480, 64 byte key: 64/368
and 64/736).

The next step is adding the user key to the beginning of


the working key, shifting the user key bytes into the working
key, which adds the first eight integers with a 256 bit key, or
16 integers with a 512 bit key.
We then create the additional working keys. The working key,
exKey[i], is equal to the XOR of the previous word, exKey
[i-1], and the word Nk positions earlier, exKey[i-Nk].

With a 256 bit key; every Nk interval, an additional step is


added that first rounds the key, then processes it with
SubByte(), (SBox lookup applied to each byte), then Xors this
with a round constant from the Rcon table. This extra
step happens on a modulus of i % Nk, with a 256 bit key that
is step 8, or every 8 passes through the loop.
The expansion routine for a 256 bit key uses some additional
processing; if i % Nk yields a remainder of 4, then SubByte()
is applied to exKey[i-1] prior to the Xor with exKey[i-Nk]. The
designers implemented this to further disperse the larger 256
bit key.

With a 512 bit key; Nk = 16, which would double the interval
between these additional dispersal steps, and create a weaker
expanded key. I have compensated for this by maintaining the
same intervals in the dispersion pattern; just as with a 256 bit
key every 8 keys, the Rcon ^ SubByte step executes, with the
SubByte step at the same alternating offset 4 interval.

If the transform is for decryption then a routine performs the


additional key reversal and SBox inversion, required by the
inverse cipher.

As I mentioned this implementation of the key expansion


routine is based on the C# Mono version, the only real change
I have made was to the rounds calculation,
adding the additional 8 rounds of diffusion when a 512 bit key
is used, other than that the key expansion routine is
exactly the same.

Rijndael Transform:

When writing this method I first took a look at a number of


implementations in various languages, to try and get a better
idea of the different ways in which the transform could be
expressed programmatically; and how that related to the

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 16 of 57

strengths and weaknesses of the C# language, (though these


classes could very easily be ported to Java or C).

The Bouncy Castle version processes a multi-


dimensional array of unsigned integers rather than bytes,
(which is more in keeping with the specification outline). It
converts the bytes to integers and back again after the
transformation.

The Mono version processes bytes, using method level


integers to store the transposition sums. My first thought was
to avoid a multi-dimensional array, as these are slow in almost
every language. I did some speed comparisons, and the two
versions were close to equal, but there were some things that
could be done to speed up processing in the Mono version,
and because we are using a variable key size, we need the
diffusion rounds to run in a loop.

To get some idea of how different the methods are, look at


the Tests\CompareEngines.cs speed test, the diffusion
algorithm from Bouncy Castle, Mono, and RDX are all there,
and they are all quite different, (and RDX is fastest). This is
what the 16 byte block version of the encryption algorithm
looks like in RDX:

private void Encrypt16(byte[] Input, int InOffset, byte


[] Output, int OutOffset)
{
int keyCtr = 0;
UInt32 R0, R1, R2, R3, C0, C1, C2, C3;

// Round 0
R0 = (UInt32)((Input[InOffset] << 24) | (Input
[InOffset + 1] << 16) | (Input[InOffset + 2] << 8) |
Input[InOffset + 3]) ^ _exKey[keyCtr++];
R1 = (UInt32)((Input[InOffset + 4] << 24) | (Input
[InOffset + 5] << 16) | (Input[InOffset + 6] << 8) |
Input[InOffset + 7]) ^ _exKey[keyCtr++];
R2 = (UInt32)((Input[InOffset + 8] << 24) | (Input
[InOffset + 9] << 16) | (Input[InOffset + 10] << 8) |
Input[InOffset + 11]) ^ _exKey[keyCtr++];
R3 = (UInt32)((Input[InOffset + 12] << 24) | (Input
[InOffset + 13] << 16) | (Input[InOffset + 14] << 8) |
Input[InOffset + 15]) ^ _exKey[keyCtr++];

// Round 1
C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2[(byte)
(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2[(byte)
(R3 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2[(byte)
(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2[(byte)
(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];

while (keyCtr < _exKey.Length - 4)


{
R0 = T0[C0 >> 24] ^ T1[(byte)(C1 >> 16)] ^ T2
[(byte)(C2 >> 8)] ^ T3[(byte)C3] ^ _exKey[keyCtr++];
R1 = T0[C1 >> 24] ^ T1[(byte)(C2 >> 16)] ^ T2
[(byte)(C3 >> 8)] ^ T3[(byte)C0] ^ _exKey[keyCtr++];
R2 = T0[C2 >> 24] ^ T1[(byte)(C3 >> 16)] ^ T2

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 17 of 57

[(byte)(C0 >> 8)] ^ T3[(byte)C1] ^ _exKey[keyCtr++];


R3 = T0[C3 >> 24] ^ T1[(byte)(C0 >> 16)] ^ T2
[(byte)(C1 >> 8)] ^ T3[(byte)C2] ^ _exKey[keyCtr++];
C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2
[(byte)(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2
[(byte)(R3 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2
[(byte)(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2
[(byte)(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];
}

// Final Round
Output[OutOffset] = (byte)(SBox[C0 >> 24] ^ (byte)
(_exKey[keyCtr] >> 24));
Output[OutOffset + 1] = (byte)(SBox[(byte)(C1 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 2] = (byte)(SBox[(byte)(C2 >> 8)]
^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 3] = (byte)(SBox[(byte)C3] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 4] = (byte)(SBox[C1 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 5] = (byte)(SBox[(byte)(C2 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 6] = (byte)(SBox[(byte)(C3 >> 8)]
^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 7] = (byte)(SBox[(byte)C0] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 8] = (byte)(SBox[C2 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 9] = (byte)(SBox[(byte)(C3 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 10] = (byte)(SBox[(byte)(C0 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 11] = (byte)(SBox[(byte)C1] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 12] = (byte)(SBox[C3 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 13] = (byte)(SBox[(byte)(C0 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 14] = (byte)(SBox[(byte)(C1 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 15] = (byte)(SBox[(byte)C2] ^
(byte)_exKey[keyCtr]);
}

This is based on the Mono version with some important


differences; in the Mono version, the rounds are laid out
sequentially with the key index as a series of fixed integers,
and uses if keyCtr < _exKey.Length clauses to control the
number of rounds processed based on the extended key size.
In this version, a single incrementing integer keyCtr is used as
the the key index, and the rounds are processed in a while
loop based on the formula counter < Nr * Nb. Using a single
incrementing counter, and eliminating the rounds clauses
increases the speed significantly.

As you can see this version uses the byte oriented approach
and is optimized by combining the SubBytes and

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 18 of 57

ShiftRows steps with the MixColumns step by


transforming them into a sequence of table lookups using the
byte values as table indices. This requires four 256 member
pre-calculated lookup tables to perform the byte
multiplication. The AddRoundKey step is then performed
with an additional Xor of the expanded key, and a round is
completed with just with 16 table lookups, and 16 Xor
operations.

The 32 byte block size also conforms to a standard Rijndael


configuration, by shifting the lookup value on the second,
third and fourth row by 1 byte, 3 bytes and 4 bytes
respectively. This is done to keep columns linearly
independent.

private void Encrypt32(byte[] Input, int InOffset, byte


[] Output, int OutOffset)
{
int keyCtr = 0;
UInt32 R0, R1, R2, R3, R4, R5, R6, R7, C0, C1, C2,
C3, C4, C5, C6, C7;

// Round 0
R0 = (UInt32)((Input[InOffset] << 24) | (Input
[InOffset + 1] << 16) | (Input[InOffset + 2] << 8) |
Input[InOffset + 3]) ^ _exKey[keyCtr++];
R1 = (UInt32)((Input[InOffset + 4] << 24) | (Input
[InOffset + 5] << 16) | (Input[InOffset + 6] << 8) |
Input[InOffset + 7]) ^ _exKey[keyCtr++];
R2 = (UInt32)((Input[InOffset + 8] << 24) | (Input
[InOffset + 9] << 16) | (Input[InOffset + 10] << 8) |
Input[InOffset + 11]) ^ _exKey[keyCtr++];
R3 = (UInt32)((Input[InOffset + 12] << 24) | (Input
[InOffset + 13] << 16) | (Input[InOffset + 14] << 8) |
Input[InOffset + 15]) ^ _exKey[keyCtr++];
R4 = (UInt32)((Input[InOffset + 16] << 24) | (Input
[InOffset + 17] << 16) | (Input[InOffset + 18] << 8) |
Input[InOffset + 19]) ^ _exKey[keyCtr++];
R5 = (UInt32)((Input[InOffset + 20] << 24) | (Input
[InOffset + 21] << 16) | (Input[InOffset + 22] << 8) |
Input[InOffset + 23]) ^ _exKey[keyCtr++];
R6 = (UInt32)((Input[InOffset + 24] << 24) | (Input
[InOffset + 25] << 16) | (Input[InOffset + 26] << 8) |
Input[InOffset + 27]) ^ _exKey[keyCtr++];
R7 = (UInt32)((Input[InOffset + 28] << 24) | (Input
[InOffset + 29] << 16) | (Input[InOffset + 30] << 8) |
Input[InOffset + 31]) ^ _exKey[keyCtr++];

// Round 1
C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2[(byte)
(R3 >> 8)] ^ T3[(byte)R4] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2[(byte)
(R4 >> 8)] ^ T3[(byte)R5] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2[(byte)
(R5 >> 8)] ^ T3[(byte)R6] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R4 >> 16)] ^ T2[(byte)
(R6 >> 8)] ^ T3[(byte)R7] ^ _exKey[keyCtr++];
C4 = T0[R4 >> 24] ^ T1[(byte)(R5 >> 16)] ^ T2[(byte)
(R7 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C5 = T0[R5 >> 24] ^ T1[(byte)(R6 >> 16)] ^ T2[(byte)
(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C6 = T0[R6 >> 24] ^ T1[(byte)(R7 >> 16)] ^ T2[(byte)
(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];
C7 = T0[R7 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2[(byte)

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 19 of 57

(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];

// rounds loop
while (keyCtr < _exKey.Length - 8)
{
R0 = T0[C0 >> 24] ^ T1[(byte)(C1 >> 16)] ^ T2
[(byte)(C3 >> 8)] ^ T3[(byte)C4] ^ _exKey[keyCtr++];
R1 = T0[C1 >> 24] ^ T1[(byte)(C2 >> 16)] ^ T2
[(byte)(C4 >> 8)] ^ T3[(byte)C5] ^ _exKey[keyCtr++];
R2 = T0[C2 >> 24] ^ T1[(byte)(C3 >> 16)] ^ T2
[(byte)(C5 >> 8)] ^ T3[(byte)C6] ^ _exKey[keyCtr++];
R3 = T0[C3 >> 24] ^ T1[(byte)(C4 >> 16)] ^ T2
[(byte)(C6 >> 8)] ^ T3[(byte)C7] ^ _exKey[keyCtr++];
R4 = T0[C4 >> 24] ^ T1[(byte)(C5 >> 16)] ^ T2
[(byte)(C7 >> 8)] ^ T3[(byte)C0] ^ _exKey[keyCtr++];
R5 = T0[C5 >> 24] ^ T1[(byte)(C6 >> 16)] ^ T2
[(byte)(C0 >> 8)] ^ T3[(byte)C1] ^ _exKey[keyCtr++];
R6 = T0[C6 >> 24] ^ T1[(byte)(C7 >> 16)] ^ T2
[(byte)(C1 >> 8)] ^ T3[(byte)C2] ^ _exKey[keyCtr++];
R7 = T0[C7 >> 24] ^ T1[(byte)(C0 >> 16)] ^ T2
[(byte)(C2 >> 8)] ^ T3[(byte)C3] ^ _exKey[keyCtr++];

C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2


[(byte)(R3 >> 8)] ^ T3[(byte)R4] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2
[(byte)(R4 >> 8)] ^ T3[(byte)R5] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2
[(byte)(R5 >> 8)] ^ T3[(byte)R6] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R4 >> 16)] ^ T2
[(byte)(R6 >> 8)] ^ T3[(byte)R7] ^ _exKey[keyCtr++];
C4 = T0[R4 >> 24] ^ T1[(byte)(R5 >> 16)] ^ T2
[(byte)(R7 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C5 = T0[R5 >> 24] ^ T1[(byte)(R6 >> 16)] ^ T2
[(byte)(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C6 = T0[R6 >> 24] ^ T1[(byte)(R7 >> 16)] ^ T2
[(byte)(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];
C7 = T0[R7 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2
[(byte)(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
}

// Final Round
Output[OutOffset] = (byte)(SBox[C0 >> 24] ^ (byte)
(_exKey[keyCtr] >> 24));
Output[OutOffset + 1] = (byte)(SBox[(byte)(C1 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 2] = (byte)(SBox[(byte)(C3 >> 8)]
^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 3] = (byte)(SBox[(byte)C4] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 4] = (byte)(SBox[C1 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 5] = (byte)(SBox[(byte)(C2 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 6] = (byte)(SBox[(byte)(C4 >> 8)]
^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 7] = (byte)(SBox[(byte)C5] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 8] = (byte)(SBox[C2 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 9] = (byte)(SBox[(byte)(C3 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 10] = (byte)(SBox[(byte)(C5 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 11] = (byte)(SBox[(byte)C6] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 12] = (byte)(SBox[C3 >> 24] ^

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 20 of 57

(byte)(_exKey[keyCtr] >> 24));


Output[OutOffset + 13] = (byte)(SBox[(byte)(C4 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 14] = (byte)(SBox[(byte)(C6 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 15] = (byte)(SBox[(byte)C7] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 16] = (byte)(SBox[C4 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 17] = (byte)(SBox[(byte)(C5 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 18] = (byte)(SBox[(byte)(C7 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 19] = (byte)(SBox[(byte)C0] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 20] = (byte)(SBox[C5 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 21] = (byte)(SBox[(byte)(C6 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 22] = (byte)(SBox[(byte)(C0 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 23] = (byte)(SBox[(byte)C1] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 24] = (byte)(SBox[C6 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 25] = (byte)(SBox[(byte)(C7 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 26] = (byte)(SBox[(byte)(C1 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 27] = (byte)(SBox[(byte)C2] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 28] = (byte)(SBox[C7 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 29] = (byte)(SBox[(byte)(C0 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 30] = (byte)(SBox[(byte)(C2 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 31] = (byte)(SBox[(byte)C3] ^
(byte)_exKey[keyCtr]);
}

To summarize; this version (RDX) of Rijndael is based on a


well-known and accepted implementation model, all I have
done is to write it in a way that it accepts a longer key length
and that the numbers of rounds processed is determined by
the length of the working key. Both the Rijndael and AES
specification documents makes it fairly clear that the authors
of Rijndael designed the algorithm with extensibility in mind,
to quote Section 6.3 of the AES specification document Fips
197:

"This standard explicitly defines the allowed values for the key
length (Nk), block size (Nb), and number of rounds (Nr).
However, future reaffirmations of this standard could include
changes or additions to the allowed values for those
parameters. Therefore, implementers may choose to design
their AES implementations with future flexibility in mind."

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 21 of 57

So this is what I have written, a more flexible implementation,


one that can accommodate the larger 512 bit key size. Will
this larger key size make it vulnerable to certain attack
vectors? Yes, just as the 256 bit key is vulnerable. The real
question however, is will a practical attack negate the full 256
bits of security added with the larger key? That is unlikely, and
if such an attack were devised, it would almost certainly have
a devastating effect on the 256 bit key as well.. Rijndael has
been around for some time now, and thoroughly scrutinized,
the addition of 8 rounds of diffusion only makes
the implementation stronger, the longer key length makes it
more resistant to raw brute force attacks. But the weak key
scheduler did give me some pause here, which is why I wrote
RSX..

API

The RDX/RSX classes can be accessed either through


the IBlockCipher interface or directly through these public
properties and methods:

Properties:
Get/Set Unit block size of internal cipher
int BlockSize { get; set; }

Get Used as encryptor, false for decryption. Value set in the


Init() call
bool IsEncryption { get; }

Get Key has been expanded


bool IsInitialized { get; }

Get Available Encryption Key Sizes in bits


int[] KeySizes { get; }

Get Cipher name


string Name { get; }

Public Methods:
Constructor: Initialize the class
BlockSize: Algorithm input block size
public RDX(int BlockSize)

Init: Initialize the Cipher. Must be called before cipher is used


Encryption: Using Encryption or Decryption mode
KeyParam: Contains cipher key, valid sizes are: 128, 192, 256
and 512 bytes
void Init(bool Encryption, KeyParams KeyParam);

DecryptBlock: Decrypt a single block of bytes


Input and Output must be at least BlockSize in length

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 22 of 57

Input: Encrypted bytes


Output: Decrypted bytes
void DecryptBlock(byte[] Input, byte[] Output);

DecryptBlock: Decrypt a block of bytes within an array


Input and Output + Offsets must be at least BlockSize in
length
Input: Encrypted bytes
InOffset: Offset with the Input array
Output: Decrypted bytes
OutOffset: Offset with the Output array
void DecryptBlock(byte[] Input, int InOffset, byte[]
Output, int OutOffset);

EncryptBlock: Encrypt a single block of bytes


Input: Bytes to Encrypt
Output: Encrypted bytes
void EncryptBlock(byte[] Input, byte[] Output);

EncryptBlock: Encrypt a block of bytes within an array


Input: Bytes to Encrypt
InOffset: Offset with the Input array
Output: Encrypted bytes
OutOffset: Offset with the Output array
void EncryptBlock(byte[] Input, int InOffset, byte[]
Output, int OutOffset);

Transform: Process a block of bytes


Input: Bytes to Encrypt/Decrypt
Output: Encrypted or Decrypted bytes
void Transform(byte[] Input, byte[] Output);

Transform: Process a block of bytes


Input: Bytes to encrypt or decrypt
InOffset: Offset with the Input array
Output: Output bytes
OutOffset: Offset with the Output array
void Transform(byte[] Input, int InOffset, byte[]
Output, int OutOffset);

Dispose: Release resources used by this class


void Dispose();

RSX
RSX is a hybrid of the Rijndael and Serpent encryption
algorithms. Most encryption algorithms can be thought of as
having two main parts; the key schedule, and
the transformation algorithm.  The key schedule takes a small
amount of initial entropy, (the user key), and expands it into a
larger working array that is used in the rounds function.

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 23 of 57

Rijndael has what is considered a weak key schedule; it relies


on a strong diffusion algorithm to thoroughly whiten the
input data. One of the strongest key schedules is a part of the
Serpent algorithm, (which was the 2nd place AES finalist). This
joining of two algorithms has been done before; Sosemanuk,
an eSTREAM cipher finalist uses a combination of Serpent and
the stream cipher Snow. The key schedule in Serpent is much
more sophisticated than Rijndael, and does a better job at
dispersing the initial entropy and eliminating weak and
related keys. Quote from the authors of Serpent:

“Serpent has none of the simpler vulnerabilities that can result


from exploitable symmetries in the key schedule: there are no
weak keys, semi-weak keys, equivalent keys, or
complementation properties.”

The result is that both ciphers have been combined into a


hybrid that can encrypt using up to a 512 bit key length.

The Key Schedule

This implementation of the key schedule is based on the


version in the Bouncy Castle SerpentEngine.cs class, an
explanation of the algorithm can be found in the Serpent
documentation. My implementation is considerably different
from Bouncy Castle's version, (but on a 256 bit key the output
is tested equivalent). I made several changes to the method to
increase performance, to process the larger key size of 512
bits, and create the correct number of rounded keys. I have
also made a change to the algorithm itself to take advantage
of the larger 512 bit key to produce better overall dispersion
by extending the polynomial primitive used in the key
rotation; for a 256 bit key it is:

wi :=(wi-8 ^ wi-5 ^ wi-3 ^ wi-1 ^ PHI ^ i) <<< 11

For a 512 bit key this becomes:

wi :=(wi-16 ^ wi-13 ^ wi-11 ^ wi-


10 ^ wi-8 ^ wi-5 ^ wi-3 ^ wi-1 ^ PHI ^ i) <<< 11

This extension of the polynomial creates a more even


distribution of the key bits across the longer initial key.

private UInt32[] SerpentKey(byte[] Key)


{
int ct = 0;
int index = 0;
int padSize = Key.Length / 2;
UInt32[] Wp = new UInt32[padSize];
int keySize = Key.Length == 64 ? 92 : 60;

// rijndael uses 2x keys on 32 block


if (this.BlockSize == 32)
keySize *= 2;

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 24 of 57

// step 1: reverse copy key to temp array


for (int offset = Key.Length; offset > 0; offset -=
4)
Wp[index++] = BytesToWord(Key, offset - 4);

// initialize the key


UInt32[] Wk = new UInt32[keySize];

if (padSize == 16)
{
// create 32 byte key pre-key
// step 2: rotate k into w(k) ints
for (int i = 8; i < 16; i++)
Wp[i] = RotateLeft((uint)(Wp[i - 8] ^ Wp[i
- 5] ^ Wp[i - 3] ^ Wp[i - 1] ^ PHI ^ (i - 8)), 11);

// copy to expanded key


Array.Copy(Wp, 8, Wk, 0, 8);

// step 3: calculate remainder of rounds with


rotating primitive
for (int i = 8; i < keySize; i++)
Wk[i] = RotateLeft((uint)(Wk[i - 8] ^ Wk[i
- 5] ^ Wk[i - 3] ^ Wk[i - 1] ^ PHI ^ i), 11);
}
else
{
// *extended*: create (64 byte/16 word) pre-keys
// step 3: rotate k into w(k) ints, with
extended polynomial primitive
// Wp := (Wp-16 ^ Wp-13 ^ Wp-11 ^ Wp-10 ^ Wp-8
^ Wp-5 ^ Wp-3 ^ Wp-1 ^ PHI ^ i) <<< 11
for (int i = 16; i < 32; i++)
Wp[i] = RotateLeft((uint)(Wp[i - 16] ^ Wp[i
- 13] ^ Wp[i - 11] ^ Wp[i - 10] ^ Wp[i - 8] ^ Wp[i - 5]
^ Wp[i - 3] ^ Wp[i - 1] ^ PHI ^ (i - 16)), 11);

// copy to expanded key


Array.Copy(Wp, 16, Wk, 0, 16);

// step 3: calculate remainder of rounds


for (int i = 16; i < keySize; i++)
Wk[i] = RotateLeft((uint)(Wk[i - 16] ^ Wk[i
- 13] ^ Wk[i - 11] ^ Wk[i - 10] ^ Wk[i - 8] ^ Wk[i - 5]
^ Wk[i - 3] ^ Wk[i - 1] ^ PHI ^ i), 11);
}

// step 4: create the working keys by processing


with the Sbox and IP
while (ct < keySize - 32)
{
Sb3(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
Sb2(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
Sb1(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
Sb0(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
Sb7(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
Sb6(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
Sb5(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
Sb4(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct++]);
}

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 25 of 57

// last rounds
Sb3(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++], ref Wk
[ct++]);
Sb2(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++], ref Wk
[ct++]);
Sb1(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++], ref Wk
[ct++]);
Sb0(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++], ref Wk
[ct++]);
Sb7(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++], ref Wk
[ct++]);
Sb6(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++], ref Wk
[ct++]);

// different offset on 16 block


if (this.BlockSize != 32)
Sb5(ref Wk[ct++], ref Wk[ct++], ref Wk[ct++],
ref Wk[ct]);

return Wk;
}

Just as with the Rijndael key schedule, the user supplies key


bytes are first shifted into a temporary integer array; wp.  The
bytes in that array then undergo a transformation using a
variation of the key rotation polynomial before being added
to the beginning of the working key array. The remainder of
the pre-keys are then calculated and added to the working
key array wk. These pre-keys are then processed through a
series of SBox calculations, with the resulting registers being
copied into the corresponding key positions.

If you compare this to Rijndael's key scheduler, it is clear that


there is a great deal more processing in Serpents key
scheduler; in Rijndael, the user key is first copied straight into
the working key, whereas in this scheduler, it undergoes pre-
processing first. Most of Rijndaels rounded keys are generated
with the simple formula of;
k[i-1] ^ k[i-Nk], whereas Serpent uses the key rotation and
then an S-Box step on each key. The result is a much stronger
expanded key, and one that is more resistant to various weak
and related key attacks.

RHX
The Key Schedule

The key schedule in RHX is the defining difference between


this and the other versions; instead of using a simple
algorithm to expand the user supplied key into a larger
working array, it uses a hash based pseudo-random generator
to create the working key. HKDF is a key derivation function
that is using an SHA-2 512 HMAC (Hash based Message
Authentication Code ) as its diffusion engine. This is one of

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 26 of 57

the strongest methods available for generating pseudo-


random keying material, and far superior in entropy
dispersion to Rijndael, or even Serpents key schedule. HKDF
uses up to three inputs; a nonce value called an information
string, an Ikm (Input keying material), and a Salt value. The
HMAC Rfc 2104, recommends a key size equal to the digest
output, in this case 64 bytes with SHA512, anything larger
gets passed through the hash function to get the required
512 bit key size. The Salt size is a minimum of the hash
functions block size, with SHA-2 512 that is 128 bytes. So a
minimum key size for RHX is 192 bytes, further blocks of salt
can be added to the key so long as they align; ikm + (n *
blocksize), ex. 192, 320, 448 bytes.. there is no upper
maximum. This means that you can create keys as large as you
like so long as it falls on these boundaries, this effectively
eliminates brute force as a means of attack on the cipher,
even in quantum terms.

private UInt32[] ExpandKey(byte[] Key, bool Encryption)


{
// block and key in 32 bit words
Nb = this.BlockSize / 4;

// expanded key size


int keySize = Nb * (Nr + 1);

// hkdf return array


int keyBytes = keySize * 4;
byte[] rawKey = new byte[keyBytes];
int saltSize = Key.Length - IKM_SIZE;

// salt must be divisble of hash blocksize


if (saltSize % SALT_SIZE != 0)
saltSize = saltSize - saltSize % SALT_SIZE;

// hkdf input
byte[] hkdfKey = new byte[IKM_SIZE];
byte[] hkdfSalt = new byte[saltSize];

// copy hkdf key and salt from user key


Buffer.BlockCopy(Key, 0, hkdfKey, 0, IKM_SIZE);
Buffer.BlockCopy(Key, IKM_SIZE, hkdfSalt, 0,
saltSize);

// HKDF generator expands array using an SHA512 HMAC


using (HKDF gen = new HKDF(new SHA512HMAC()))
{
gen.Init(hkdfSalt, hkdfKey, _hkdfInfo);
gen.Generate(keyBytes, rawKey, 0);
}

// initialize working key


UInt32[] exKey = new UInt32[keySize];
// copy bytes to working key
Buffer.BlockCopy(rawKey, 0, exKey, 0, keyBytes);

// inverse cipher
if (!Encryption)
{
// reverse key
for (int i = 0, k = keySize - Nb; i < k; i +=
Nb, k -= Nb)
{

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 27 of 57

for (int j = 0; j < Nb; j++)


{
UInt32 temp = exKey[i + j];
exKey[i + j] = exKey[k + j];
exKey[k + j] = temp;
}
}
// sbox inversion
for (int i = Nb; i < keySize - Nb; i++)
{
exKey[i] = IT0[SBox[(exKey[i] >> 24)]] ^
IT1[SBox[(byte)(exKey[i] >> 16)]] ^
IT2[SBox[(byte)(exKey[i] >> 8)]] ^
IT3[SBox[(byte)exKey[i]]];
}
}

this.IsInitialized = true;

return exKey;
}

The working key is derived by multiplying the block size in


words by the number of rounds + 1, just as in a standard
Rijndael implementation. The key material is then copied into
Ikm and Salt byte arrays. The HKDF generator is initialized, the
Ikm, Salt, and an optional information string are passed to
HKDF, and the raw bytes are generated based on the key size
* 4 (size in words). These bytes are then copied to the
working key integer array.

Now, some people might balk at the size of the key, but what
is 192 bytes by todays standards? My 256 GB memory stick
could hold 1,333,333,333 keys, a single 5 terabyte drive could
hold multiple keys for every person on earth..  with wire
speeds and storage capabilities growing constantly, what is
the point in keeping keys so small, particularly when a larger
key, generated with a cryptographically strong method, can
dramatically increase the security of the cipher?

RHX has another strong advantage; the number of diffusion


rounds is configurable; between 10 and 38 rounds. That is the
number of rounds generated using the rounds
calculation formula for keys between 128 and 1024 bit. Using
22 rounds (equal to a 512 bit key), with the much larger key
space, creates what I believe to be one of the strongest
ciphers in the public domain, and one that will likely not be
breakable in my lifetime, regardless of advances in quantum
computing.

SPX (Serpent)
This is an implementation of the Serpent block cipher. Just as
with Rijndael, I strove to create a more flexible diffusion
engine and key schedule. I modified the key schedule so that
it can produce the required number of working keys when

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 28 of 57

doubling the diffusion round count, (64 rounds with 512 key,


32 rounds with 256 bit key or less). The diffusion algorithm,
(the portion of the cipher that does the actual mixing of
plaintext into ciphertext). is exactly the same with every key
length, only it can now process a variable number of rounds.
The key scheduler extension involved creating a larger
working array and a change to the rotation polynomial was
also added, to leverage the larger initial key array (16 words),
in a way that better concentrates and disperses the pre-keys
rotation cycle:

private Int32[] ExpandKey(byte[] Key)


{
int cnt = 0;
int index = 0;
int padSize = Key.Length < 32 ? 16 : Key.Length / 2;
Int32[] Wp = new Int32[padSize];
int offset = 0;

// less than 512 is default rounds


if (Key.Length < 64)
this.Rounds = DEFAULT_ROUNDS;

int keySize = 4 * (this.Rounds + 1);

// step 1: reverse copy key to temp array


for (offset = Key.Length; offset > 0; offset -= 4)
Wp[index++] = BytesToWord(Key, offset - 4);

// pad small key


if (index < 8)
Wp[index] = 1;

// initialize the key


Int32[] Wk = new Int32[keySize];

if (padSize == 16)
{
// 32 byte key
// step 2: rotate k into w(k) ints
for (int i = 8; i < 16; i++)
Wp[i] = RotateLeft((Wp[i - 8] ^ Wp[i - 5] ^
Wp[i - 3] ^ Wp[i - 1] ^ PHI ^ (i - 8)), 11);

// copy to expanded key


Array.Copy(Wp, 8, Wk, 0, 8);

// step 3: calculate remainder of rounds with


rotating primitive
for (int i = 8; i < keySize; i++)
Wk[i] = RotateLeft((Wk[i - 8] ^ Wk[i - 5] ^
Wk[i - 3] ^ Wk[i - 1] ^ PHI ^ i), 11);
}
else
{
// *extended*: 64 byte key
// step 3: rotate k into w(k) ints, with
extended polynominal primitive
// Wp := (Wp-16 ^ Wp-13 ^ Wp-11 ^ Wp-10 ^ Wp-8
^ Wp-5 ^ Wp-3 ^ Wp-1 ^ PHI ^ i) <<< 11
for (int i = 16; i < 32; i++)
Wp[i] = RotateLeft((Wp[i - 16] ^ Wp[i - 13]
^ Wp[i - 11] ^ Wp[i - 10] ^ Wp[i - 8] ^ Wp[i - 5] ^ Wp
[i - 3] ^ Wp[i - 1] ^ PHI ^ (i - 16)), 11);

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 29 of 57

// copy to expanded key


Array.Copy(Wp, 16, Wk, 0, 16);

// step 3: calculate remainder of rounds with


rotating primitive
for (int i = 16; i < keySize; i++)
Wk[i] = RotateLeft((Wk[i - 16] ^ Wk[i - 13]
^ Wk[i - 11] ^ Wk[i - 10] ^ Wk[i - 8] ^ Wk[i - 5] ^ Wk
[i - 3] ^ Wk[i - 1] ^ PHI ^ i), 11);
}

// step 4: create the working keys by processing


with the Sbox and IP
while (cnt < keySize - 4)
{
Sb3(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
Sb2(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
Sb1(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
Sb0(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
Sb7(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
Sb6(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
Sb5(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
Sb4(ref Wk[cnt++], ref Wk[cnt++], ref Wk
[cnt++], ref Wk[cnt++]);
}

// last round
Sb3(ref Wk[cnt++], ref Wk[cnt++], ref Wk[cnt++],
ref Wk[cnt]);

return Wk;
}

As you can see, this is similar to the key schedule in RSX, and
in fact their output is equivalent at the byte level. This
schedule however calculates the working key to a size
required by Serpents rounds processing.

SHX
SHX, just like RHX uses an HKDF generator to expand the user
supplied key into a working key integer array. It also takes a
user defined number of rounds between 32 (the normal
number of rounds), all the way up to 128 rounds in 8 round
sets. A round count of 40 or 48 is more than sufficient, as
theoretical attacks to date are only able to break up to 12
rounds and would require an enormous amount of memory
and processing power.

The transform in SHX is identical to the Serpent


implementation SPX, it process rounds by first moving the
byte input array into 4 integers, then processing the rounds in
a while loop. Each round consists of an Xor of each state word

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 30 of 57

(Rn) with a key, an S-Box transformation of those words, and


then a linear transformation. Each of the 8 S-Boxes are used in
succession within a loop cycle. The final round Xors the last 4
keys with the state and shifts them back into the output byte
array.

private void Encrypt16(byte[] Input, Int32 InOffset,


byte[] Output, Int32 OutOffset)
{
int keyCtr = 0;
int crnLen = _exKeyLength - 4;

// input round
Int32 R0 = BytesToWord(Input, InOffset + 12);
Int32 R1 = BytesToWord(Input, InOffset + 8);
Int32 R2 = BytesToWord(Input, InOffset + 4);
Int32 R3 = BytesToWord(Input, InOffset);

// process 8 round blocks


while (keyCtr < crnLen)
{
R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb0(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb1(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb2(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);
;

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb3(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb4(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb5(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 31 of 57

R3 ^= _exKey[keyCtr++];
Sb6(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 ^= _exKey[keyCtr++];
R1 ^= _exKey[keyCtr++];
R2 ^= _exKey[keyCtr++];
R3 ^= _exKey[keyCtr++];
Sb7(ref R0, ref R1, ref R2, ref R3);

// skip on last block


if (keyCtr < crnLen)
LinearTransform(ref R0, ref R1, ref R2, ref
R3);
}

// last round
WordToBytes(_exKey[keyCtr++] ^ R0, Output,
OutOffset + 12);
WordToBytes(_exKey[keyCtr++] ^ R1, Output, OutOffset
+ 8);
WordToBytes(_exKey[keyCtr++] ^ R2, Output, OutOffset
+ 4);
WordToBytes(_exKey[keyCtr] ^ R3, Output, OutOffset);
}

TFX (Twofish)
This was an interesting cipher, with methods like a keyed
S-Box and a complex algebraic description, quite a bit
different from Sepent or Rijndael. As such, it required more
consideration in how an extended key size could be
implemented. What I did was similar to the other ciphers;
to use patterns from the existing function, and extend those
patterns in a way that best leverages the larger cipher key
while maintaining a consistancy the original design.

private Int32[] ExpandKey(byte[] Key)


{
int k64Cnt = Key.Length / 8;
int kmLen = k64Cnt > 4 ? 8 : 4;
int keyCtr = 0;
Int32 A, B, Q;
Int32 Y0, Y1, Y2, Y3;
Int32[] eKm = new Int32[kmLen];
Int32[] oKm = new Int32[kmLen];
byte[] sbKey = new byte[Key.Length == 64 ? 32 : 16];
Int32[] wK = new Int32[this.Rounds * 2 + 8];

for (int i = 0; i < k64Cnt; i++)


{
// round key material
eKm[i] = BytesToWord(Key, keyCtr);
keyCtr += 4;
oKm[i] = BytesToWord(Key, keyCtr);
keyCtr += 4;
// sbox key material
WordToBytes(MDSEncode(eKm[i], oKm[i]), sbKey,
((k64Cnt * 4) - 4) - (i * 4));
}

keyCtr = 0;

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 32 of 57

while (keyCtr < KEY_BITS)


{
// create the expanded key
if (keyCtr < (wK.Length / 2))
{
Q = keyCtr * SK_STEP;
A = F32(Q, eKm, k64Cnt);
B = F32(Q + SK_BUMP, oKm, k64Cnt);
B = B << 8 | (Int32)((UInt32)B >> 24);
A += B;
wK[keyCtr * 2] = A;
A += B;
wK[keyCtr * 2 + 1] = A << SK_ROTL | (int)
((UInt32)A >> (32 - SK_ROTL));
}

Y0 = Y1 = Y2 = Y3 = keyCtr;

// 512 key
if (Key.Length == 64)
{
Y0 = (byte)Q1[Y0] ^ sbKey[28];
Y1 = (byte)Q0[Y1] ^ sbKey[29];
Y2 = (byte)Q0[Y2] ^ sbKey[30];
Y3 = (byte)Q1[Y3] ^ sbKey[31];

Y0 = (byte)Q1[Y0] ^ sbKey[24];
Y1 = (byte)Q1[Y1] ^ sbKey[25];
Y2 = (byte)Q0[Y2] ^ sbKey[26];
Y3 = (byte)Q0[Y3] ^ sbKey[27];

Y0 = (byte)Q0[Y0] ^ sbKey[20];
Y1 = (byte)Q1[Y1] ^ sbKey[21];
Y2 = (byte)Q1[Y2] ^ sbKey[22];
Y3 = (byte)Q0[Y3] ^ sbKey[23];

Y0 = (byte)Q0[Y0] ^ sbKey[16];
Y1 = (byte)Q0[Y1] ^ sbKey[17];
Y2 = (byte)Q1[Y2] ^ sbKey[18];
Y3 = (byte)Q1[Y3] ^ sbKey[19];
}
// 256 key
if (Key.Length > 24)
{
Y0 = (byte)Q1[Y0] ^ sbKey[12];
Y1 = (byte)Q0[Y1] ^ sbKey[13];
Y2 = (byte)Q0[Y2] ^ sbKey[14];
Y3 = (byte)Q1[Y3] ^ sbKey[15];
}
// 192 key
if (Key.Length > 16)
{
Y0 = (byte)Q1[Y0] ^ sbKey[8];
Y1 = (byte)Q1[Y1] ^ sbKey[9];
Y2 = (byte)Q0[Y2] ^ sbKey[10];
Y3 = (byte)Q0[Y3] ^ sbKey[11];
}

// sbox members as MDS matrix multiplies


_sBox[keyCtr * 2] = MDS0[(byte)Q0[(byte)Q0[Y0]
^ sbKey[4]] ^ sbKey[0]];
_sBox[keyCtr * 2 + 1] = MDS1[(byte)Q0[Q1[Y1] ^
sbKey[5]] ^ sbKey[1]];
_sBox[(keyCtr * 2) + 0x200] = MDS2[(byte)Q1
[(byte)Q0[Y2] ^ sbKey[6]] ^ sbKey[2]];
_sBox[keyCtr++ * 2 + 0x201] = MDS3[(byte)Q1
[(byte)Q1[Y3] ^ sbKey[7]] ^ sbKey[3]];
}

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 33 of 57

// key processed
this.IsInitialized = true;
return wK;
}

The first loop in the method copies the key material into two
arrays okm and ekm, used in the main while loop as working
key material. This is the first extension of the function. With a
256 bit key, the 32 bytes are copied into two 4 member
integer arrays. With a 512 bit key, these two integer arrays are
extended to 8 member arrays, using the full width of the user
supplied key material. The S-Box key sbKey, is extended as
well, from 16 to 32 bytes after undergoing an MDS (Maximum
Distance Separable) like transformation through MDSEncode
().

At the top of the main loop, the working key is created; using
the F32 function. The keyed S-Box member is then calculated
using a lookup into one of two key dependant s-boxes (QO
and Q1), XORd with a member of sbKey, the S-Box keying
material. This is the second extension of the function. The key
length clause determines how many times this shifting
permutation of Q s-box products and s-box keys occurs. With
a 256 bit key it happens three times, with the last stage
creating four S-Box keys by passing the XORd product of the
Q s-box lookups through an MDS matrix. With a 512 bit key,
you can see that the permutation has been appended by 16
bytes, adding some the additional entropy of the longer
cipher key, and using the same pattern of alternating key
dependant s-box lookups.

THX
The transform for both THX and TFX are identical. So no
matter how the working keys are generated, or what size they
are, the algebraic formula used by a round is the same.

private void Encrypt16(byte[] Input, Int32 InOffset,


byte[] Output, Int32 OutOffset)
{
Int32 keyCtr = 0;
Int32 X0 = BytesToWord(Input, InOffset) ^ _exKey
[keyCtr++];
Int32 X1 = BytesToWord(Input, InOffset + 4) ^ _exKey
[keyCtr++];
Int32 X2 = BytesToWord(Input, InOffset + 8) ^ _exKey
[keyCtr++];
Int32 X3 = BytesToWord(Input, InOffset + 12) ^
_exKey[keyCtr];
Int32 T0, T1;
keyCtr = 8;

while (keyCtr < _exKey.Length)


{
T0 = Fe0(X0);
T1 = Fe3(X1);

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 34 of 57

X2 ^= T0 + T1 + _exKey[keyCtr++];
X2 = (Int32)((UInt32)X2 >> 1) | X2 << 31;
X3 = (X3 << 1 | (Int32)((UInt32)X3 >> 31)) ^ (T0
+ 2 * T1 + _exKey[keyCtr++]);

T0 = Fe0(X2);
T1 = Fe3(X3);
X0 ^= T0 + T1 + _exKey[keyCtr++];
X0 = (Int32)((UInt32)X0 >> 1) | X0 << 31;
X1 = (X1 << 1 | (Int32)((UInt32)X1 >> 31)) ^ (T0
+ 2 * T1 + _exKey[keyCtr++]);
}

keyCtr = 4;
WordToBytes(X2 ^ _exKey[keyCtr++], Output,
OutOffset);
WordToBytes(X3 ^ _exKey[keyCtr++], Output, OutOffset
+ 4);
WordToBytes(X0 ^ _exKey[keyCtr++], Output, OutOffset
+ 8);
WordToBytes(X1 ^ _exKey[keyCtr], Output, OutOffset
+ 12);
}

It is interesting to note that Twofish has the unusual feature of


not combining the working key with the state in a strictly
linear fashion, but rather uses the first eight integers created
by the key schedule in the key whitening stages, with K4
through K8 used in the output round whitening stage.

RSM
This is Rijndael and Serpent merged within the rounds
function. The key scheduler uses HKDF, and is similar to an HX
series implementation. The transform combines the two
ciphers in the rounds processing loop. First a round of
Serpent; which is a pass through one of eight bit slicing
S-Boxes and a linear transform, then a full round of Rijndael,
where the working key is added to the state.

private void Encrypt16(byte[] Input, int InOffset, byte


[] Output, int OutOffset)
{
int keyCtr = 0;
UInt32 R0, R1, R2, R3, C0, C1, C2, C3;

// Round 0
R0 = (UInt32)((Input[InOffset] << 24) | (Input
[InOffset + 1] << 16) | (Input[InOffset + 2] << 8) |
Input[InOffset + 3]) ^ _exKey[keyCtr++];
R1 = (UInt32)((Input[InOffset + 4] << 24) | (Input
[InOffset + 5] << 16) | (Input[InOffset + 6] << 8) |
Input[InOffset + 7]) ^ _exKey[keyCtr++];
R2 = (UInt32)((Input[InOffset + 8] << 24) | (Input
[InOffset + 9] << 16) | (Input[InOffset + 10] << 8) |
Input[InOffset + 11]) ^ _exKey[keyCtr++];
R3 = (UInt32)((Input[InOffset + 12] << 24) | (Input
[InOffset + 13] << 16) | (Input[InOffset + 14] << 8) |
Input[InOffset + 15]) ^ _exKey[keyCtr++];

// Round 1

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 35 of 57

C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2[(byte)


(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2[(byte)
(R3 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2[(byte)
(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2[(byte)
(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];

while (keyCtr < _exKey.Length - 4)


{
// serpent sbox and transform
Sb0(ref R0, ref R1, ref R2, ref R3);
LinearTransform(ref R0, ref R1, ref R2, ref R3);

// rijndael round
R0 = T0[C0 >> 24] ^ T1[(byte)(C1 >> 16)] ^ T2
[(byte)(C2 >> 8)] ^ T3[(byte)C3] ^ _exKey[keyCtr++];
R1 = T0[C1 >> 24] ^ T1[(byte)(C2 >> 16)] ^ T2
[(byte)(C3 >> 8)] ^ T3[(byte)C0] ^ _exKey[keyCtr++];
R2 = T0[C2 >> 24] ^ T1[(byte)(C3 >> 16)] ^ T2
[(byte)(C0 >> 8)] ^ T3[(byte)C1] ^ _exKey[keyCtr++];
R3 = T0[C3 >> 24] ^ T1[(byte)(C0 >> 16)] ^ T2
[(byte)(C1 >> 8)] ^ T3[(byte)C2] ^ _exKey[keyCtr++];

Sb1(ref C0, ref C1, ref C2, ref C3);


LinearTransform(ref C0, ref C1, ref C2, ref C3);

C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2


[(byte)(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2
[(byte)(R3 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2
[(byte)(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2
[(byte)(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];

Sb2(ref R0, ref R1, ref R2, ref R3);


LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 = T0[C0 >> 24] ^ T1[(byte)(C1 >> 16)] ^ T2


[(byte)(C2 >> 8)] ^ T3[(byte)C3] ^ _exKey[keyCtr++];
R1 = T0[C1 >> 24] ^ T1[(byte)(C2 >> 16)] ^ T2
[(byte)(C3 >> 8)] ^ T3[(byte)C0] ^ _exKey[keyCtr++];
R2 = T0[C2 >> 24] ^ T1[(byte)(C3 >> 16)] ^ T2
[(byte)(C0 >> 8)] ^ T3[(byte)C1] ^ _exKey[keyCtr++];
R3 = T0[C3 >> 24] ^ T1[(byte)(C0 >> 16)] ^ T2
[(byte)(C1 >> 8)] ^ T3[(byte)C2] ^ _exKey[keyCtr++];

Sb3(ref C0, ref C1, ref C2, ref C3);


LinearTransform(ref C0, ref C1, ref C2, ref C3);

C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2


[(byte)(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2
[(byte)(R3 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2
[(byte)(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2
[(byte)(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];

Sb4(ref R0, ref R1, ref R2, ref R3);


LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 = T0[C0 >> 24] ^ T1[(byte)(C1 >> 16)] ^ T2


[(byte)(C2 >> 8)] ^ T3[(byte)C3] ^ _exKey[keyCtr++];
R1 = T0[C1 >> 24] ^ T1[(byte)(C2 >> 16)] ^ T2
[(byte)(C3 >> 8)] ^ T3[(byte)C0] ^ _exKey[keyCtr++];
R2 = T0[C2 >> 24] ^ T1[(byte)(C3 >> 16)] ^ T2

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 36 of 57

[(byte)(C0 >> 8)] ^ T3[(byte)C1] ^ _exKey[keyCtr++];


R3 = T0[C3 >> 24] ^ T1[(byte)(C0 >> 16)] ^ T2
[(byte)(C1 >> 8)] ^ T3[(byte)C2] ^ _exKey[keyCtr++];

Sb5(ref C0, ref C1, ref C2, ref C3);


LinearTransform(ref C0, ref C1, ref C2, ref C3);

C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2


[(byte)(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2
[(byte)(R3 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2
[(byte)(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2
[(byte)(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];

Sb6(ref R0, ref R1, ref R2, ref R3);


LinearTransform(ref R0, ref R1, ref R2, ref R3);

R0 = T0[C0 >> 24] ^ T1[(byte)(C1 >> 16)] ^ T2


[(byte)(C2 >> 8)] ^ T3[(byte)C3] ^ _exKey[keyCtr++];
R1 = T0[C1 >> 24] ^ T1[(byte)(C2 >> 16)] ^ T2
[(byte)(C3 >> 8)] ^ T3[(byte)C0] ^ _exKey[keyCtr++];
R2 = T0[C2 >> 24] ^ T1[(byte)(C3 >> 16)] ^ T2
[(byte)(C0 >> 8)] ^ T3[(byte)C1] ^ _exKey[keyCtr++];
R3 = T0[C3 >> 24] ^ T1[(byte)(C0 >> 16)] ^ T2
[(byte)(C1 >> 8)] ^ T3[(byte)C2] ^ _exKey[keyCtr++];

Sb7(ref C0, ref C1, ref C2, ref C3);


LinearTransform(ref C0, ref C1, ref C2, ref C3);

C0 = T0[R0 >> 24] ^ T1[(byte)(R1 >> 16)] ^ T2


[(byte)(R2 >> 8)] ^ T3[(byte)R3] ^ _exKey[keyCtr++];
C1 = T0[R1 >> 24] ^ T1[(byte)(R2 >> 16)] ^ T2
[(byte)(R3 >> 8)] ^ T3[(byte)R0] ^ _exKey[keyCtr++];
C2 = T0[R2 >> 24] ^ T1[(byte)(R3 >> 16)] ^ T2
[(byte)(R0 >> 8)] ^ T3[(byte)R1] ^ _exKey[keyCtr++];
C3 = T0[R3 >> 24] ^ T1[(byte)(R0 >> 16)] ^ T2
[(byte)(R1 >> 8)] ^ T3[(byte)R2] ^ _exKey[keyCtr++];
}

// Final Round
Output[OutOffset] = (byte)(SBox[C0 >> 24] ^ (byte)
(_exKey[keyCtr] >> 24));
Output[OutOffset + 1] = (byte)(SBox[(byte)(C1 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 2] = (byte)(SBox[(byte)(C2 >> 8)]
^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 3] = (byte)(SBox[(byte)C3] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 4] = (byte)(SBox[C1 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 5] = (byte)(SBox[(byte)(C2 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 6] = (byte)(SBox[(byte)(C3 >> 8)]
^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 7] = (byte)(SBox[(byte)C0] ^
(byte)_exKey[keyCtr++]);

Output[OutOffset + 8] = (byte)(SBox[C2 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 9] = (byte)(SBox[(byte)(C3 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 10] = (byte)(SBox[(byte)(C0 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 11] = (byte)(SBox[(byte)C1] ^
(byte)_exKey[keyCtr++]);

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 37 of 57

Output[OutOffset + 12] = (byte)(SBox[C3 >> 24] ^


(byte)(_exKey[keyCtr] >> 24));
Output[OutOffset + 13] = (byte)(SBox[(byte)(C0 >>
16)] ^ (byte)(_exKey[keyCtr] >> 16));
Output[OutOffset + 14] = (byte)(SBox[(byte)(C1 >>
8)] ^ (byte)(_exKey[keyCtr] >> 8));
Output[OutOffset + 15] = (byte)(SBox[(byte)C2] ^
(byte)_exKey[keyCtr]);
}

The cipher is invertible and has separate functions for


encryption and decryption, and can process both 16 byte and
32 byte block sizes.

TSM
This is Twofish and Serpent merged during rounds processing.
Is an invertible cipher that combines the ciphers in 4 round
loop cycles; two rounds of each:

private void Encrypt16(byte[] Input, Int32 InOffset,


byte[] Output, Int32 OutOffset)
{
Int32 keyCtr = 0;
Int32 T0, T1;
Int32 X0 = BytesToWord(Input, InOffset) ^ _exKey
[keyCtr++];
Int32 X1 = BytesToWord(Input, InOffset + 4) ^ _exKey
[keyCtr++];
Int32 X2 = BytesToWord(Input, InOffset + 8) ^ _exKey
[keyCtr++];
Int32 X3 = BytesToWord(Input, InOffset + 12) ^
_exKey[keyCtr];

keyCtr = 8;
int index = 0;

while (keyCtr < _exKey.Length)


{
// serpent sbox and transform
SuperBox(index++, ref X0, ref X1, ref X2, ref
X3);
LinearTransform(ref X0, ref X1, ref X2, ref X3);

// twofish round
T0 = Fe0(X0);
T1 = Fe3(X1);
X2 ^= T0 + T1 + _exKey[keyCtr++];
X2 = (Int32)((UInt32)X2 >> 1) | X2 << 31;
X3 = (X3 << 1 | (Int32)((UInt32)X3 >> 31)) ^ (T0
+ 2 * T1 + _exKey[keyCtr++]);

// serpent round
SuperBox(index++, ref X0, ref X1, ref X2, ref
X3);
LinearTransform(ref X0, ref X1, ref X2, ref X3);

// twofish round
T0 = Fe0(X2);
T1 = Fe3(X3);
X0 ^= T0 + T1 + _exKey[keyCtr++];
X0 = (Int32)((UInt32)X0 >> 1) | X0 << 31;
X1 = (X1 << 1 | (Int32)((UInt32)X1 >> 31)) ^ (T0

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 38 of 57

+ 2 * T1 + _exKey[keyCtr++]);

if (index > 7)
index = 0;
}

keyCtr = 4;
WordToBytes(X2 ^ _exKey[keyCtr++], Output,
OutOffset);
WordToBytes(X3 ^ _exKey[keyCtr++], Output, OutOffset
+ 4);
WordToBytes(X0 ^ _exKey[keyCtr++], Output, OutOffset
+ 8);
WordToBytes(X1 ^ _exKey[keyCtr], Output, OutOffset
+ 12);
}

Fusion
I think this is my favorite cipher in the library. If I had the
cure for baldness on my computer, this is what I would use to
encrypt it.. Fusion is a parallelized stream cipher; it it encrypts
a random 128 bit counter to create a key stream, used to
transform input data. The  pseudo random generator used to
create the key stream is a combination of the Rijndael and
Twofish ciphers:

private void CTransform(byte[] Input, Int32 InOffset,


byte[] Output, Int32 OutOffset)
{
Int32 keyCtr = 0;
Int32 M0, M1;

Int32 X0 = BytesToWord(Input, InOffset) ^ _exKey


[keyCtr++];
Int32 X1 = BytesToWord(Input, InOffset + 4) ^ _exKey
[keyCtr++];
Int32 X2 = BytesToWord(Input, InOffset + 8) ^ _exKey
[keyCtr++];
Int32 X3 = BytesToWord(Input, InOffset + 12) ^
_exKey[keyCtr];

keyCtr = 8;

Int32 X4 = (Int32)(T0[(byte)(X0 >> 24)] ^ T1[(byte)


(X1 >> 16)] ^ T2[(byte)(X2 >> 8)] ^ T3[(byte)X3]) ^
_exKey[keyCtr++];
Int32 X5 = (Int32)(T0[(byte)(X1 >> 24)] ^ T1[(byte)
(X2 >> 16)] ^ T2[(byte)(X3 >> 8)] ^ T3[(byte)X0]) ^
_exKey[keyCtr++];
Int32 X6 = (Int32)(T0[(byte)(X2 >> 24)] ^ T1[(byte)
(X3 >> 16)] ^ T2[(byte)(X0 >> 8)] ^ T3[(byte)X1]) ^
_exKey[keyCtr++];
Int32 X7 = (Int32)(T0[(byte)(X3 >> 24)] ^ T1[(byte)
(X0 >> 16)] ^ T2[(byte)(X1 >> 8)] ^ T3[(byte)X2]) ^
_exKey[keyCtr++];

while (keyCtr < _exKey.Length)


{
// rijndael round
X0 = (Int32)(T0[(byte)(X4 >> 24)] ^ T1[(byte)
(X5 >> 16)] ^ T2[(byte)(X6 >> 8)] ^ T3[(byte)X7]) ^
_exKey[keyCtr++];

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 39 of 57

X1 = (Int32)(T0[(byte)(X5 >> 24)] ^ T1[(byte)(X6


>> 16)] ^ T2[(byte)(X7 >> 8)] ^ T3[(byte)X4]) ^ _exKey
[keyCtr++];
X2 = (Int32)(T0[(byte)(X6 >> 24)] ^ T1[(byte)(X7
>> 16)] ^ T2[(byte)(X4 >> 8)] ^ T3[(byte)X5]) ^ _exKey
[keyCtr++];
X3 = (Int32)(T0[(byte)(X7 >> 24)] ^ T1[(byte)(X4
>> 16)] ^ T2[(byte)(X5 >> 8)] ^ T3[(byte)X6]) ^ _exKey
[keyCtr++];

// twofish round
M0 = Fe0(X0);
M1 = Fe3(X1);
X2 ^= M0 + M1 + _exKey[keyCtr++];
X2 = (Int32)((UInt32)X2 >> 1) | X2 << 31;
X3 = (X3 << 1 | (Int32)((UInt32)X3 >> 31)) ^ (M0
+ 2 * M1 + _exKey[keyCtr++]);

X4 = (Int32)(T0[(byte)(X0 >> 24)] ^ T1[(byte)(X1


>> 16)] ^ T2[(byte)(X2 >> 8)] ^ T3[(byte)X3]) ^ _exKey
[keyCtr++];
X5 = (Int32)(T0[(byte)(X1 >> 24)] ^ T1[(byte)(X2
>> 16)] ^ T2[(byte)(X3 >> 8)] ^ T3[(byte)X0]) ^ _exKey
[keyCtr++];
X6 = (Int32)(T0[(byte)(X2 >> 24)] ^ T1[(byte)(X3
>> 16)] ^ T2[(byte)(X0 >> 8)] ^ T3[(byte)X1]) ^ _exKey
[keyCtr++];
X7 = (Int32)(T0[(byte)(X3 >> 24)] ^ T1[(byte)(X0
>> 16)] ^ T2[(byte)(X1 >> 8)] ^ T3[(byte)X2]) ^ _exKey
[keyCtr++];

M0 = Fe0(X6);
M1 = Fe3(X7);
X4 ^= M0 + M1 + _exKey[keyCtr++];
X4 = (Int32)((UInt32)X4 >> 1) | X4 << 31;
X5 = (X5 << 1 | (Int32)((UInt32)X5 >> 31)) ^ (M0
+ 2 * M1 + _exKey[keyCtr++]);
}

keyCtr = 4;
WordToBytes(X2 ^ _exKey[keyCtr++], Output,
OutOffset);
WordToBytes(X3 ^ _exKey[keyCtr++], Output, OutOffset
+ 4);
WordToBytes(X0 ^ _exKey[keyCtr++], Output, OutOffset
+ 8);
WordToBytes(X1 ^ _exKey[keyCtr], Output, OutOffset
+ 12);
}

Within the main loop a round of Rijndael is processed and the


product of that round undergoes a full round of Twofish,
including the working key addition.

ChaCha+ and Salsa+


Both ChaCha and Salsa use a 512 bit state engine similar to a
hash function to create a key stream. This pseudo random
stream is Xored with the plaintext input to create the
ciphertext. They have no inverse function, as stream ciphers,
the encrypted ciphertext is Xored with the keystream to
produce plaintext in decryption. The 64 bytes of state used by
the cipher are added to a 16 member integer array, the state.

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 40 of 57

These integers are Xored, left rotated by varying degrees, and


added together to form a keystream, which in turn is Xored
with the input. The state itself is created from the user key; up
to 32 bytes from the user key, 8 bytes from an initialization
vector, an 8 byte counter, and a 16 byte ascii string called
Sigma or Tau (256 or 128 bit key versions). In this
implementation, the maximum key length is 448 bits, with an
64 bit IV, so the entire state can now be initialized via a user
generated key. To do this, (replace the sigma constant and the
zero initialized incrementing state counter), one has to be able
to guarantee a minimum asymmetry between state variables; I
do this by testing the bytes used for the nonce portion of the
input key for a symmetry no greater than what is contained in
the sigma constant: "expand 32-byte k". So no more than 2
repeating bytes, two repetitions, and at a distance of no less
than 5 array intervals. If the keying material is insufficiently
asymmetric, the state is copied into a temp variable, and
passed through the core function, and those bytes that go to
create the nonce portion of the key are copied from the
output into the state array. IV and counter are also compared
for equality, and replaced with the hashed output from the
core function if necessary. This is all done in the CreateNonce
() function called at the bottom of the key scheduler, if the key
bytes do not meet a minimum asymmetry requirement.

private void CreateNonce()


{
// Process engine state to generate key
int stateLen = _State.Length;
Int32[] chachaOut = new Int32[stateLen];
Int32[] stateTemp = new Int32[stateLen];

// copy state
Buffer.BlockCopy(_State, 0, stateTemp, 0, stateLen
* 4);

// create a new nonce with core


SalsaCore(20, stateTemp, chachaOut);

// copy new nonce to state


_State[0] = chachaOut[0];
_State[5] = chachaOut[5];
_State[10] = chachaOut[10];
_State[15] = chachaOut[15];

// check for unique counter


if (_State[8] == _State[6])
_State[8] = chachaOut[8];
if (_State[9] == _State[7])
_State[9] = chachaOut[9];
}

DCS
DCS is a stream cipher that uses two Rijndael streams in an
AES configuration; that is a 256 bit key and 128 bit block size.

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 41 of 57

It creates two AES SIC (Segmented Integer Counter) streams


using unique keys and 128 bit counters. These two
independent streams are combined using a logical exclusive
OR operation (XOR) to produce a pseudo random output
stream. That stream is than XOR’d again with the input data to
produce the cipher text. DCS uses a single 768 bit key to
generate the random stream, making it impervious to brute
force attacks. It is also automatically parallelized, intended to
run at high speed on multi processer systems.

Initialization

The Init() method tests for weak and invalid keys; this is
because DCS requires a strong key, one that does not contain
repeating sequences or high numbers of repeating bytes. Keys
should be created with a hash function or PRNG as
demonstrated in the class: Crypto\Helpers\KeyGenerator.cs.

The ZerosFrequency() test counts the frequency of zero bytes


in the 96 byte key, the EvaluateSeed() method tests for
recurring byte frequency as well as the frequency
of ascending 4 byte pattern runs in the seed.

public void Init(byte[] Seed)


{
if (Seed == null)
throw new ArgumentOutOfRangeException("Invalid
seed! Seed can not be null.");
if (Seed.Length != 96)
throw new ArgumentOutOfRangeException("Invalid
seed size! Seed must be 96 bytes.");
if (ZerosFrequency(Seed) > 32)
throw new ArgumentException("Bad seed! Seed
material contains too many zeroes.");
if (!EvaluateSeed(Seed))
throw new ArgumentException("Bad seed! Seed
material contains repeating squence.");

// copy seed
Buffer.BlockCopy(Seed, 0, _seedBuffer, 0,
_seedBuffer.Length);

byte[] keyBuffer1 = new byte[KEY_BYTES];


byte[] keyBuffer2 = new byte[KEY_BYTES];

// copy seed to keys


Buffer.BlockCopy(Seed, 0, keyBuffer1, 0, KEY_BYTES);
Buffer.BlockCopy(Seed, KEY_BYTES, keyBuffer2, 0,
KEY_BYTES);

if (keyBuffer1.SequenceEqual(keyBuffer2))
throw new ArgumentException("Bad seed! Seed
material is a repeating sequence.");

// copy seed to counters


Buffer.BlockCopy(Seed, KEY_BYTES * 2, _ctrBuffer1,
0, BLOCK_SIZE);
Buffer.BlockCopy(Seed, (KEY_BYTES * 2) + BLOCK_SIZE,
_ctrBuffer2, 0, BLOCK_SIZE);

if (_ctrBuffer1.SequenceEqual(_ctrBuffer2))

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 42 of 57

throw new ArgumentException("Bad seed! Seed


material is a repeating sequence.");

// expand AES keys


_exKey1 = ExpandKey(keyBuffer1);
_exKey2 = ExpandKey(keyBuffer2);
}

The first 64 bytes of the seed or 'key' is then copied into two


32 byte key buffers. These buffers are first checked for
equality before being used to create the two unique AES
working keys (exKey1 and exKey2). The remaining 32 bytes is
copied into two 128 bit segmented counters and checked for
equality.

Random Generation

The method Generate(), takes the expected return size in


bytes, and two 128 bit counters as parameters, and returns a
number of pseudo-random bytes. This p-rand is generated by
calling the diffusion algorithm: AesTransform(Ci, T, Ki)  twice,
with unique keys and counters derived in the Init() method.
The output from these calls (the encrypted counter block
arrays), is then Xor'd and added to the output array. Both
segmented counters are incremented on each iteration of the
processing loop, (every 16 bytes).

private byte[] Generate(Int32 Size, byte[] Ctr1, byte[]


Ctr2)
{
// align to upper divisible of block size
Int32 alignedSize = (Size % BLOCK_SIZE == 0 ?
Size : Size + BLOCK_SIZE - (Size % BLOCK_SIZE));
Int32 lastBlock = alignedSize - BLOCK_SIZE;
byte[] randBlock1 = new byte[BLOCK_SIZE];
byte[] randBlock2 = new byte[BLOCK_SIZE];
byte[] outputData = new byte[Size];

for (int i = 0; i < alignedSize; i += BLOCK_SIZE)


{
// encrypt counter1 (aes: ctr1, out1, key1)
AesTransform(Ctr1, randBlock1, _exKey1);
// encrypt counter2 (aes: ctr2, out2, key2)
AesTransform(Ctr2, randBlock2, _exKey2);

// xor the two transforms


for (int j = 0; j < BLOCK_SIZE; j++)
randBlock1[j] ^= randBlock2[j];

// copy to output
if (i != lastBlock)
{
// copy transform to output
Buffer.BlockCopy(randBlock1, 0, outputData,
i, BLOCK_SIZE);
}
else
{
// copy last block
int finalSize = (Size % BLOCK_SIZE) == 0 ?
BLOCK_SIZE : (Size % BLOCK_SIZE);
Buffer.BlockCopy(randBlock1, 0, outputData,

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 43 of 57

i, finalSize);
}

// increment counters
Increment(Ctr1);
Increment(Ctr2);
}

return outputData;
}

This combination of independent pseudo-random


permutations has been explored for some years now,
including a paper by Stefan Lucks; The sum of PRPs is a secure
PRF. That paper gives as a security bound for a sum of two
3 2n−1
independent PRPs q /2 , where q  is the number of
queries and n the block size (i.e. 128 for AES).

This means that this combining of independent pseudo


random streams is more secure than using a single PRP, for
2 n
which the bound is q /2 . If you wanted to give an adversary
−k
an advantage at most 2    then you could use the sum 2
(2n−k) /3 64
times. For e.g. k=64  that's 2 , which should be
enough (256 Exbibytes). In comparison, with a single PRP you
32
could only use it 2  times (32 Gibibytes).

So aside from the longer key length providing more resistance


against brute force attacks, this algorithm also has the
advantage of providing a much longer period between
necessary rekeying of the stream, which means larger data
sets or streams can be safely encrypted with the same key.
Another advantage is that because DCS is using an AES
configuration, it could be ported to C/C++ and made to
leverage the AES Instruction set; AES-NI.

Parallel Processing

The minimum input size that triggers parallel processing is


defined as the MinParallelSize property, which is 1024
bytes. Data blocks of this size or greater will be processed in
parallel using a Parallel For loop. The input data is sub divided
into chunks divisible by the system processer count, with each
chunk processed on its own thread inside the loop. The
segmented counters are created in a jagged array inside the
loop, with each counter member incremented to an offset equal
to the chunk size multiplied by the value of the loop iterator. Two
distinct counters are offset and passed to the Generate() method
which returns the pseudo random output. That random array is
then Xor'd with the input bytes at the corresponding offset to
create the output.

public void Transform(byte[] Input, byte[] Output)


{
if (Output.Length < 1)
throw new ArgumentOutOfRangeException("Invalid

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 44 of 57

output array! Size can not be less than 1 byte.");


if (Output.Length > Input.Length)
throw new ArgumentOutOfRangeException("Invalid
input array! Input array size can not be smaller than
output array size.");

int outputSize = Output.Length;

if (!this.IsParallel || outputSize <


this.MinParallelSize)
{
// generate random
byte[] random = Generate(outputSize,
_ctrBuffer1, _ctrBuffer2);

// output is input xor with random


for (int i = 0; i < outputSize; i++)
Output[i] = (byte)(Input[i] ^ random[i]);
}
else
{
// parallel ctr processing //
int count = this.ProcessorCount;
int dimensions = count * 2;
int alignedSize = outputSize / BLOCK_SIZE;
int chunkSize = (alignedSize / count) *
BLOCK_SIZE;
int roundSize = chunkSize * count;
int subSize = (chunkSize / 16);

// create jagged array of 'sub counters'


byte[][] counters = new byte[dimensions][];

// create random and xor to output in parallel


System.Threading.Tasks.Parallel.For(0, count, i
=>
{
// offset first counter by i * (chunk size /
block size)
counters[i] = Increase(_ctrBuffer1, subSize
* i);
// offset the second counter
counters[count + i] = Increase(_ctrBuffer2,
subSize * i);

// create random with counter offsets


byte[] random = Generate(chunkSize, counters
[i], counters[i + count]);
int offset = i * chunkSize;

// xor with input at index offset


for (int j = 0; j < chunkSize; j++)
Output[j + offset] = (byte)(Input[j +
offset] ^ random[j]);
});

// last block processing


if (roundSize < outputSize)
{
int finalSize = outputSize % roundSize;
byte[] random = Generate(finalSize, counters
[count - 1], counters[dimensions - 1]);

for (int i = 0; i < finalSize; i++)


Output[i + roundSize] = (byte)(Input[i +
roundSize] ^ random[i]);
}

// copy the last counter positions to class

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 45 of 57

variables
Buffer.BlockCopy(counters[count - 1], 0,
_ctrBuffer1, 0, _ctrBuffer1.Length);
Buffer.BlockCopy(counters[dimensions - 1], 0,
_ctrBuffer2, 0, _ctrBuffer2.Length);
}
}

API

Properties:
Get/Set Automatic processor parallelization
public bool IsParallel { get; set; }

Get Minimum input size to trigger parallel processing


public int MinParallelSize { get; }

Get Cipher name


public string Name { get; }

Get Key Size in bits; 768 or 96 bytes


public Int32 KeySize { get; }

Public Methods:
Constructor: Initialize the class
public DCS()

Init: Initialize the algorithm, must be called before processing


Key: 96 byte (768 bit) random seed value
public void Init(byte[] Key)

Transform: Encrypt/Decrypt an array of bytes


Input: Input bytes, plain text for encryption, cipher text for
decryption
Output: Output bytes, array of at least equal size of input that
receives processed bytes
public void Transform(byte[] Input, byte[] Output)

Transform: Encrypt/Decrypt an array of bytes


Input: Input bytes, plain text for encryption, cipher text for
decryption
InOffset: Offset within the Input array
Output: Output bytes, array of at least equal size of input that
receives processed bytes
OutOffset: Offset within the Output array
public void Transform(byte[] Input, int InOffset,
byte[] Output, int OutOffset)

Dispose Release resources used by this class


public void Dispose()

Example Implementation

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 46 of 57

The implementation is a basic example of a standalone


encryptor. It is just for demonstration purposes and as such
lacks many of the components required to make a secure
implementation; error handling, validation checks,
authentication controls, key management, logging and
notifications etc.

Key and Vector Creation

Keys and vectors are created by the KeyGenerator class using


a combination of an SHA-HMAC and the .Net
RNGCryptoServiceProvider. Using RngCrypto should
be more than sufficient for creating keying material, but in the
context of a document encryptor, the additional time
expended creating the key is negligible, and the extra layer of
security provided is a reasonable expense. There are two
choices of key creation engines: one uses bytes derived by an
SHA-2 HMAC, the other 'Ng' uses a SHA-3 HMAC. I did this
because I felt that it might be more secure generating keys for
the HX ciphers using a different hashing algorithm (SHA-3).

/// <summary>
/// Get a random seed value
/// </summary>
/// <returns>64 bytes of p-rand</returns>
internal static byte[] GetSeed64()
{
byte[] data = GetRngBytes(256);
byte[] key = GetRngBytes(64);

using (SHA512HMAC hmac = new SHA512HMAC(key))


return hmac.ComputeMac(data);
}

/// <summary>
/// Get a random seed value using an SHA3-512 HMAC
/// </summary>
/// <returns>64 bytes of p-rand</returns>
internal static byte[] GetSeed64Ng()
{
byte[] data = GetRngBytes(144); // 2x block per
Nist sp800-90b
byte[] key = GetRngBytes(64); // key size per
rfc 2104

using (HMAC hmac = new HMAC(new Digests.SHA3Digest


(512), key))
return hmac.ComputeMac(data);
}

Headers

Key Header
Both the key and message contain headers that provide some
information to the encryptor. The key header contains fields
that are used by the encryptor to determine settings;
algorithm, cipher mode, padding scheme and block size. It

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 47 of 57

contains a unique 16 byte key identity field, 16 bytes of


p-rand used to encrypt the file extension, and a 128 byte field
used to store a secret key used in the HMAC message
authentication.

[Serializable]
[StructLayout(LayoutKind.Sequential)]
internal struct KeyHeaderStruct
{
internal Int32 Engine;
internal Int32 KeySize;
internal Int32 IvSize;
internal Int32 CipherMode;
internal Int32 PaddingMode;
internal Int32 BlockSize;
internal Int32 RoundCount;
[MarshalAs(UnmanagedType.ByValArray, SizeConst =
16)]
internal byte[] KeyID;
[MarshalAs(UnmanagedType.ByValArray, SizeConst =
16)]
internal byte[] ExtRandom;
[MarshalAs(UnmanagedType.ByValArray, SizeConst =
64)]
internal byte[] MessageKey;

internal KeyHeaderStruct(Engines engine, KeySizes


keySize, IVSizes ivSize, CipherModes cipher,
PaddingModes padding, BlockSizes block, RoundCounts
round)
{
this.Engine = (Int32)engine;
this.KeySize = (Int32)keySize;
this.IvSize = (Int32)ivSize;
this.CipherMode = (Int32)cipher;
this.PaddingMode = (Int32)padding;
this.BlockSize = (Int32)block;
this.RoundCount = (Int32)round;
this.KeyID = Guid.NewGuid().ToByteArray();
this.ExtRandom = KeyGenerator.GetSeed16();
this.MessageKey = KeyGenerator.GetSeed64();
}
}

Message Header
The message header contains the identity field of the key
used to encrypt the message, a 16 byte field that contains the
encrypted file extension, and a 64 byte value that stores the
HMAC hash of the cipher-text.

[Serializable]
[StructLayout(LayoutKind.Sequential)]
internal struct MessageHeaderStruct
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst =
16)]
internal byte[] MessageID;
[MarshalAs(UnmanagedType.ByValArray, SizeConst =
16)]
public byte[] Extension;
[MarshalAs(UnmanagedType.ByValArray, SizeConst =
64)]
internal byte[] MessageHash;

internal MessageHeaderStruct(byte[] messageID, byte

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 48 of 57

[] messageHash, byte[] extension)


{
this.MessageID = messageID;
this.Extension = new byte[16];
extension.CopyTo(Extension, 0);
this.MessageHash = messageHash;
}
}

Message Authentication

When a message is encrypted, a hash value is calculated from


the ciphertext using an SHA512 HMAC. An HMAC (Hash
based Message Authentication Code) creates a unique code
for the output data using a combination of a hash function
mixed with a secret key. The secret key is a p-rand 128 byte
array stored in the key header. Before a message is decrypted,
the secret key is extracted from the key header, and the
HMAC is used to test the integrity of the encrypted message.
The HMAC and SHA classes in the example were adapted
from the Bouncy Castle library, with just a few changes for
format, and ComputeHash and Dispose methods added.

Creating a Checksum

internal byte[] GetChecksum(string FilePath, byte[]


HashKey)
{
using (SHA512HMAC hmac = new SHA512HMAC(HashKey))
{
int blockSize = hmac.BlockSize;
byte[] buffer = new byte[blockSize];
byte[] chkSum = new byte[hmac.DigestSize];

using (BinaryReader inputReader = new


BinaryReader(new FileStream(FilePath, FileMode.Open,
FileAccess.Read, FileShare.None)))
{
inputReader.BaseStream.Position =
MessageHeader.GetHeaderSize;
int bytesRead = 0;

while ((bytesRead = inputReader.Read


(buffer, 0, blockSize)) == blockSize)
hmac.BlockUpdate(buffer, 0, bytesRead);

if (bytesRead > 0)
hmac.BlockUpdate(buffer, 0, bytesRead);

hmac.DoFinal(chkSum, 0);
}

return chkSum;
}
}

In the example, a checksum is created by moving through the


file using the BlockUpdate method. After the blocks have
been processed a call to DoFinal returns the hash value. For
large files a progress indicator should be added to the
example.

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 49 of 57

Verifying a File

internal bool Verify(string InputPath, string KeyPath)


{
byte[] hashKey = KeyHeader.GetMessageKey(KeyPath);
byte[] msgHash = MessageHeader.GetMessageHash
(InputPath);
byte[] hash = GetChecksum(InputPath, hashKey);

return IsEqual(msgHash, hash);


}

To test the file, the hash key is extracted from the key file, a
new HMAC hash code is calculated, and the two are
compared for equality.

The Transform Class

The Transform class is a wrapper for the encryption api. The


class constructor takes the key file path as an argument, and
uses the key header information to initialize the correct
algorithm and settings. The class contains two public methods
Encrypt() and Decrypt(), both methods take the input and
output file paths as arguments.
RDX and RSX are both block ciphers, whereas DCS is a stream
cipher. Because they require different input sizes to operate,
(DCS requires a minimum of 1024 bytes to trigger parallel
processing), they need to be implemented a bit differently.
The same is true for the parallel counter mode PSC, which also
requires a minimum 1024 byte block for parallelization.

Processing RDX/RSX

private void EncryptRX(string InputPath, string


OutputPath, MemoryStream Header)
{
using (BinaryReader inputReader = new BinaryReader
(new FileStream(InputPath, FileMode.Open,
FileAccess.Read, FileShare.None)))
{
byte[] inputBuffer = new byte[this.BlockSize];
byte[] outputBuffer = new byte[this.BlockSize];
long bytesRead = 0;
long bytesTotal = 0;

using (BinaryWriter outputWriter = new


BinaryWriter(new FileStream(OutputPath, FileMode.Create,
FileAccess.Write, FileShare.None)))
{
// write message header
outputWriter.Write(Header.ToArray());

while ((bytesRead = inputReader.Read


(inputBuffer, 0, this.BlockSize)) == this.BlockSize)
{
this.Cipher.Transform(inputBuffer,
outputBuffer);
outputWriter.Write(outputBuffer);
bytesTotal += bytesRead;

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 50 of 57

if (bytesTotal % this.ProgressInterval
== 0)
CalculateProgress(bytesTotal);
}

if (bytesRead > 0)
{
if (bytesRead < this.BlockSize)
Padding.AddPadding(inputBuffer,
(int)bytesRead);

this.Cipher.Transform(inputBuffer,
outputBuffer);
outputWriter.Write(outputBuffer);
CalculateProgress(bytesTotal +
bytesRead);
}
}
}
}

A binary writer first writes the serialized message header to


the beginning of the encrypted output file, a while loop is
used to transform the input in block size byte arrays and write
them to the output file. When the loop reaches the end of the
input array, if the bytes read are less than the block size but
more than zero, padding is added to the file to align it to the
block size. The decryption function is similar, but tests for
padding and removes it from the output.

Processing DCS

private void ProcessDCS(string InputPath, string


OutputPath, MemoryStream Header = null)
{
using (BinaryReader inputReader = new BinaryReader
(new FileStream(InputPath, FileMode.Open,
FileAccess.Read, FileShare.None)))
{
int blockSize = (DCS_BLOCK * 4);
long bytesRead = 0;
long bytesTotal = 0;

if (inputReader.BaseStream.Length < blockSize)


blockSize = (int)
inputReader.BaseStream.Length;

using (BinaryWriter outputWriter = new


BinaryWriter(new FileStream(OutputPath, FileMode.Create,
FileAccess.Write, FileShare.None)))
{
byte[] inputBuffer = new byte[blockSize];
byte[] outputBuffer = new byte[blockSize];

if (Header != null)
outputWriter.Write(Header.ToArray());
else
inputReader.BaseStream.Position =
MessageHeader.GetHeaderSize;

using (DCS dcs = new DCS())


{
dcs.Init(this.Key);

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 51 of 57

while ((bytesRead = inputReader.Read


(inputBuffer, 0, blockSize)) == blockSize)
{
dcs.Transform(inputBuffer,
outputBuffer);
outputWriter.Write(outputBuffer);
bytesTotal += bytesRead;

if (bytesTotal %
this.ProgressInterval == 0)
CalculateProgress(bytesTotal);
}

if (bytesRead > 0)
{
outputBuffer = new byte[bytesRead];
dcs.Transform(inputBuffer,
outputBuffer);
outputWriter.Write(outputBuffer);
CalculateProgress(bytesTotal +
bytesRead);
}
}
}
}
}

With this example of DCS the input size is set to 4096 bytes.
Depending on if encryption or decryption is being used, the
message header is either written to the output array or the
input file pointer is moved to the end of the header. The input
is then transformed through the while loop. Because this is a
stream cipher, no padding is required, and the last data
segment is simply transformed and written to the output file.

Processing in PSC mode

private void ProcessPSC(string InputPath, string


OutputPath, MemoryStream Header = null)
{
using (BinaryReader inputReader = new BinaryReader
(new FileStream(InputPath, FileMode.Open,
FileAccess.Read, FileShare.None)))
{
// PSC requires min. 1024 byte block to
parallelize,
// and block must be divisible of 1024
int blockSize = PSC.MinParallelSize;
long bytesRead = 0;
long bytesTotal = 0;

if (inputReader.BaseStream.Length < blockSize)


blockSize = (int)
inputReader.BaseStream.Length;

using (BinaryWriter outputWriter = new


BinaryWriter(new FileStream(OutputPath, FileMode.Create,
FileAccess.Write, FileShare.None)))
{
byte[] inputBuffer = new byte[blockSize];
byte[] outputBuffer = new byte[blockSize];

if (Header != null)
outputWriter.Write(Header.ToArray());
else

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 52 of 57

inputReader.BaseStream.Position =
MessageHeader.GetHeaderSize;

while ((bytesRead = inputReader.Read


(inputBuffer, 0, blockSize)) == blockSize)
{
this.Cipher.Transform(inputBuffer,
outputBuffer);
outputWriter.Write(outputBuffer);
bytesTotal += bytesRead;

if (bytesTotal % this.ProgressInterval
== 0)
CalculateProgress(bytesTotal);
}

if (bytesRead > 0)
{
outputBuffer = new byte[blockSize];
this.Cipher.Transform(inputBuffer,
outputBuffer);
outputWriter.Write(outputBuffer, 0,
(int)bytesRead);
CalculateProgress(bytesTotal +
bytesRead);
}
}
}
}

PSC requires a minimum 1024 bytes of input to trigger


parallel processing. It is similar to processing DCS, in that
because it is running like a stream cipher, no padding of the
output is required.

PSC (Parallel Segmented Counter) Mode

PSC is a parallel CTR mode that works in a way similar to DCS,


it takes a segmented integer counter, and creates sub-
counters offset at intervals equal to the chunk size * the
Parallel loop iterator. These chunks which are a division of the
input size / processor count, are then processed in parallel.
If the input size is less than the parallel threshold, the input is
processed using a standard CTR configuration.

public void Transform(byte[] Input, byte[] Output)


{
if (Output.Length < 1)
throw new ArgumentOutOfRangeException("Invalid
output array! Size can not be less than 1 byte.");
if (Output.Length > Input.Length)
throw new ArgumentOutOfRangeException("Invalid
input array! Input array size can not be smaller than
output array size.");

int outputSize = Output.Length;

if (!this.IsParallel || outputSize < MIN_PARALLEL)


{
// generate random
byte[] random = Generate(outputSize,
_pscVector);

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 53 of 57

// output is input xor with random


for (int i = 0; i < outputSize; i++)
Output[i] = (byte)(Input[i] ^ random[i]);
}
else
{
// parallel ctr processing //
int count = this.ProcessorCount;
int alignedSize = outputSize / _blockSize;
int chunkSize = (alignedSize / count) *
_blockSize;
int roundSize = chunkSize * count;
int subSize = (chunkSize / _blockSize);

// create jagged array of 'sub counters'


byte[][] counters = new byte[count][];

// create random, and xor to output in parallel


System.Threading.Tasks.Parallel.For(0, count, i
=>
{
// offset counter by chunk size / block size
counters[i] = Increase(_pscVector, subSize
* i);
// create random with offset counter
byte[] random = Generate(chunkSize, counters
[i]);
int offset = i * chunkSize;

// xor with input at offset


for (int j = 0; j < chunkSize; j++)
Output[j + offset] = (byte)(Input[j +
offset] ^ random[j]);
});

// last block processing


if (roundSize < outputSize)
{
int finalSize = outputSize % roundSize;
byte[] random = Generate(finalSize, counters
[count - 1]);

for (int i = 0; i < finalSize; i++)


Output[i + roundSize] = (byte)(Input[i +
roundSize] ^ random[i]);
}

// copy the last counter position to class


variable
Buffer.BlockCopy(counters[count - 1], 0,
_pscVector, 0, _pscVector.Length);
}
}

The parallel 'chunks' of p-rand are created with the Generate()


method, using counters offset by the loop iterator * chunk size.
This p-rand is then Xored with the corresponding block
of input, and added to the Output array.  The last block is
processed if the input size does not align evenly with the
block size, then the counter in its new position is copied to
the class variable; _pscVector.

Tests

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 54 of 57

There are a number of different tests included in the project


used to verify the validity of the RDX, SPX implementations in
standard configurations (128, 192, and 256 bit keys). There are
also I/O, mode equality, padding, HMAC, HKDF, SHA
KAT tests, and performance comparisons;

• AESAVS Known answer tests; the output from a


transformation is known, given set parameters of key,
iv, or plaintext. The full plaintext and key vectors from
the AESAVS set, used for certifying an AES
implementation are used, 960 vector tests in total.
• AES monte carlo tests; defined in the AES specification
Fips 197, and the vectors created by Brian Gladman.
• ChaCha and Salsa20; KAT tests from the bouncy castle
jdk 1.51 implementation for both ChaCha and Salsa20.
• HKDF; the set of known answer tests from the
HKDF RFC 5869 are used to test the HKDF
implementation.
• HMAC; Known answer tests from RFC 4231 are used to
test the HMAC implementation.
• Cipher Modes; The full set of vectors for ECB, CBC and
CTR modes from Nist SP800-38A are tested.
• PSC Equality; The parallel counter mode is compared
with output from a standard CTR.
• I/O; Tests output through accessor methods within the
standard block cipher implementation using RDX with
a series of KAT tests..
• Rijndael Vector; known answer testing 256 bit block
size using test vectors derived from Bouncy Castle
RijndaelTest.cs and the Nessie unverified vectors.
• Serpent Key; Compares the RSX key scheduler output
to a standard Serpent scheduler for a byte level
equivalency.
• Serpent Vector; The complete set of Nessie verified
vectors including 100 and 1000 round monte carlo
tests, 2865 vectors.
• SHA-2; Vector: KAT tests used in the NIST SHA test
vectors document supplement.
• SHA-3; Vectors from the Bouncy Castle jdk 1.51 SHA-3
tests including Nist vectors.

Updates
• November 21, 2004: Added SHX, RHX, SPX, ChaCha and
Salsa implementations
• November 23: Various performance optimizations
added.

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 55 of 57

Conclusion
In the spring of 1946 work on the ENIAC computer was
completed. Newspapers around the globe heralded it as the
“Super Brain” and the “Einstein Machine”. It was the most
powerful computer ever built; and had more than 17,000
vacuum tubes, 7,200 crystal diodes, 1,500 relays, 70,000
resistors, 10,000 capacitors and around 5 million hand-
soldered joints. It weighed 30 tons, took up 1800 square feet,
and consumed 150 kW of power. It was capable of calculating
an astounding 385 multiplication operations per second.

Imagine that you were one of the designers, out for a few
pints with fellow engineers and scientists, and you proposed
that in just 25 years, anyone could walk into a Sears store and
buy a 10 dollar portable calculator with thousands of times
the computational power and speed. I think you would have
been greeted with much skepticism; ‘impossible’, ‘infeasible’,
‘transistors and circuit pathways cannot be made that small’..
and you would have been subjected to a barrage of scientific
theories positing that such a thing could never happen.. at
least, not for a hundred years or so..

In a recent article on wired, John Martinis, one of the foremost


experts on quantum computers, states that one of
the objectives of the new Google Quantum AI lab, is to
double the number of qubits each year. Recently another
breakthrough in how quantum states are measured could
prove to be 350 times faster than current methods..
breakthroughs of this kind are happening ever more
frequently as our understanding of quantum processes
continues to grow. So, at this ever accelerating rate, how long
will it be before computers exist that will be capable of the
enormous processing power required to break current
encryption technology? It is impossible to say with any
certainty, but a major breakthrough could put this in reach,
possibly much sooner than expected. So when you hear of the
improbability of brute forcing a 256 bit key, remember the
ENIAC, and consider how far we have advanced technology in
the last 100 years.

There is the argument against stronger encryption, often


linked to the idea that state agencies are developing a mass
surveillance apparatus for our own protection, that facilities
like this one in Utah, will be used only to target criminals and
terrorists, and that strong encryption hampers their efforts. I
think most people understand that this is not strictly the
case, that the technology could be forged into some system
of people control, and that these agencies intentions are at
best unclear. The people they propose to target like criminals
and terrorists; the worst of which don't use electronic

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 56 of 57

communications at all, or they have developed effective


evasion strategies, and have access to unbreakable and
undetectable methods like One Time Pads and
steganography.

I think that we have advanced so quickly over the past 100


years because we are living in an age of unparalleled personal
freedoms, freedom to express our ideas and communicate
them without fear of interference or reprisal. This has been a
chief driver in the forward progression of our society, and
these freedoms need to be preserved if we are to maintain
that forward momentum, and hopefully, create a better
society for future generations. Encryption technologies play a
pivotal role in that future, and I believe we should be striving
towards technologies that protect information for the full
measure of a human lifetime, that all forms of
electronic communication should incorporate strong
encryption technology as a matter of standard, and that these
technologies should constantly be compared to, and evolved
against the projected rate of technological change.

So.. hope you enjoyed the article, leave a comment, or if it's


technical, you can email me through this, or my site.

Cheers,
John

License
This article, along with any associated source code and files, is
licensed under The Code Project Open License (CPOL)

Share

About the Author

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014
Cipher EX V1.2 - CodeProject Page 57 of 57

John Underhill
Network Administrator vtdev.com
Canada

Network and programming specialist. Started in C, and have


learned about 14 languages since then. Cisco programmer,
and lately writing a lot of C# and WPF code, (learning Java
too). If I can dream it up, I can probably put it to code. My
software company, (VTDev), is on the verge of releasing a
couple of very cool things.. keep you posted.

Comments and Discussions


20 messages have been posted for this article Visit
http://www.codeproject.com/Articles/828477/Cipher-EX-V to
post and view comments on this article, or click here to get a print
view with messages.

Permalink | Advertise | Privacy | Terms of Use | Mobile Article Copyright 2014 by John Underhill
Web04 | 2.8.141223.1 | Last Updated 25 Dec 2014 Everything else Copyright © CodeProject, 1999-2014

http://www.codeproject.com/Articles/828477/Cipher-EX-V?display=Print 26.12.2014

You might also like