You are on page 1of 7

Accelerating in Cryptosystem with Simultaneously Encryptions/Decryptions HW-Threads and Self-Dynamic Reconfiguration

Trong-Tuan NGUYEN Acronics Systems, Inc Tuan. T.Nguyen@acronics.com Van-Cuong NGUYEN Mai-Duyen Le NGUYEN Hung-Manh PHAM StudentMember,IEEE DUY TAN University, Vietnam manh.et2@gmail.com

Faculty of Electronics & Telecommunications DANANG University of Technology, Vietnam

ngvancuong2000@gmail.com

nguyenlmaiduyen@dtu.edu.vn

Abstract - The security information on the Ethernet


environment is always critical problems. With the growing up processing of new microprocessors model, the hazards from attacking and stealing the information become the very closely threats. In this paper, we introduce a proposal the novel Cryptosystem with Simultaneously Cipher Hardware Threads and Partial Dynamically Reconfiguration (PDR) engine, with new features, the novel Cryptosystem enhances the level security, speeds up to process information, adaptive

While waiting the new Encryption algorithm that more confidence than AES, the problem brings forward that needing a mechanism of multiple cipher or cascade cipher to enhances the level security of system that several researching workings have been proposed as [10],[11],[12]. Also current many commercial products at ASIC level as well as TI AM387xCortex-A8 or Maxim Crypto MAXQ1850 integrated all the AES, 3DES,RSA,SHA in their systems. The researching workings as mentioned at above indicate out the essential expecting as looking for the methods to enhance the security level of information. This paper proposes the novel architecture of Cryptosystem with targets: enhancement the level security, accelerator processing that adaptive application with realtime requirements, reduce of the power consumption and FPGA areas in using

applications that real time requirements, to reduce of the power consumption and FPFA areas Keyword - Reconfigurable SoC, Multiple Cryptographic, Simultaneously Multiple Hardware Threads, FPGA- PDR

I. INTRODUCTION
With the development of information technology, protecting sensitive information via encryption is becoming more and more important to daily life. In 2001, the National Institute of Standards and Technology (NIST) selected the Rijndael algorithm as the Advanced Encryption Standard (AES) [1], which replaced the Data Encryption Standard (DES) [2]. Since then, AES has been widely used in various applications, such as secured communication systems, highperformance data base servers, digital video/audio recorders. But with the growing up of semiconductor fabrication, the density of logic gates and speed processing of ICs are been rapid. So the ability for attacking and hacking the secured information may be come in reality. Currently, the AES has been used for almost secured applications but some researching, is vulnerable by a related key attack [3], [4]. Information of military/intelligent fields need requirements of high level security and authentic. The encrypted image, encrypted stream videos can be recovered from a few pixels by the methods of edgedirected bicubic interpolation algorithm, bilateral- filterthat been mentioned in [9], [10] .

This paper presents in 5 sections, with hence is Introduction. The section II mentions the related researching about enhancement security based current algorithms as AES, 3DES. In this section we shall have an analyst the advantages and disadvantages those researching so that we present a solution for new proposed architecture of Cryptographic in section III. The section IV listed out the performance evaluations with parameters of throughputs, FPGA resource, power consumptions and effectively of this proposed vs the current researching. The final section is for conclusion and future workings.

II. RELATED WORKS


This section outlines the current researching multiple encryption, the disadvantages for these solution and new proposal for filling up these disadvantages. Also, in this section introduces two related issues that support for new system: ReConOS and PDR engine A. Current researching To enhance the security level for information, a lot of research-workings have proposed, or commercial products released on market. There are many methods for achieving these targets as well as

embed several encryption algorithms as TI AM387xCortex-A8 or Maxim Crypto MAXQ1850 or employs encryptions and authentic on same of system [7], or Dynamically reconfiguration the Cipher-Key module when detects the threat is coming [13], in that the system can self-reconfigure the Cipher- Key module that collative the security requirements and release out of FPGA platform with current module. So advantages are listed out as the security level shall enhance a lot, reduce the FPGA, latency of gate, and power consumption. But following the architecture of IP Core, all Input/Output data, control signals, status signals are mapped in register as n Fig.1. All operations of IP Core are under CPU administration. By the CPU architecture, the Instruction Pointer shall implementation

requirement, reduction for FPGA source and low power consumption by the simultaneously hardware threads and Self-Dynamic Partial Reconfiguration engine B. ReconOS Architecture ReconOS project, that have been developing by Computer Engineering Group of University of Paderborn which supports both software and hardware threads with a single unified programming model. ReconOS is based on eCos and Linux OS that presented in Fig.2 .

Fig.

programming with sequence steps. So it is the cause that not speeds up the processing of system.
2. The Linux operation [14]

ReconOS system architecture [14] is presented in Fig.3, that all threads share the same physical memory space. Therefore, hardware threads have direct access to any location in the systems memory, or memory mapped peripherals, if desired.
Fig. 1. Hardware Accelerator

Besides that, some application as well as secure electronic transaction [5], watermaking and identification for IP Core [6], [7] or Remote configure bit-stream [8] have implemented several algorithms as encryptions and authentic, several proposals concentrated the multiple encryption method for enhancement security. In the [5] implements the multiple encryption the all of secure electronic transactions by the SHA and MD algorithm. In the [11] there is four famous Fig.3. ReconOS system architecture There are three sections that build up to the system: .Delegate thread: Module that concerns the transparency of thread-to-thread communication and synchronization, regardless of the execution context (hardware or software) of the respective communication partners. This enables the designer to easily replace, for example, a software thread with a functionally equivalent hardware thread, allowing for rapid design space exploration with respect to the hardware/software partitioning. .Hardware thread : consists of at least two VHDL processes: the synchronization state machine and the actual user logic. The state transitions in the synchronization state machine are always dependent on control signals from the OSIF; only after a previous operating system call returns, the next state can be reached. Thus, the communication with the operating system is purely sequential, while

cryptographic algorithms to implement multilevel security or creating the multi cipher text, that included: AES, DES, Rivest Shamir Adleman and Ceaser or [12] proposed the method with double encryptions for protecting the sensitive image. Almost researching works have mentioned at above are presented at algorithm and programming on CPU platform. The critical problems when deploy these methods with multiple encryptions, or combination encryptions and authentic on CPU platform, shall not satisfy the real-time applications, for example with run-time video conference or the video on the scene that captures from UAVs to Base. These applications are very strict requirements that can not implement on CPU platform or following single hardware threads that presented in [7]. The paper addresses to accelerate of encryption processing that suitable real time

the processing of the hardware thread itself can be highly parallel. It is up to the programmer to decompose a hardware thread into a collection of user logic modules and one synchronization state machine. .Hardware/Software interfacing This module has a mechanism for low-level synchronization and communication between the hardware circuitry and the operating system, that called OSIF (Operation System Interfacing).An overview of the OSIFs structure and its interfaces to the hardware thread, the system buses and the FIFO cores is given in Fig.4 [14]

CRYPTOSYSTEM WITH SIMULTANEOUSLY

MULTIPLE HARDWARE THREADS


ARCHITECTURE (SMHT) AND DPR ENGINE In this section, we propose the Cryptosystem architecture with SMHT and PDR engine. In that, the Cryptosystem has the mechanism for cipher threads with AES_Core, can operator in simultaneously. Based on the level security of application, the system creates multiple of hw_tasks, that each hw_tasks for each AES_Thread. In novel of Cryptosystem, there are three issues that need for consideration: A. Hardware design . OS Synchronization communication module This RTL module synchronizes between threads with operating system calls. The state transition in the synchronizations state machine are always dependent on control signals from the OSIF; only after a previous operating system call returns, the next state can be reached. For initiation the new Encryption/Decryption transaction,

Fig.4. OSIF overview and interfaces

this module puts a query API reconos_mbox_get() to the OS for asking the Semaphone ready for new threading. If the system is available, it shall have an indication signal for initiation the transaction. The state machine transfers the data in main memory in Local_ram of hardware thread by API reconos_read_burst(), after the transfer data is completed then the hardware thread enters the AES_Initial state by assignation the Start signal to AES_Core (user_logic) core. The Start is port map to AES_Core and this core starts Encryption/Decryption operation. At this time, the OS Synchronization module continues to query the Done signal from AES_Core, and until the Done is asserted on high logic level, the OS Synchronization releases the Semaphone flag for termination the current transaction. The Fig. 6 indicates out the Done and Start" of control signals

C. Self-Dynamic Partial Reconfiguration A dynamically reconfigurable system allows to change parts of his logic resources without disturbing the functioning of the remaining circuit. This property permits the system to change its behavior according to external events. The dynamic reconfiguration takes place in Partially Reconfigurable Region (PRR) which can be partially reconfigured independently [15]. Designing a dynamically

reconfigurable system always requires the declaration of PRRs. The partial bit-streams of these zones are stored in an external memory and they contain all the information about the positions and functionalities of the considered PRRs. A dynamic reconfigurable system usually has a central processor connecting to the internal reconfiguration port (ICAP) and controlling the partial reconfiguration process by downloading bit-streams onto this port. The ICAP and this controller are implemented in a static zone (i.e. not reconfigured) of the FPGA. Except for the dynamic zone which is being reconfigured, the whole FPGA is still on operation during the entire reconfiguration process. In one PRR, several Partially Reconfigurable Modules (PRMs) could be loaded (one at a time). Each PRM is individually designed and implemented using partial reconfiguration design tools [15]. All PRMs for a given PRR must be pin compatible with each other, i.e., have the same port definitions and entity names. III. PROPOSAL FOR NOVEL ARCHITECTURE OF

Fig. 5. FSM of OS Synchronization Communication

tasks, so performance of system shall increase a lot. These advantages shall have mention in the analyst and performance evaluations of section IV B. Software design The operations for each hw_task are included as: Burst_Ram_Read, Fig. 6. Control signals for user_logic core . FSM of AES_Core (user_logic) Data that been copied in Local Ram, is divided in two areas. In the first area, included as: Cipher key, Information of Key Length : Burst_Ram_Write and cryptographic processing with Tx.r, Tx.w and Tx.p in correlative, that indicated in Fig. 8

128, 192 or 256, Command for encryption or decryption actions, Length of Frame for In that cipher remain needs the operation, of for first frame of
Fig. 8. The operation of two HW_threads with shared memory

plaintext/cipher-text. plaintext/cipher-text operations. After

areas:

All

remain

encryption/ signal

decryption from OS

In double encryption/decryption tasks, we use a shared memory for storage the results out of AES_0 thread and data in for processing of AES_1 thread. For guarantees the integrity of data in shared memory, avoids the confliction of read/write operations in the same time, there have the communication and synchronization mechanisms for shared memory as well as: Creates space for shared memory, attaches it to the address space of a process, lock thread before Writing/Reading for protect data, operating for Writing/eading data in shared memory, detaches and destroy shared memory from the current process. Both of two flow chats, before enters routines of read/write, there has a command for locking the accessing memory from other threads as detailed in

receives module,

Start

Synchronization

AES-Core

enters

encrypt/decrypt

operations. RTL shall decode and implement following commands inside Local_Ram. The Cipher operation is quite transition inside AES Hardware Threads. When the final data that needed to process completing, AES-Core asserts Done signal to OS Synchronization module for finalization an operation cycle. Fig. 7 presents the FSM of AESCore

Fig. 9

Fig. 7. FSM of AES_Core ( user_logic)

In comparison with common IP cores as Fig.1, this new architecture shall process all in Hardware_thread. Beside Data is accelerated in processing parallel by RTL, there are two advantages that be mentioned as well as : - Since the CPU had transferred the data in to Local_Ram, AES_Core gets data this memory for encryption/decryption. At this time, AES_Core has been working in independent with system. So if Local_Ram has a large capacity, it shall reduce the handshake with CPU for getting new data, and Local_Ram. - During the time that AES_Thread has been processing the data in Local_Ram, the CPU has other free of time-slots for running other increase the processing data in
Fig. 9. Flow-chats for Writing/Reading data

After transaction completes, the unlock command is released C. Integrated system The Cryptosystem is developed on FPGA platform. For accelerating of systems processing, some threads that get a lot of CPUs resource, shall been mapped in IP Cores. These IP Cores are stored in ACE Flash as Partial Bit-stream format. The main program that under PPC/MicroBlazer shall track the threads, attacking for outside environment to system for hacking the data, under some events:

wrong IP/MAC, Power consumption or

clocks for expanded procedure,

and AES Transformation

procedure for each block shall be also 11 cycles that indicated in Fig.13.

Fig. 10. Flow chat of Software/Hardware Co-operation

electromagnetic radiation [16]. When determinates a threat is coming, system shall upgrade the level security by double encryption, in which creates a new hw_thread that indicated in Fig. 11 and dynamic
Fig. 13. The AES processes within 11 Clock cycles

B. Analyst performance With the tradition IP cores structure, all Data_in, Data_out, Control signals, Status signals are mapped into registers in Fig. 1, following the mechanism of embedded systems, there registers are controlled by microprocessor.

reconfiguration the AES_Thread with Partial Bitstream into hw_thread separately. In Fig.12 indicates there are two AES_Threads that are built for two hw_tasks by DPR engine

Fig. 11. Slots for hw_tasks reconfiguration with OSIF bus

IV. ANALYST AND PERFORMANCE EVALUATIONS: A. RTL simulation and implementation results: For verification the design correcting both of RTL code and testing cycles for encryption/decryption, the materials for testing AES IP Core shall be based on FIPS197 specification [1]

Fig. 14. AES thread is processed in sequence

Fig. 15. Simultaneously AES- threads (Proposed model)

So when the system has requirement upgrade level security by double encryptions, all data much be processed completely in the first AES thread before for next threads. This method brings about large latency and low throughput of system. On Fig.14 indicates the clarification
Fig. 12. Creating the slots for each partial bit stream AES_Thread
Plaintext = 32 43 f6 a8 88 5a 30 8d 31 31 98 a2 e0 37 07 34 Cipher Key = 2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c Ciphertext = 39 25 84 1d 02 dc 09 fb dc 11 85 97 19 6a 0b 32

about above contents. So in novel structure, both of AES cores are in active operation the same time. Once Core is getting the data for beginning the new encryption/decryption while other one is been processing. Thus, there is no waste-time slot during processing. For each block of 16 bytes data that needs AES processing, there shall be passed in three stages, in with named is AES cycles : AES cycles/1 block = 1 read cycle + 11 processing cycles + 1 write cycle

We shall test the AES IP Core with 128-bit Key_in, the other key_in shall have the same results. Flowing our proposal in this paper, clock cycles for Key_expander_128 procedure shall be 11 cycles, that include the first clock of key_in_128 is read in and 10

Solutions

AES_Thread Stand-alone (*)

Parameters Maximum Frequency (MHz) Number of Clock Throughput(Mbps) Slice used TpS(Mbps/Slice) Power consumption (W) Maximum Frequency (MHz)

Values 81 13 #798 11,167 71.4 4,472

It is evident that our proposed approach has flexibility for Cryptosystem. When needs low level security thus the Cryptosystem operates as Stand-alone model with AES_Thread. And having a requirement for the high level security then system shall dynamically works as reconfiguration a bit-stream of second AES_Thread and

81 26 # 389 11,167 35.7 8.944 81 15 # 1296 22,304 58.1 8.944

Sequence double threads model (**)

Simultaneously threads model (***)

Number of Clock Throughput(Mbps) Slice used TpS(Mbps/slice) Power consumption (W) Maximum Frequency (MHz)

initials the simultaneously model. Following the TABLE 1, the new proposal offers the throughput is very high than Sequence double threads model [13], beside that the efficiently of Throughput per Slice (TpS) is larger than Sequence model in two times TABLE. I. THE DETAILED COMPARISON BETWEEN THREE MODELS: STAND-ALONE, SEQUENCED AND PROPOSED SYSTEM

Number of Clock Throughput(Mbps) Slice used TpS(Kbps/slice) Power Consumption (W) Platform Virtex6 XC6VLX240-1

In the sequence model at Fig.14 : For streams that need double encryption with each block, we shall spend cycles of time as : Block = 2 x AES cycles In the proposed model at Fig.15 : Flowing the organization of Hardware_Thread, the data is transferred in to Local_Ram and the AES Core shall take these data for begin the new 1 V.

encryption/decryption. The Local_Ram is setup with 4Kbyte capacity, with 16 bytes of API Read_Burst, we shall have a full Local Ram with 253 blocks with named is Frame Data. Data Frame = 253 blocks x 16 bytes = 4Kbytes In the initial stage, and final stage of cryptography: 1st Frame (or final Frame) = 253 read cycles + (253 x 13) AES cycles + 253 write cycles = 3759 cycles So each 1 block = 2 Cycles + AES cycles = 15 cycles In remain stages: There are two AES Cores that take part in V. CONCLUSION AND FUTURE WORKS In this paper, we have proposed a Cryptosystem system combining that needing to process, the cycles for Simultaneously engine and partial reconfiguration scheme to reduce the required hardware resources and furthermore greatly improve the bandwidth as well as the security of the implemented encryption algorithm. We plan to implement a scrambler system to protect the content of BRAM against attack. The scrambler module which will be based on the unique device identifier and a pseudo-random number generator (PRNG) to securely encrypt the key stored in the BRAM, could furthermore enhance the robustness of the whole system. A complete investigation of this complex system will also be carefully studied.

processing cryptographic If we have

mxBlocks

completing of each method are calculated as at below: In the sequence model : Cycles for mBlocks = 2 x m x AES Processing (*) In the proposed model : Cycles for mBlocks = 1st AES Processing + m x AES Processing + Final AES Processing = (m + 2 ) x AES Processing (**) The Fig. 15 shall details for above analysts The synthesis results, parameters of throughput and performance for this design are indicated on TABLE 1.

REFERENCES
[1] NIST,Advancedencryptionstandard(AES),Nov.20 ,http://csrc.nist.gov/publications/ps/ps197/ps-197.pdf [2] NIST,Dataencryptionstandard(DES),Oct.1999,http //csrc.nist.gov/publications/ps/ps46-3/ps46-3.pdf [3] Alan Kaminsky, Michael Kurdziel, Stanisaw Radziszowski, An Overview of Cryptanalysis Research for the Advanced Encryption Standard, http://www.cs.rit.edu/~spr/PUBL/aes.pdf [4] Alex Biryukov, Dmitry Khovratovich Related-key Cryptanalysis of the Full AES-192 and AES-256 http://impic.org/papers/Aes-192256.pdf [5] Himanshu Gupta,role of multiple encryption in secure electronic transaction, International Journal of Network Security & Its Applications (IJNSA), Vol.3, No.6, November 2011 [6] Daniel Ziener, Jurgen Teich,Power Signature Watermarking of IP Cores for FPGAs, http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.161.1509 [7] Thanh Tran, Pham Ngoc Nam, Tran Hoang Vu, Nguyen Van Cuong, A framework for secure remote updating of bitstream on runtime reconfigurable embedded platforms,

2012) Section #6 [14] Enno Lubbers,Marco in Platzner, Communication and

Synchronization

Multithreaded

Reconfigurable

Computing

Systems, In Proceedings of the 8th International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), Las Vegas, July 2008 [15] Xilinx Inc., Partial Reconfiguration User Guide UG702 (v14.1) April 2012 [16] Daniel Ziener, Jrgen Teich Power Signature Watermarking of IP Cores for FPGAs http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.161.1509

http://ieeexplore.ieee.org/xpl/articleDetails.jsp? reload=true&arnumber=6315952 [8] K.-W. Hung W.-C. Siu, Fast image interpolation using the bilateral filter Published in IET Image Processing [9] Zhou Dengwen, An Edge-Directed Bicubic Interpolation Algorithm 2010 3rd International Congress on Image and Signal Processing (CISP2010) [10] Melek nen, Refik Molva,Secure Data Aggregation with Multiple Encryption, link.springer.com/chapter/10.1007%2F978-3540-69830-2_8 [11] Sairam Natarajan A Novel Approach for Data Security Enhancement Using Multi Level Encryption Scheme, et al, / (IJCSIT) International Journal of Computer Science and Information

Technologies, Vol. 2 (1) , 2011, 469-473 [12] Jayant Kushwaha,Bhola Nath RoySecure, Image Data by Double encryption International Journal of Computer Applications (0975 8887) Volume 5 No.10, August 2010 [13] Trong-Tuan NGUYEN, Van-Cuong NGUYEN, Hung-Manh

PHAM Enhance the performance and security of SoC using pipeline and dynamic partial reconfiguration The 2012 International Conference on Integrated Circuits and Devices in Vietnam (ICDV

You might also like