You are on page 1of 6

Term Paper on Microprocessor Core Implementation for Microcontrollers

Lovely Professional University Phagwara (Punjab) Submitted Term Paper, 2013 of ECE 312 In Computer Science and Engineering By Khalid Bashir(11106697) Section- K1109 Roll No. A05 Under the guidance of Mr. Abhijit Bhattacharyya (Asstt. Professor)

Abstract In recent years, microcontrollers are largely used in varieties of different systems, especially on small embedded systems, small device controllers, and e-devices. Different from general systems, processors inside these systems may not need very powerful computing capability; however, reliability, robustness, low EMI, and low power are the most important criteria. It is a well-known fact that asynchronous circuit can achieve these goals via removing the global clock in the synchronous circuit. However, it's very difficult to implement systems with asynchronous circuits. In 2009, it was found that an asynchronous microcontroller called NCTUAC18 is compatible with Microchip's PIC18 ISA. It is a quasi-delayinsensitive (QDI) implementation. However, because of the DI /QDI nature, it is inflexible and thus it makes the circuit design even more difficult. In order to meet the DI/QDI constraints, it causes some limitations. To overcome these limitations, a new pipeline model was proposed. Instead of original FPGA implementation, the design was synthesized with TSMC 0.13m technology. The area and worst case delay are also shown in this paper. In this paper it is demonstrated that it is possible to implement processor core with dual-rail Muller pipeline. KeywordsDual-rail, pipeline, Quasi-delay-insensitive, asynchronous, synchronous

I. INTRODUCTION This paper will shed some light on the implementation of microprocessors as microcontrollers with the help of quasi-delayinsensitive implementation of asynchronous circuit. This helps in the increase of robustness, reliability, low power with almost no decrease in it working conditions. Synchronous circuits also have many problems such as clock skew, worse case performance, non-modularity etc. With an increase in the number of mobile applications and the increase in size and of VLSI based systems, these problems are causing more problems. But due to complex practical reasons, almost all systems use fixed clock period design. These best solution for these problems is by using asynchronous design and avoiding the use of clock signals. Without clock signals in asynchronous circuits, Object Oriented Programming (OOP) style in hardware design becomes possible. The most important thing a designer or programmer need to know is handshaking protocol interface usage. This makes all the designed components reusable, which is the main characteristic of OOP. The growth in mobile device industry has caused these issues to be re-evaluated. Microcontrollers are widely used on a variety of different simple systems. In embedded systems the is no need for complex processors, their core instead must have some other special characteristic such as low power consumption, low EMI etc. By default these characteristics are present in asynchronous circuits. That is why asynchronous circuits are used in designeing processor cores for microcontrollers. 8051, AVR, and MicroChip PIC family microcontrollers are all popular 8-bit microcontrollers for embedded systems. In those systems asynchronous 8051 compatible microprocessors are implemented. But due to the Complex Instruction Set Computing (CISC) nature of direct pipelining is not easy. Therefore in this paper the implementation of microprocessor core with PIC18 instruction set with quasi-delayinsensitive model is shown.

II. Related Works Asynchronous circuits have been studied and used since early 1950s, however, synchronous circuits have still dominated the mainstream of digital circuit design. Recently, some academic and commercial research shows that its worth to implement real-life systems with asynchronous circuits. But, due to lack of proper tools and standardization of implementation of design models, there is almost no research done on the asynchronous circuits due to is limited commercial applications. Due to absence of clock in asynchronous circuits, they rely on handshaking protocols to be sure of correctness of the operations done by the circuit. This protocol is divided into control signaling and data encoding. Fig. 1 shows the 4phase handshaking protocol. In this protocol, only the rising edge is the valid active transition, thus its a level signaling or reTurn-to-zero protocol.

Except control signaling, there are many choices on how to encode data i.e. the data signaling protocol. The Bundled Data Bus also called Single Rail refers to a separate request and acknowledge wires that bundles the data signals with them. Thus total n + 2 wires are required to send n-bit data. N wires to send n-bit data and two other wires, one to send request and other to receive acknowledgement. Fig. 3 shows the bundled-data model.

On the contrary, in the 2-phase handshaking protocol, the falling and rising edge of request and acknowledge are active signals; thus its a transition signaling or non-return-to-zero protocol. However, the 2-phase very complex and hard to implement. Fig. 2 shows the 2-phase handshaking protocol.

Except bundled-data model, there are also many data encoding methods for DI circuits. However, due to implementation issues, dual-rail encoding is the most popular used DI data encoding scheme. To represent 1-bit data in dual-rail encoding method, two physical wires are used. For example, a valid data, D is represented by two physical data wires, d.0 and d.1. The following equation shows this encoding scheme. (1) D = 0; (d.0, d.1) = (0, 1) (2) D = 1; (d.0, d.1) = (1, 0) In particular, (0, 0) represents a space which allows us to identify consecutive 0s or 1s. (1, 1) state is not used. Data transferring starts from the (0, 0) state (called null or empty data). If a ny

state is changed from (d.0, d.1) = (0, 0) to (0, l)/ (1, 0), which notices the arrival of valid data 0/l. Thus total 2 n wires are needed to transfer n-bit data. Fig. 4 shows the dual-rail model.

may be initialized. The i-th C-element C(i) can propagate a 1 from its previous stage the (i - 1)th C-element only if the next stage C-element (Ci+1) is 0. Thus, the signal can be propagated one stage to one stage. It should be noticed that the original single-rail model is based on bundled-data model, thus the request signal must be propagated via a matching delay as shown is Fig. 6. In fact, the matching issue should be carefully handled on all bundled-data model. The pipeline model can also be constructed as 4phase dual-rail model as shown in Fig. 7.

III. BANKED TLB WITH PREFETCHING MECHANISM David Muller proposed his famous Muller Celement and Muller pipeline (aka Muller distributor) in 1959. A Muller pipeline is a naturally simple and elegant handshaking control model. The simplest form of Muller pipeline mainly consists of C-elements and inverters. Fig. 5 shows the schematic symbol and truth table of a two-input C-element.

The model can be considered as two Muller pipelines connected in parallel with a common acknowledge signal in every stage. Excluding the Muller pipeline, there are several more models that were proposed. The most important of all is the micropipeline which was described by Ivan E. Sutherland is his famous Turing Award Micropipelines lecture in 1989. This approach is based on a two-phase bundled-data model with micropipeline as backbone control circuit. Due to its popularity a lot of asynchronous circuit design models are based on implementation of pipelining. It can be used to implement many different kinds of pipelined systems, even processors. For example, the NSR processor is a very simple 16-bit micropipeline based microprocessor with very simple Reduced Instruction Set Computer (RISC) instructions (less than 20 instructions). In If both inputs are high or low, the output will addition, the most famous of all is the Amulet be high or low; otherwise, the previous value is series processors. These processors are ARM kept. Fig. 6 shows the original Muller pipeline compatible processors implemented with micropipeline architecture. model. There are also some different models proposed for asynchronous processor design. Some try to modify the original micropipeline architecture. For example, a new control circuit for micropipeline was proposed by Choy et al. and Micronets architecture tried to decentralize the To understand its behavior, lets consider the control to the functional units. Furthermore, there i-th C-element C(i). In the initial state, all C- have been still several famous asynchronous elements are initialized to 0. The handshaking processor implementation models proposed.

Takashi Nanya et al. showed their QDI 8-bit microprocessor model called TITAC which uses Martins Q-element as control circuitry. TITAC2 was proposed to show a new delay model called scalable-delay-insensitive (SDI). The delay model modified original DI or QDI unbounded gate and wire delay to bounded relative delay ratio between any two components. There are also some works that try to model processor with asynchronous circuits. Martin et al. in Caltech have already shown three generations of different asynchronous processor model. Chen et al. showed an asynchronous RISC processor model in 2002. In addition, there are also several asynchronous superscalar processor models proposed, for example the Kin architecture, Hades project, and the most famous of all the counter flow pipeline (CFPP). However, all these superscalar models are not very easy to implement or are just realized ideas, not practical implementations and certainly not very suitable to be implemented for cores for simple microcontrollers.

Except the pipeline model itself, conditional branch handling is a very important design issue for pipeline processors. Furthermore, its much harder to use, operate and program an asynchronous processor without centralized control. In this paper, an easy way to deal with conditional branches for the dual-rail 4-phase pipeline microcontroller core was shown. Though its very simple, its enough for simple microcontroller core with such short pipeline. Furthermore, we also evaluate the maximum delay time of each stage and extra costs that may be introduced. Though QDI or even most asynchronous circuits may cause much extra cost, we still have to point out that its also a design tradeoff. In fact, it depends upon your requirements. Asynchronous circuits may be a good solution for designing mobile devices or systems that needs high reliability, low power and low EMI. ACKNOWLEDGMENT I wish to acknowledge Hung-Yue Tsai, JenChieh Wu, Wei-Min Cheng, and Chang-Jiu Chen for their work in quasi-delay insensitive implementation of microprocessors. Also I want to thank my teacher Mr Abhijit Bhattacharyya for his help in teaching me about microprocessors.

IV. CONCLUSIONS AND DISCUSSIONS Though there are several proposed asynchronous pipeline models, most asynchronous processors are still implemented with micropipeline or modified micropipeline models. Thats not only because its popularity and other important characteristics but also the implementation cost consideration. In addition, REFERENCES because of dual-rail nature and higher timing [1] A. Davis and S. M. Nowick, An constraints coming from DI and QDI circuits, introduction to asynchronous circuit its very hard to implement microprocessor core design, Technical Report No. UUCSwith DI or QDI circuits. Thus, most DI or QDI 97-013, Computer Science Department, pipeline models are seldom used to implement University of Utah, 1997. microprocessors. However, it is widely known [2] J. H. Lee, W. C. Lee, and K. R. Cho, A that DI circuit has the highest reliability and it is novel asynchronous pipeline architecture suitable to implement microcontrollers that may for CISC type embedded controller operate in variable environments. In addition, it A8051, in Proceedings of the 45th does not need to consider the matching delay Midwest Symposium on Circuits and issue that may be encountered in implementation Systems, Vol. 2, 2002, pp. 675-678. with bundled-data circuit such as micropipeline [3] A. J. Martin, M. Nystrm, K. model. In this paper, a methodology was Papadantonakis, P. I. Pnzes, P. Prakash, provided to model a QDI PIC18 compatible C. G. Wong, J. Chang, K. S. Ko, B. Lee, microprocessor core with dual-rail 4-phase pipeline in a reasonable cost. Though just a E. Ou, J. Pugh, E. V. Talvala, J. T. Tong, modeled PIC18 compatible core was used, the and A. Tura, The Lutonium: A submodel can also be used on other simple nanojoule asynchronous 8051 microprocessor core. In fact, it is shown clearly microcontroller, in Proceedings of the the flow needed to design a QDI microprocessor 9th International Symposium on core for simple microcontroller with Verilog Asynchronous Circuits and Systems, HDL and an easy implementation model.

2003, pp. 14-23. [4] D. Muller and W. Bartky, A theory of asynchronous circuits, in Proceedings of International Symposium on the Theory of Switching, 1959, pp. 204-243. [5] J. Gunawardena, A generalized event structure for the Muller unfolding of a safe net, in Proceedings of the 4th International Conference on Concurrency Theory,1993, pp. 278-292. [6] J. Spars and S. Furber, Principles of Asynchronous Circuit Design A Systems Prospective, Kluwer Academic Publishers, London, 2001, pp. 11-25. [7] I.E. Sutherland, Micropipelines, Turing Award Lecture, Communications of the ACM, Vol. 32, 1989, pp. 720-738. [8] E. Brunvand, The NSR processor, in Proceedings of the 26th Hawaii International Conference on System Sciences, 1993, pp. 428-435. [9] J. V. Woods, P. Day, S. B. Furber, J. D. Garside, N. C. Paver, and S. Temple, AMULET1: An asynchronous ARM microprocessor, IEEE Transactions on Computers, Vol. 46, 1997, pp. 385-398. [10] S. B. Furber, J. D. Garside, S. Temple, J. Liu, P. Day, and N. C. Paver, AMULET2e: An asynchronous embedded controller, in Proceedings of the 3rd International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1997, pp. 290-299.

You might also like