You are on page 1of 9

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856

A Survey on Content Addressable Memory


Narendra Babu T1, Dr. Fazal Noorbasha2 and Bhavani Shankar P3
1

Research Scholar, VLSI Systems Research Group, Electronics & Communication Engineering, KL University, Guntur, A.P, India
2

VLSI Systems Research Group Head, Electronics & Communication Engineering, KL University, Guntur, A.P, India
3

M.Tech Student, Electronics & Computer Engineering, KL University, Guntur, A.P, India

Abstract: A content addressable memory compares input


search data against a table of stored data and returns the address of the matching data. CAMs have a single clock cycle throughput making them faster than other hardware and software based search systems. Basically, these CAMs are used for packet forwarding in network routers. The CAM has a parallel active circuitry which consumes more power and the main challenge in designing the CAM is to reduce the power consumption without reducing the speed and memory density. In this paper, the circuit level techniques of CAM are reviewed. At the circuit level, low power match line sensing techniques and search line driving approaches are concentrated.

1.1 Basics of CAM We now take a more detailed look at CAM architecture. A small model is shown in figure 1. The gure 1 shows CAM consisting of 4 words, with each word containing 3 bits arranged horizontally (corresponding to 3 CAM cells). There is a match-line corresponding to each word (ML0, ML1, etc.) feeding into match line sense ampliers (MLSAs), and there is a differential search line pair corresponding to each bit of the search word (SL0, SL0, SL1, SL1, etc.). CAM search operation begins with loading the search-data word into the search-data registers followed by precharging all match lines high, putting them all temporarily in the match state. Next, the search line drivers broadcast the search word onto the differential search lines, and each CAM core cell compares its stored bit against the bit on its corresponding search lines. Match lines on which all bits match remain in the precharged-high state. Match lines that have at least one bit that misses, discharge to ground. The MLSA then detects whether its match line has a matching condition or miss condition. Finally, the encoder maps the match line of the matching location to its encoded address [1].

Keywords: Content-addressable memory (CAM), match line sensing, review, search line power.

1. INTRODUCTION
Most of the memory devices store and retrieve data by addressing specific memory locations. This path becomes the limiting factor for those systems that depend on fast memory access. The time required to find the data stored in memory can be reduced if the data can be identified by its content rather than by its address. A memory used for this purpose is Content Addressable Memory (CAM). CAM is used in applications where search time is very critical and very short. It is well suited for several functions like Ethernet address lookup, data compression, and security or encryption information on a packet-bypacket basis for high performance data switches. It can also be operated as a data parallel or Single Instruction/Multiple Data (SIMD) processor. Since CAM is an extension of RAM first, we have to know the RAM features to understand CAM. In general RAM has two operations read and write i.e. the data stored in RAM can be read or written but CAM has three operations read, write and compare [1]. The compare operation of CAM makes it useful in variety of applications like network routers. The network router is that which forwards the incoming packets from the sender port to the proper destination port by looking in to its routing table. Basically CAMs are used to design network routers for fast transfer or forwarding of packets. Volume 2, Issue 3 May June 2013

Figure 1 Simple schematic of a CAM CAM core cells and match line structures of CAM are discussed in section 2 and 3. Match line sensing schemes and search line driving approaches are reviewed in section 4 and 5. And the conclusion is given at the end.

2. CORE CELLS
Basically, CAM can be implemented using two cells namely NOR cell and NAND cell. Page 360

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
MLn+1 nodes are joined to form a word. A serial nMOS chain of all the Mi transistors resembles the pull down path of a CMOS NAND logic gate. A match condition for the entire word occurs only if every cell in a word is in the match condition. An important property of the NOR cell is that it provides a full rail voltage at the gates of all comparison transistors. On the other hand, a deciency of the NAND cell is that it provides only a reduced logic 1 voltage at node B, which can reach only VDD-Vtn when the search lines are driven to VDD (where VDD is the supply voltage and Vtn is the nMOS threshold voltage) [1].

2.1 NOR Cell Figure 2 shows a NOR type CAM cell. The NOR cell implements the comparison between the complementary stored bit, D (and D ), and the complementary search data on the complementary search line, SL (and SL ), using four comparison transistors, M1 through M4, which are all typically minimum-size to maintain high cell density. These transistors implement the pull down path of a dynamic XNOR logic gate with inputs SL and D. Each pair of transistors, M1/M3 and M2/M4, forms a pull down path from the match line, ML, such that a mismatch of SL and D activates least one of the pull down paths, connecting ML to ground. A match of SL and D disables both pull down paths, disconnecting ML from ground. The NOR nature of this cell becomes clear when multiple cells are connected in parallel to form a CAM word by shorting the ML of each cell to the ML of adjacent cells. The pull down paths connect in parallel resembling the pull down path of a CMOS NOR logic gate. There is a match condition on a given ML only if every individual cell in the word has a match [1].

Figure 3 9-T NAND Type CAM

3. MATCH LINE STRUCTURES


Basically, match line is one of the key structures in CAMs. The NOR cell and NAND cell are used to construct a CAM match line. 3.1 NOR Match Line Figure 2 10-T NOR type CAM The schematic form of NOR match line is shown in figure 4. The NOR cells are connected in parallel to form a NOR match line. A typical NOR search cycle operates in three phases: search line precharge, match line precharge, and match line evaluation. First, the searchlines are precharged low to disconnect the match lines from ground by disabling the pulldown paths in each CAM cell. Second, with the pulldown paths disconnected, the Mpre transistor precharges the match lines high. Finally, the search lines are driven to the search word values, triggering the match line evaluation phase. In the case of a match, the ML voltage, VML, stays high as there is no discharge path to ground. In the case of a miss, there is at least one path to ground that discharges the match line. The match line sense amplier (MLSA) senses the voltage on ML, and generates a corresponding full-rail output match result. The main feature of the NOR match line is its high speed of operation. In the slowest case of a one-bit miss in a word, the critical evaluation path is through the two series transistors in the cell that form the pulldown path. Even in this worst case, NOR-cell evaluation is faster than the NAND case, where between 8 and 16 transistors form the evaluation path [1]. Page 361

2.2 NAND Cell Figure 3 shows a NAND type CAM cell. The NAND cell implements the comparison between the stored bit, D, and corresponding search data on the corresponding search lines, (SL, SL ), using the three comparison transistors M1, MD and MD which are all typically minimum-size to maintain high cell density. We illustrate the bitcomparison operation of a NAND cell through an example. Consider the case of a match when SL=1 and D=1. Pass transistor is ON and passes the logic 1 on the SL to node B. Node B is the bit-match node which is logic 1 if there is a match in the cell. The logic 1 on node B turns ON transistor M1. Note that M1 is also turned ON in the other match case when SL = 0 and D = 0. In this case, the transistor MD passes logic high to raise node B. The remaining cases, where SL = D, result in a miss condition, and accordingly node B is logic 0 and the transistor M1 is OFF. Node B is a pass-transistor implementation of the XNOR function. The NAND nature of this cell becomes clear when multiple NAND cells are serially connected. In this case, the MLn and Volume 2, Issue 3 May June 2013

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856

Figure 4 Structure of a NOR match line with n cells. 3.2 NAND Match Line Figure 5 shows the NAND match line. A number of n cells are cascaded to form the NOR match line. The precharge pMOS transistor, Mpre, sets the initial voltage of the match line, ML, to the supply voltage, VDD. Next, the evaluation nMOS transistor, Meval, turns ON. In the case of a match, all nMOS transistors, M1through Mn are ON, effectively creating a path to ground from the ML node, hence discharging ML to ground. In the case of a miss, at least one of the series nMOS transistors, M1 through Mn, is OFF, leaving the ML voltage high. A sense amplier, MLSA, detects the difference between the match (low) voltage and the miss (high) voltage. The NAND match line has an explicit evaluation transistor, Meval, unlike the NOR match line, where the CAM cells themselves perform the evaluation. There is a potential charge-sharing problem in the NAND matchline. Charge sharing can occur between the ML node and the intermediate MLi nodes. This charge sharing may cause the ML node voltage to drop sufciently low such that the MLSA detects a false match. A technique that eliminates charge sharing is to precharge high, in addition toML, the intermediate match nodes ML1 through MLn-1 . This procedure eliminates charge sharing, since the intermediate match nodes and the ML node are initially shorted. However, there is an increase in the power consumption due to the search line precharge. A feature of the NAND match line is that a miss stops signal propagation such that there is no consumption of power past the nal matching transistor in the serial nMOS chain. Typically, only one match line is in the match state, consequently most matchlines have only a small number of transistors in the chain that are ON and thus only a small amount of power is consumed. Two drawbacks of the NAND match line are a quadratic delay dependence on the number of cells, and a low noise margin. The quadratic delay-dependence comes from the fact that adding a NAND cell to a NAND matchline adds both a series resistance due to the series nMOS transistor and a capacitance to ground due to the nMOS diffusion capacitance. These elements form an RC ladder structure whose overall time constant has a quadratic dependence on the number of NAND cells. The low noise margin is caused by the use of nMOS pass transistors for the comparison circuitry. NOR cells avoid this problem by applying maximum gate voltage to all CAM cell transistors when conducting [1]. Volume 2, Issue 3 May June 2013

Figure 5 Structure of a NAND match line.

4. MATCH LINE SENSING SCHEMES


There are different match line sensing schemes which are used for generating the match result.In this section we reviewed the sensing schemes which reduces the power consumption of CAMs. 4.1 Segmented Match Line Scheme Figure 6 shows the proposed segmented match line architecture. In SMA, match lines in TCAM words are partitioned into four segments. For convenience, the segmented match lines are sequentially numbered from the left of figure. Segments 1 and 2 are referred to as left segments and the other two segments are called right segments. The terms segment and partition are interchangeably used. Each segment implements the traditional NOR match line circuit with inputs where is the segment number. NOR inputs in each segment are represented as D0, D1, . . , DNI(k)-1. Any NOR inputs can drain the charge stored in a segmented match line when it turns on its associated nMOS transistor connected to ground. The numbers of NOR inputs in the four segments are not necessarily the same and can be selected to meet power reduction requirements. The four segments are classied from a pre-charging perspective. One segment type is the pre-charged segment and the other type is the charge-shared segment. The match lines in pre-charged segments are charged before match-line evaluation. Match lines of charge-shared segments are never precharged but share the chargeswith charged segments formatch-line evaluation. In SMA, segments 1 and 4 are designated as charged segments and the others belong to charge-shared segments. A global signal, segment precharge (SP) is used to source currents only to pre-charged segments for a match-line evaluation. However, as in the previous research [2], [3] regarding conditional precharging, the SP signal in charged segments can be generated from various sources such as partial comparison results [3]. The circuit associated with the SP signals is referred to as the segment pre-charging circuit (SPC). The two segments, pre-charged and charge-shared segments in left or right segments, are electrically separated by a pass gate, which is referred to as the charge spread circuit (CSC). CSC is enabled using a global signal, charge spread (CS). When CSC is enabled (CS=1), the charges in left or right segments are shared as a part of match-line evaluation. The match sensor block is located between the left and right segments. It Page 362

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
processes the partial (half) comparison results to produce the nal comparison result for a full TCAM word. The nal results from all words fed into the priority encoder block (not shown) produced a hit or miss result [6]. to meet the sensing voltage boundary. Please note that Cn is proportional to NI(k). The charge sharing method was previously applied to CAM to reduce the voltage swing for match lines without externally created pre-charged voltage [4], [5]. A lowswing match line scheme from [5] is shown in figure 7. A tank capacitor is associated with each match line to make a lower voltage. The charge at the tank capacitor is used to charge the match line with low voltage by driving the signal eval to the logic high. A similar approach was used in [5].

Figure 6 Segmented Match Line Architecture. 4.1.1 Pre-Charging The entire match line is pre-charged during every search operation in the traditional NOR match-line structure, whenever a word under comparison does not match the comparison signal. However, SMA segments the total match line capacitor and pre-charges the subset of the match lines. As a result, SMA reduces the power consumption in match lines by reducing the total capacitance seen by a power source. SMA is similar to the conditional pre-charging methods in that only a subset of the match line is pre-charged in the rst phase. The conditional pre-charging scheme, however, needs to precharge the remaining match lines depending on the rststage results. However, SMA performs charge sharing instead of the second pre-charging, and thus does not draw current from the power source. The pre-charging time is also reduced because of smaller RC time constant. The total charging time is further reduced because the two charged segments 1 and 4 are charged at the same time [6]. 4.1.2 Charge Sharing The charge stored in the nth charged segment originates from either pre-charging or charge sharing. The charge size is referred to as Qnwhere n is the segment number. Q1 and Q4 reach their maximum values when the respective match-line segments, 1and 4, are pre-charged. The charge in the charged segments is then shared with other match line partitions when CSC is enabled. The static match voltage at left segments, Vlf is established after charge sharing. The voltage is determined by the charge conservation rule as shown in (1). The capacitance of the nth segment is represented as Cn. (1) The voltage at the right segments Vrf can be similarly calculated. The two voltages (Vlf, Vrf) do not have to be the same. The static match voltages are not typically the rail voltage. It is, therefore, important for the static voltages to meet the minimum sensing voltage of the match sensor block. NI(1) and NI(4) determine the minimum static voltage and should be carefully selected Volume 2, Issue 3 May June 2013 4.1.3 Evaluation After the pre-charging operation, the partial evaluation results for the four segments are merged to determine the nal match result. The process is called merging segmented match lines (MSMs). The MSM phase can be broken down into three sub operations. In the rst operation, the charge is shared through CSC. The rst Page 363

Figure 7 Low-Swing Scheme in [5]. The tank capacitor is pre-charged to VDD and shared with a match line to create a low-voltage swing at each match line by choosing the size of the tank capacitor. The technique requires an additional tank capacitor. SMA achieves the same by pre-changing the pre-charged segments without additional capacitors. SMA is not limited to a voltage swing and has the exibility to create an arbitrary voltage at each match line by choosing the number of NOR circuits in the pre-charged segments without creating externally generated reference voltages. The case selected Ctank to make the match line voltage swing of VDD/2and will be referred to as low-swing VDD/2 (LS-VDD/2). Once the static voltages at each match line are sensed, the charge stored in all segments is re-cycled for subsequent search operations if the word comparison result is a match. The re-cycled charge is then accumulated with the shared charge from precharging. The charge shared voltages approach the rail voltage, VDD when a word continuously produces the match result. The charge shared voltage has minimum and maximum values, which are referred to as Vm-max, and Vm-min, respectively. The voltage boundary is formulated as

(2)

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
operation is performed simultaneously in left and right segments. In the second operation, the NOR operation is performed at each segment by enabling the NOR inputs, which are generated from the TCAM cell. The rst two operations do not have to be sequentially performed. The charge-sharing operation is similar to the NAND operation in the CAM NAND match line structure. Since each segment is a NOR-type match line structure, NAND and NOR operations are performed simultaneously when the rst two operations are enabled at the same time. Since MSM is performed after pre-charging, there is no dc path from power source to ground. Once the partial comparison results from the left and right segments are generated in the rst two operations, the results are combined by the match sensor block in the third operation. The third operation can not be performed at the same time as the rst two operations. The match sensor circuit is physically located between the left and right segments. The main reason for having it in the center is to speed up the sensing operation by evaluating the segments at the left and right segments in parallel. The match sensor block produces a logical value of 1 only when the voltages, Vlf and Vrf, are above the minimum nal voltages [6]. tell if the word is match or mismatch, and then automatically disables the charge path to save the power. Notably, a reset signal SEARCH_EN will set the DMLSA into an initial state, where ML(i) = SML(i) = 0 and SP = 0 before the searching process. The detailed operation of DMLSA in the searching process is described below.

Figure 9 Schematic of DMLSA

4.2.1.1 Mismatch Before the searching process SP = 0, SEARCH = SEARCH_EN is pulled to high at the beginning of the searching process. Then, MN1 is turned on to charge the ML(i) such that KP will be discharged but not totally pulled down to 0. If there is any mismatch CAM cell, MSi is turned on to make a current path between ML(i) and SML(i) such that SML(i) will be charged by ML(i). When the voltage of SML(i) is high enough to turn off MP3, the voltage of KP will be pulled down such that MATCHB is equal to logic 1, indicating the comparison result of the word = mismatch. By two feedback paths, MATCHB turns MN3 on and MP1 off, respectively, such that the current path of MP1 is shut off to choke the charge current of ML(i) and SP is discharged via MN3 to turn off MN1. The former constitutes a positive loop from MATCHB to KP through MN3 and MN2, which more quickly pulls down KP. Therefore, the power consumption is reduced after the searching process [7]. 4.2.1.2 Match If all of the CAM cells are match, ML(i) and SML(i) are isolated without any current path. The voltage difference between ML(i) and SML(i) creates an output current of the differential pair (MP2 and MP3) to charge the KP and SP. As soon as KP is charged to high, MATCHB becomes logic 0, indicating that the comparison is a match. After the SP is raised to high, SEARCH will equal to logic 0 and turn off MN1 to choke the charge current to ML(i) [7].

4.2 Self Disabled Sensing Technique The self-disabled sensing technique can choke the charge current fed into the ML right after the matching comparison is generated. Figure 8 shows the CAM architecture, where block C and block DMLSA denote the CAM cell and differential MLSAs (DMLSAs) respectively. The prototype CAM is 128 words32 bits. The Search Word Register loads the search key and feeds it into all the CAM cells. Each of the DMLSA charges the ML and senses the voltage variation to generate the match signal, which is sent to the Address Encoder. In general, there is only one word or no match with the search key to enable the Address Encoder to generate the corresponding address code or a no-match signal after the searching process [7].

Figure 8 Architecture of the Self disabled sensing CAM. 4.2.1 DMLSA Figure 9 shows the DMLSA schematic diagram. The DMLSA senses the voltage on the ML(i) and SML(i) to Volume 2, Issue 3 May June 2013

4.3

Parity Bit And Power-Gated Match Line Sensing

4.3.1 Parity Bit Based CAM The parity bit based CAM design is shown in figure 10 consisting of the original data segment and an extra onebit segment, derived from the actual data bits. We only obtain the parity bit, i.e., odd or even number of 1s. The Page 364

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
obtained parity bit is placed directly to the corresponding word and ML. Thus the new architecture has the same interface as the conventional CAM with one extra bit. During the search operation, there is only one single stage as in conventional CAM. Hence, the use of these parity bits does not improve the power performance. However, this additional parity bit, in theory, reduces the sensing delay and boosts the driving strength of the 1-mismatch case (which is the worst case) by half. In the case of a matched in the data segment (e.g., ML3), the parity bits of the search and the stored word is the same, thus the overall word returns a match. When 1 mismatch occurs in the data segment (e.g., ML2), numbers of 1s in the stored and search word must be different by 1. As a result, the corresponding parity bits are different.Therefore now we have two mismatches (one from the parity bit and one from the data bits). If there are two mismatches in the data segment (e.g., ML0, ML1 or ML4), the parity bits are the same and overall we have two mismatches. With more mismatches, we can ignore these cases as they are not crucial cases. The sense amplifiers now only have to identify between the 2-mismatch cases and the matched cases. Since the driving capability of the 2-mismatch word is twice as strong as that of the 1-mismatch word, the proposed design greatly improves the search speed and the Ion/Ioff ratio of the design [8]. EN.At this time, signal EN is set to lowand the power transistor Px is turned OFF. This will make the signal ML and C1 initialized to ground and VDD, respectively. After that, signal EN turns HIGH and initiates the COMPARE phase. If one or more mismatches happen in the CAM cells, the ML will be charged up. Interestingly, all the cells of a row will share the limited current offered by the transistor Px, despite whatever number of mismatches. When the voltage of the ML reaches the threshold voltage of transistor M8 (i.e., Vth8), voltage at node C1 will be pulled down. After a certain but very minor delay, the NAND2 gate will be toggled and thus the power transistor Px is turned off again. As a result, the ML is not fully charged to VDD, but limited to some voltage slightly above the threshold voltage of M8, Vth8 [8].

Figure 10 Parity bit based CAM. 4.3.2 Gated Power Match Line Sense Amplifier The CAM architecture is shown in figure 11. The CAM cells are organized into rows (word) and columns (bit). Each cell has the same number of transistors as the conventional P-type NOR CAM and use a similar ML structure. However, the COMPARISON unit, i.e., transistors M1-M4, and the SRAM unit, i.e., the crosscoupled inverters, are powered by two separate metal rails, namely VDDMLand the VDD, respectively. The VDDML is independently controlled by a power transistor (Px) and a feedback loop that can auto turn-off the ML current to save power. The purpose of having two separate power rails of (VDD and VDDML) is to completely isolate the SRAM cell from any possibility of power disturbances during COMPARE cycle. The gated-power transistor Px, is controlled by a feedback loop, denoted as Power Control which will automatically turn off Px once the voltage on the ML reaches a certain threshold. At the beginning of each cycle, the ML is first initialized by a global control signal Volume 2, Issue 3 May June 2013 Figure 11 (a) CAM Architecture. (b) Each cell powered by two different rails.

5. SEARCH LINE DRIVING APPROACHES


The power consumption of search line mainly depends on Match line.Some of the search line approaches are reviewed in this section. 5.1 Pipelined Search Line Driving To distribute the incoming search word to all the CAM cells at the same time, the SLs span across the entire memory array. Inside the core cell, the SLs drive the bit comparison network to compare the incoming and the stored bits. Hence, the SL capacitance consists of the metal-line parasitic capacitance and the parasitic capacitance due to the cells, amounting to high values in Page 365

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
large CAM arrays. The SLs broadcast a new search word every cycle. Assuming random search data, 50% of the SLs switch state from cycle to cycle. Therefore, high switching activity of the highly capacitive SLs causes high energy consumption. One way to reduce the SL power is to break the SLs into smaller segments and deactivate as many segments as possible each clock cycle [6], [9]. The pipelined SL driving technique achieves this effect by dividing the SL into pipeline stages. Figure 12 presents the concept of the pipelined SL architecture. The SL registers divide the memory array and the SLs into the pipeline stages in the direction of the search word broadcast. For simplicity, the gure illustrates only two pipeline stages, while in general the number of stages can be larger. The SL register holds the search word being broadcast into the pipeline stage in the current clock cycle. A dynamic NOR gate monitors the outcome of the word matching in the entire stage and generates the eni signal for the following stage. If none of the stored words matches the search word in stage i , then enable i+1 is high, and the SL register of stage i+1 latches this search word in the next clock cycle. If there is a match in stage i, then enable i+1 is low, and hence stage i+1 remains idle in the next clock cycle. For simplicity, we assume here that a search results in a single match in the entire CAM. We now describe the operation of the pipelined SL structure through an example. Figure 13 illustrates the timing diagram of a CAM with four pipeline stages. In four consecutive cycles: cycle 1 through cycle 4, the search initiates for four different words: A, B, C, and D. The search starts from the rst pipeline stage, i.e., stage 0, and thus this stage is active every clock cycle. The search for word A in cycle 1 results in a miss in stage 0, and hence the search for the same word continues in the next clock cycle in stage1. A match in this stage stops further search for word A, and thus idles stage 2 in the subsequent cycle, thus saving power. The idle phase (shaded in the gure) then propagates down the pipeline. For higher power savings, it is desirable for the match to take place as close as possible to the start of the pipeline, as is the case with word B. In the worst case (word D), it takes four clock cycles to complete a search. As a result, pipelining introduces latency into the system, which is equal to the number of the pipeline stages. The throughput of the system remains unchanged, and the search results are available every clock cycle: cycles 4 through 7 in this example. While the rst pipeline stage is active every clock cycle, the activity of the remaining stages is data dependent. With uniform distribution of the matches across the pipeline stages, half of the last N-1 stages are inactive on average, where N is the total number of stages. Therefore, on average, a fraction (fIDLE) of the pipeline stages remains idle: (3) If N>>1, then fIDLE approaches 50%. However, any power saving comes at the price of search latency, which Volume 2, Issue 3 May June 2013 Figure 13 Timing Diagram of Pipelined CAM. is equal to N clock cycles. In addition to the latency, the pipelined architecture has some area penalty associated with it. The SL registers between the pipeline stages require extra chip area, compared to the non-pipelined CAM. However, if the size of the pipeline stage is sufciently large with respect to the area occupied by the pipeline registers, then the area overhead due to the SL registers becomes insignicant.Reducing the number of active stages in the pipelined SL architecture effectively decreases the average SL capacitance that is switched every clock cycle. This, in turn, reduces the SL power dissipation by the fraction of the idle stages. Moreover, the proposed CAM structure also reduces the ML energy since the match sensing is disabled in the idle pipeline stages. The actual savings depend on the distribution of the matches across the pipeline stages. Partitioning the memory array into N equally sized pipeline stages results in the average activity of only(N+1)/2N stages. Hence, the total energy consumption of a CAM with W words and M cells per word can be expressed as (4) Where EREG is the energy dissipation due the pipeline register (one ip-op), and ECELL is the energy consumption per CAM cell. To nd the optimal number of stages that minimizes thetotal energy,we differentiate this equation with respect to the number of stages, N, and equate the derivative to zero. (5)

(6)

Figure 12 Pipelined CAM Architecture.

Page 366

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
5.2 Hierarichal Search Lines Another method of saving searchline power is to shut off some searchlines by using the hierarchical searchline scheme [9], [10], [11], and [12]. The basic idea of hierarchical searchlines is to exploit the fact that few match lines survive the rst segment of the pipelined matchlines. With the conventional search line approach, even though only a small number of match lines survive the rst segment, all search lines are still driven. Instead of this, the hierarchical search line scheme divides the search lines into a two-level hierarchy of global searchlines (GSLs) and local searchlines (LSLs). Figure 14 shows a simplied hierarchical search line scheme, where the match lines are pipelined into two segments, and the search lines are divided into four LSLs per GSL. In the gure 14, each LSL feeds only a single match line (for simplicity), but the number of match lines per LSL can be 64 to 256. The GSLs are active every cycle, but the LSLs are active only when necessary. Activating LSLs is necessary when at least one of the match lines fed by the LSL is active. In many cases, an LSL will have no active match lines in a given cycle; hence there is no need to activate the LSL, saving power. Thus, the overall power consumption on the searchlines is: (7) Where CGSL is the GSL capacitance, CLSL is the LSL capacitance (of all LSLs connected to a GSL) and is the activity rate of the LSLs. CGSL primarily consists of wiring capacitance, whereas CLSL consists of wiring capacitance and the gate capacitance of the SL inputs of the CAM cells. The factor, which can be as low as 25% in some cases, is determined by the search data and the data stored in the CAM. The above equation determines how much power is saved on the LSLs, but the cost of this savings is the power dissipated by the GSLs. Thus, the power dissipated by the GSLs must be sufciently small so that overall searchline power is lower than that using the conventional approach. If wiring capacitance is small compared to the parasitic transistor capacitance [12], then the scheme saves power. However, as transistor dimensions scale down, it is expected that wiring capacitance will increase relative to transistor parasitic capacitance. In the situation where wiring capacitance is comparable or larger than the parasitic transistor capacitance, CGSL and CLSL will be similar in size, resulting in no power savings. In this case, small-swing signaling on the GSLs can reduce the power of theGSLs compared to that of the full-swing LSLs [10], [11]. This results in the modied searchline power (8) Where VLOW is the low-swing voltage on the GSLs (assuming an externally available power supply VLOW). This scheme requires an amplier to convert the lowswing GSL signal to the full-swing signals on the LSLs. Fortunately, there is only a small number of these ampliers per searchline, so that the area and power Volume 2, Issue 3 May June 2013 overhead of this extra circuitry is small [11].

Figure 14 Hierarchal Search Line Structure [10], [11].

6. CONCLUSION
In this paper, CAM and its application and basics related to it are introduced. Various cells of CAM mainly NOR cell and NAND cell and their operations are also discussed. This discussion is extended to these cells which are used to design a match line of CAM mainly the power consumption of CAM due to match line sensing techniques and search line driving approaches which are used to reduce the power consumption of CAM. In future many techniques can be used to design Low power CAMs.

References
[1] Kostas Pagiamtzis, and Ali Sheikholeslami, Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey, IEEE Journal of Solid-State Circuits, Vol. 41, No. 3, March 2006. [2] C. Zukowski And S. Wang, Use Of Selective Precharge For Low-Power Content-Addressable Memories, In Proc. IEEE Int. Symp. Circuits Syst., Jun. 912, 1997, PP. 17881791. [3] C. Lin, J. Chang, and B. Liu, A Low-Power Precomputation-Based Fully Parallel ContentAddressable Memory, IEEE J. Solid-State Circuits, Vol. 38, No. 4, PP. 654662, Apr. 2003. [4] Sanghyeon Baeg, Low-Power Ternary ContentAddressable Memory Design Using a Segmented Match Line, IEEE Transactions on Circuits And SystemsI: Regular Papers, Vol. 55, No. 6, July 2008. [5] M. Khellah and M. Elmasry, Use Of Charge Sharing To Reduce Energy Consumption In Wide Fain-In Gates, in Proc. IEEE Int. Symp. Circuits Syst., 1998, PP. 912. [6] G. Kasai, Y. Takarabe, K. Furumi, and M. Yoneda, 200-Mhz/200-MSPS 3.2W At 1.5V Vdd, 9.4Mbits Page 367

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)


Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
Ternary CAM With New Charge Injection Match Detect Circuits And Bank Selection Scheme, in Proc. IEEE Custom Integr. Circuits Conf. (CICC), 2003, PP. 387390. [7] Chua-Chin Wang, Chia-Hao Hsu, Chi-Chun Huang, and Jun-Han Wu, A Self-Disabled Sensing Technique for Content-Addressable Memories, IEEE Transactions on Circuits And SystemsII: Express Briefs, Vol. 57, No. 1, January 2010. [8] Anh-Tuan Do, Shoushun Chen, Zhi-Hui Kong, and Kiat Seng Yeo, A High Speed Low Power CAM With a Parity Bit And Power-Gated ML Sensing, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 21, No. 1, January 2013. [9] K. Pagiamtzis and A. Sheikholeslami, A Low-Power Content-Addressable Memory (CAM) Using Pipelined Hierarchical Search Scheme, IEEE J. Solid-State Circuits, Vol. 39, No. 9, pp. 15121519, Sep. 2004. [10]H. Noda, K. Inoue, M. Kuroiwa, F. Igaue, K. Yamamoto, H. J. Mattausch, T. Koide, A. Amo, A. Hachisuka, S. Soeda, I. Hayashi, F. Morishita, K. Dosaka, K. Arimoto, K. Fujishima, K. Anami, and T. Yoshihara, A Cost-Efficient High-Performance Dynamic TCAM with Pipelined Hierarchical Search And Shift Redundancy Architecture, IEEE J. SolidState Circuits, Vol. 40, No. 1, pp. 245253, Jan. 2005. [11]K. Pagiamtzis and A. Sheikholeslami, Pipelined Match-Lines And Hierarchical Search-Lines For Low-Power Content-Addressable Memories, in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 2003, pp.383386. [12]H. Noda, K. Inoue, M.Kuroiwa, A. Amo, A. Hachisuka, H. J. Mattausch, T. Koide, S. Soeda, K. Dosaka, and K. Arimoto, A 143 MHz 1.1W4.5 Mb Dynamic TCAM with Hierarchical Searching and Shift Redundancy architecture, in IEEE Int. SolidState Circuits Conf. (ISSCC) Dig. Tech.Papers, 2004, pp. 208209.

Volume 2, Issue 3 May June 2013

Page 368

You might also like