Design and Analysis of a High Speed Carry Select Adder

— An optimal high-speed and low-power VLSI architecture requires an efficient arithmetic processing unit that is optimized for speed and power consumption. Adders are one of the widely used in digital integrated circuit and system design. High speed adder is the necessary component in a data path, e.g. Microprocessors and a Digital signal processor. The present paper proposes a novel high-speed adder by combining the advantages of Carry Look Ahead Adder (CLAA) and Carry Select Adder (CSA), devising a hybrid CSA. In the proposed adder, CSA uses CLAA technology to generate the carry bits for each sum bit, which are then used to select the respective multiplexer (MUX) which adds the carry bit to the sum accordingly. The proposed adder has been synthesized with bulk 40 nm standard CMOS library on Synopsys Design Compiler. Analysis has indicated the superiority of proposed adder over CLAA and CSA. As compared to CSA and CLAA, the proposed Carry Select Ahead Adder (CSAA) provides shorter average path and a simpler hardware. This has led to faster processing speed by increasing the complexity of circuit on the chip. The proposed adder finds its applications in various Arithmetic and Logic Units (ALU) of CPUs for faster arithmetic results.


I. INTRODUCTION
The latest rapid advancements in multimedia and communication systems, real-time signal processing have led to an increase in the need for faster processing systems, that are able to perform complex calculations faster.Performance of large digital circuits is dependent on the speed of circuits that form various functional units.
One of the most essential units of such a functional unit is an ALU, which is a major component of the central processing unit (CPU) of a computer system.It performs all processes related to arithmetic and logic operations that need to be done on instruction words.With the increase in complexity of operations, the ALU also becomes more expensive, taking up more space in the CPU and dissipating more heat.This has augmented extensive effort to increase computation capability while reducing the complexity of ALU so as to ensure that the CPU is also powerful and fast enough.
Designing of high-speed and low-power VLSI architectures requires efficient arithmetic processing units that are optimized for the performance parameters, like speed and power consumption.Adders are the main component in general purpose microprocessors and digital signal processors.For adding two binary numbers there exists several adder structures based on very different design ideas.Thus if one need to implement an addition circuit one must decide which circuit is most appropriate for its planned application.
Adders find application in many other functions such as subtraction, multiplication and division.Since addition is the base for all types of subtraction, multiplication, signal and image processing; design of an efficient adder circuit becomes necessary to realize an efficient system design.A high-speed, low power and area efficient addition/multiplication has always been a fundamental requirement of high-performance processors and systems.The two basic adders, i.e., Half-Adder (HA) and Full Adder (FA) perform the addition of two and three bits respectively.The HA comprise of an AND gate, and an Exclusive OR gate.Full-adder (FA) performs the arithmetic sum of three bits.Three inputs involve two input bits plus an extra bit for an incoming carry.This is important for cascading adders together to create N-bit adders.A full adder is made up of two HAs and an OR gate.Since for current applications, a two input or three input adders is not sufficient, many multiple bit adders has been proposed and consequently used as per the application requirement.
For addition of multiple bits, carry needs to be propagated from one bit position to the next bit position.The time required to propagate a carry through the adder determines the speed of addition [1].Many different approaches had already been suggested to improve the performance of the adder.The easiest type of parallel adder to build is a ripple carry adder, which uses a chain of one bit full adder to generate its output.The Ripple Carry Adder (RCA) gives the most compact design, occupies small area but takes longer computation time.The delay of RCA is linearly proportional to number of input bits.For some input signals carry has to ripple all the way from least significant bit (LSB) to most significant bit (MSB).The propagation delay of such a circuit is defined as the worst case delay over all possible input patterns also called as critical path delay.
The time critical applications use Carry Look-ahead scheme (CLAA) to derive fast results.In CLAA, for every single bit, sum and carry is independent of the previous bits.This eliminates the ripple effect making it faster than RCA but it also lead to increase in area.CLAA is fast for a design having less input bits, for higher number bits it shows the worse delay [2].Several modifications/improved versions have been proposed and implemented to further improve the performance of carry look ahead adders [3]- [7].In contrast, a Carry Skip Adder (CSKA) uses a carry skip scheme to reduce the additional time taken to propagate the carry signal in RCA.Thus, CSKA is faster than RCA at the expense of a few simple modifications.
Another adder SCBCLA [8], which is a self-timed implementation of CLAA adders based on the 'section-carry', has also been implemented via a semicustom ASIC-based design flow targeting a 130nm bulk CMOS standard cell library [9].Unlike CCLA, SCBCLA need not compute bitwise look-ahead carry signals for each and every adder stage.Instead SCBCLA produces the look-ahead carry signal corresponding to a 'section' or 'group' of adder inputs.An SCBCLA adder contains three constituent blocks: SCBCLA unit, full adder, and sum logic.Two different topologies of SCBCLA have been suggested based on the utilization of modules-either solely or in conjunction with a ripple carry adder (RCA) section in the least significant adder stage [9].
To mitigate linear dependency of carry propagation delay, a Carry Select Adder (CSA) has been proposed that including parallelism by anticipating both possible values of the carry input i.e. 1 or 0 and evaluates the result in advance.The final sum and carry is chosen using multiplexers after the knowledge of real value of the carry.The carry-out bit of the preceding block of the adder acts as the select signal to the multiplexer.Though, the use of two adders and final selection multiplexers consumes more area, reduction in carry propagation delay is substantial.Thus CSA proves to be a compromise between RCA (small areas but longer delay) and CLAA (larger area with shorter delay) [10].This has led to research in the field of CSA and varied versions of CSA have been proposed in the literature [11]- [15].
In contrast to CSA, a Carry Select Adder with Sharing (CSAS) has been proposed that uses of a fast incrementer circuit instead of adders to increment the interim sum when the input carry is obtained as logic 1 [16].This performs each of the two additions in half of the clock cycle by using few latches.Iterative use of this concept can lead to efficient trade off of area for delay.More specifically, the delay of the proposed adder is O (2n) while its area is O ((1+α) n), where α<1.
To reduce area and power consumption with small speed penalty, a Modified Carry Select-Adder (MCSA) design has also been proposed, that uses a single RCA and Binary to Excess-1 Converter (BEC) instead of using dual RCAs [17].Since BEC designing takes less number of logic gates than RCA design, reduction in area for MCSA and total power consumption is achieved.
The present paper proposes a novel parallel prefix adder that takes the advantage of both carry generating circuit of CLAA and multiplexing of CSA.The proposed CSAA uses the MUXs which preselect the next four sum bits and their corresponding carry output.The proposed CSAA decreases the time delay without the use of parallel multiplier circuits.A reduction in time delay has been achieved mainly due to addition of four bits simultaneously instead of bit wise addition.Analysis indicates that the performance of proposed CSAA improves with increase in number of bits.Further, the performance improvement has been achieved without making any changes in the structure of conventional logic gates.The next section gives a brief overview of the three basic adders.
Section III presents the proposed adder and its result analysis.Finally, section IV concludes the proposed work.

II. PRELIMINARIES
In the simplest of operations, considering a Half Adder (HA), the output 'Sum' is simply the 'XOR' operation of the inputs and the output 'Carry' (C out ) is the 'AND' operation of the inputs.The more complex Full Adder (FA) also considers a Carry bit (C in ) from the previous stage, such as ALU, LU, etc., along with the two inputs.A FA is made up of two HAs.The output Sum of the first HA propagates as the input of the second HA, which receives its second input as the C in from the previous stage.The resultant Sum is obtained from the Sum output of the second HA and the resultant C out is obtained by performing OR operation on both the Carry outputs of the HAs.
Fig. 1. 4-bit Ripple Carry Adder [19] It is possible to create a logical circuit using multiple FAs to add N-bit numbers.Each full adder has a C in input, which is the C out of the previous adder.This kind of adder is called a ripple-carry adder (RCA), since each carry bit "ripples" or carries over to the next full adder.The layout of a RCA is simple, which allows for fast design time; however, the RCA is relatively slow, which can be attributed to the fact that each full adder must wait for the carry bit to 'ripple' from the previous full adder [18,19].Fig. 2. 4-bit Carry Look-Ahead Adder [19] For reducing the computation time, engineers devised a faster way to add two binary numbers by using (Carry Look Ahead Adder) CLAA.A CLAA improves speed by reducing the amount of time required to determine all the carry bits.It can be contrasted with the simpler, but usually slower RCA for which the carry bit is calculated alongside the sum bit, and each bit must wait until the previous carry has been calculated to begin calculating its own result and carry bits respectively.The CLAA calculates one or more carry bits before the sum itself, which reduces the wait time to calculate the result of the larger value bits.CLAA is based on two principles: 1. Calculating, for each position of the digit, whether that position is going to propagate a carry if one comes in from the right.
2. Combining these calculated values to be able to deduce quickly whether, for each group of digits, that group is going to 'ripple' a carry that comes in from the right.
Supposing that groups of 4 digits are chosen.Then the sequence of events goes something like this: 1.All 1-bit adders calculate their results.
Simultaneously, the look-ahead units perform their calculations.
2. Suppose that a carry arises in a particular group.That carry will emerge at the left-hand end of the group and start propagating through the group to its left.
3. If that carry is going to propagate all the way through the next group, the look-ahead unit will already have deduced this.Accordingly, before the carry emerges from the next group the look-ahead unit is immediately able to tell the next group to the left that it is going to receive a carry and, at the same time, to tell the next look-ahead unit to the left that a carry is on its way.The net effect is that the carries start by propagating slowly through each 4-bit group, just as in a RCA system, but then move 4 times as fast, leaping from one look-ahead carry unit to the next.Finally, within each group that receives a carry, the carry propagates slowly within the digits in that group [18,19].
To determine whether a bit pair will generate a carry, the following logic works: = .
To determine whether a bit pair will propagate a carry, either of the following logic statements work: A (Carry Select Adder) CSA is a particular way to implement an adder.CSA anticipates all possible values of input carry i.e. 0 and 1 and evaluates the result in advance.Once the original value of carry is known, result can be selected using the multiplexer stage.Therefore the conventional CSA makes use of Dual RCAs to generate the partial sum and carry by considering input carry Cin = 0 and Cin = 1, then the final sum and carry are selected by multiplexers.The carry-select adder generally consists of two RCAs and a MUX, and thus is area consuming due to the use of dual RCAs.Adding two n-bit numbers with a CSA is done in order to perform the calculation twice, one time with the assumption of the carry being '0' and the other assuming '1'.
After the two results are calculated, the correct Sum, as well as the correct Cout, is then selected with the multiplexer once the correct carry is known.The basic building block of a CSA (4-bit) consists of two 4-bit RCAs that are multiplexed together, where the resulting Cout and Sum bits are selected by the Cin.Since one RCA assumes Cin of 0, and the other assumes a Cin of 1, selecting which adder had the correct assumption via the actual carry-in yields the desired result.A 16-bit CSA with a uniform block size of 4 can be created with 3 of these blocks and a 4-bit RCA.Since Cin is known at the beginning of computation, a carry select block is not needed for the first four bits [18,19].The next section explains the structure and the working of proposed CSAA, followed by its simulation and result analysis.

III. PROPOSED CSAA
In the proposed CSAA adder, several modifications have been done to decrease the computation time.In the present work, a novel architecture has been proposed that exploits the CLAA's ability to generate carry bits before computing the sum and the multiplexing ability of CSA.In a conventional adder where RCA is used in CSA, the multiplexer waits for RCA to compute the sum and propagate the carry to the next RCA.This renders the higher order RCAs inactive, consequently leading to increase in computation time.CLAA helps to overcome this delay by calculating the carry bits ahead of the sum, so that all RCAs can function simultaneously and generate the sum.
The proposed adder is very similar to a traditional CSA, albeit different.Figure .4 depict the structure of proposed CSAA (say 16-bit), where the operations are divided by taking 4-bits at a time.For a 16 bit addition using proposed adder, eight CLAAs and four CSA units are required, where each unit contains a 10:5 multiplexer.The initial carry bit in each CLAA is either connected to Vcc (+5V) or the ground, and each output carry is connected to the input of a 10:5 multiplexer, which then selects the required sum.
In the proposed CSAA (say 16-bit), the operations are divided by taking 4-bits at a time, the first 2 CLAAs are used for adding the first four bits b[3]-b[0], which only differ in the initial carry bit, i.e. 0 or 1, their sum and their output carry both selected by Cin using 10:5 Mux.The output carry C0 generated in the previous stage is similarly used for selecting the output carry and the sum of the addition of the next four bits b7-b4.This same procedure takes place for both sets of bits b11-b8 and b15-b12 and the output carry of the b15-b12 bits is the output carry (Cout) of the whole 16-bit High Speed CSA.Similarly, this same procedure can be extended for 32-bit High Speed CSA or for the required number of bits.
A digital schematic for the proposed CSAA has been created using Verilog as the primary language on Xilinx Vivado Design Suite 2014.4.This has been indicated in Figure 7.The next section presents the simulation and result analysis of the proposed CSAA.

A. Result Analysis
A hardware/design schematic has been created for 16 bit and 32 bit variant of the proposed adder.The designed hardware in Verilog has been simulated using Synopsys Design Compiler.The technology used for the hardware was standard 40nm CMOS bulk technology.The results for both versions of the adder -16 bit and 32 bit -have been compared taking the CSA as a standard.The following parameters of the circuit are taken into consideration and presented in Table 1: A) Area -: Units: µm 2 (10 -6 metre 2 ) B) Power -: Units: mW (10 -3 Watt) C) Timing -: Units: ns (10 -9 second) Based on the three main performance parameters, viz., power, timing and area, a comparison analysis of the proposed adder has been done with CSA, RCA and CLAA.This has been quantitatively indicated in Table 1 and graphically depicted in Figure 5 and Figure 6.The comparison has been done for both for 16-bit and 32-bit variants respectively.The result analysis clearly indicates the fast computation time achieved by proposed adder as compared to its conventional variants, CSA, CLAA and RCA.As is evident from Table 1, the High Speed CSA shows a 51.1% and a 59.4% increase in speed in the 16 and 32 bit variants w.r.t. the 16 and 32 bit variants of CSA respectively.This can be attributed to the advanced carry selection and multiplexing More importantly, it is worth noticing that with doubling the number of bits, i.e. from 16 to 32, there is an 8.3% increase in speed.This reflects better speed performance of the proposed CSAA with increase in number of bits.Though, with higher number of bits, the percentage increase in the area of proposed CSAA is same as that of conventional CSA; the power reduces by 5.14%.This depicts improved power and area performance of proposed CSAA as compared to conventional CSA for higher number of bits.

TABLE 1 :
COMPARISON OF ADDERS FOR AREA, POWER AND TIMING