

# A Comparative Performance Analysis of Low Power Bypassing Array Multipliers

Nirlakalla Ravi, S. Venkateswarlu, T. Jayachandra Prasad

RGM College of Engineering & Technology, (Autonomous), JNT University-ATP, Nandyal, Andhra Pradesh-518501, India

E-mail: ravi2728@gmail.com

# Thota Subba Rao

Department of Physics, S. K. University-ATP, Nandyal, Andhra Pradesh-515003, India *E-mail: thotasubbarao6@gmail.com* 

*Abstract*— Low power design of VLSI circuits has been identified as vital technology in battery powered portable electronic devices and signal processing applications such as Digital Signal Processors (DSP). Multiplier has an important role in the DSPs. Without degrading the performance of the processor, low power parallel multipliers are needed to be design. Bypassing is the widely used technique in the DSPs when the input operand of the multiplier is zero. A Row based Bypassing Multiplier with compressor at the final addition of the ripple carry adder (RCA) is designed to focus on low power and high speed. The proposed bypassing multiplier with compressor shows high performance and energy efficiency than Kuo multiplier with Carry Save Adder (CSA) at the final RCA.

*Index Terms*— Bypassing, Low Power, Speed, CSA, RCA, Compressor

### I. Introduction

In modern VLSI system, power is the most important parameter to optimize for low power applications like Digital Signal Processor (DSP), portable devices etc. DSP is one of core technologies for multimedia and mobile applications, most DSP applications entail addition and multiplication arithmetic operations. Especially, the multiplier is the critical arithmetic operation unit for many DSP applications, such as filtering, convolution, Fast Fourier Transform (FFT), etc. Analysis of the conventional DSP applications shows that the average of zero input of operand in multiplier is 73.8 percent. An important low power design to reduce power consumption is to shutdown part of a circuit while it is not in operation. The power reduction in multipliers can be achieved using bypassing technique in DSP's. The primary power reductions are obtained by tuning off MOS components through multiplexers when the operand of the multipliers are zero [1][2].

The major source of power dissipation in CMOS circuits is the dynamic power dissipation. Dynamic power dissipation appears only when a CMOS gate switches from one stable state to another. In this paper we present a technique to minimize power dissipation in digital multipliers, from dynamic power of the total power consumption the concentration is on switching activity. There have been proposed a lot of techniques to reduce the switching activity of logic circuit <sup>[3] [4] [5] [6]</sup> <sup>[7]</sup> <sup>[8]</sup>. Bypassing is the extensively used technique for the reduction of major part of the total power consumption i.e dynamic power. Multiplication operation requires more computational time and higher circuit complexity. Many other complex arithmetic operations, like exponentiation, division, and multiplicative inversion, can be therefore performed by applying multiplication operations repeatedly. Hence, it is important in a practical sense to develop fast multiplication algorithms for these complex arithmetic operations. Not only to reduce power consumption, to enhance the speed of the bypassing multiplier (BM) also reported [2]. Our contribution added a step ahead to improve the performance of the BM without increase of the power consumption.

The remaining organization of the paper is as follows: In Section 2 bypassing technique is discussed. Section 3 discusses the row based bypassing array multiplier. The proposed row bypassing multiplier with compressor is discussed in Section 4. The results and discussions are given in Section 5. Finally conclusion is given in Section 6.

# 1.1 Sources of Power Consumption in CMOS

Power consumption can be reduced in CMOS circuits by using a smaller design of multipliers like unsigned row and column bypassing multipliers. The sources of power consumption in CMOS circuits is given by

$$P_{total} = P_{dynamic} + P_{short \ circuit} + P_{leakage}$$

$$P_{total} = \alpha f_{clk} C_L V_{DD}^2 + I_{SC} V_{DD} + I_{leakage} V_{DD}$$
(1)

Dynamic power ( $P_{dynamic}$ ) dissipation is the result of charging the load capacitances in a circuit, where  $\alpha$  is the switching activity,  $f_{clk}$  is the clock frequency,  $C_L$  is the output capacitance and  $V_{DD}$  is the supply voltage. P-short-circuit is the power dissipation due to direct path from  $V_{DD}$  to GND. There are six main sources of leakage current ( $P_{leakage}$ ) in a CMOS transistor

- 1. Reverse-biased junction leakage current (I<sub>REV</sub>)
- 2. Subthreshold (weak inversion) leakage (I<sub>SUB</sub>)
- 3. Oxide tunneling current (I<sub>OX</sub>)
- 4. Gate direct-tunneling leakage (I<sub>G</sub>)
- 5. Gate induced drain leakage (IGIDL)
- 6. Punchthrough current  $(I_{PT})$

Currents I<sub>2</sub>, I<sub>5</sub>, and I<sub>6</sub> are off-state leakage mechanisms, while I<sub>1</sub> and I<sub>3</sub> occur in both ON and OFF states. I<sub>4</sub> occurs in the off state, but typically occurs during the transistor bias states in transition <sup>[9][10][11][12]</sup> <sup>[13]</sup>

#### **II.** Bypassing Techniques

The key idea of this design is based on the observation that the most modern multipliers produce a large number of signal transitions while adding zero partial products. The design uses another way to transition activity optimization, and that is hardware bypassing. Since, adding zero partial products generate a large number of signal transitions in the carry-adder array without affecting the results and the additions bypasses by disabling the adders.

### 2.1 Row Bypassing

For a low power row bypassing multiplier, the addition in the j<sup>th</sup> row can be disabled to reduce the power dissipation if the bit  $b_i$  in the multiplier is 0, i.e., all partial products  $a_i b_i$ ,  $0 \le i \le n-1$ , are zero. As a result, the addition operations in the j<sup>th</sup> row of CSA is in the Fig. 1 is bypassed and the outputs from the  $(j-1)^{th}$  row of CSAs is directly fed to the  $(j+1)^{th}$  row of CSAs without affecting the multiplication result. In the design, each modified FA in the CSA array is attached by three tristate buffers and two 2-to-1 multiplexers as shown in Fig. 2. The tri-state buffer shown in Fig. 3 decides whether to disable the full adders or not according to the multiplier bits b<sub>i</sub>. And then utilizes two multiplexers as shown in Fig. 4 to select the correct outputs. The extra correcting circuits must be added to correct the multiplication result.

When the corresponding partial product is zero, the RBAC disables unnecessary transitions and bypasses the inputs to outputs. Two multiplexers augmented to the outputs of the adder transmit the input-carry bit and the input-sum bit of the previous addition to the outputs.

The tri state buffers placed at the input of the adder cells disable signal transitions in the adders which are bypassed, and the input carry bit and input sum bit are passed to downwards.











Fig. 3: TGCMOS Tristate Buffer



Fig. 4: TGCMOS based 2-1 Mux

### 2.2 Column Bypassing

Instead of bypassing rows of Full adders, columns of Full adders of the multiplier design are bypassed. In this approach, the operations in a column can be disabled if the corresponding bit in the multiplicand is 0. There are two pro of this method. First, it eliminates the extra correcting circuit. Secondly, the modified FA is shown in Fig. 6 is simpler than that of used in the row bypassing multiplier.

Theorem: When  $a_j = 0$ , the output of a column j adder cell FA<sub>i,j</sub> can be specified as follows:

- 1. The output carry bit is 0.
- 2. The output sum bit is equal to the output sum bit of FA<sub>i-1,j+1</sub>.

Proof: 1. Consider row 0. Note that, in row 0, there are only two bits to be added. Adder  $FA_{0j}$  carries out  $a_jb_1 + a_{j+1}b_0$ . If  $a_j = 0$ , then the output carry bit must be zero, and the out sum bit is equal to  $a_{j+1}b_0$ .

- 2. Assume that the theorem holds for row i.
- 3. In row i+1, the inputs of  $FA_{i+1,j}$  are carry bit from  $FA_{i,j,}$  sum bit from  $FA_{i,j+1}$ , and the partial product  $a_jb_{i+1}$ . Since  $a_j = 0$ , two out of the three inputs are 0, and the output sum bit is equal to the sum bit sent by  $FA_{i,j+1}$ . The column bypassing if  $a_i = 0$  is as shown in Fig. 5.





Fig. 6: Column Bypassing Adder Cell

The two tri-state buffers are placed at two inputs of full adder to disable the operation of full adder when  $a_j$  is 0. The tri-state buffer is designed by TG-CMOS. The multiplexer is placed at the sum output of full adder. The value of sum is selected from the bypassing value or sum output of full adder according to the value of  $a_j$ . This bypassing cell does not need to add multiplexer for carry output and tri-state buffer for carry input of full adder. Therefore, significant portion of extra hardware is saved without degrading the performance. In addition, power consumption can be also reduced as an effect of reduced hardware activities.

# III. Bypassing Array Multiplier

In numerous computing and signal processing applications, parallel multiplier has been a building block for many algorithms. The Carry-Save Array (CSA) multiplier is a straight forward implementation of vector multiplication. It consists of a partial product reduction tree, which is used to calculate partial products in Carry-Save redundant form, and a final chain adder to transform the redundant form in normal binary form. The functionality of the Carry-Save array multiplier is as follows.  $X = (x_{n-1}, \dots, x_0)$  and  $Y = (y_{n-1}, \dots, x_0)$  $1.....y_0$ ) are fed into an array of FA cells. Each FA cell performs the multiplication X<sub>i</sub> x Y<sub>i</sub> using an AND gate and then adds the result with the incoming carry bits, to produce an output sum and an output carry. All FA cells are appropriately connected (sums and carries) to perform the multiplication. The final adder is used to merge the sums and carries from the last row of the array, since in every row the carry bits are not immediately added but rather propagated to the row below. The column bypassing multiplier using CSA is as shown in Fig. 8 with Column Bypassing Adder Cell shown in Fig. 6.

Fig. 5: A 8  $\times 8$  multiplication of bit  $a_i \!=\! 0$  corresponds to column  $i \!=\! 0$ 

#### IV. Row based Bypassing Multiplier with CSA

The Bypassing scheme is used for low power applications of the processor. The method is used to disable the gate if the input operand of the multiplier is zero. To focus on the speed of the multiplier a design was proposed shown in Fig. 9, in which CSA architecture is used at the final addition of RCA to shorten the delay of the multiplier <sup>[2]</sup>. For an example, 8x8 multiplication can be divided into two 8x4 bypassing multiplier based on RCA as shown in Fig. 7.



Fig. 7: 8x4 Partial product reduction into two parts



Fig. 8: Schematic design of Column Bypassing Array Multiplier



Fig. 9: Row based Bypassing multiplier with CSA at final addition [2]



Fig. 10: Row based Bypassing Array Multiplier with Compressor at final addition

The partial sums and carry output from these two 8x4 multipliers can be computed simultaneously. Note that the final stage adders consist of RCA adders in both sides and CSA adders in the middle. In this configuration, the parallelism of the existed multiplier can be established. Furthermore, delay time of RCA multiplier can be shortened through this method.

# 4.1 Proposed Row based Bypassing Multiplier with Compressor

In this paper, the proposed multiplier shown in Fig. 10 adopts parallel architecture to shorten delay time further than that of the multiplier shown in Fig. 9. This proposed multiplier consists of a compressor at the middle of RCA further to accelerate the speed of the multiplier. For example, in an 8x8 multiplication the 8x4 two partial product blocks with bypassing method based on RCA with compressor design is shown in Fig. 10. The partial sums and carry output from these two 8x4 multipliers can be computed simultaneously. Note that the final stage adders consist of RCA adders on both sides and compressors at the middle. With this configuration, a parallelism of the proposed multiplier can be reduced with somehow having extra hardware.

Minimizing the number of resources required within a processor would have a positive impact on its power performance. Furthermore, since an adder is one of the basic arithmetic units, any improvement in the performance of an adder would have a major impact on the performance of a processor. Multi-operand adder structures are frequently used for the summation of partial products in multiplication, as in the Wallace and Dadda tree multipliers. They are also used in the implementation of constant multiplications into shifts and additions <sup>[14][15][16]</sup>. The 14-T full adder design used to design the compressor is as shown in the Fig. 11. The performances of the full adder with other full adders <sup>[17]</sup> <sup>[18]</sup> are as shown in the Fig. 12.



Fig. 11: Schematic diagram of 14-T full adder



Fig. 12: EDP of the different full adders for 180nm technology

Compressors do the simple operation of addition that adds more number of bits at a time.

Different compressors logic based upon the perception of the counter of full adder, a single bit full adder can be considered as a counter of "1,s" at the input bits. It can be defined as single bit adder circuit that has four/five/six/seven inputs and three outputs.

The Wallace tree architecture supports fully parallel partial product reduction. The classic 3-input Wallace tree element is a carry-save adder which accepts 3-hit wide opera.nds and exports a 2-bit wide result, i.e., the 3-2 compressor takes 3-inputs of same weight and produces 2 outputs, a sum of weight 1 and a carry of weight. 2. Given the nature of the 3-2 compressor, it is impossible to build completely regular tree architecture.



Fig. 13: RBM Comp design impementation

#### V. Simulation Results and Discussions

The design implementation using Tanner EDA tool is as shown in Fig. 13. The performance evaluation of the all the bypassing multipliers are done by Synopsys HSPICE for 180nm technology with a supply voltage of 1.8V. Table 1 shows the performances of the bypassing multipliers in terms of power, delay, energy delay product (EDP) and number of MOS components.

| Multiplier | Power (mW) | Delay (ns) | EDP (js) | Transistors |
|------------|------------|------------|----------|-------------|
| СВМ        | 11.91      | 0.95       | 1.07E-20 | 1904        |
| RBM        | 14.24      | 1.45       | 2.99E-20 | 2336        |
| RBMCSA [2] | 13.30      | 79.56      | 8.42E-17 | 2408        |
| RBM Comp   | 13.36      | 1.45       | 2.81E-20 | 2422        |

Table 1: Performance Comparison of the multipliers

The proposed RBM using compressors can consume little more power 13.36 mW than <sup>[2]</sup> 13.3 mW because of the one extra full adder as shown in the Fig. 14. Due to extra hardware the RBM consumes more power than CBM.

RBM Compressor design shows high speed 1.45 ns than that of the RBMCSA 79.56 ns as shown the performances in the Fig. 15. The vertical compression of the compressors due to parallel nature enhances the performance of the RBM Comp. MOS components of the implemented designs are also given in the Table. 1. Though the proposed requires one full adder, the proposed multiplier is energy efficient than RBM and RBMCSA. All multipliers are implemented with 14-T full adders. Therefore the proposed requires 14 additional transistors.



Fig. 14: Total power consumption of the multipliers



Fig. 15: Performances of the multipliers

#### VL Conclusion

Low power designs are mandatory nowadays for DSPs and battery powered portable electronic appliances. The arithmetic operations of DSPs must be performed for low power consumption without loss of the performance. The prominence of this paper is without increase of the power consumption to accelerate the performance of the multiplier using bypassing technique. Column bypassing cell is used in row bypassing technique for fewer transistors in order to decrease the power consumption. In this paper a step is taken further on to increase the speed of the multiplier further using a compressor is designed with full adders and placed at the middle of the final addition of the RCA. The RBM Comp multiplier consumes little more power with the enhancement of speed and also saves more energy by consuming one full adder area.

#### References

- Sunjoo Hong, Taehwan Roh, and Hoi-Jun Yoo, "A 145 μW 8×8 Parallel Multiplier based on Optimized Bypassing Architecture," IEEE International conference, pp: 1175-1178, 2011.
- [2] Ko-Chi Kuo, Chi-WenChou, "Low power and high speed multiplier design with row bypassing and parallel architecture," Microelectronics Journal 41, pp: 639–650, 2010.
- [3] Ko-Chi Kuo, Chi-WenChou, "Low power Multiplier with bypassing and tree structure," IEEE Conference Proceedings, pp: 602–605, 2006.
- [4] Chua-Chin and Gang-Neng Sung, "A Low power 2-Dimensional Bypassing Multiplier Using 0.35um CMOS Technology," IEEE Proceedings of the Emerging VLSI Technologies and Architectures (ISVLSI'06), 2006.

- [5] Gang-Neng Sung and Chua-Chin, "A power aware 2 dimensional bypassing multiplier using cellbased design flow," IEEE Conference Proceedings, pp: 3338–3341, 2008.
- [6] Jin-Tai Yan and Zhi-Wei Chen, "Low Power Multiplier Design with Row and Column Bypassing," IEEE Conference Proceedings, pp: 227–230, 2009.
- [7] Gang-Neng Sung, Yu-Cheng Lu and Chua-Chin Wang, "A power aware signed 2dimensional bypassing multiplier for video/image processing," IEEE Conference Proceedings, 2010.
- [8] Dimitris Bekiaris, George Economakos and Kiamal Pekmestzi, "A mixed style multiplier architecture for low dynamic and leakage power dissipation," IEEE Conference Proceedings, pp: 258-261, 2010.
- [9] Anantha P. Chandrakasan and Robert W. Brodersen, "Minimizing Power Consumption in CMOS Circuits," pp: 1-64, 1995.
- [10] Farzan Fallah and Massoud Pedram, "Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits," pp: 1-21.
- [11] Victor Adler and Eby G. Friedman, "Delay and Power Expressions for a CMOS Inverter Driving a Resistive-Capacitive Load," Analog Integrated Circuits and Signal Processing, 14, 29–39, 1997.
- [12] Y. Taur, "CMOS design near the limit of scaling," IBM J. Res. & Dev. Vol. 46 no. 2/3 March/May 2002.
- [13] Alvin Joseph J. Tang and Joy Alinda Reyes, "Comparative Analysis of Low Power Multiplier Architectures," IEEE Fifth Asia Modelling Symposium, pp: 270-274, 2011.
- [14] C. S. Wallace, "Suggestions for a Fast Multiplier," IEE Transactions on Electron. Computers, EC-13, pp. 14-17, 1964.
- [15] L. Dadda, "Some Schemes for Parallel Multipliers," Alta Freq., 34:349-356, 1965.
- [16] Shubjit Roy Chowdhury, Aritha Banerjee, Aniruddha Roy, Hiranmay Saha, "Design, Simulation and Testing of a High Speed Multiplication Applications," IEEE Computer Society, pp. 434 - 438, 2008.
- [17] C.-H. Chang, J. Gu, and M. Zhang, "A review of 0.18-μm full adder performances for tree structured arithmetic circuits," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 6, pp. 686–695, 2005.
- [18] Jin-Fa Lin, Yin-Tsung Hwang, Ming-Hwa Sheu, and Cheng-Che Ho, "A Novel High-Speed and Energy Efficient 10-Transistor Full Adder Design," IEEE Transactions on Circuits and

Systems-I: Regular Papers, vol. 54, no. 5, pp. 1050-1059, 2007.

#### **Authors' Profiles**



**N. Ravi:** Received B.Sc degree in electronics from Osmania Degree College-Kurnool, in 1998. He obtained Masters Degree in Physics from S.K. University, Anantapur, AP - India in 2001.

He received Ph.D in Low Power VLSI Design from Sri Venkateswara University, Tirupati in 2013. He is working at Rajeev Gandhi memorial College of Engg & Technology, Nandyal, as an Associate Professor & HOD in the department of Physics. His research interests include Low power and High performance VLSI designs, FGPA, Solid State Physics and Nanotechnology. He has published many research papers in various international journals.



**S. Venkates warlu:** Professor in the department of Mathematics at Rajeev Gandhi Memorial College of Engg & Technology, Nandyal, Andhra Pradesh-India. He received Ph.D degree from S. K University, Anantapur.

His area of interest includes FEM, FDM, Fuzzy logic, and computational methods using Matlab.



**T. Jayachandra Prasad:** Principal and Professor of ECE at Rajeev Gandhi Memorial College of Engg & Technology, Nandyal, Andhra Pradesh-India. He deserved Ph.D from JNTU Anantapur.

The institution is moving forward

to become Deemed under his excellence. His area of interest includes, DSP, signals and systems, VLSI and image processing. He has guided many research students in the above area.



**T. Subba Rao:** Professor in the department of Physics, Sri Krishnadevaraya University, Anantapur, Andhra Pradesh-India. He obtained Ph.D from IIT-Kharagpur. His area of research includes thin films, ceramics, nano materials and nano electronics.

UGC and IUAC projects are his assets. He was guided the students in thin film technology, polymeric materials, ceramics and electronics.

How to cite this paper: Nirlakalla Ravi, S. Venkateswarlu, T. Jayachandra Prasad, Thota Subba Rao,"A Comparative Performance Analysis of Low Power Bypassing Array Multipliers", International Journal of Information Technology and Computer Science(IJITCS), vol.5, no.8, pp.38-45, 2013. DOI: 10.5815/ijitcs.2013.08.04