High-speed radix-10 multiplication using partial shifter adder tree-based convertor

A radix-10 multiplication is the foremost frequent operations employed by several monetary business and user-oriented applications, decimal multiplier using in state of art digital systems are significantly good but can be upgraded with time delay and area optimization. This work is proposed a more area and time delay optimized new design of overloaded decimal digit set (ODDS) architecture-based radix-10 multiplier for signed numbers. Binary coded decimal (BCD) to binary followed by binary multiplication and finally binary to BCD conversion are 3 major modules employed in radix-10 multiplication. This paperwork presents a replacement technique for binary coded decimal (BCD) to binary and vice-versa convertors in radix-10 multiplication. A novel addition tree structure called as partial shifter adder (PSA) tree-based approach has been developed for BCD to binary conversion, and it is used to add partially generated products. To meet our major concern i.e. speed, we need particular high-speed multiplication, hence the proposed PSA based radix-10 multiplier is using vertical cross binary multiplication and concurrent shifter-based addition method. The design has been tested on 45nm technology-based Zynq-7 field programmable gate array (FPGA) devices with a 6-input lookup table (LUTs). A combinational implementation maps quite well into the slice structure of the Xilinx Zynq-7 families field programmable gate array. The synthesis results for a Zynq-7 device indicate that our design outperforms in terms of the area and time delay.


INTRODUCTION
The binary-coded decimal (BCD) defines the encoding of decimal numbers in binary. Packed BCD uses four binary bits to represent a fixed decimal digit. IBM introduces code capable of representing alphanumeric information in the IBM card which was later accepted by many other manufacturers it was the first BCD code [1]. All IBM desktop and workstations include BCD arithmetic in their hardware [1]. Many microprocessors have decimal number support. In 80x86 family microprocessor DAX (decimal adjust) instructions used to convert binary output into binary coded decimal (BCD) output [2]. The Zilog Z80 series of microprocessors also includes additional hardware to support Intel x86 family decimal arithmetic instructions and BCD conversion [2]. For common decimal arithmetic two binary encodings were included in the 2008 IEEE 754r standard [3] and now the IEEE754r standard supports 16-and 34-digit decimals significands. Fujitsu Sparc X microprocessors [4], Z-System, and IBM microprocessors [3] based workstations TELKOMNIKA Telecommun Comput El Control  High-speed radix-10 multiplication using partial shifter adder tree-based convertor (Utsav Kumar Malviya) 557 also use IEEE 754-2008 standard decimal number multipliers. The initial Motorola 6800 processor was also capable of performing BCD arithmetic's later Motorola designed new advanced versions of processors, and replace DA instruction with Coldfire instructions [5]. Different types of BCD or radix-10 multipliers have been designed by researchers to enhance BCD multiplication performance. signed digit radix-10 BCD multiplication using a multi-operand adder structure [6] offers improvements in performance in comparison with conventional BCD multiplier. Available binary-to-BCD converter uses for BCD digits multiplier (BDM) in [7], Gorgin's [7] multiplier design was based on the truth table for conversion of binary number into BCD digits and develops a new combinational design architecture for implementation on FPGA. Vazquez [8] designs a high-speed decimal multiplier using three stage parallel process. BCD multiplier using recoded multiplier digits as in conventional parallel multiplier design was represented in 2015 [9].
A parallel decimal multiplier [10,11] uses partial product reduction (PPR) tree and overloaded decimal digit set (ODDS) architecture and BCD-4221/5211 codes. Cui's [10,11] multiplier consists of a binary PPR tree block, a non-fixed size BCD-4221 counter block, and a BCD-4221/5211 PPR tree block and in their work decimal carry-save algorithm based on BCD-4221/5211 is used in the PPR tree to obtain high-performance radix-10 multipliers.In ODDS architecture, the 4-bit binary number converted into radix-10 (0,1,2..15). ODDS use a carry-free generation of decimal multiples also it performs multiplication of binary number and then converts it back into decimal. In contrast with normal BCD multiplication, in ODDS there is no requirement of include extra hardware for BCD numbers just requires convertors Binary to BCD and vice-versa [10,11]. With increasing the need for the real-time computation, it is difficult to manage with the conventional radix-10 multiplication methods, however, with speed and area optimized design of radix-10 multiplier the demand for real-time computation can be satisfied. A fast radix-10 multiplication presented in [12][13][14]. Vazquez's multiplier [12][13][14] uses a new algorithm and architecture for the BCD parallel multiplier that utilizes some properties of two redundant BCD codes and speed-up computation time, it uses BCD excess-3 (XS-3) code along with an overloaded decimal digit set (ODDS) for signed BCD number multiplication. High-speed Vedic multiplication presented in 2012 [15] for high throughput and BCD multiplier design using the Vedic multiplication method was presented in 2013 [16]. Another combinational design of the radix-10 multiplier presented in [17] for floating-point numbers with IEEE Std 754-2008 [18] without ODDS architecture. Decimal multiplication of two 4-digit numbers with their new method of partial product generation presented in [19].
ODDS architecture and carry-save adder-based binary to BCD and vice versa converter was used in decimal multiplication in [20]. A look-up-table (LUT) based combinational design of high-speed and area-efficient BCD multiplier design described in [21]. Energy optimisations methods for microprocessor and ALU are defines in [22]. In [23] defines a base 1000 use for binary to BCD conversion and design embedded arithmetic blocks based decimal multiplier. In [8][9][10][11][12][13][14]20] uses the Xilinx ISE [24] EDA tool for the design and verification of their BCD multiplier.
Multi-operand adder trees [6] implementation using state of art VLSI may results complex irregular layouts of BCD multiplier as compare with binary compressor trees and carry-save adders. PPR tree and ODDS base BCD multiplier [10,11] was limited for unsigned number multiplication only, with signed-digit and self-complementing codes theirs ODDS converters fail to convert the BCD to binary and vice-versa. Alvaro's multiplier [12][13][14] design resolve the issue of signed digit multiplication in ODDS by implementing XS-3 encoding and their radix-10 multiplier was good for both signed and unsigned number BCD multiplication, But the convertor was using a complex sequential architecture with parallel computation [12][13][14]. Overall, Alvaro's radix-10 multiplier [12,14] was less area and speed optimized then Xiaoping's radix-10 multiplier [13,14] but was capable of handling the signed number BCD multiplication in ODDS architecture. Vedic multiplier [15,16] is best suited only when high speed is a major concern than area optimization.
This work represents an ODDS architecture [8][9][10][11][12][13][14] based BCD multiplier for signed number with optimized area and speed, there are two major modules in the radix-10 (BCD) multiplier which can be optimized to fulfill the need for high-speed real-time computations, first, Binary-BCD convertors and second, type of binary multiplier. This work is considering these optimization challenges, and develop a high speed modified adder tree structure partial shifter adder (PSA) for BCD to the binary converter in radix-10 multiplier for area optimization and also this work uses binary multiplication by Vedic multiplication method [15,16] for fast output. These two changes increase the efficiency of hardware implementation in terms of area and speed. All modules of the proposed work design and verifies on vivado design suite [25] and validated on Zynq-7000 FPGA [26]. Xilinx vivado design suits [25] is the latest advanced tool of Xilinx's ISE [24].

RESEARCH METHOD
Radix-10 multiplication in the digital processing systems is done by binary coded decimal (BCD) number system. The decimal multiplier also called the BCD multiplier or radix-10 multiplier in the digital  Figure 1 shows a flow for proposed work this work follows ODDS architecture, as it may be seen three major modules have been developed first, BCD to binary convertor, second, Binary multiplication, and third Binary to BCD converters. In Figure 1 A and B are the 4 digits (4 BCD digit is 16 binary bits) BCD inputs which requires to get multiply and an 8-digit expected BCD. This work produces final result following steps need to follow: -Step 1: Convert A and B into a binary number individually. As A and B are of 4 BCD digit then expected output binary maximum up to 14 bit only because maximum 4-digit BCD number is 9999 requires only 14 binary bits. This BCD to binary conversion is done using a new proposed PSA tree adder.
Step 2: Multiply 14x14 binary numbers C and D using vertical cross multiply or Vedic multiplication [8] for fast results.

BCD to binary convertor
The proposed new architecture of 16-bit BCD to 14-bit binary convertor is shown in Figure 2. In proposed architecture, a new partial shifter adder (PSA) is used. This PSA based architecture also considers the sign bit and as per the sign bit, it converts the 4-digit BCD number into binary. PSA is combinational architecture which uses carry-free generation of decimal multiples and this makes the PSA significantly fast. The proposed method uses shifter operations, the left shift by 'n' bits is equaling to multiply a number with 2 n .B15, B14 ......B0 is the 16-bit input BCD number and s13, s12.....s0 is 14-bit output binary, in (1) shows the binary generation from 16 bit BCD number. In (2) is the generation of 14 bit binary from 16-bit BCD using left shift operation. In (3) multiplication by 1024 is left shift by 10-bit, multiplication by 16 is left shift by 4 bit, multiplication by 8 is left shift by 3 bit, multiplication by 64 is left shift by 6 bit, multiplication by 32 is left shift by 5 bit, multiplication by 4 is left shift by 2 bit, multiplication by 2 is left shift by 1 bit. In this work a partial shifter adder tree where 2's complement addition used for subtraction, upper nibble of 16-bit BCD B15 B14 B13 B12 conceded with extra '0' at its MSB first, as shown in (4). In (5) obtained by replacing subtraction operations of (3) according to (4).
In (5) shows the generation of BCD with modified shifting and complement operations in upper nibbles. Figure 3 is the implementation of (5) which is proposed partial shifter adder tree (PSA) addition structure for BCD to binary conversion Figure 3 shows the specific arrangement of BCD bits and its complement bits for shifting and also shows the location of '1' for 2's complement. Form Figure 3 it may be observed that a total of 25 full adders and 10 half adders require for addition with the proposed addition structure and it is very less as compared with the available design of 16-bit BCD to 14 it binary conversion.   Figure 4 shown is the design of 4-bit vertical cross or Vedic multiplication [8] and a 14-bit similar vertical cross multiplier has been designed using VHDL while implementation. Let In1 and In2 are the 4-bit numbers. In Figure 4 the outputs T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, and T16 are generated as per logical AND operation. Figure 5 shows the tree adder arrangement for the multiplier output. Figure 5 tree addition arrangement reduces hardware requirement in comparison with Wallace addition or carry-save addition [8]. for 4-bit multiplication, it requires only 8 full adders and 4 half adders only.

Vertical cross multiplication of binary numbers
As shown in Figure 5 the Adder tree is required for addition of logical and gate output and the final sum is the final output of 8 bit for two 4 bit input numbers, same adder architecture used for addition when   Figure 6 shows the method of converting a 16-bit binary number to 4-digit BCD number, a similar binary to BCD converter design has been used for 28-bit binary number conversion into 8-digit BCD using VHDL while implementation. From Figure 6 leading one detection (LOD), subtraction and counting operation are used for conversion. this converter is sequential design so to improve throughput 3 stage parallel processing used between the generation of digits. BCD conversion form binary number is shown in the flow diagram of Figure 6 the process can be understood by an example elaborated in Table 1 onsider any 12-bit binary number is (3485d)=(D82h)=(110110000010b). LOD is at 12. From Table 1 the process of binary to BCD conversion can be observed with an example, D3, D2..D0 is the final BCD output.

RESULTS AND ANALYSIS
The register transfer logic entries are done using very high-speed integrated circuit hardware description language (VHDL) in the proposed design. Synthesis is done for combinational style modeling and the target device selected is Zynq-7000 FPGA. Figure 7 shows the simulation result obtain for this 16x16 radix-10 multiplier design; this result generated using the integrated simulation environment (ISE) of Xilinx Vivado [14]. A few other simulation observations are shown in Table 2. Row-1 of Table 2 shows the observation of the ISE simulator shown in Figure 7.   Table 3 shows the Synthesis results for the proposed 16x16 BCD multiplier design, observed using the Xilinx Vivado tool [14] and Zynq-7000 Field programmable gate array (FPGA) [14]. From Table 3 it may observe that numbers of Zynq-7000 FPGA slices used in this work are 482 and also the Maximum frequency obtain is 267.251 Mhz. The simulations are observed for a balanced Simulator setting in Xilinx VIvado. The area obtains for the proposed work design implementation is 0.04138 mm 2 . The number of universal gates two input NAND required for this design is 29362.  Table 4 shows the different methods used and by researchers and proposed method with the time delay results. The results are obtained using the HDL based FPGA implementation. This work is an application specific integrated circuit (ASIC) design and the target device FPGA technology must be the same for comparison, hence this work has selected FPGA of 45 nm technology same as of [1][2][3]. It is observed that the time delay for the proposed design of 16x16 BCD multiplication is lowest among other related works for the same platform. from the table, it may also observe that the area requirement of proposed work is less as compared with other work.  3.21 ns --0.0422 ODDS with XS-3 [2] 41.5 ns -120600 PPR tree with ODDS coding [3] 3.29 ns -32656

CONCLUSION
This work is a design of a decimal digit set (ODDS) architecture-based radix-10 multiplier for signed numbers. In this work, three modification methods incorporated, first, modified partial shifter adder (PSA) tree used for BCD to binary conversion, second, 14x14 Vedic binary multiplication used, and produce 27-bit binary output and third, BCD to binary conversion using sequential design with 7 stage pipelines. PSA tree is a combinational design with a minimum number of FA and HA requirements which reduce overall time delay. An addition tree in the Vedic multiplier also uses the PSA tree structure and produces 27-bit output instead of 28bit after the multiplication of two 14-bit input also reduces area. An additional register between the digits of the final output allows 7 stage pipeline which enhances overall throughput of output. The observed simulation results tested on the Xilinx Vivado tool and also verified on Zynq-7000 FPGA. Simulation is performed all possible test inputs and verified correctly. From the analysis of the results, the time delay, and area on IC found in this work for 16x16 radix-10 multiplication is less as compare with other similar work.