On-chip debugging for microprocessor design

This article proposes a closer-to-metal approach of RTL inspection in microprocessor design for use in education, engineering, and research. Signals of interest are tapped throughout the microprocessor hierarchical design and are then output to the top-level entity and ﬁnally displayed to a VGA monitor. Input clock signal can be fed as slow as one wish to trace or debug the microprocessor being designed. An FPGA development board, along with its accompanying software package, is used as the design and test platform. The use of VHDL commands ’type’ and ’record’ in the hierarchy provides key ingredients in the overall design, since this allows simple, clean, and tractable code. The method is tested on MIPS single-cycle microprocessor blueprint. The result shows that the technique produces more consistent display of the true contents of registers, ALU input/output signals, and other wires – compared to the standard, widely-used simulation method. This approach is expected to increase conﬁdence in students and designers since the reported signals’ values are the true values. Its use is not limited to the development of microprocessors; every FPGA-based digital design can beneﬁt from it. This is an open access article under the CC BY-SA license.

INTRODUCTION Digital device development depends greatly on precise understanding how data propagate between basic digital logic units, also called register transfer level. In design phase, designers often use simulation procedures to check whether their designs meet the logic requirements. An example of this is also encountered in senior level electrical/computer engineering bachelor-degree courses such as Programmable Logic Design or Computer Architecture [1][2][3][4]. In such courses, students are asked to design a micro-architecture of a microprocessor based on a given architecture (that is, the assembly language requirement). Students then write HDL code representing the micro-architecture and test the design against a set of instructions. Testing is generally done in simulation and, after a number of testing-coding iterations, hardware test is performed.
Software simulation is indispensable for its quick setting, fast compilation, and -provided the designer is experienced -accuracy. However, mismatches between simulation and synthesized hardware are not entirely unheard of, even for simple design. Mismatches also occur between pre-synthesis and post-synthesis simulations. To make matter worse, in post-synthesis (netlist) simulation one generally can only monitor the top-level ports; signals deeper in hierarchy are inaccessible. To address this problem, we propose a closer-tometal approach for the register transfer level inspection. Effectively, this is an on-chip debugging technique ❒ ISSN: 1693-6930 where signals of interest are brought up to the top level for output reading. The term on-chip debug generally refers to a technique in microprocessor (or other digital device) design where a designer can inject a fault in the device under development to test its fault tolerant behavior [5][6][7][8]. For microprocessors, this is usually done using JTAG protocol based on NEXUS Consortium standard. In [9], on-chip debug is used in high level synthesis for FPGAs.
On-chip debug has been a concern from the beginning of computer era. FPGA has also taken part in this field. For example, work by Jamal et.al [10,11] proposes better functional changes during on-chip debug, utilizing FPGA overlay architecture. Contemporary works in this field, particularly post-silicon debug, can be found in [12][13][14][15]. Indeed, post-silicon debug readiness needs to be prepared early in the design [16][17][18]. A number of authors extend the idea to other areas such as machine learning [19,20].
In this article we describe on-chip debug more in its literal meaning. That is, the process of debugging a microprocessor in which the debug capability is embedded in the hardware design. Our contribution lies in the following aspects. First, we propose a simple hardware-oriented debugging method for use in any digital design. We hope this introductory notion will trigger students' creative faculty to solve some challenging problems that otherwise difficult to tackle. Second, we describe -in a tutorial way -the construction of a simple on-chip debugging feature in the design of a microprocessor using 'type' and 'record' in vhdl. We believe this will help students and designers easily duplicate our work.

2.
RESEARCH METHOD This research starts with a list of design requirements for an on-chip debugging feature in a microprocessor: (a) non-intrusiveness: the debugging feature should be as discreet as possible so as not to obstruct the main design (b) meaningful message: the interface to human reader should be immediately readable (c) easy to modify: when the designer wants to tap other signals in the microprocessor design, it should be straightforward to do so The main idea of this chip debugging feature is shown in Figure 1.

Memory
Reporting module VGA Display Figure 1. On-chip debugging principles in this paper. The next step is building a model microprocessor from scratch using VHDL in an FPGA chip (an Altera DE2-115 development board was used). Here we use a scaled-down version of the MIPS microprocessor architecture [21,22] as our proof of concept. MIPS architecture has the advantages of, among others, being simple and consistent for students to follow. MIPS is of RISC-architecture and originated as a pedagogical model at Stanford University. We developed the chip-debugging feature based on a MIPS implementation described by Harris and Harris [23]. In this stage a VGA display module were also built, consisting of a vga sync module, two character memory modules, and a font ROM. The experimental hardware setup is shown in Figure 2. We extend the single-cycle MIPS instruction set found in the main text of [23] by constructing a number of new instructions and 'rewire' the datapath as needed. We then verify -using the proposed onchip debugging technique -that the final result works as expected. In the next section we will discuss the engineering design in more detail.

PROTOTYPE DESIGN
The design requires several aspects to work seamlessly together. These are: processor design with its signal inspection, information display, and experiment design.

Processor design and signal inspection
As mentioned before, the approach works by sensing internal microprocessor signals (including memory access ones) and sending them up through the design hierarchy. Using hierarchical design implies that many entities and files are used, which poses a new challenge on how to tap signals from different entities in a straightforward and unobtrusive way. The signal tapping as shown in Figure 1 is implemented using a shared bus that is available across the hierarchy. Figure 3 shows the organization of modules that make up the entire microprocessor.  Figure 3. Module organization and hierarchy. The small red blocks are instantiations of the record entity that encapsulates the tapped signals' information. The red 'cable' then acts as a debug-bus across the hierarchy.

ALU PC
The arithmetic logic unit (ALU) does the arithmetic computation with the help of a set of registers (32-bits × 32) that functions as a scratchpad for the ALU. Program counter (PC) acts as a pointer to instruction. Sign-extend module extends less-than-32-bit-wide numbers (for example in immediate-type instructions) to its 32-bit representation. The shift module functions as bit-wise shifter. All these are in the "datapath" module ❒  ISSN: 1693-6930 which also hosts a number of multiplexers controlling which way data would flow into. Controlling is done by the "controller" module outside the datapath, which decode 32-bit instructions from the instructions memory. Datapath and Controller forms the "MIPS processor" module. Together with Instruction Memory and Data Memory modules, they make up the complete microprocessor system. The signal tapping, shown as red blocks in Figure 3, is a record-type entity instantiated at every modules of interest. Acting as a "debugging bus", this record is ready to accept the value of any signal of interest in every level in the hierarchy. Since the bus is logically encapsulated, it does not obstruct the main design. Modifying the bus' content is straightforward and can be done once in the definition, without the need to change any code in the instantiation part.

Information display
To show the debugging steps, the values of signals of interest are displayed in VGA monitor, one row per clock. The Altera DE2-115 board is equipped with a VGA port, but users must themselves program the VGA synchronization and character generation. Here we adopt the method described in [24], and the arrangement is depicted if Figure 5. In this layout, phase locked loop (PLL) is used step up the clock frequency. It feeds the VGA sync module, which produces the horizontal and vertical synchronization signals, to be used by the VGA display. The vga sync module also generates information of current pixel's x and y position. This information is used by the character generation and Font-ROM modules to render appropriate character at a given time. Again, the interested readers are referred to [24] for further technical details regarding the display arrangement. The microprocessor system, shown in Figure 5, also in Figure 3 as green-outlined box, transmits the debugging signals off to the report compiler where all tapped signals are lined up and sent off to the character generation circuit.

VGA display
Character generator Report compilation Figure 5. Display arrangement, including reporting module.

Test design
The next step in this work is to design a test which will confirm the functionality of of the overall setup, which will serve as a proof of concept for our proposed method. A microprocessor architecture expansion task is chosen as the test. That is, new instructions are to be introduced to set. This will require a modification in the micro-architecture of our microprocessor and testing the functionalities of the new instructions. As the base architecture, we adopt [23], which in turn was inspired by earlier editions of [25]. Specifically, the reader are referred to Chapter 6 (Architecture) and Chapter 7 (Microarchitecture) of [23].
These added instructions are shift left logical (sll), shift right logical (srl), and shift right arithmetic (sra). The three instructions are of R-type instruction and have the same invocation form. For instance, the format for sll is: where rd is the destination register, rt is the source register. The four-bytes data is stored in rt is shifted left by shamt amount, and then stored in rd. Similar form holds for srl and sra.
The architecture for these three register-type instructions is shown in Figure 6(a). Following the convention, the six most significant bits (instr[31:26], the op field) are 0, indicating R-type instructions. The 6 least significant bits (instr [5:0], the funct field) indicate which R-type instruction is operative. (And only last two bit indicate which of the three shifts will be operative). The shift amount is placed in the shamt field (instr [10:6]). The source and destination register are in the rt and rd fields, respectively.
Based on the above architecture, a number of modifications are implemented in the microarchitecture. Figure 6(b) shows part of the new microarchitecture design. The shifter module receives the instr [10:6] as the amount of shift and receives instr[1:0] as shift mode chooser (which of the three shift commands is operative). The 32 bit data (to be shifted) comes from the register and is output to a multiplexer, which will choose between two signals: output from the original ALU or output from the shifter.  After modifying the microarchitecture, a set of code will be used to confirm the validity. The test code formulated is a continuation of the test code in [23], page 437. It consists of computations involving all instruction by which a specific state is targeted ("value of 7 is stored in memory address 84"). A fault in the implementation of any instruction will render the target state not achieved. It is very unlikely to produce the expected result under faulty condition. Figure 7 shows the overall instruction test, written in MIPS assembly language. The blue lines (address 0 to 40 and address 64) are the original test from [23], and the green ones (address 44 to 60) are the new code. In the end of the test, the computed value (happens to be 126 = 0x7e) is stored in the memory address 84 = 0x54.

RESULT AND ANALYSIS
Two scenarios are administered to assess the proposed method. First one involves simulation of the microprocessor system using ModelSim. The second one is to run the microprocessor in the Altera FPGA chip. In both scenarios, the same microprocessor module is used (green box in Figure 3). In the first scenario -the ModelSim simulation-a SystemVerilog testbench file is built as wrapper for the microprocessor. The second scenario synthesizes the microprocessor and programmed the netlist into the FPGA chip.
In both scenarios, the test code in its machine form as shown in Figure 7, rightmost column, is embedded to the chip in the Instruction Memory. The instruction is executed sequentially, with jumps at specific moments. Debugging in microprocessor design most of the times involves probing values of internal signals such as program counter, instruction, register address input, register data input, memory address, ALU inputs, ALU output. Indeed, these are the signals we will display in both scenarios. Figure 8 shows the result for the first scenario. It can be seen that a number of signal values are not resolved. Instead of showing values of signals, 'xxxxxxxx' are shown. In the second scenario, where on-chip debugging features are employed in the FPGA-based microprocessor chip, all signal values are delineated correctly. The result is depicted in Figure 9. The difference in the signal display (though the final result is the same, implying correct model/design) might come from overly tight timing specification, coarse timing resolution, multiply driven signals, different initial states, or simply a bug in the simulation software. The on-chip debugger, on the other hand, shows real data from the hardware (though still limited by the display system's speed capability).
While it is relatively straightforward (but not necessarily easy) to fix bugs in the simulation side, hardware reporting is precious and some times the only choice a designer has. The MIPS example shown here serves as a demonstration of this on-chip debugging technique.

5.
CONCLUSION This paper proposed a hardware-oriented approach of debugging a chip design in FPGA, by way of tapping signals from any level in the hierarchy up to the top level. These signals of interest were then displayed using VGA module provided by the board. The tapping was done using a the VHDL 'record' type as a bus with which signals are bundled and transported up the hierarchy.
A microprocessor design challenge was used as the test case. The proposed method correctly displayed the internal signals. The approach naturally showed higher fidelity compared to simulation. While software simulation of hardware design is indispensable and will continue to get better, hardware-level debugger and reporting module is invaluable and some times the only option. This is even more true when the chip is already in the deployment stage. We hope that this paper will inspire other researchers and students alike to employ the same technique in their designs.