Lab 5: Decoding RISC-V Instructions
Welcome back after (hopefully, a relaxing) recess week! Last time, we took a break from hardware design to write some RISC-V assembly code. This week, we will bring hardware back into the picture. We will create a couple of modules that can decode RISC-V machine language instructions, and in particular, extract the immediates and control signals.
Project Work
This manual is part of the final project. You do not need to write a report for this. You should work in your project groups.
Introduction
From asm to 0s and 1s - RISC-V edition
We discussed last time that assembly language code maps very closely to machine code. This week, we will explore what exactly that means.
In the first few lab sessions, we designed digital logic and learned how it works. No matter what, digital logic has inputs, and it performs some logical function on these inputs to produce outputs.
A CPU is no different; it too is a logic block. It reads instructions from instruction memory, and performs a series of logical operations on the instruction to "execute" the code.
When a CPU reads instructions, it has no understanding of the English language. Actually, it has no understanding of any programming language either; no, not even assembly language. To a CPU, the instructions are nothing but a series of bits. So, we need to convert our human-readable code to machine code that the CPU can understand.
Every instruction of assembly (pseudoinstructions notwithstanding) gets translated into a series of bits by a piece of software we call the "assembler"; so named because it "assembles" our assembly language code into machine code. This series of bits follows a given format for every different CPU architecture.
How is Python/C/Java/JS code run on my CPU?
The concept of compilation and/or assembly of code in higher-level languages like Python or Java, is a very complex topic, and way out of the scope of this course. If you are interested in these topics, though, you should explore CS4212: Compiler Design and CS4215: Programming Language Implementation.
For RISC-V, we classify instructions into 6 types, each with their own way of formatting the bits:
- R-type: Used for data processing instructions where the source and destination operands are all registers.
- I-type: Used for data processing instructions where one of the sources is an immediate value, while the rest are registers.
- S-type: Used for store instructions, which do not have a destination register.
- B-type: Used for branching instructions, which also do not have a destination register.
- U-type: Used for the upper-immediate instruction
lui
andauipc
. These have a destination register and a large 20-bit immediate. (In fact, this instruction type allows the largest immediate values, hence why it's so useful). - J-type: Used for
jal
. Also has a destination register and a large immediate.
Each word of machine code has a number of different fields:
opcode
: used by every instruction. This defines what type of instruction we are seeing.funct3
andfunct7
:funct3
is used by R-, I-, S-, and B-type instructions.funct7
is only used by R-type instructions. These define the specific instruction we want to execute, within the type set by the opcode.rd
,rs1
,rs2
: the destination, source 1 and source 2 registers for the instruction (where applicable). RISC-V has 32 general-purpose registers, labelledx0
throughx31
. Each of rd, rs1 and rs2 are a 5-bit value indexing the register they refer to. E.g.5b10101
== 21, so we are talking aboutx21
.imm
: used by every instruction type except R. Used to encode the immediate value used, in twos complement signed format.
The table provided in the RISC-V Reference Card (available on Canvas, or online at https://www.cs.sfu.ca/~ashriram/Courses/CS295/assets/notebooks/RISCV/RISCV_CARD.pdf), shows how the bits are arranged for each type of instruction, as well as the encoding (i.e. what values mean what).
Open up RARS, and then open riscv_asm_sample.asm
from last time. Click the "Assemble" button to assemble the file and open the "Execute" tab. It should look like something like this:
The assembled code for riscv_asm_sample.asm
Notice the "code" column. Consider a simple instruction like add s5, s1, s2
. The encoding for this instruction should be:
Field | Value | Explanation |
---|---|---|
opcode | 0110011 | R-type instruction |
rd | 10101 | Register x21 |
funct3 | 000 | add instruction |
rs1 | 01001 | Register x9 |
rs2 | 10010 | Register x18 |
funct7 | 0000000 | add instruction |
imm | N/A | R-type instruction |
So, we should get 32b00000001 00100100 10001010 10110011
, or 32h01248AB3
. Indeed, cross-checking with RARS, we see that this is correct.
As another example, consider addi s5, s5, -100
:
Field | Value | Explanation |
---|---|---|
opcode | 0010011 | I-type instruction |
rd | 10101 | Register x21 |
funct3 | 000 | addi instruction |
rs1 | 10101 | Register x21 |
rs2 | N/A | I-type instruction |
funct7 | N/A | I-type instruction |
imm | 111110011100 | 12-bit immediate -100 |
This should be 32b1111 1001 1100 1010 1000 1010 1001 0011
, or 32hf9ca8a93
. Once again, this is the same as we find in RARS.
Try to calculate the machine code for the following instructions by hand:
xori x24, x12, 20
lw x12, 4(x23)
sltu x2, x5, x6
Then, type these instructions in RARS and verify that your calculations are correct.
Decoding instructions in hardware
In this lab, we will attempt to recreate the instruction decoder and the immediate extension modules we saw in the lectures.
Open Vivado and create a new project, using the same settings as we have been using so far. Remember to choose the correct part number, and import the correct constraints file.
Import Decoder.sv
and Extend.sv
from Github, and import them into your project. For now, we won't have a Top
module. We'll get to that in a couple of weeks.
Activity 1: The immediate extension unit
Let's start with the simpler of the two files, Extend.sv
. Remember how different instruction formats have different ways to encode the immediate values. The Extender module should pick out the right bits, in the right order, from the instruction, and sign-extend them correctly. Remember, all immediates in RISC-V must be sign-extended to maintain their positive/negative sign.
What is sign extension?
Sign-extension allows us to retain the sign of an m-bit binary number when it is extended to an n-bit binary number (n > m).
Consider the 4-bit twos complement number 4b1011
. This represents -5. If we blindly extend this to 8 bits with 0s in the upper bits, we get 8b00001011
, which represents 11.
Sign extension solves this problem, by using the MSB to fill the upper bits. That is, if the MSB (sign bit) is 1, the upper bits are all set to 1. If the MSB is 0, the upper bits are all set to 0. So, 4b1011
would become 8b11111011
, which is still equal to -5.
Remember that there are five kind of instructions in RISC-V that use immediate values. Each of them encode the immediate slightly (or very) differently.
The input instr_imm
represents the upper 25 bits of the instruction word. For any instruction, the opcode is stored in the lowest 7 bits, so the immediate is guaranteed to be within these 25 upper bits.
The input imm_src
represents the kind of instruction that is being executed. Depending on this, the way the immediate is encoded is different.
The output ext_imm
should hold the sign-extended immediate that we calculated.
Activity 2: The instruction decoder
Now, let's move on to the more complex one, the instruction decoder.
The sole input to this module is the complete instruction to be executed. The outputs are PCS
, jump
, mem_to_reg
, mem_write
, alu_control
, alu_src_b
, imm_src
and reg_write
.
The purpose of each of these outputs is detailed in Lecture 7's slides. To summarize:
-
PCS
: Used to determine the behaviour of the program counter: +4, branch, or jump. -
mem_to_reg
: Used to ascertain if the output from the memory should be stored in a register. -
mem_write
: Used to determine whether we should read from, or write to, the memory. -
alu_control
: Used to determine the operation performed by the ALU. -
alu_src_b
: Used to control the second source of the ALU (reg vs imm). -
imm_src
: Used to determine the format of the instruction, and therefore, its immediate encoding. -
reg_write
: Used to determine whether we should write a value to the register file.
The width of the input and outputs has not been specified in the skeleton file.
Tip
You will notice that imm_src
goes from the Decoder
to the Extend
module in the block diagram for a simple single cycle processor in the lecture notes. Indeed, the imm_src
in this file will later connect to the imm_src
input of the Extend
module. So make sure the output of imm_src
matches what Extend
expects.
We recommend following these steps to complete the Decoder
module:
-
Complete the
assign
statements on lines 39-41 so that the logic valuesopcode
,funct3
andfunct7
reflect those components of the instruction. This will make it easier to use these bits later, as they will be used quite a few times. -
Determine the bit width required for each of the outputs of the decoder, and update the code accordingly.
-
Now, for each of the outputs, write the logic required to set that output. Write the code inside the
always @(instr) begin ... end
block. This allows us to use constructs likecase ... endcase
statements in combinational logic. We should also use the ternary operator? :
where possible to do so elegantly.
Activity 3: Simulation
Yes, you know it; it's time for simulation tests. For this time, we will have a skeleton file to use: Decoder_Extend_sim.sv
. Download and import this simulation source into the Vivado project.
Change the widths of the different signals to match those that we used in the Decoder
and Extend
modules.
Inside the initial begin ... end
block, we have included one test case for you:
57 58 |
|
This is the machine code for the assembly instruction addi x20, x20, -8
. Verify that the outputs of all the signals are as expected. Also, make sure that the immediate output from the Extend
module (i.e. ext_imm
) is correct.
You should also test plenty of other instructions, to make sure that the decoding works correctly for every type of instruction.
Tip
Right click on the ext_imm
waveform and set the radix to Signed Decimal
for a human-readable interpretation of the decoded immediate.
Concluding remarks
What we should know
- How assembly language maps to machine language in RISC-V.
- The different formats of RISC-V assembly instructions.
- How immediates are encoded in different types of instructions.
- How to write SystemVerilog code to decode machine language instructions.