ICS (Architecture)

Fudan University / 2020 Spring

Assignment 3 Part II (Branch Prediction)

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton1

After Part I of the assignment-3, I think you've got some clues about what the saying is talking about. Then in Part II, you are required to add branch predictor to your MIPS Pipeline Processor, which in some way is another kind of cache, recording the choices about branch to help predict the future, thus decreasing the waste of the CPU cycles.

This part will make up 15% of the final score. It will due on June 1, and you must pass all the tests (cpu_tb.sv is to be delayed) before submitting your work. Recently we've made some changes to checker, please refer to checker guidelines for details (If you've seen this in assignment-2, you can just ignore).

In Part II, you need have a look at the related chapters in Computer Architecture: A Quantitative Approach2 (e.g. 3.3 Reducing Branch Cost With Advanced Branch Prediction, Appendix C) about branch predictor and follow this doc to implement one. Before work, you need import codes in src/.

Background

Now recall the value of the CPI (Cycles per Instruction), we'll have a deeper image:

Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls2

There are so many items contributing to the final CPI and engineers (and scientists) have explored so many techniques to improve them by the sophisticated design of CPU and the interesting interaction with Compiler and so on.

In assignment-2, we've taken advantage of some techniques to improve our CPU, such as forwarding and bypassing, or even simple branch scheduling and prediction to decrease the CPI (due to the inherent features of our programs, always taken may have round a 60% success rate3).

However, that's still far from how the industry is doing, since for typical RISC programs, the average dynamic branch frequency is often between 15% and 25%, meaning that 3-6 instructions execute between a pair of branches2. When we only consider the stalls caused by branch, we will have at least 1 + 0.4 * 0.2 = 1.08 CPI, while state-of-the-art pipeline with BPB can have 90% accuracy (do not consider the hit rate), which means we can get 1 + 0.1 * 0.2 = 1.02 CPI based on the classic 5-stage pipeline. With the introduction of Superscalar and IPC (Instructions per cycle), the performance differential may even grow larger.

Although branch predictor is promising, it also incurs new challenge.

Modern architecture's depth has been designed as more than 5 stages to lower the third item, thus leading to the extremely high misprediction penalty. Take the Intel Core i7 pipeline structure as an example, the total pipeline depth is 14 stages, with branch mispredictions typically costing 17 cycles2. To obtain better performance, thus we need more sophisticated branch prediction mechanisms.

Hint: In this part, you have to at least design a 2-bit predictor scheme, while other strategies are also welcomed.

BPB Module

Our BPB offers another choice for the update of PC, meanwhile, BPB should watch cautiously how CPU is going. You need to implement bpb module defined as below.

I recommend you to design a structure like cache which can scale well when we change the parameters, and separating the controller module out can give you more flexibility to change your strategy. Due to its simplicity, we will not give further setting, and it's your time to design.

Report Requirements