# HW2

Suppose we have a deeply pipelined processor, for which we implement a branch-target buffer for the conditional branches only. Assume that the misprediction penalty is always four cycles and the buffer miss penalty is always three cycles. Assume a 90% hit rate, 90% accuracy, and 15% branch frequency. How much faster is the processor with the branch target buffer versus a processor that has a fixed two-cycle branch penalty? Assume a base clock cycle per instruction (CPI) without branch stalls of one.

• 固定两周期分支惩罚的 CPU 的 CPI：$1+15\%\times2=1.3$
• 拥有 BTB 的 CPU 的 CPI：$1+15\%\times\left(\left(1-90\%\right)\times4+90\%\times\left(1-90\%\right)\times3\right)=1.1005$
• 后者比前者快 $\frac{1.3}{1.1005}\approx 1.18$ 倍

$P_{inBuffer} = P_{brFreq}\times P_{hitRate}\times(1 − P_{acc})\\ = 15\% \times 90\% \times 10\% = 0.0135 \\ P_{notInBuffer}= P_{brFreq}\times(1 − P_{hitRate}) \\ = 15\% \times (1 − 90\%) = 0.015$

$T = 1 + P_{inBuffer}\times T_{inBuffer} + P_{notInBuffer}\times T_{notInBuffer} = 1.099$

$T' = 1 + P_{brFreq}\times T_{fixed} = 1 + 2\times 15\% = 1.3$

Consider a branch-target buffer that has penalties of zero, two, and two clock cycles for correct conditional branch prediction, incorrect prediction, and a buffer miss, respectively. Consider a branch-target buffer design that distinguishes conditional and unconditional branches, storing the target address for a conditional branch and the target instruction for an unconditional branch.

• What is the penalty in clock cycles when an unconditional branch is found in the buffer?

0，因为其分支预测准确率是 100%，并且已经在 buffer 中不会发生 buffer miss。

• Determine the improvement from branch folding for unconditional branches. Assume a 90% hit rate, an unconditional branch frequency of 5%, and a two-cycle penalty for a buffer miss. How much improvement is gained by this enhancement?

$T = f_{uncFreq}\times [r_{hit}\times(−1) + (1 − r_{hit})\times2] \\ = 5\%\times[90\%\times(−1) + (1 − 90\%)\times2] \\ = −0.035$

$T'= f_{uncFreq}\times (1 − r_{hit})\times2 \\ = 5\%\times(1 − 90\%)\times2 \\ = 0.01$

An (m,n) correlating branch predictor uses the behavior of the most recent m executed branches to choose from $2^m$ predictors, each of which is an $n$-bit predictor. A two-level local predictor works in a similar fashion, but only keeps track of the past behavior of each individual branch to predict future behavior.

There is a design trade-off involved with such predictors: correlating predictors require little memory for history, which allows them to maintain 2-bit predictors for a large number of individual branches (reducing the probability of branch instructions reusing the same predictor), while local predictors require substantially more memory to keep history and are thus limited to tracking a relatively small number of branch instructions.

For this exercise, consider a (1,2) correlating predictor that can track four branches (requiring 16 bits) versus a (1,2) local predictor that can track two branches using the same amount of memory. For the following branch outcomes, provide each prediction, the table entry used to make the prediction, any updates to the table as a result of the prediction, and the final misprediction rate of each predictor. Assume that all branches up to this point have been taken. Initialize each predictor to the following:

• Table 1: branch outcomes

454 T
543 NT
777 NT
543 NT
777 NT
454 T
777 NT
454 T
543 T
• Table 2: Correlating predictor

Entry Branch Last outcome Predition
0 0 T T with one mispredition
1 0 NT NT
2 1 T NT
3 1 NT T
4 2 T T
5 2 NT T
6 3 T NT with one mispredition
7 3 NT NT
PC Branch Outcome Predition Entry Update
454 2 T T 4
543 3 NT NT 6 NT
777 1 NT NT 3 T with one mispredition
543 3 NT NT 7
777 1 NT T 3 NT
454 2 T T 5
777 1 NT T 2
454 2 T T 5
543 3 T NT 6 NT with one mispredition

• Table 3: Local predictor

Entry Branch Last 2 outcome(right is most recent) Predition
0 0 T,T T with one mispredition
1 0 T,NT NT
2 0 NT,T NT
3 0 NT T
4 1 T,T T
5 1 T,NT T with one mispredition
6 1 NT,T NT
7 1 NT,NT NT
PC Branch Outcome Predition Entry Update
454 0 T T 0 T
543 1 NT T 4 T with one mispredition
777 1 NT T 5 NT
543 1 NT NT 7
777 1 NT NT 7
454 0 T T 0
777 1 NT NT 7
454 0 T T 0
543 1 T NT 7 NT with one mispredition