CC BY 4.0 （除特别声明或转载文章外）

如果这篇博客帮助到你，可以请我喝一杯咖啡~

Suppose we have a deeply pipelined processor, for which we implement a branch-target buffer for the conditional branches only. Assume that the misprediction penalty is always four cycles and the buffer miss penalty is always three cycles. Assume a 90% hit rate, 90% accuracy, and 15% branch frequency. How much faster is the processor with the branch target buffer versus a processor that has a fixed two-cycle branch penalty? Assume a base clock cycle per instruction (CPI) without branch stalls of one.

- 固定两周期分支惩罚的 CPU 的 CPI：$1+15\%\times2=1.3$
- 拥有 BTB 的 CPU 的 CPI：$1+15\%\times\left(\left(1-90\%\right)\times4+90\%\times\left(1-90\%\right)\times3\right)=1.1005$
- 后者比前者快 $\frac{1.3}{1.1005}\approx 1.18$ 倍

**正解**：

分支目标缓存中，计算罚时通过两种情况：缓存中有预测但是预测 出错；分支不在缓存中，分别用 $P_{inBuffer}$ 和 $P_{notInBuffer}$。命中率、准确率和分支频率分别用 $P_{hitRate}$、$P_{acc}$、$P_{brFreq}$ 表示。

\[P_{inBuffer} = P_{brFreq}\times P_{hitRate}\times(1 − P_{acc})\\ = 15\% \times 90\% \times 10\% = 0.0135 \\ P_{notInBuffer}= P_{brFreq}\times(1 − P_{hitRate}) \\ = 15\% \times (1 − 90\%) = 0.015\]因为缓存中有预测但预测失败的罚时 $T_{inBuffer}$ 是 4 个周期；缓存中没有预测的罚时 $T_{notInBuffer}$ 是 3 个周期。所以，采用分支目标缓存的时间为：

\[T = 1 + P_{inBuffer}\times T_{inBuffer} + P_{notInBuffer}\times T_{notInBuffer} = 1.099\]使用固定周期 $T_{fixed} = 2$ 分支，罚时为：

\[T' = 1 + P_{brFreq}\times T_{fixed} = 1 + 2\times 15\% = 1.3\]因此加速比为：$\frac{T’}{T}=\frac{1.3}{1.099}\approx 1.183$

Consider a branch-target buffer that has penalties of zero, two, and two clock cycles for correct conditional branch prediction, incorrect prediction, and a buffer miss, respectively. Consider a branch-target buffer design that distinguishes conditional and unconditional branches, storing the target address for a conditional branch and the target instruction for an unconditional branch.

- What is the penalty in clock cycles when an unconditional branch is found in the buffer?

0，因为其分支预测准确率是 100%，并且已经在 buffer 中不会发生 buffer miss。

**正解**：

无条件分支在缓存区中找到，存储的是目标指令，预测成功。相对于在缓冲区找到条件分支的目标地址，然后通过地址找指令，少了一次寻址。因此，时钟周期的惩罚是-1。

- Determine the improvement from branch folding for unconditional branches. Assume a 90% hit rate, an unconditional branch frequency of 5%, and a two-cycle penalty for a buffer miss. How much improvement is gained by this enhancement?

可将遇到分支的平均惩罚降低 $5\%\times\left(1-90\%\right)\times2=0.01$ 个时钟周期。

**正解**：

命中率用 $r_{hit}$ 表示、无条件分支频率用 $f_{uncFreq}$ 表示。对于使用了分支折叠的情况：

\[T = f_{uncFreq}\times [r_{hit}\times(−1) + (1 − r_{hit})\times2] \\ = 5\%\times[90\%\times(−1) + (1 − 90\%)\times2] \\ = −0.035\]对于没有使用分支折叠的情况：

\[T'= f_{uncFreq}\times (1 − r_{hit})\times2 \\ = 5\%\times(1 − 90\%)\times2 \\ = 0.01\]改进为 $T’− T = 0.01 − (−0.035) = 0.045$

An (m,n) correlating branch predictor uses the behavior of the most recent m executed branches to choose from $2^m$ predictors, each of which is an $n$-bit predictor. A two-level local predictor works in a similar fashion, but only keeps track of the past behavior of each individual branch to predict future behavior.

There is a design trade-off involved with such predictors: correlating predictors require little memory for history, which allows them to maintain 2-bit predictors for a large number of individual branches (reducing the probability of branch instructions reusing the same predictor), while local predictors require substantially more memory to keep history and are thus limited to tracking a relatively small number of branch instructions.

For this exercise, consider a (1,2) correlating predictor that can track four branches (requiring 16 bits) versus a (1,2) local predictor that can track two branches using the same amount of memory. For the following branch outcomes,

provide each prediction, the table entry used to make the prediction, any updates to the table as a result of the prediction, and the final misprediction rate of each predictor.Assume that all branches up to this point have been taken. Initialize each predictor to the following:

Table 1: branch outcomes

Branch PC(word address) Outcome 454 T 543 NT 777 NT 543 NT 777 NT 454 T 777 NT 454 T 543 T Table 2: Correlating predictor

Entry Branch Last outcome Predition 0 0 T T with one mispredition 1 0 NT NT 2 1 T NT 3 1 NT T 4 2 T T 5 2 NT T 6 3 T NT with one mispredition 7 3 NT NT

PC | Branch | Outcome | Predition | Entry | Update |
---|---|---|---|---|---|

454 | 2 | T | T | 4 | |

543 | 3 | NT | NT | 6 | NT |

777 | 1 | NT | NT | 3 | T with one mispredition |

543 | 3 | NT | NT | 7 | |

777 | 1 | NT | T |
3 | NT |

454 | 2 | T | T | 5 | |

777 | 1 | NT | T |
2 | |

454 | 2 | T | T | 5 | |

543 | 3 | T | NT |
6 | NT with one mispredition |

失败的预测在上图中已加粗，共三条，因此失败率为 $\frac{3}{9}\approx0.33$。

Table 3: Local predictor

Entry Branch Last 2 outcome(right is most recent) Predition 0 0 T,T T with one mispredition 1 0 T,NT NT 2 0 NT,T NT 3 0 NT T 4 1 T,T T 5 1 T,NT T with one mispredition 6 1 NT,T NT 7 1 NT,NT NT

PC | Branch | Outcome | Predition | Entry | Update |
---|---|---|---|---|---|

454 | 0 | T | T | 0 | T |

543 | 1 | NT | T |
4 | T with one mispredition |

777 | 1 | NT | T |
5 | NT |

543 | 1 | NT | NT | 7 | |

777 | 1 | NT | NT | 7 | |

454 | 0 | T | T | 0 | |

777 | 1 | NT | NT | 7 | |

454 | 0 | T | T | 0 | |

543 | 1 | T | NT |
7 | NT with one mispredition |

失败的预测在上图中已加粗，共三条，因此失败率为 $\frac{3}{9}\approx0.33$。