体系结构HW2

post on 12 Nov 2021 about 4167words require 14min
CC BY 4.0 （除特别声明或转载文章外）
如果这些文字帮助到你，可以请我喝一杯咖啡~

Suppose we have a deeply pipelined processor, for which we implement a branch-target buffer for the conditional branches only. Assume that the misprediction penalty is always four cycles and the buffer miss penalty is always three cycles. Assume a 90% hit rate, 90% accuracy, and 15% branch frequency. How much faster is the processor with the branch target buffer versus a processor that has a fixed two-cycle branch penalty? Assume a base clock cycle per instruction (CPI) without branch stalls of one.

固定两周期分支惩罚的 CPU 的 CPI：$1+15\%\times2=1.3$
拥有 BTB 的 CPU 的 CPI：$1+15\%\times\left(\left(1-90\%\right)\times4+90\%\times\left(1-90\%\right)\times3\right)=1.1005$
后者比前者快 $\frac{1.3}{1.1005}\approx 1.18$ 倍

正解：

分支目标缓存中，计算罚时通过两种情况：缓存中有预测但是预测出错；分支不在缓存中，分别用 $P_{inBuffer}$ 和 $P_{notInBuffer}$。命中率、准确率和分支频率分别用 $P_{hitRate}$、$P_{acc}$、$P_{brFreq}$ 表示。

\[P_{inBuffer} = P_{brFreq}\times P_{hitRate}\times(1 − P_{acc})\\ = 15\% \times 90\% \times 10\% = 0.0135 \\ P_{notInBuffer}= P_{brFreq}\times(1 − P_{hitRate}) \\ = 15\% \times (1 − 90\%) = 0.015\]

因为缓存中有预测但预测失败的罚时 $T_{inBuffer}$ 是 4 个周期；缓存中没有预测的罚时 $T_{notInBuffer}$ 是 3 个周期。所以，采用分支目标缓存的时间为：

\[T = 1 + P_{inBuffer}\times T_{inBuffer} + P_{notInBuffer}\times T_{notInBuffer} = 1.099\]

使用固定周期 $T_{fixed} = 2$ 分支，罚时为：

\[T' = 1 + P_{brFreq}\times T_{fixed} = 1 + 2\times 15\% = 1.3\]

因此加速比为：$\frac{T’}{T}=\frac{1.3}{1.099}\approx 1.183$

Consider a branch-target buffer that has penalties of zero, two, and two clock cycles for correct conditional branch prediction, incorrect prediction, and a buffer miss, respectively. Consider a branch-target buffer design that distinguishes conditional and unconditional branches, storing the target address for a conditional branch and the target instruction for an unconditional branch.

What is the penalty in clock cycles when an unconditional branch is found in the buffer?

0，因为其分支预测准确率是 100%，并且已经在 buffer 中不会发生 buffer miss。

正解：

无条件分支在缓存区中找到，存储的是目标指令，预测成功。相对于在缓冲区找到条件分支的目标地址，然后通过地址找指令，少了一次寻址。因此，时钟周期的惩罚是-1。

Determine the improvement from branch folding for unconditional branches. Assume a 90% hit rate, an unconditional branch frequency of 5%, and a two-cycle penalty for a buffer miss. How much improvement is gained by this enhancement?

可将遇到分支的平均惩罚降低 $5\%\times\left(1-90\%\right)\times2=0.01$ 个时钟周期。

正解：

命中率用 $r_{hit}$ 表示、无条件分支频率用 $f_{uncFreq}$ 表示。对于使用了分支折叠的情况：

\[T = f_{uncFreq}\times [r_{hit}\times(−1) + (1 − r_{hit})\times2] \\ = 5\%\times[90\%\times(−1) + (1 − 90\%)\times2] \\ = −0.035\]

对于没有使用分支折叠的情况：

\[T'= f_{uncFreq}\times (1 − r_{hit})\times2 \\ = 5\%\times(1 − 90\%)\times2 \\ = 0.01\]

改进为 $T’− T = 0.01 − (−0.035) = 0.045$

An (m,n) correlating branch predictor uses the behavior of the most recent m executed branches to choose from $2^m$ predictors, each of which is an $n$-bit predictor. A two-level local predictor works in a similar fashion, but only keeps track of the past behavior of each individual branch to predict future behavior.

There is a design trade-off involved with such predictors: correlating predictors require little memory for history, which allows them to maintain 2-bit predictors for a large number of individual branches (reducing the probability of branch instructions reusing the same predictor), while local predictors require substantially more memory to keep history and are thus limited to tracking a relatively small number of branch instructions.

For this exercise, consider a (1,2) correlating predictor that can track four branches (requiring 16 bits) versus a (1,2) local predictor that can track two branches using the same amount of memory. For the following branch outcomes, provide each prediction, the table entry used to make the prediction, any updates to the table as a result of the prediction, and the final misprediction rate of each predictor. Assume that all branches up to this point have been taken. Initialize each predictor to the following:

Table 1: branch outcomes

Branch PC(word address) Outcome

454 T

543 NT

777 NT

543 NT

777 NT

454 T

777 NT

454 T

543 T

Table 2: Correlating predictor

Entry Branch Last outcome Predition

0 0 T T with one mispredition

1 0 NT NT

2 1 T NT

3 1 NT T

4 2 T T

5 2 NT T

6 3 T NT with one mispredition

7 3 NT NT

Branch PC(word address)	Outcome
454	T
543	NT
777	NT
543	NT
777	NT
454	T
777	NT
454	T
543	T

Entry	Branch	Last outcome	Predition
0	0	T	T with one mispredition
1	0	NT	NT
2	1	T	NT
3	1	NT	T
4	2	T	T
5	2	NT	T
6	3	T	NT with one mispredition
7	3	NT	NT

PC	Branch	Outcome	Predition	Entry	Update
454	2	T	T	4
543	3	NT	NT	6	NT
777	1	NT	NT	3	T with one mispredition
543	3	NT	NT	7
777	1	NT	T	3	NT
454	2	T	T	5
777	1	NT	T	2
454	2	T	T	5
543	3	T	NT	6	NT with one mispredition

失败的预测在上图中已加粗，共三条，因此失败率为 $\frac{3}{9}\approx0.33$。

Table 3: Local predictor

Entry Branch Last 2 outcome(right is most recent) Predition

0 0 T,T T with one mispredition

1 0 T,NT NT

2 0 NT,T NT

3 0 NT T

4 1 T,T T

5 1 T,NT T with one mispredition

6 1 NT,T NT

7 1 NT,NT NT

Entry	Branch	Last 2 outcome(right is most recent)	Predition
0	0	T,T	T with one mispredition
1	0	T,NT	NT
2	0	NT,T	NT
3	0	NT	T
4	1	T,T	T
5	1	T,NT	T with one mispredition
6	1	NT,T	NT
7	1	NT,NT	NT

PC	Branch	Outcome	Predition	Entry	Update
454	0	T	T	0	T
543	1	NT	T	4	T with one mispredition
777	1	NT	T	5	NT
543	1	NT	NT	7
777	1	NT	NT	7
454	0	T	T	0
777	1	NT	NT	7
454	0	T	T	0
543	1	T	NT	7	NT with one mispredition

失败的预测在上图中已加粗，共三条，因此失败率为 $\frac{3}{9}\approx0.33$。

Related posts

Tilelang 入门-L2 友好的矩阵乘 05 Jul 2025

DAC25 GoPTX Script 25 Jun 2025

今日此时所想之事（四） 21 Jun 2025

Loading comments...