[转帖]计算机体系结构-（2）内存数据保持和刷新

计算机,体系结构,内存,数据,保持,刷新 · 浏览次数 : 0

小编点评

## RAM 数据保留和刷新 **问题:** 如何才能降低 RAM 中的刷新时间？ **解决方案:** 通过在 64 ms 内刷新所有 row，从而利用部分 row 的缓存和带宽。 **关键思想:** * Manufacturing process variation 导致不同 row 的保时时间不相同。 * 部分 row 的保时时间比其他 row 更长。 * 这些特殊的 row 可以通过特殊的刷新技术被缓存。 **具体步骤:** 1. **识别 row 的保留时间:** * 观察 overwhelming majority of DRAM rows 可以刷新更频繁。 * 模拟 128 ms 和 256 ms 的刷新时间下的失效率。 * 通过观察失效数量和时间来确定 row 的保留时间。 2. **分段存储:** * 将 row 分组到不同的 bin 中。 * 每个 bin 包含具有相似保时时间的 row。 3. **刷新的策略:** * 根据 bin 的保时时间选择合适的刷新的策略。 * 例如，使用 Bloom过滤器可以高效地维护 bin 的成员信息。 4. **降低硬件成本:** * 降低刷新频率可以显著降低硬件成本。 * 通过使用缓存和数据分发技术，可以进一步降低硬件成本。 **挑战:** * 获取 row 的保时时间可能难度很大。 * 由于 bin 的组织方式，确定刷新的策略可能困难。 * 需要对硬件进行优化，例如增加缓存和带宽。

正文

https://zhuanlan.zhihu.com/p/433151653
复制

本人lino，即将毕业的研究生，在此记录下学习过程。本次记录跟随是苏黎世邦理工大学的计算机体系结构课程。

当在memory中存储数据时，数据的保留是个问题，可能会丢失这个数据。因此本次内容围绕着DRAM进行深度探索，了解其数据的保持和刷新。

其中电容是storage device，晶体管是access device。wordline给晶体管足够的开启电压时，the charge from the capacitor gets shared through the bitline。在dram中基本的问题是the capacitor charge leaks over time问题，当不去访问时，电容和bitline之间会有rc电导使得电容里的电荷漏电，当然这只是其中一个漏电的路径，还有很多。漏电还和温度有关，高温会导致漏电更快，因此需要刷新的更快，普通情况下大概64ms刷新一次，总结下downsides of refresh。

Energy consumption：Each refresh consumes energy

Performance degradation：DRAM rank/bank unavailable while refreshed

Qos/predictability impact:（Long）pause times during refresh--(当处理器要访问memory时遇见refresh产生的脉冲会造成处理器等待延时)

Refresh rate limits DRAM capacity scaling

如果想提高DRAM的能力，one is the cells become smaller，another is putting more cells，但是refresh限制了这些。而且cell做的越小，电容能储存的电荷越少，漏电越快，并且也很容易受噪声的影响， refresh rate increases with cell density。下图是Device capacity以及所对应的refreshing,随着capacity的增加，46%的时间都用来刷新了，这就意味着处理器有46%的时间是不能访问DRAM的。

How Do we solve the problem？

一个关键的思想就是我们在64ms时间内需要刷新所有的rows吗？引发讨论，可不可以只刷新那些处理器分配使用的rows，而没有用的就不用刷新？假设8G的dram，只有到4G，那剩下的4G部分的rows就不用刷新，答案目前还没有这么做因为，我们没有interface去通知dram controller处理器分配了哪些。然后引出

what if we knew what happened underneath (in DRAM cells) and exposed that information to upper layers?

实际上，我们不需要64ms刷新所有的cells，一部分cells存储了很多的charge，not leaky，这些cells可以保存数据tens of seconds,当然也有一部分cells是非常leaky，small的，他们不得不每64ms刷新一次。根据这个profile我们可以看出，每次64ms刷新的cell占比很小。

但是为什么会有这个profile呢？原因是Manufacturing is not perfect，not all DRAM cells ram exactly the same，some are more leaky than the others。这些被称作Manufacturing Process Variation。所以这是我们可以充分利用这个profile的一个机会。

在假定我们知道each row的保留时间下提出如下问题：

what can we do with this information?

who do we expose this information to?

how much information do we expose?(如果expose的多，那么进行验证的数量也变得复杂)

how do we determine this profile information ?

DRAM刷新间隔和累计cell失败可能性的关系如下：

128ms刷新一次大约会有30cell失效，256ms刷新一次大约会有1000个cell失效。可以观察到overwhelming majority of DRAM rows can be refreshed much less often without losing data。

因此提出如下的方式（可参考RAIDR:retention-aware Intelligent DRAM Refresh）：

1.profiling:Identify the retention time of all DRAM rows

2.binning:store rows into bins by retention time

-> use bloom filters for efficient and scalable storage

3.Refreshing:Memory Controller refreshes rows in different bins at different rates

实验结果和仿真如下。可见硬件成本，刷新率，energy等都明显减少。

进行dig deeper:how to make RAIDR working? 具体步骤细节如下

该步骤的挑战如下：

data pattern dependence of retention time

----Retention time of a DRAM cell depends on its value and the values of cells nearby it

Variable retention time phenomenon

data pattern dependence of retention time

具体解释如下：当一行被activated时，all bitlines 同时被扰乱，因为当activate a row时电容进行漏电，这导致bitlines被扰乱，这种扰乱被称作noise，这种noise将影响DRAM，同时噪声也收到附近cells值的影响。

可以认为有如下连接：

Bitline-bitline coupling ->electrical coupling between adjacent bitlines

Bitline-wordling coupling ->electrical coupling between each bitline and the activated wordline

所以cell中数据的保留时间取决于data pattens stored in nearby cells。所以我们应该找出the worst data patten to find worst-case retention time。但是memory controller很难知道which bits interfere with each other，因为Opaque mapping of addresses to physical DRAM geometry，在DRAM里addresses进行了remapping，并且对于多余的将会映射到share row，remapping of faulty bitlines/wordlines。并且随着cells越来越小，这个coupling noise的model就越难建立，很不易被发现。

Variable retention time phenomenon

DRAM cell的保留时间是随机改变的，如下图。

Binning

该方法是使用bloom filter。

Probabilistic data structure that compactly represents set membership(presence or absence of element in a set)

Non-approximate set membership:Use 1 bit per element to indicate absence/presence of each element from an element space of N elements

Approximate set membership:use a much smaller number of bits and indicate each element's presence/absence with a subset of thise bits

这里教授也对bloom filter进行了展开，当然这篇文章写的也很简明。

布隆过滤器(Bloom Filter)的原理和实现www.jianshu.com/p/88c6ac4b38c8

优缺点如下：

总结下RAIDR Refresh Controller：

Choose a fresh candidate row

Determin which bin the row is in

Determine if refreshing is needed

然后这是RAIDR实现的位置如下图。

当然也可以在Memory Controller里面实现。

发布于 2021-11-16 21:23