Date(s) - 14 Apr 2014
1:00 PM - 2:00 PM
3043 ECpE Building Addition
Title: Energy-Efficient and Cost-Effective Schemes of Memory Error Protection
Speaker: Zhao Zhang, Associate Professor
Abstract: Computer memory reliability is increasingly a concern as memory cell density scales up. Two large-scale evaluations on data center computers have shown alarmingly high memory raw error rates. Meanwhile, memory performance and memory energy consumption continue to be first-order considerations in multi-core computer design. Conventional memory error protection schemes, used in workstation and server computers, are energy inefficient and/or limited in their protection strength. The majority of consumer-level computers and devices have no memory error protection at all.
This talk presents two energy-efficient and cost-effective schemes of memory error protection. The first scheme, called enhanced embedded ECC (E3CC), adds ECC protection to conventional non-ECC memories and power-efficient, sub-ranked memories. ECC bits are embedded into the same DRAM rows that store the related data bits, reducing memory energy consumption. The scheme includes a novel, non-power-of-two memory address mapping based on the Chinese Remainder Theorem, plus a couple of design optimizations. The second scheme, called MemGuard, uses log hash to provide strong detection against multi-bit errors. It incurs negligible hardware cost and energy overhead, and no memory storage overhead. Coupled with OS-based checkpointing for error recovery, MemGuard may substitute ECC memory in consumer-level computers and mobile devices, or complement SECDED ECC memory or Chipkill Correct memory by providing even stronger multi-bit error detection.