Result: SMART Grid On Chip: Infusing intelligence to on-chip energy management
Further Information
Scaling petaflop supercomputers to exascale requires a 500x increase in FLOPS, but at the cost of only a 3x increase in the total power consumption. In addition, computing systems implemented with nanometer scale multi-gate field effect transistors are increasingly integrating heterogeneous cores, GPUs, and accelerators. The power delivery and energy management of such complex chip multi-processors (CMP) is, therefore, a challenging research task spanning across the system, architecture, circuits, and device stack. A set of on-chip energy management techniques that are similar to supply side and demand side management in a SMART grid is developed. The techniques span the circuit and system layers. Intelligence is introduced in the operation of the on-chip power distribution network (PDN) to sense variations in the computational activity across cores and reconfigure the PDN to optimize the energy efficiency. A circuit level technique that improves the energy efficiency through the implementation of under-provisioned on-chip voltage regulators (OCVRs) interconnected through a switch network is developed. An operating system level task scheduling heuristic distributes the workloads on the cores such that the required reconfiguration of the PDN is minimized. SPICE simulations indicate up to a 44% reduction in the energy consumption of the CMP. An evolving on-chip power delivery methodology where reference voltages of the OCVRs are controlled through a particle swarm optimizer (PSO) is developed. The PSO negates the effects of transistor aging and process, temperature, and power supply noise induced variation in the load circuit, OCVRs, and on-chip timing sensors. The simulation results indicate an average reduction of 35% and 5% in, respectively, the power consumption and operating temperature of the voltage domains. In addition, the end of life of the voltage domain is prolonged due to a mean reduction in the aging induced Vth shift of 40%. Novel circuit techniques to detect and set the power supply voltages and suppress power supply noise are developed. The run-time circuit techniques for power supply voltage detection and clamping are demonstrated for a heterogeneous 3-D integrated circuit through SPICE simulation of a device plane in a 22 nm technology and a power plane in a 45 nm technology. The power supply voltages of less than 1 V are successfully set and provided as a reference to an OCVR within 500 ns of initiating an active state and with variation of less than 1% in the reference voltage. Noise on the power supply is suppressed through the use of hyperabrupt p-n junction varactors as on-chip decoupling capacitors. The voltage droops and overshoots on the on chip power distribution network are suppressed by up to 60% as compared to metal insulator metal (MIM) or deep trench decoupling capacitors of the same capacitance. With approximately 42% and 15% of the data center power consumed by, respectively, the processors and the cooling system, the run-time energy management techniques developed in this thesis have significant potential to reduce the running cost of exascale data centers.