As the semiconductor industry struggles to maintain its momentum down the path following the Moore’s Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption.
However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management.
Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities.
Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area.
Vertical integration also brings new reliability challenges including TSV’s electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs.
We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit’s performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance.
Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems.
In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU’s operating points to adjust DRAM’s voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM’s resilience without sacrificing performance. The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future.
BACKGROUND AND MOTIVATION
CMOS technology has approached a critical junction where traditional device and interconnect scaling are unable to keep up with Moore’s Law. The underlying reason is that engineers are facing several dilemmas in advanced technology nodes. The first dilemma comes as chip feature size approaches the lower limits of current photolithography technology. Investment in next-generation lithography solutions is possible but costly. The second dilemma is the ever-increasing leakage current. For example, thin gate oxide results in substantial gate tunneling leakage and also sub- threshold leakage.
Employment of metal gates and high-k dielectrics is an effective approach to control the leakage current, however, its compatibility with CMOS process has raised some concerns. The third dilemma is increasing dominance of wire delay and power in future technology nodes. Limitations to DRAM bus bandwidth and speed are inherent to off-chip integration, and lead to a severe “memory wall” problem in the era of big data. Furthermore the power dissipated by such a coarse integration paradigm leads to huge dynamic power dissipation and inhibits scaling towards envisioned exascale computing systems of the future.
LOW POWER 3D CLOCK TREE SYNTHESIS
The clock tree is one of the largest and most frequently switched networks in 3D IC, making it to be a major contributor to the chip’s total power in high-performance VLSI circuit (can take up to 70% of the chip’s total power). Naturally we are interested in designing a low-power clock tree for 3D IC. One of the well-known techniques for low-power clock tree design is clock gating.
Clock gating exploits the fact that instructions are not executed with even frequency, causing spatial and temporal variations in the “on” and “off” states of the sequential logic. The clock gating technique applies control signal at certain intermediate clock tree node to shut down all its descendants’ (wires and sequential logics) clock signal and power supply, when its downstream sequential logics are inactive, thereby reducing the dynamic power dissipated by wires, buffers, and sequential logics.
MODELING AND EM-AWARE LAYOUT OPTIMIZATION FOR TAPERED TSVS
Through-Silicon-Via offers vertical connections for 3D ICs. Due to its large dimensions and non-ideal etching process, TSV’s layout needs to be carefully optimized in order to balance peak current density and delay for digital circuit. This chapter investigates the TSV’s tapering effect, (which is an inevitable byproduct of Deep Reactive Ion Etching based manufacturing) and its impact on the TSV’s electrical properties. We show that the current crowding effect is more severe in realistic tapered TSVs than ideal cylindrical TSVs.
We propose a non-uniform current density model for tapered TSVs which achieves considerable accuracy and speedup in estimating the current density distribution, when compared to existing models developed for cylindrical TSVs. We apply our model to perform a detailed study on (1) impact of TSV’s tapering on peak current density, and (2) wire sizing problem in order to minimize TSV-involved path delay under second-order delay model while keeping the peak current density within tolerable levels. A new dynamic programming based heuristic is proposed to find the optimal wire configuration which reduces both peak current density and delay thereby improving the reliability and
RELIABILITY AWARE PLACEMENT FOR 3D ICs
In Chapter 4, current density simulation and thermal-mechanical simulation were performed for a single TSV. We used the peak current density inside a TSV as an EM constraint, and formulated a wire sizing problem to optimize the delay for a TSV-involved path. However, many recent 3D IC experimental works have shown that the current density is not the only driving force for TSV’s EM. In this chapter, we start with the state-of-the-art EM analysis for TSVs, and then we develop detailed simulation framework to capture TSV’s EM transient under various factors. We also introduce the methodology to investigate 3D IC’s material fracture trend. The TSV’s EM and material fracture models are then integrated into 3D IC’s placement flow, which aims to optimize both 3D IC’s performance and reliability.
VOLTAGE NOISE INDUCED DRAM SOFT ERROR REDUCTION TECHNIQUE FOR 3D-CPUs
Three-dimensional integration enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. A dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU’s operating points to adjust DRAM’s voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM’s resilience without sacrificing performance.
CONCLUSIONS AND FUTURE WORK
3D ICs have shown promising improvements in performance and energy efficiency independent of costly transistor scaling. However, the expanded design space brought on by 3D integration imposes extra design complexities to the physical design domain, including the 3D clock tree synthesis. Furthermore, vertical vias introduce new sources of reliability degradations. In this dissertation, we present novel clock tree synthesis flow for 3D ICs. We also develop a simulation framework to capture the trend of EM degradation and thermal mechanical stress. These physical design methodologies are necessary enhance 3D IC’s performance, power and reliability, and push 3D ICs into full commercialization in the near future.
In order to deliver clock signal throughout the three dimensional space, we develop a clock tree synthesis algorithm for 3D ICs. Clock gating is applied in order to minimize the clock tree power. Different than clock gating a 2D IC, sending “enable” signal to shutdown gates requires control TSVs, which compete with clock TSVs for placement resources. In contrast with conventional clock tree synthesis flow which constructs the clock tree based only on the clock sinks’ geometric information, our flow aims to cluster clock sinks with similar switching behaviors, thus one shutdown gate can control multiple clock sinks at the same time. Besides, our clock tree synthesis flow accounts for placement white spaces for TSVs, and is able to optimally allocate clock TSVs and control TSVs such that the overall clock power is minimum.
We also investigate several reliability aware physical design methodologies. The first case study we have performed is about EM-aware delay optimization for a TSV-involved timing path. We develop a fast and accurate meshing strategy to predict the peak current density inside one TSV. The meshing strategy avoids time-consuming numerical simulation. To prevent EM degradation caused by high magnitude of current, wire sizing technique is used to control the peak current density, while optimizing for wire delay. We use dynamic programming method to solve the EM-constrained delay optimization problem, and successfully minimize interconnect delay while meeting the EM constraint. In addition, we recognize the TSV’s tapering effect, which is a byproduct of TSV manufacturing, and quantitatively summarize that TSV tapering causes higher magnitude of current in side TSVs.
Furthermore, a more advanced and accurate TSV EM model is investigated. We set up a simulation framework using FEM to investigate the impact of electrical current, thermal stress, and temperature on TSV’s EM. Due to the large mismatch of CTEs between copper and silicon, TSV and neighboring substrate area endure high magnitude of thermal stress, which dramatically accelerates the migration of atoms inside TSVs. Besides, high magnitude of thermal stress leads to delamination and cracks of the substrate. We develop accurate analytical models for TSV’s EM and TSV-induced material fracture, which replace time-consuming FEM simulations. These reliability models are then embedded into the 3D placement design flow, and we show that from locally rearranging the locations of TSVs and logic gates, circuit lifetime can be substantially increased, and thermal stress can be reduced, with litter degradation on circuit’s performance.
At last, we present a detailed simulation framework for analyzing 3D CPUs’ voltage noise induced DRAM transient fault. Significant correlations between CPU activities and DRAM layer thermal and voltage noise behaviors have been observed. We show that under certain DRAM resilience target, the currently off-the-shelf DRAM refresh period (32 or 64 ms) is not sufficient, however arbitrarily applying faster DRAM refresh rate inevitably hurts the performance. Based on our simulation framework, we propose a dynamic DRAM resilience management technique, which tunes CPU frequency and DRAM refresh rate in order to maximize performance while meeting long-term resilience target. Simulation results show that our management scheme achieves higher throughput when comparing to running at the nominal operating point.
Source: University of Maryland
Author: Tiantao Lu