## 24.8 A 100GB/s Wide I/O with 4096b TSVs Through an Active Silicon Interposer with In-Place Waveform Capturing

Satoshi Takaya<sup>1</sup>, Makoto Nagata<sup>1</sup>, Atsushi Sakai<sup>2</sup>, Takashi Kariya<sup>2</sup>, Shiro Uchiyama<sup>2</sup>, Harufumi Kobayashi<sup>2</sup>, Hiroaki Ikeda<sup>2</sup>

<sup>1</sup>Kobe University, Kobe, Japan,

<sup>2</sup>Association of Super-Advanced Electronics Technologies, Tokyo, Japan

Three dimensional (3D) stacking of memory chips is a promising direction for implementing memory systems in mobile applications [1-2] and for low-cost high-performance computation [3]. The requirements are extremely low power consumption, high data bandwidth, stability and scalability of operation, as well as large storage capacity with a small footprint. A digital control chip at the base of the stack is needed to efficiently access the 3D memory hierarchy, as well as to emulate a standard memory interface for compatibility. The overall performance and yields of a 3D system are constrained by vertical communication channels among the stacked chips, as well as the connections to the PCB. However, the empirical models presently used in the design stage do not properly represent the electrical and mechanical properties and performance variations of through silicon vias (TSVs) and microbumps (µBumps). What is needed are circuit techniques that handle such uncertainties to enable the creation of robust 3D data links. This paper presents a complete test vehicle for TSV-based wide I/O data communication in a three-tier 3D chip stack assembled in a BGA package. In-place eye-diagram and waveform capturers are mounted in an active silicon interposer to characterize vertical signaling through the chain of TSVs and µBumps.

The test vehicle is shown in Fig. 24.8.1. The wide I/O TSV data bus of 4096b is capable of 100GB/s source synchronous bidirectional data transfer at 200MHz and 0.56mW/Gb/s with a 1.2V supply. The three-tier stack has a memory chip (MEM) on the top, an active silicon interposer (ASIP) in the middle, and a logic chip (LOGIC) at the bottom, all fabricated in 90nm CMOS technology. The silicon area of each chip is  $9.9 \times 9.9 \text{mm}^2$ . Via-last 50µm pitch Cu TSV and chip-tochip stacking processes [4] are used. Vertical connections with more than 7.3k TSVs and the same number of µBumps are densely integrated. The stack is mounted on an FR-4 interposer of a 527-pin BGA and assembled with a system PCB.

In write (W) mode, words with a data width of 4096b are sent from LOGIC and stored in the 800KB of SRAM in MEM, while in read (R) mode, words from MEM are sent to LOGIC. Besides being able to compare the received data bits with the expected ones, the built-in self test (BIST) mechanisms have W and R modes capable of at-speed generation of data bits. The bit patterns and sequences are defined by the data generator macros of RG and WG and have a variety of formats, such as all bits alternating in a "checkerboard" style  $(5 \rightarrow A \rightarrow 5 \rightarrow A \rightarrow ...)$ or in a "plain" style  $(0 \rightarrow F \rightarrow 0 \rightarrow F \rightarrow ...)$ , or all bits fixed as 0 or 1. The bytes selected from the data bits can be masked for partial data transfers. During BIST, the erroneous bits are detected by the data checker macros of RC and WC in accordance with the selected format of the data bits. The number of failed bits is continuously stored in a fail register during repeated operations. The BIST and wide I/O configurations, together with status information, are defined in the respective registers and accessed under the I2C protocol. The I2C transactions and scan chains for test and debug are all through vertical communication channels in parallel with the wide I/O linkage.

The wide I/O bus is divided into 8 parallel banks (see Fig. 24.8.2). Each bank has two TSV arrays (64×7 and 64×6) containing mini I/O channels of 512b and additional 16b for 32:1 redundancy. Power ( $V_{DDM}$ ) and ground ( $V_{SS}$ ) pins for the mini I/O circuits are placed every 5 columns. Each mini I/O circuit consists of a pair of driver and receiver buffers and a bus keeper. The driver has 4-levels of drive strengths. The redundant bits and selectable driver strengths make the wide bus operation adaptable to the conditions of the 3D chip stack.

The silicon interposers in the stack provide fine-pitch horizontal and vertical routing channels for accommodating TSVs and  $\mu$ Bumps of different dimensions at various locations among the chips. Such accommodations will be needed

when chips from different suppliers are assembled in a stack. By putting the waveform capturer of [5] on a silicon interposer, in-place evaluations of signal and power integrity within a stack become possible by snooping the waveforms through the vertical channels. The analog and digital components of the capturer are fully integrated in the ASIP (Fig. 24.8.3). The capture uses 3.3V devices, enabling coverage of full swing signals driven by the mini I/O circuit at 1.2V and compatibility with the low-cost CMOS process of the Si interposer. The resolution of the timing and voltage is 10b, and the size of step is configurable at the finest resolutions of 10ps and 0.5mV. The redundant channels, as well as the V<sub>DD</sub> and V<sub>SS</sub> pins in every wide I/O bank, are selectable for monitoring. The capturer includes up to 150 probes and makes it possible to diagnose widely distributed vertical data channels hidden within a 3D stack structure.

Figure 24.8.4 summarizes the measured 4096b wide I/O performance. The throughput of 100GB/s is achieved with a standard supply voltage of 1.2V. The power supply for the mini I/O circuits ( $V_{DDM}$ ) is separated from the rest of the digital circuits. The power consumption current of the mini I/O circuits is 385mA in total under high switching activity (5 $\rightarrow$ A) or (0 $\rightarrow$ F). The energy efficiency is 0.56mW/Gb/s. The power consumption falls to 7mW when all bits are transferred but not changed from constant 0 or 1. The throughput and energy efficiency are superior to standard mobile memory wide I/O specifications with 512 channels at 12.8GB/s [1-2], whereas the power consumption is higher than the low-power-oriented custom 3D I/O circuits reported in [6]. The power consumption of the mini I/O circuits can potentially be reduced by co-optimizing the design with the TSV and CMOS technologies.

We confirmed that the wide I/O performance does not change when using redundant bits for in-place waveform capture. This proves that the input capacitance of the capturer has a negligible impact on vertical signaling. Also, the design of a single TSV per vertical channel has a sufficiently high yield, and the redundant bits were not needed in any of the tested samples.

The in-place captured waveforms confirm that the second strength (0.5mA) of the mini I/O drivers is the optimum choice for full swing vertical signal transmission (Fig. 24.8.5). The dynamic power noise in  $V_{\text{DDM}}$  has a slightly larger amplitude than in the unified  $V_{\text{SS}}$ ; however, it remains less than 20% of the signal swing. This implies complete integration, as well as production of the vertical power supply networks in the stack. The eye diagrams show high quality signaling in the data bus.

In summary, the test vehicle is capable of very wide I/O vertical data transfer, and it features embedded BIST and in-place waveform capture for testing, diagnosis, and characterization. The process specifications of the test vehicle are summarized in Fig. 24.8.6. The die photo is given in Fig. 24.8.7.

## Acknowledgements:

The authors would like to thank T. Sato and D. Kosaka for their technical contributions. This work was supported by the "Dream Chip Project" of NEDO.

## References:

J.-S. Kim, *et al.*, "A 1.2V 12.8GB/s 2Gb Mobile Wide-I/O DRAM with 4×128
I/Os Using TSV-Based Stacking," *ISSCC Dig. Tech. Papers*, pp. 496-497, 2011.
J. Roullard, *et al.*, "Evaluation of 3D Interconnect Routing and Stacking Strategy to Optimize High Speed Signal Transmission for Memory on Logic," in *IEEE Electronic Components and Technology Conf.*, pp. 8-13, 2012.

[3] J. Jeddeloh and B. Keeth, "Hybrid Memory Cube New DRAM Architecture Increases Density and Performance," *IEEE Symp. VLSI Circuits*, pp. 87-88, 2012.

[4] H. Takatani, *et al.*, "PDN Impedance and Noise Simulation of 3D SiP with a Widebus Structure," *IEEE Electronic Components and Technology Conf.*, pp. 673-677, 2012.

[5] T. Hashida and M. Nagata, "An On-Chip Waveform Capturer and Application to Diagnosis of Power Delivery in SoC Integration," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 789-796, 2011.

[6] Y. Liu, *et al.*, "A Compact Low-Power 3D I/O in 45nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 142-143, 2012.



| Logic Active                     | e Si Interposer M |  |  |
|----------------------------------|-------------------|--|--|
| Figure 24.8.7: Chip photographs. |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |
|                                  |                   |  |  |