## 20.2 A 0.14pJ/b Inductive-Coupling Inter-Chip Data Transceiver with Digitally-Controlled Precise Pulse Shaping

Noriyuki Miura<sup>1</sup>, Hiroki Ishikuro<sup>1</sup>, Takayasu Sakurai<sup>2</sup>, Tadahiro Kuroda<sup>1</sup>

<sup>1</sup>Keio University, Yokohama, Japan, <sup>2</sup>University of Tokyo, Tokyo, Japan

One of the main applications of System-in-a-Package (SiP) is a high-performance and yet power-aware system such as HDTV on a portable device. H.264/AVC for 1080HDTV requires 23.1Gb/s data bandwidth between microprocessors and memories. The data link should consume as low as 0.4pJ/b energy in order to keep the total power dissipation below 10mW. But conventional technologies consume much larger energy: 1.6pJ/b in a microbump technology [1] and 2.8pJ/b in an inductive-coupling transceiver [2]. This paper reports energy reduction in an inductivecoupling transceiver from 2.8pJ/b to 0.14pJ/b without degrading the data rate. The energy dissipation is the lowest published to date and far lower than [1-16] (Fig. 20.2.1). Precise pulse shaping reduces the transmitter's energy by 1/20, while device scaling lowers the receiver's energy by 1/20.

The shape of the transmit pulse is illustrated in Fig. 20.2.2. The transmitter's energy dissipation,  $E_{TX}$  is determined by the transmit pulse shape and given by,  $E_{TX}=V_{DD}I_P\tau$ , where  $I_P$  is a pulse amplitude and  $2\tau$  is a pulse width. A received pulse signal,  $V_R$ , is induced by  $MdI_T/dt$ , and the amplitude is given by  $V_P=2MI_P/\tau$ . When the communication distance and the inductor size are given (i.e. M is given), a pulse slew rate,  $S_P=I_P/\tau$ , is determined by the Bit-Error Rate (BER) requirement. By reducing  $\tau$  while keeping the slew rate,  $E_{TX}$  is reduced by  $\tau^2$ , as  $E_{TX}=V_{DD}S_P\tau^2$  indicates. On the other hand, since a receiver (latch comparator [2]) is a digital CMOS circuit, the receiver's energy dissipation,  $E_{RX}$ , is given by  $E_{RX}=CV_{DD}^2$ , and it is effectively reduced by device scaling.

Figure 20.2.2 depicts the pulse shaping circuit that consists of pulse width, pulse slew rate and pulse amplitude controls. In the pulse width control, a 4-phase clock generator provides  $0^{\circ}$ ,  $45^{\circ}$ ,  $90^{\circ}$ ,  $180^{\circ}$  clocks to two Phase Interpolators (PIs). The left PI interpolates a clock phase between  $90^{\circ}$  and  $180^{\circ}$  by 1/256-UI (4ps) step. The right PI is a dummy circuit that always outputs  $0^{\circ}$  clock. A succeeding AND gate generates a pulse clock that determines the pulse width,  $\tau$ . The pulse slew rate is digitally controlled by variable capacitors. The pulse amplitude is digitally controlled by changing channel width of the NMOS in the H-bridge driver.

Since the receiver's timing margin is reduced by reducing the pulse width, an accurate timing design is necessary to maintain the BER<10<sup>-12</sup>. A clock transceiver is modified to reduce the timing jitter in the received clock, *Rxclk*. The timing jitter caused by supply noise and temperature variations can be effectively rejected as common-mode noise by the clock link that is located adjacent to the data link. A sampling timing control adjusts the timing by 4ps step to overcome timing shift due to process variations.

Figure 20.2.3 shows two sets of stacked test chips. One is fabricated in 0.18 $\mu$ m CMOS where the transceiver with the pulse shaping circuit is implemented. Energy reduction in the data transmitter is evaluated in the same process employed in [2]. The other is fabricated in 90nm CMOS to measure energy reduction in the data receiver due to device scaling. In both 0.18 $\mu$ m and 90nm CMOS, a 10 $\mu$ m-thick transmitter chip is stacked on top of a receiver chip and the communication distance including an adhesive layer is 15 $\mu$ m. The data transceiver communicates at 1Gb/s by a 30 $\mu$ m diameter metal inductor. A clock transceiver in the 0.18 $\mu$ m CMOS chip provides 1GHz clock link by a 200 $\mu$ m diameter metal inductor. The communication distance and inductor size are the same as [2].

Timing jitter in the clock transceiver is reduced to receive the narrow pulse correctly. The clock transceiver and simulated waveforms are depicted in Fig. 20.2.4(a). A conventional clock transceiver in [2] transmits a sinusoidal clock whose low slew rate increases the timing jitter when the transmitter generates  $I_{rc}$  and the receiver latches *Rxclk*. On the other hand, the clock

transceiver here transmits a rectangular clock of higher slew rate to solve this problem. Figure 20.2.4(b) depicts measured timing jitter of the received clock. By increasing the slew rate, the timing jitter is reduced to  $4.8 ps_{rms}$  at minimum. It is half of the timing jitter of the conventional clock transceiver.

Figure 20.2.5(a) presents measured timing bathtub curve dependence on the pulse amplitude. The minimum pulse amplitude required for a BER<10<sup>-12</sup> is seen to be 60mV. Figure 20.2.5(b) presents the bathtub curve dependence on the pulse width. It is confirmed that  $E_{\tau x}$  is reduced by  $\tau^2$ . When  $\tau$  is set to the minimum pulse width of 60ps,  $E_{\tau x}$  is reduced by 1/17 (0.13pJ/b) and the timing margin for a BER<10<sup>-12</sup> is 25ps.

Figure 20.2.6 shows the measured BER dependence on supply noise. Since the clock link is located adjacent to the data link, timing jitter caused by supply noise is effectively rejected and suppressed within the timing margin of 25ps. Therefore, the data transceiver exhibits sufficiently high tolerance against a supply noise of 350mV. It is much larger than the 69mV supply noise monitored in a product-level microprocessor for 3G cellular phones [17]. Chip performance in both 0.18µm and 90nm CMOS is summarized in Fig. 20.2.7.

## Acknowledgements:

This work is supported by CREST/JST. The authors are grateful to M. Tago, M. Mizuno and Y. Nakagawa with NEC Corporation for their assistance in measurements. The 0.18 $\mu$ m CMOS VLSI chip in this study was fabricated in TSMC. The 90nm CMOS VLSI chip in this study was fabricated through the chip fabrication program of VDEC, the University of Tokyo, with the collaboration by STARC, Fujitsu Limited, Matsushita Electric Industrial Company Limited, NEC Electronics Corporation, Renesas Technology Corporation, and Toshiba Corporation.

## References:

[1] K. Kumagai, et al., "System-in-Silicon Architecture and its Application to H.264/AVC Motion Estimation for 1080HDTV," *ISSCC Dig. Tech. Papers*, pp. 430-431, Feb., 2006.

[2] N. Miura, et al., "A 1Tb/s 3W Inductive-Coupling Transceiver for Inter-Chip Clock and Data Link," ISSCC Dig. Tech. Papers, pp. 424-425, Feb., 2006.

[3] J. Yamada, et al., "High-Speed Interconnect for a Multiprocessor Server Using Over 1Tb/s Crossbar," ISSCC Dig. Tech. Papers, pp. 108-109, Feb., 2006.

[4] N. Miura, et al., "A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip Wireless Superconnect with Transmit Power Control Scheme," *ISSCC Dig. Tech. Papers*, pp. 264-265, Feb., 2005.

[5] K. Chang, et al., "Clocking and Circuit Design for a Parallel I/O on a First-Generation CELL Processor," ISSCC Dig. Tech. Papers, pp. 526-527, Feb., 2005.

[6] R. Drost, et al., "Electronic Alignment for Proximity Communication," *ISSCC Dig. Tech. Papers*, pp. 144-145, Feb., 2004.

[7] G. Paul, et al., "A Scalable 160Gb/s Switch Fabric Processor with 320Gb/s Memory Bandwidth," *ISSCC Dig. Tech. Papers*, pp. 410-411, Feb., 2004.

[8] D. Mizoguchi, et al., "A 1.2Gb/s/pin Wireless Superconnect Based on Inductive Inter-Chip Signaling (IIS)," ISSCC Dig. Tech. Papers, pp. 142-143, Feb., 2004.

[9] K. Tanaka, et al., "A 100Gb/s Transceiver with GND-VDD Common-Mode Receiver and Flexible Multi-Channel Aligner," *ISSCC Dig. Tech. Papers*, pp. 264-265, Feb., 2002.

[10] P. Landman, et al., "A 62Gb/s Backplane Interconnect ASIC based on 3.1Gb/s Serial-Link Technology," *ISSCC Dig. Tech. Papers*, pp. 52-53, Feb., 2002.

[11] T. Tanahashi, et al., "A 2Gb/s 21CH Low-Latency Transceiver Circuit for Inter-Processor Communication," *ISSCC Dig. Tech. Papers*, pp. 60-61, Feb., 2001.

[12] R. Nair, et al., "A 28.5GB/s CMOS Non-Blocking Router for Terabit/s Connectivity between Multiple Processors and Peripheral I/O Nodes," *ISSCC Dig. Tech. Papers*, pp. 224-225, Feb., 2001.

ISSCC Dig. Tech. Papers, pp. 224-225, Feb., 2001. [13] M. Fukaishi, et al., "A 20Gb/s CMOS Multi-Channel Transmitter and Receiver Chip Set for Ultra-High Resolution Digital Display," ISSCC Dig. Tech. Papers, pp. 260-261, Feb., 2000.

Tech. Papers, pp. 260-261, Feb., 2000. [14] T. Takahashi, et al., "110GB/s Simultaneous Bi-Directional Transceiver Logic Synchronized with a System Clock," ISSCC Dig. Tech. Papers, pp. 176-177, Feb., 1999.

Papers, pp. 176-177, Feb., 1999. [15] Y. Ohtomo, et al., "A 40Gb/s 8×8 ATM Switch LSI using 0.25µm CMOS/SIMOX," ISSCC Dig. Tech. Papers, pp. 154-155, Feb., 1997.

[16] Y. Unekawa, et al., "A 5Gb/s 8×8 ATM Switch Element CMOS LSI Supporting Five Quality-of-Service Classes with 200MHz LVDS Interface," ISSCC Dig. Tech. Papers, pp. 118-119, Feb., 1996.

[17] Y. Kanno, et al., "In-Situ Measurement of Supply-Noise Maps with Millivolt Accuracy and Nanosecond-Order Time Resolution," Symp. VLSI Circuits, pp. 78-79, June 2006.



Continued on Page 608

20

|                                               | This                                               | Work                                     | <sup>[2]</sup> Previous Work               |
|-----------------------------------------------|----------------------------------------------------|------------------------------------------|--------------------------------------------|
| Energy Dissipation<br>in Data Link            | <b>0.14pJ/b</b><br>(Tx:0.11pJ/b, Rx:0.03pJ/b)      | 0.33pJ/b<br>(Tx:0.13pJ/b, Rx:0.2pJ/b)    | <b>2.8pJ/b</b><br>(Tx:2.2pJ/b, Rx:0.6pJ/b) |
| Energy Dissipation<br>in Clock Link           | No Data                                            | <b>3.5pJ/b</b><br>(Tx:1.5pJ/b, Rx:2pJ/b) | <b>12.5pJ/b</b><br>(Tx:6.5pJ/b, Rx:6pJ/b)  |
| Jitter<br>in Clock Link                       | No Data                                            | 4.8ps-rms                                | 9.5ps-rms                                  |
| Process                                       | 90nm CMOS<br>(V <sub>DD</sub> =1V)                 | 180nm<br>(V <sub>DD</sub> =              | CMOS<br>1.8V)                              |
| Data Rate                                     |                                                    | 1Gb/s                                    |                                            |
| Bit Error Rate                                |                                                    | <10 <sup>-12</sup>                       |                                            |
| Clock Rate                                    | 1GHz                                               |                                          |                                            |
| Channel Area                                  | 30µm x 30µm                                        |                                          |                                            |
| Distance                                      | 15μm (Chip Thickness:10μm, Adhesive Thickness:5μm) |                                          |                                            |
| , <b>,</b> ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |
|                                               |                                                    |                                          |                                            |