# A 5.4-mW 4-Gb/s 5-Band QPSK Transceiver for Frequency-Division Multiplexing Memory Interface

Wei-Han Cho, Yilei Li, Yanghyo Kim, Po-Tsang Huang, Yuan Du, Sheau Jiung Lee, and Mau-Chung Frank Chang

University of California, Los Angeles, CA 90095-1594, USA

Abstract — This paper presents a novel self-equalized and skewless frequency-division multiplexing memory interface. To prove its feasibility, we have realized a 5-band QPSK transceiver in 40 nm CMOS to transmit up to 4 Gb/s through 10 orthogonal communication channels (each with 400 Mb/s) via on-chip TSV emulator with effective loading of 1 pF or 5cm FR-4 PCB trace. With differential current-mode signaling, the transceiver consumes only 5.4 mW and takes only 80×100  $\mu$ m<sup>2</sup>. A real-time flexible BER testing platform is established to prove that the BER of the transceiver is less than 10<sup>-12</sup>.

*Index Terms* — RF interconnects, multi-band transceiver, current-mode Schmitt trigger, BER testing platform.

### I. INTRODUCTION

The interface between computing units and main memories has been dominated by DDR technologies for decades. While serializing parallel data in time domain, DDR4 transceiver was implemented as a 16:1 time-division multiplexing SerDes with total data rate of 3.2 Gb/s [1]. To ensure signal integrity at such high data rate, various design features, such as on-die termination and equalizers, has been integrated in DDR4 I/O circuits. Also, because of the narrow data eye width (<300 ps), delay-lock loop (DLL) is required to reduce the skew between DQS and DQ signals from unbalanced traces. More recent Wide-I/O memory interface takes advantage of TSV 3DIC technology for higher bandwidth and lower energy/bit by increasing the number of I/Os (up to 4096 TSVs) at the expense of higher packaging cost [2]-[3].

As an alternative, this paper proposes a new frequencydivision multiplexing (FDM) architecture that can offer simultaneous and orthogonal communication channels in the frequency domain to link high speed data comparable to that of DDR but with self-equalization and zero skew between DQS and DQ signals. For the FDM memory interface, we implement a multi-band QPSK transceiver which can operate over five frequency bands each at  $f_1$  = 1.6 GHz,  $f_2$  = 2.4 GHz,  $f_3$  = 3.2 GHz,  $f_4$  = 4 GHz, and  $f_5$  = 5.2 GHz, respectively (Fig. 1). With up-to-400-Mb/s data on each channel, the transceiver can achieve a total bandwidth of 4 Gb/s while it consumes only 5.4 mW and takes only 80×100 µm<sup>2</sup>.



Fig. 1. Channel spectrum of the FDM memory interface with 5-band QPSK modulation.



Fig. 2. Illustration of self-equalized QPSK modulation.

## II. FREQUENCY-DIVISION MULTIPLEXING MEMORY INTERFACE

In the FDM memory interface, each frequency band can carry multiple bits of data depending on the modulation scheme. In the case of QPSK modulation, two bits of data are modulated by two orthogonal carrier, in-phase (I) and quadrature (Q), at the same frequency. With each carrier, the up-converted signal has two sub-bands, the upper sideband (USB) and the lower sideband (LSB), which are identical but mirrored over frequency to each other (Fig. 2). After passing through a linear time-invariant (LTI) system with straight downward (in linear-linear scale) low-pass response, USB with more attenuation and LSB with less attenuation can be mixed down and reconstruct a baseband signal equally attenuated over frequency. In real cases with usually curved downward response (or straight downward after  $f_{3dB}$  in log-log scale), the baseband signal can be either slightly peaking (with concave curve after  $f_{3dB}$ ) or slightly damping (with protruding curve before and at  $f_{3dB}$ ). Either way, the signal integrity is better than that of NRZ signals in mainstream memory interface, and thus none or less equalization circuitry is needed.

With 5 QPSK-modulated frequency bands, 10 bits of signals (1 DQS, 1DM and 8 DQ) can be simultaneously transmitted on a shared transmission line (TML). Within common channel medium used in memory applications (e.g. FR-4 PCB, silicon interposer, TSV), group delay variance is negligible over the 5 chosen bands. Therefore



Fig. 3. Block diagram of the 5-band QPSK transceiver.

among the 10 bits, skew between DQS and DQ signals is inherently negligible and thus no DLL is required.

The frequency allocation has been chosen to avoid severe inter-channel interference (ICI). With minimum spacing of 800 MHz, two cascaded 2<sup>nd</sup>-order low pass filter with combined  $f_{3dB}$  of 200MHz can suppress offband ICI of adjacent bands by more than 20 dB. Also, the lowest band at 1.6 GHz accompanies a 3<sup>rd</sup>-order harmonic component around 4.8 GHz and thus the highest band is shifted to 5.2 GHz to reduce in-band ICI. Considering both in-band and off-band interferences, the signal-tointerference ratio (SIR) for each band is greater than 16 dB. Note that the 2<sup>nd</sup>-order harmonic component in this system has been eliminated with fully differential architecture.

## III. 5-BAND QPSK TRANSCEIVER

Fully differential architecture is adopted in this design not only because of its even-order harmonic suppression effect. Compared with single-ended voltage-mode signaling in mainstream memory interface, differential current-mode signaling induces much less simultaneous switching noise (SSN). Also, the differential current-mode signaling is less sensitive to supply and electromagnetic noise due to the common-mode rejection characteristic of fully differential architecture.

As shown in Fig. 3, the 5-band QPSK transceiver is composed of five parallel TX slices and five parallel RX slices, each operating at allocated frequency band. Each TX slice includes two differential current-steering DACs, two fully differential mixers, two 2X current-mirror output buffers and one CML divider to generate I and Q carriers from external oscillators. The DAC output current swings from 10 µA to 50 µA at each end to attain a signal level of 40  $\mu A_{pp}$  with common mode of 30  $\mu A$  (Fig. 4). After merging 10 parallel outputs, the 5-band QPSK transceiver drives an 80-mV<sub>pp</sub> signal onto a differential 100- $\Omega$  TML. The DAC bottom current (10  $\mu$ A) is chosen to ensure the RX impedance matching, since the RX input buffer is directly biased by TX output current. The RX input buffer is embedded with additional bias circuitry to reduce the TX PVT variation effect on RX filters (Fig. 4). The RX



Fig. 4. Schematics of the differential current-steering DAC and the receiver input buffer.



Fig. 5. Schematics of the current-mode low-pass filters and the current-mode Schmitt trigger.

input buffer evenly distributes the received current to 10 separate fully differential mixers, which connect to current-mode low-pass filters. The current-mode low-pass



Fig. 6. Micrograph of the test chip with both TX/RX and 1-pF on-chip interconnection to emulate TSV loading in 3DIC. (TX:  $80 \times 35 \ \mu m^2$ ; RX:  $80 \times 65 \ \mu m^2$ )



Fig. 7. (a) Demodulated 400-Mb/s  $2^{31}$ -1 PRBS eye diagrams of I/Q channels at  $f_1$  (upper) and  $f_2$  (lower); (b) 250-Mb/s  $2^{31}$ -1 PRBS eye diagrams of original (upper) and demodulated (lower) DQ/DQS.

filter is designed with two complex poles and with current gain of 3 (Fig. 5). Two cascaded filters set the system  $f_{3dB}$  to 200 MHz and attenuate the off-band ICI to be 10X smaller than desired signal. The residual off-band ICI could induce glitch at the output and thus current-mode Schmitt triggers are necessary in this system. The hysteresis window of the current-mode Schmitt trigger is adjustable by tuning the reference current ( $I_{ref}$ ) shown in Fig. 5. Not only the Schmitt trigger, but also the DAC, I/O buffers and filter, are constructed by current mirrors, which can ensure the current-mode linearity even with very small bias current.

The entire 5-band QPSK transceiver is designed to constantly draw 6 mA (2.4 mA for TX and 3.6 mA for RX) out of a 0.9-V supply. Note that the CML divider is not included in the calculation of power consumption because it can be shared by multiple transceivers in the FDM memory interface. The constancy of current drawing induces little supply bouncing, allowing 4 transceivers to share one pair of VDD/VSS pins even with 1-nH bonding wires on each pin.



Fig. 8. Top view of the test board with separate TX/RX connected with a 5-cm FR-4 differential trace.



Fig. 9. (a) Demodulated 400-Mb/s  $2^{31}$ -1 PRBS eye diagrams of the 1-cm (upper) and the 5-cm (lower) test boards; (b) latency of 2.4 ns is found by subtracting out measured cable delay (upper) from measured total delay (lower, output inverted).



Fig. 10. The real-time flexible BER testing platform for the 5band QPSK transceiver.

# IV. CIRCUIT IMPLEMENTATION AND MEASUREMENT

Three test chips of the 5-band QSPK transceiver are implemented: one with both TX/RX, one with TX only, and one with RX only. The one with both TX/RX is to emulate a 3DIC packaging environment and thus on-chip interconnection is used with loading of 1 pF. To fit into TSV pitch of 40  $\mu$ m (one pair of 80  $\mu$ m), the 5-band QPSK transceiver is laid out with total area of only 80×100  $\mu$ m<sup>2</sup> (Fig. 6). With the test chip, the 5-band QPSK transceiver is proved to be able to operate up to 4 Gb/s, i.e. 400 Mb/s per QPSK I/Q channel, and the DQ and DQS remain aligned after demodulation, as shown in Fig. 7.

The separate TX/RX test chips are for demonstration with PCB interconnection. Test boards with 1-cm and 5-cm FR-4 differential traces (3-mil width and 3-mil spacing) are manufactured (Fig. 8) and the measured eye diagrams are slightly worse than the case of on-chip interconnection but still with sufficient eye opening of 1.8 ns (Fig. 9 (a)). The 2.4-ns latency of the 5-band QPSK transceiver is also found by subtracting out the measured cable delay of 1.6 ns from the measured total delay of 4 ns (Fig. 9(b)). Note that, during all measurements, the carriers of TX and RX are synchronized by external phase shifters.

For the 5-band QPSK transceiver, a real-time flexible BER testing platform was established as shown in Fig. 10. A customized FMC–rich (FPGA Mezzanine Card) FPGA board is implemented with Xilinx V7-2000T to generate real-time test packets of random data and to accumulate the error bit count from the received packets. Additionally, a Lattice XO3 board is used as the adaptor to SMA cables for the test boards. With the platform, the 10-bit pattern is transmitted to the test boards and the system BER is measured after days of accumulation to be less than  $10^{-12}$  at 2 Gb/s, where the data rate is limited by the 200-MHz I/O speed of the Lattice XO3 board.

## V. APPLICATIONS FOR 2D/2.5D/3DIC MEMORY I/O INTERFACE

In summary, we have proposed an innovative selfequalized and skewless FDM memory interface and realized a 4-Gb/s 5-band QPSK transceiver in 40 nm Using 80-mV<sub>pp</sub> differential current-mode CMOS. signaling, the transceiver steadily consumes 5.4 mW (2.16 mW for TX and 3.24 mW for RX) and every four transceivers can share one pair of VDD/VSS pins each with a 1-nH bonding wire. With total area of only 80×100  $\mu$ m<sup>2</sup>, the 5-band QPSK transceiver is compatible with various packaging technologies from high-end TSV 3DIC to cost-efficient wire bonding and has been tested with TSV loading of 1 pF and with up-to-5-cm FR-4 differential traces. Also, a real-time flexible BER testing platform is established and the measured BER is less than 10<sup>-12</sup>. Compared with the existing DDR4 transmitter in 22 nm CMOS [1], our transceiver offers better energy efficiency. Compared with the Wide-I/O technology [2-3], our transceiver requires less number of I/O pins (80 versus 704) for the same target bandwidth (12.8 GB/s). Also,

| TABLE I                                    |
|--------------------------------------------|
| SUMMARY OF PERFORMANCE AND COMPARISON WITH |
| PRIOR ARTS                                 |

|                                    | This work               | DDR4 [1]                | Wide IO [2]              | Wide IO [3]              |
|------------------------------------|-------------------------|-------------------------|--------------------------|--------------------------|
| Signalling                         | differential<br>current | single-ended<br>voltage | single-ended<br>voltage  | single-ended<br>voltage  |
| Technology                         | 40nm CMOS               | 22nm CMOS               | 50nm CMOS                | 90nm CMOS                |
| TX energy<br>efficiency            | 0.54 pJ/bit<br>@ 4Gb/s  | 2.5 pJ/bit<br>@ 3.2Gb/s | 0.78 pJ/bit<br>@ 0.2Gb/s | 0.56 pJ/bit<br>@ 0.2Gb/s |
| # of signal pins<br>for 12.8 GB/s  | 64                      | 40*                     | 576*                     | 576*                     |
| # of VDD/VSS pins<br>for 12.8 GB/s | 16                      | 40*                     | 128*                     | 128*                     |
| Packaging<br>compatibility         | 2D/2.5D/3DIC            | 2.5D                    | TSV 3DIC                 | TSV 3DIC                 |

\*Estimation is based on JEDEC standards.

since the 5-band QPSK transceiver can be used with wire bonding (2D), silicon interposer (2.5D) or TSV 3DIC, the FDM memory interface is versatile for cost-effective manufacturing.

#### ACKNOWLEDGEMENT

This material is based on research sponsored by Air Force Research Laboratory (AFRL) and the Defense Advanced Research Agency (DARPA) under agreement number FA8650-15-1-7519. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory (AFRL) and the Defense Advanced Research Agency (DARPA) or the U.S. Government.

### REFERENCES

- T. C. Hsueh, et al.: 'A 25.6Gb/s differential and DDR4/GDDR5 dual-mode transmitter with digital clock calibration in 22nm CMOS', ISSCC Dig. Tech. Papers, pp. 444-445, Feb. 2014.
- [2] J.-S. Kim, et al.: 'A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With 4 × 128 I/Os Using TSV Based Stacking', IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 107-116, 2012.
- [3] S. Takaya, et al.: 'A 100GB/s wide I/O with 4096b TSVs through an active silicon interposer with in-place waveform capturing', ISSCC Dig. Tech. Papers, pp. 434-435, Feb. 2013.