# A 3x3.8Gb/s Four-Wire High Speed I/O Link Based On CDMA-Like Crosstalk Cancellation

Tzu-Chien Hsueh, Pin-En Su, Sudhakar Pamarti

University of California, Los Angeles

*Abstract*-A signaling technique realizing three differential links over four wires is proposed. The technique uses Code Division Multiple Access (CDMA) principles to cancel the crosstalk between the links. A 3x3.8 Gb/s, 10<sup>-12</sup> BER, prototype IC built in 90nm CMOS is described and measurement results are presented.

### I. INTRODUCTION

The use of differential-mode (DM) signaling in modern high speed serial data communication systems has enabled multiple Gb/s data rates but at the expense of poor pin utilization (0.5 link/pin). Since the number of package pins available for I/O purposes is often limited, meeting the everincreasing I/O bandwidth requirements poses a tremendous challenge. Raising the signaling frequency of the DM links can increase the I/O data bandwidth, but the harsh signaling environment of FR4 printed circuit boards poses well documented signal integrity and power consumption challenges.

Common-mode (CM) signaling has been suggested as a means to improve pin utilization and aggregate data bandwidth without increasing the signaling frequency [1]-[3]. In [1], CM signaling was employed to realize a low speed backchannel to assist adaptation of the primary DM backplane link. Both [2] and [3] suggested differential signaling using the common modes of a pair of DM links resulting in a 50% higher pin utilization (0.75 links/pin). However, the aggregate data bandwidth of systems employing CM signaling remains low owing to inevitable interference between the DM and CM signals. Specifically, interference or signal crosstalk is caused by on-die and PCB trace mismatches, finite output impedance of the TX drivers and RX preamplifiers, differences in the CM and DM channels and signal flight times, and electromagnetic coupling between the differential and common modes of signal propagation [3]. The signal crosstalk degrades bit error rate (BER) on both the DM and CM links particularly at high signaling rates. Traditional signal integrity solutions e.g., better PCB trace routing, shielding, careful circuit layout, and sophisticated equalization etc. have limited utility considering the wide variety of mechanisms causing signal crosstalk.

This paper proposes a technique based on the Code Division Multiple Access (CDMA) principle [4] to suppress the *signal crosstalk* in multi-GHz multi-conductor signaling systems that employ simultaneous CM and DM signaling. The CDMA principle is known to solve a similar problem – eliminating the interference among transmitted signals from various radios. Whereas the interfering signals in a CDMA system are the outputs of different radios, in the proposed technique, they are the DM and CM signals. While the radio signals interfere because they share the communication

medium, the DM and CM signals interfere because of the aforementioned mechanisms such as PCB trace mismatches etc. Even though CDMA is known to suppress the crosstalk from interfering signals, it increases the signal bandwidth many-fold; it also requires precise analog-to-digital converters followed by banks of digital correlation blocks (rake receivers). These issues make its direct application to GHz high speed I/O links impractical and/or not very useful. As described later in the paper, the proposed technique employs a simple analog variant of the CDMA technique suitable for crosstalk cancellation in GHz high speed I/O. A fabricated prototype 90nm CMOS IC that employs the technique achieved 3x3.8 Gb/s operation at BER  $< 10^{-12}$ . Measurements also indicated that the proposed technique improves the BER by 50x by canceling the signal crosstalk. Note that the CDMA technique was also employed in [5] to realize dynamically reconfigurable I/O but not for signal crosstalk cancellation. Furthermore, the digital correlation based implementation confines it to low operating speeds. The rest of the paper describes the CDMA-like crosstalk cancellation technique in Section II, the design of the prototype IC in Section III, and discusses measurement results in Section IV.





Fig. 1. Simplified block diagram of the proposed CDMA-like four-wire signaling technique.

Fig. 1 shows a simplified block diagram of the proposed CDMA-like signaling technique. The gray blocks represent the differences from prior art in CM signaling [3]. As in prior art, three data links are realized using four wires: two DM links, and one CM link composed of two single-ended CM signals operated in differential manner. Two common mode extraction circuits (CE) compute the respective CM signals of the two pairs of conductors. Unlike prior art [3], in the

transmitter, the CM signal, C(t), is multiplied by a binary signal S(t) which alternates between 1 and -1 spending equal time within each bit period at the two levels, as shown in the figure. In this paper, S(t) is referred to as the spreading signal, and the multiplication process is referred to as spreading, in accordance with CDMA terminology. S(t) is realized by simply using a 50% duty cycle, bit-rate clock signal. In the receiver, the difference of the two extracted CM signals is multiplied (de-spread) by an appropriately delayed version of the spreading signal, S(t), followed by an integrate-and-dump circuit. The clock synchronization circuit that achieves the "appropriate delay" is not shown in the figure for the sake of simplicity. Note that the integration is carried out over one bit period, and then the integrator is reset just before the next integration period begins. The subsequent decision circuit samples the integrator output at the end of each integration period. Section III describes a simple way of avoiding the instantaneous reset operation using dual integrators and a twoway interleaved operation. Similar integrate-and-dump circuits are included in the two DM links also as shown in the Fig 1.

The operation of the proposed technique is illustrated in Fig. 2(a) through Fig. 2(c) using a signal processing model and time-domain waveforms respectively. Note that the time variable "t" is dropped in this figure for the sake of simplicity. The solid arrows in Fig. 2(a) represent the main DM and CM signal paths from the transmitter through the channel (shaded box) to the receiver. The dotted arrows represent signal crosstalk i.e. interference between the DM and CM signals. For example, suppose that the top two conductors in Fig. 1 are mismatched in length. As a result, the input to the despreading block in the CM receiver,  $y_{C}(t)$ , which should be a filtered (by the channel) version of the spread CM signal, C(t)·S(t), is now corrupted by a scaled, filtered version of the DM signal, D<sub>1</sub>(t). This DM-to-CM crosstalk is denoted by  $[D_1(t)]_X$ . As mentioned in Section I, a wide variety of error sources such as mismatches in the TX circuits, RX preamplifier circuits, differences in the effective channels, differences in signals' time of flight, and electromagnetic mode coupling between the DM and CM channels, all cause similar signal crosstalk. The dotted arrows represent a total summation of all such crosstalk.

Momentarily, assume that the DM and CM signals are perfectly equalized, and the equalization issues are discussed in detail later in this section. Consider the CM receiver. In the absence of crosstalk, the de-spreading operation simply undoes the spreading operation performed in the TX (since S(t)·S(t) = 1) and the integrator input is  $z_C(t) = C(t)$ . The CM data is recovered by sampling the integrator output at the end of each integration period:

$$C_{no\_crosstalk}[n] = \int_{(n-l)T_B}^{nT_B} z_C(t) \cdot dt = \int_{(n-l)T_B}^{nT_B} C(t) \cdot dt$$
(1)

The integrator input and output waveforms are shown in Fig. 2(b); the sampling instants are also marked showing proper CM data recovery. In the presence of crosstalk, the integrator's input and output are corrupted by DM-to-CM

crosstalk, but only the crosstalk is multiplied by S(t). Specifically,  $z_c=C(t)+[D_1(t)+D_2(t)]_X \cdot S(t)$ , and

$$C_{crosstalk}[n] = C_{no_{crosstalk}}[n] + \int_{(n-1)T_{B}}^{nT_{B}} [D_{1}(t) + D_{2}(t)]_{X} \cdot S(t) \cdot dt$$
(2)

Note that multiplication with S(t) makes  $[D_1(t)]_X \cdot S(t)$  and  $[D_2(t)]_X \cdot S(t)$  very wide-band signals with negligible power within the bandwidth of the CM signal, C(t). Therefore, the integration, which is essentially low pass filtering, effectively cancels the contribution of the crosstalk term in Eq. (2). The crosstalk cancellation is also illustrated in the time-domain by the bottom curves in Fig. 2(c). Because of multiplication with S(t), the  $[D_1(t)]_X \cdot S(t)$  term has almost equal positive and negative areas within the integration period. So, its contribution to the integrator output almost vanishes at the sampling instant thereby resulting in crosstalk cancellation.



Fig. 2. (a) Simplified model of the proposed CDMA-like signaling technique.(b) CM signal wavefroms without crosstalk. (c) DM-to-CM crosstalk waveforms illustrating crosstalk cancellation.

Similarly, CM-to-DM crosstalk is canceled in the two DM receivers as well. The main difference is that the crosstalk terms,  $[C(t)S(t)]_X$ , are already wideband signals with negligible power within the bandwidths of  $D_1(t)$  and  $D_2(t)$ :

$$z_{I}(t) = D_{I}(t) + [C(t) \cdot S(t)]_{X}, z_{2}(t) = D_{2}(t) + [C(t) \cdot S(t)]_{X},$$
(3)

and do not need the multiplication by S(t) in the receiver.

Note that, to achieve multi-GHz operation, both the DM signals and the CM signals require equalization. The equalization requirements of the DM links are the same as in any conventional DM signaling system of comparable individual link data rate,  $1/T_B$ . The spreading operation does

double the bandwidth of the CM signal on the wires, resulting in additional signal loss and inter-symbol interference. However, the equalizer circuits (e.g., TX pre-emphasis taps) do not correspondingly run at twice the data rate  $(2/T_{\rm B})$ . In the proposed technique, they still run at the data rate,  $1/T_{\rm B}$ , but equalize only the effective pass-band channel seen by the C(t)signal. The effective channel is around  $1/T_B$  since spreading operation translates the spectrum of C(t) such that it is centered at 1/T<sub>B</sub>. Note that similar pass-band equalization was employed in the context of analog multi-tone I/O techniques [6]. Since the equalizer data rate doesn't change, its power consumption is comparable to base band equalizers of the same data rate. Even though the equalization cost doesn't increase, the effective loss seen by C(t). S(t) is higher because it is centered at 1/T<sub>B</sub>, and not DC. The proposed technique compensates for the extra CM signal loss by increasing the CM signal swing by 50% relative to the DM signal swing. It is instructive to compare the proposed technique with a pair of purely differential signaling links running at 1.5/T<sub>B</sub> each so that the aggregate data bandwidth is the same as in the proposed technique. The proposed technique will have higher signal loss, but lower clock frequency and equalizer data rate. The higher loss is also because both DM and CM signals have to be accommodated within the available swing on each wire. The RX preamplifiers spend additional power to compensate for the loss. On the other hand, the lower equalizer speed and clock frequency  $(1/T_B \text{ as opposed to } 1.5/T_B)$  result in power savings.

## III. IMPLEMENTATION

A 3x3.8 Gb/s prototype IC was fabricated in IBM 90nm CMOS logic technology to demonstrate the proposed technique. The IC includes TX, RX, clock generation, preemphasis equalization, swing and impedance control, and PRBS pattern generation and verification circuitry for all three links. Fig. 3 shows the circuit block diagram of the CM transceiver. Except for the spreading, de-spreading, and CM extractor circuits, the DM links have identical circuit architecture, and so they are not described here.

In the TX, transmitted data is generated by multiplexing two half-rate PRBS generators and then spread by the full rate clock. The precise spreading action is achieved by multiplexing the data and its inverse using a 50% duty cycle clock, TXCLK, as shown in Fig. 3. Combining the DM and CM signals onto the wires is done by simple current-mode drivers [3]. All links employ 4-tap TX pre-emphasis equalization operating at 3.8Gb/s with pre-determined coefficients. Replica based swing and impedance control circuits are employed on all links.

In the RX, the CM extraction circuits are implemented by a source-coupled transistor pair with the parallel input devices (the drains are tied together) [2]. The de-spreading, integrateand-dump, and decision circuits are implemented in duplicate and are used in time-interleaved fashion: one copy run by the half-rate clock, RXCLK, and the other by its inverse. When RXCLK is HIGH, one copy de-spreads and integrates its input while the other holds its output for the decision circuit. The roles are reversed when RXCLK is LOW. The de-spreading and integrate-and-dump circuits and their timing diagram are shown in Fig. 4. The de-spreading is done using a Gilbert Cell, and parasitic capacitors are used for high speed integration [7].

Clock generation is done using integer-*N* phase locked loop (PLL), phase interpolator (PI), and duty cycle correction (DCC) circuits. The DM and CM links share the PLLs. The TX PIs are used only for testing and debugging purposes. The RX PIs are used to manually adjust the de-spreading clock skew such that its rising edges are aligned with the start of a received CM bit interval as shown in Fig. 2(b). The DCC circuits generate a nominally 50% duty cycle 3.8 GHz clock for spreading and de-spreading functions.



Fig. 3. Block diagram of the CM transmitter and receiver.



Fig. 4 Simplified schematics and timing diagram for the de-spreading and integrate-and-dump circuits.

#### IV. MEASUREMENT RESULTS

The IC achieved an aggregate data rate of 11.4Gb/s (3.8Gb/s/link) with BER <  $10^{-12}$  for two different test channels shown in Fig. 5(a) and (b): (A) 6" FR4 traces, and (B) 2" FR4 traces followed by 60cm coaxial cables. The total channel loss is about 10dB at 3.8GHz in either case. The whole IC

(including pads) occupies 2.2mmx2.0mm, and consumes 27mW/Gb/s from 1.0V/1.2V supplies. A die photo and a performance summary are shown in Fig. 5(c) and Table I.



Fig. 5. (a) & (b) Test setups for two different channels. (c) Die photo.

| Technology                                                                                                                        | IBM 90nm CMOS      |
|-----------------------------------------------------------------------------------------------------------------------------------|--------------------|
| Sumple Vellence                                                                                                                   | 4.00/ 4.00/        |
| supply voltages                                                                                                                   | 1.2V, 1.0V         |
| Aggregate Data Rate on 4 Wires                                                                                                    | 11.4Gb/s           |
| TX Equalization                                                                                                                   | 4-tap pre-emphasis |
| RX Equalization                                                                                                                   | N/A                |
| Equalization Baud Rate                                                                                                            | 3.8GHz             |
| TX Driver                                                                                                                         | Current Mode       |
| TX & RX Clock Jitter                                                                                                              | 3ps r.m.s. @3.8GHz |
| RX Phase-Interpolator Resolution                                                                                                  | 4ps @3.8GHz        |
| BER                                                                                                                               | <10 <sup>-12</sup> |
| Power Efficiency (TX+RX: Front-end)                                                                                               | 14mW/Gb/s          |
| Power Efficiency (TX+RX: Front-end, Data Path, Clock Generator,<br>Clock Distribution, Swing Control, Impedance Control, DC Bias) | 27mW/Gb/s          |

TABLE I Measurements Summary

Bit error rate measurements were performed to demonstrate the crosstalk cancellation. The measured BER for the DM and CM links over the channel B for various data rates with and without the CDMA-like technique enabled is plotted in Fig 6. At least 50x improvement in the BER over conventional four-wire signaling is observed. Note that for lower data rate, the TX signal swings were intentionally reduced to raise BER thereby saving measurement time. Tests were also performed to confirm that the proposed technique is not very sensitive to timing misalignment between received crosstalk and the de-spreading signal. To verify, the TX PIs on the DM links were adjusted over one bit period to try to simulate worst case of timing misalignment. At least 10x BER reduction is still observed even at worst case misalignment with the technique enabled.

Fig. 7(a) and 7(b) show the measured DM and CM eye diagrams at the end of the coaxial cable. The CM eye diagram is shown here only to illustrate the spreading operation. The two half-sized eyes within one bit interval, seen in Fig. 7(b), are as expected. As illustrated in Fig. 7(c) for example CM data, the spread CM signal at the end of the channel folds onto one bit interval to result in two half-sized eyes in an eye diagram measurement. More importantly, conclusions about the performance of the CM pre-emphasis cannot be readily drawn from the measured CM eye. This is because the CM pre-emphasis only equalizes an effective pass-band channel

and does not try to "open" the eye of the 2x3.8Gb/s spread C(t)·S(t) signal.



Fig. 6. Measured BER of the DM and CM links with and without the CDMAlike technique enabled.



Fig. 7. Measured eye diagram of (a) DM, (b) CM signals at the end of the channel. (c) Illustration of the CM signal eye formation.

#### ACKNOWLEDGMENTS

The authors acknowledge Prof. M.C. F. Chang for his productive discussion, R. Tam for his test support, E.Y. Sung and J. Blakely for their help in PCB design. The project was funded in part by NSF (ECS-0621733) and SRC (2008-HJ-1786).

#### REFERENCES

- A. Ho et al, "Common-mode backchannel signaling system for differential high-speed links," *IEEE 2004 Symposium on VLSI Circuits, Digest of Technical Papers*, pp. 352-355.
- [2] T. Gabara, "Phantom-mode signaling in VLSI system," in Proc. IEEE Conf. Advanced Research in VLSI, Mar. 2001, pp. 88–100.
- [3] S.-W. Choi et al, "A three-data differential signaling over four conductors with pre-emphasis and equalization: a CMOS current mode implementation," *IEEE J. Solid-State Circuits*, vol. 41, no.3, pp. 633-641, Mar. 2006.
- [4] S. Haykin, Communication Systems, 4th ed., New York: John Wiley & Sons, 2001
- [5] Z. Xu et al, "A 2.7 Gb/s CDMA-interconnect transceiver chip set with multi-level signal data recovery for re-configurable VLSI systems," *ISSCC Digest of Tech. Papers*, pp.158-159, Feb. 2003.
- [6] A. Amirkhany et al, "A 24Gb/s software programmable analog multi-tone transmitter," *IEEE J. Solid-State Circuits*, vol. 43, no.4, pp. 999-1009, April 2008
- [7] S. Sidiropoulos et al, "A 700-Mb/s/pin CMOS signaling interface using current integrating receivers," *IEEE J. Solid-State Circuits*, vol. 32, no.50, pp. 681-690, May 1997.