

## 7.4 An 8GB/s Quad-Skew-Cancelling Parallel Transceiver in 90nm CMOS for High-Speed DRAM Interface

Young-Sik Kim<sup>1</sup>, Seon-Kyoo Lee<sup>1</sup>, Seung-Jun Bae<sup>2</sup>, Young-Soo Sohn<sup>2</sup>, Jung-Bae Lee<sup>2</sup>, Joo Sun Choi<sup>2</sup>, Hong-June Park<sup>1</sup>, Jae-Yoon Sim<sup>1</sup>

<sup>1</sup>Pohang University of Science and Technology, Pohang, Korea

<sup>2</sup>Samsung Electronics, Hwasung, Korea

In high-speed wireline communication, full-rate clocking for chip-to-chip interface has been widely adopted since it eliminates clock-induced deterministic jitter. Design with standard digital CMOS technologies, however, often limits the maximum frequency of circuit operation. The increase in power and circuit complexity in full-rate clocking makes the problem even worse in the design of a parallel transceiver whose clock tree travels through long interconnects. As an alternative to the full-rate clocking, frequency generation with a multiphase PLL has been also considered to relax the tight requirements of operating frequency of oscillator and flip-flops. DRAM interface, as a representative of high-speed parallel links, has adopted quadruple data rate (QDR) schemes for high-speed graphic applications [1-2]. However, as the data rate of DRAM interface increases up to multi-Gb/s range, skew in quadrature clock phases presents one of the most serious performance degradation factors.

Figure 7.4.1 shows the effect of a typical amount of quad-skew of  $\pm 10$ ps in the transmitted eye. The effective deterministic jitter of 20ps in the transmitted eye also causes an additional 20ps jitter in the recovered CDR clock since the edge clocks for phase detection can be aligned at both extreme cases of the data edges, resulting in total degradation of decision window of 40ps. Likewise, a random skew of  $\pm 10$ ps in edge and data clocks in a quarter-rate CDR also reduces the decision window by 40ps. Therefore, random skews of  $\pm 10$ ps in TX and RX causes the worst-case degradation of 80ps in decision window. In addition, the skewed smaller eye experiences even more inter-symbol-interference causing a further increase in the deterministic jitter. There have been several approaches for skew compensation in TX and RX [3-4]. But, the previously reported quad-error detection schemes are based on analog circuits and sensitive to the mismatch in input duty cycle [3]. In addition, skew in input multiphase clocks is not considered in the individual tracking of data edges [4]. This paper presents an 8Gb/s parallel transceiver for memory interface with digitally controlled quad-deskew. With an individual data-edge-tracking CDR used both in TX and RX, our transceiver minimizes the effect of error in skew detection circuit.

Figure 7.4.2 shows the circuit diagram of the transceiver, which consists of eight TRX macros and a ring-oscillator-based PLL for global 4-phase clock generation. The TX compensates the skew by tuning the rising edges of serializer clocks driven by a quad-skew detector. For low-power transmission, a voltage-mode output driver is adopted with a 200mV-swing. When TRX macro is used as TX, the RX block is initially turned on and used for transmitter quad-deskew. The RX block includes a quarter-rate CDR and a continuous-time linear equalizer (CTLE). While the output driver transmits data, the receiver monitors the transmitted data through the CTLE. In the conventional quarter-rate CDR with a binary PD, the eight-phase clocks cannot be controlled individually. However, our quarter-rate CDR is designed for the edge clocks to track all the four data edges with individual phase control. Using the CDR, the four edge clocks are eventually aligned with all the edges in the data eyes. Then, the extracted phases are applied to the quad-skew detector (QSD). Since this scheme directly tracks the four edges of data eyes, all the mismatches and signal distortions by delay paths, serializer, output driver, and interconnects are compensated in the deskew process. This CDR-based deskew consumes negligible power since the receiver block is turned off after the initial quad-deskew period. When TRX macro is used as a receiver, TX part is turned off.

Figure 7.4.3 shows the circuit diagram of the quad-skew detector. It receives the four edge clocks and generates up/dn bits to update each edge of the serializer clocks. A 4-to-2 MUX selects one of the four  $\frac{1}{2}\pi$ -outphased pairs one-by-one. The  $\frac{1}{2}\pi$ -delayed phase goes through an additional reference delay which eventually stores the average of  $\frac{1}{2}\pi$ -delays, corresponding to 1UI. Then, the two edges are compared by a PD after the time amplification. The time amplifier [5] is used to reduce the effect of metastability in PD operation. The output of PD is

1-to-4 demultiplexed to update four serializer clock phases. The PD output is also used to update the reference delay, so that it converges to the average of  $\frac{1}{2}\pi$ -delays. The mismatches among the transmission switches in 4-to-2 MUX are the only portion that this deskew scheme is not able to compensate. Since the other circuits are commonly used, the offsets and gain errors in circuit operation are all included in the compensation process.

The CDR is performed with a DLL-based operation which receives four seed phases from the global PLL to generate four edge clocks. As in Fig. 7.4.4, the data clocks (CD0, CD90, CD180, and CD270) are generated by simple inverter-based phase averaging [6] with edge clocks (CEO, CE90, CE180, and CD270). Therefore data clocks automatically track the center of the data eye, resulting in maximization of the decision window under any skewed data eyes. With skews of  $\pm 15$ ps in seed clock phases, simulated jitter obtained by overlapping four data clocks increases to 47.3ps with the conventional quarter-rate CDR scheme while our CDR shows a jitter of 10.6ps, which is almost equal to the jitter without clock skew. This minimum jitter is limited by the finite resolution of the digitally controlled phase generation in the CDR. This scheme aligns only the rising edges; however, the jitter in falling edges is also somewhat reduced since the individual tracking reduces the fluctuation at PD output. When TRX macro is used as RX, to reduce power consumption, CDRs are turned off for most of time and periodically turned on as in the pulsed CDR scheme [7].

The transceiver is implemented in a standard digital 90nm CMOS. For verification, two chips are assembled on a PCB with one set to TX and RX, respectively. For transmission line, 2-inch-long microstrip lines on FR4 PCB are used, which is reasonable in targeted short-range DRAM interface. With a BER of less than  $10^{-12}$  for PRBS 2<sup>7</sup>-1, the maximum data rate of transceiver is measured to be 8Gb/s, verifying operation of both TX and RX. Figure 7.4.5 shows the measured eye-diagrams on receiver side at 8Gb/s. The graph summarizes the results of TX quad-deskew obtained from three chips. The random variation of -20 to +15ps before the deskew is reduced to  $\pm 5$ ps, which is limited by CDR tracking resolution. Figure 7.4.6 summarizes the performance. Total power consumption is 257mW including PLL, resulting in 4.02mW/Gb/s per channel. Figure 7.4.7 shows the micrograph of the fabricated chip. This work presents a quad-deskew transceiver compensating all the major mismatches of interconnects and circuits, providing a suitable architecture for high-speed QDR-based DRAM interface.

### Acknowledgements:

This work was partly supported by IT R&D program of MKE/KEIT [10039159], NRF under grant No. 2011-0010685, IDEC of Korea, and a scholarship from Samsung Electronics.

### Reference:

- [1] S. J. Bae, et al, "A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable Clock-Tracking BW," *ISSCC Dig. Tech. Papers*, pp. 498-499, Feb. 2011.
- [2] R. Kho, et al, "A 75nm 7Gb/s/pin 1Gb GDDR5 Graphics Memory Device With Bandwidth Improvement Techniques," *IEEE J. Solid State Circuits*, vol. 45, no.1, pp. 120-133, JANUARY. 2010.
- [3] N. Nguyen, et al, "A 16-Gb/s Differential I/O Cell with 380fs RJ in an Emulated 40nm DRAM Process," *SOVC*, pp.128-129, Jun. 2009.
- [4] R. Inti, et al, "A 0.5-to-2.5Gb/s Reference-less Half-Rate Digital CDR with Unlimited Frequency Acquisition Range and Improved Input Duty-Cycle Error Tolerance," *ISSCC Dig. Tech. Papers*, pp. 438-439, Feb. 2011.
- [5] S. K. Lee, et al, "A 1GHz ADPLL with a 1.25ps Minimum-Resolution Sub-Exponent TDC in 0.18um CMOS," *ISSCC Dig. Tech. Papers*, pp. 482-483, Feb. 2010.
- [6] K. Yamaguchi, et al, "A 2.5-GHz Four-Phase Clock Generator With Scalable No-Feedback-Loop Architecture," *IEEE J. Solid State Circuits*, vol. 36, no.11, pp. 1666-1672, NOVEMBER. 2001.
- [7] R. Reutemann, et al, "A 4.5mW/Gb/s 6.4Gb/s 22+1-Lane Source-Synchronous Link RX Core with Optional Cleanup PLL in 65nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 160-161, Feb. 2010.



Figure 7.4.1: Effect of quad-skew in TX and RX.



Figure 7.4.2: Transceiver architecture.



Figure 7.4.3: Circuit diagram of quad-skew detector.



Figure 7.4.4: Clock generation scheme for CDR.



Figure 7.4.5: Measured TX quad-deskew.

|                   |                      |             |
|-------------------|----------------------|-------------|
| Technology        | 90nm CMOS            |             |
| Supply Voltage    | 1.25V                |             |
| Data rate         | 8Gb/s                |             |
| Quad-deskew range | -30ps ~ +30ps        |             |
| BER               | < $10^{-12}$         |             |
| Area / Lane       | 0.225mm <sup>2</sup> |             |
| Power / Lane      | PLL & Global clocks  | 0.77mW/Gb/s |
|                   | Serializing block    | 0.51mW/Gb/s |
|                   | Output buffer        | 0.38mW/Gb/s |
| RX                | PLL & Global clocks  | 0.77mW/Gb/s |
|                   | Equalizer            | 0.29mW/Gb/s |
|                   | CDR                  | 1.30mW/Gb/s |
|                   | TX + RX              | 4.02mW/Gb/s |

Figure 7.4.6: Performance summary.



Figure 7.4.7: Chip photo.