# A 56 Gb/s PAM4 Receiver with Low-Overhead Threshold and Edge-Based DFE FIR and IIR-Tap Adaptation in 65nm CMOS

Ashkan Roshan-Zamir<sup>1</sup>, Takayuki Iwai<sup>1</sup>, Yang-Hang Fan<sup>1</sup>, Ankur Kumar<sup>1</sup>, Hae-Woong Yang<sup>1</sup>, Lee Sledjeski<sup>2</sup>, John Hamilton<sup>2</sup>, Soumya Chandramouli<sup>2</sup>, Arlo Aude<sup>2</sup>, and Samuel Palermo<sup>1</sup>

<sup>1</sup>Analog and Mixed-Signal Center, Texas A&M University, College Station, TX, USA

<sup>2</sup>Texas Instruments Corporation, Duluth, GA, USA ashkanroshan@tamu.edu, spalermo@tamu.edu

*Abstract*—A PAM4 quarter-rate receiver employs a singlestage CTLE and a DFE with 1 FIR and 1 IIR-taps to efficiently compensate for channel loss. In addition to the per-slice main 3 data samplers, an error sampler is utilized for background threshold control and an edge-based sampler performs both PLL-based CDR phase detection and generates information for background DFE tap adaptation. Fabricated in GP 65nm CMOS, the 56Gb/s receiver achieves 4.63mW/Gb/s and compensates for up to 20.8dB loss when operated with a 2-tap FFE transmitter.

## I. INTRODUCTION

High-speed I/O standards are emerging that employ PAM4 modulation for moderate channel losses near 20dB. While ADC-based receivers are well suited for PAM4 signaling [1], they generally consume high power. This motivates a power-efficient mixed-signal receiver frontend solution for these applications. However, several challenges are faced in a mixed-signal PAM4 receiver design. The receiver is required to make multi-level decisions and implement equalizers that can cancel longtail multi-level ISI. This can lead to DFEs that have large tap counts when implemented with FIR feedback filters [2]. An alternative is to utilize DFEs which combine FIR and IIR feedback filters [3], [4]. However, robust operation requires setting of the sampler thresholds and DFE taps in an adaptive manner. This should be done with minimal hardware overhead and offer compatibility with clock recovery architectures that support PAM4 modulation.

This work presents a mixed-signal PAM-4 quarter-rate receiver that employs a single-stage CTLE and a 1 FIR and 1 IIR-tap DFE to efficiently cancel long-tail ISI. A bangbang phase detector (BBPD) PLL-based CDR allows for clock recovery with only one per-slice edge sampler and utilizes static phase interpolators for eight-phase data and edge clock generation. Utilizing the edge samplers, the DFE adaptation scheme of [3] is extended for PAM4 operation with independent per-slice tap values for mismatch robustness. Background sampler threshold adaptation is also achieved with only an additional single per-slice sampler that periodically scans the top and bottom PAM4 eyes.



Fig. 1. PAM4 receiver with threshold and DFE tap adaptation.

# II. RECEIVER ARCHITECTURE

Fig. 1 shows the PAM4 receiver architecture. The CTLE output is connected to the quarter-rate DFE with slices that consists of 5 samplers. Three data samplers implement a 2bit flash ADC for PAM4 symbol detection, 1 error sampler periodically scans the top and bottom eves for threshold tuning, and 1 edge sampler provides information for CDR phase locking and DFE tap adaptation. The CTLE's primary role is to cancel the pre-cursor ISI in combination with a 2-tap TX FFE, while the DFE's FIR-tap targets the first post-cursor and IIR-tap compensates for the long-tail post-cursor terms. The outputs of the 4 receiver slices are first deserialized to 1/8 symbol rate, with the data and edge samples driving the CDR's PAM4 BBPD. At this point the data samples are also probed out for external BER testing. All the data, error, and edge samples are then further deserialized to 1/32 symbol rate for processing by the DFE tap and threshold adaptation logic.

A detailed block diagram of the equalizer data-path is shown in Fig. 2. The single-stage CTLE's tunable degeneration resistor and capacitor provides 6dB of peaking and gain control. After the CTLE, the DFE IIR-tap summation is done in a per-slice CML summer that precedes the 5 samplers. The DFE FIR-tap summation is done within the CML samplers in a thermometer code manner to minimize the direct feedback delay and meet the 1-UI stringent timing [4], as shown in Fig. 3(a).



Fig. 2. Equalizer data-path.

Independent DFE FIR-tap weights are used to set the tail currents on a per-slice basis to compensate for mismatch between the receiver slices. Sampler threshold and offset control is also implemented using an additional differential pair controlled through the DAC-generated  $V_{off}/V_{th}$  voltage. The single IIR MUX of Fig. 3(b) combines the thermometer quarter-rate data from all the slices and serializes it to full-rate using a current-mode architecture. A tunable RC load filter implements the IIR filter, with the time constant controlled through tunable R and C values and the amplitude controlled by the tunable tail current.

Fig. 4 shows the PLL-based CDR block diagram. The BBPD receives the 1/8-rate data and edge samples and filters out all but the symmetric transitions to avoid asymmetric PAM4 transition-induced jitter. In order to reduce loop latency, the BBPD works with 8 parallel Early/Late signals controlling an 8-segment charge pump. This parallel charge pump drives the loop filter to produce the control voltage for a 14GHz LC oscillator. In addition to the primary resonator tank, oscillator phase noise is reduced with tanks also in the source of both cross-coupled transistor pairs. Quarter-rate clocks are generated by a CML divide-by-two and then converted to CMOS level. Static CMOS phase interpolators efficiently generate the 8 clock phases for the quarter-rate data and edge samplers. Perphase skew calibration is achieved with tunable delay buffers preceding the samplers.

# **III. DFE TAP AND THRESHOLD ADAPTATION**

The edge-based DFE tap background adaptation logic tables are shown in Fig. 5, which is modified from [3] to allow for PAM4 operation and independent per-slice DFE FIR-tap control. Similar to the BBPD logic, the DFE tap adaptation works with symmetric PAM4 data transitions in



Fig. 3. (a) CML sampler with DFE FIR-tap and threshold control. (b) DFE IIR-tap MUX and filter.



Fig. 4. PAM4 PLL-based CDR.

order to improve convergence. When a symmetric transition is detected, the correlation between the edge sample and the sign of the previous symbols determines the residual ISI polarity from the corresponding symbol. As the DFE FIR-tap cancels the first post-cursor, if the  $D_{-1}$  symbol polarity matches the edge sample ISI polarity, this implies that the tap value is too small and the FIR-tap counter is incremented and vice-versa. As PAM4 receivers require improved sensitivity, independent per-slice adaptation is implemented for the DFE FIR-taps to compensate for mismatch in the 4 receiver slices. The DFE IIR-tap amplitude is set in a similar manner utilizing the  $D_{-2}$ 



Fig. 5. PAM4 DFE FIR and IIR-tap adaptation logic tables.

polarity, as this IIR tap compensates for long-tail ISI after the first post cursor. IIR-tap time constant is set with the correlation from either  $D_{-3}$  or  $D_{-4}$  and the edge sample. The use of one common DFE IIR-tap mux allows for the adaptation of only a single set of IIR values.

Background sampler threshold adaptation is achieved with an additional error sampler to periodically estimate the top/bottom eye height and place the thresholds in the middle. An initial foreground calibration step is performed where all 20 samplers are set to zero offset/threshold by shorting the input to the common mode and adjusting the per-sampler Voff/Vth DAC codes. On a per-slice basis, the top sampler's threshold is then incremented up by 1LSB to come to the initial condition shown in Fig. 6. The initial coarse adaptation steps are based on uniform symbol statistics, with both the top sampler and error thresholds increased until a 25% one detection probability is achieved by the error sampler. Also, in parallel the bottom sampler is stepped at the same rate in an open-loop manner to improve convergence speed. State 1 shows that this implies the error sampler is residing at the bottom of the top eye and the top sampler is sampling only 1 threshold LSB inside the eye. Next, the polarity of the error sampler threshold is inverted and then fine-tuned to converge to the top of the bottom eye, with the bottom sampler adjusted by the fine steps (State 2). In order to not have to rely on uniform statistics, the process then transitions to monitoring the relative values of the error samplers and the bottom/top samplers to track the eye edges in States 3 and 4. Next, the data samplers' thresholds are fixed and the error sampler threshold is increased until discrepancy is detected between



Fig. 6. Background sampler threshold adaptation algorithm.

error and the top outputs, implying the error sampler has reached the top edge of the top eye (State 5). This is repeated to find the bottom of the bottom eye (State 6). Now the top and bottom eye heights are found and the bottom/top thresholds are placed in the middle when the process goes back to State 3 and 4 for monitoring of the top of the bottom eye and bottom of the top eye, respectively. The algorithm then periodically rotates between States 3-6.

#### **IV. EXPERIMENTAL RESULTS**

Fig. 7(a) shows the chip micrograph of the PAM4 receiver, which was fabricated in a GP 65nm CMOS process. The receiver occupies a total area of 0.51mm<sup>2</sup>.

Receiver BER measurements were performed using a channel which has 20.8 dB of loss at 14GHz (Fig. 7(b)). A PAM4 pattern generator with 1-main and 1-pre-cursor FFE



Fig. 7. (a) Chip micrograph and (b) measured channel response.



Fig. 8. (a) 56Gb/s eye-diagram before channel without equalization and (b) after channel with 2-tap pre-cursor FFE.



Fig. 9. (a) Measured DFE tap adaptation. (b) Measured sampler threshold adaptation. Note, edge sampler values are omitted and only error sampler #1 is shown for clarity.



Fig. 10. Measured 56Gb/s (a) timing and (b) voltage bathtub curves.

taps generates PRBS15 data. Fig. 8 (a) shows the transmitter PAM4 pre-channel eye diagram without any equalization with 600 mV<sub>ppd</sub> swing. Co-optimizing the 2-tap pre-cursor FFE with the receiver equalization results in a completely closed eye at the channel output (Fig. 8(b)).

An on-chip DAC monitors DFE tap coefficients and the sampler thresholds convergence. Fig. 9 shows that the DFE taps converge within 2µs and the initial threshold procedure completes within 16µs. A bypass clock input with phase shift capability is used to measure the combined MSB/LSB BER timing bathtub curves of Fig. 10(a), which shows 0.19UI of timing margin at BER=10<sup>-12</sup>. The voltage bathtub curves of Fig. 10(b) are measured with the CDR locked and by changing the threshold code for the top, middle, and bottom samplers from their ideal position. Table I summarizes the receiver performance and compares it with other PAM4 receivers operating near 56Gb/s. The receiver achieves a power efficiency of 4.63mW/Gb/s, which is superior to the ADC-based design of [1] and the mixedsignal front-end of [5] which utilizes a 2-stage CTLE and an additional TX FFE tap. Employing the DFE IIR-tap

| References              | [1]                                    | [2]                                    | [5]                 | This Work                            |
|-------------------------|----------------------------------------|----------------------------------------|---------------------|--------------------------------------|
| Technology              | CMOS 16nm<br>FinFET                    | CMOS 16nm<br>FinFET                    | 40nm CMOS           | 65nm CMOS                            |
| Data-Rate               | 56Gb/s                                 | 40-56Gb/s                              | 56Gb/s              | 56Gb/s                               |
| Data Format             | PAM4                                   | PAM4                                   | PAM4                | PAM4                                 |
| Equalization            | CTLE<br>ADC based DFE &<br>FFE         | CTLE<br>10-tap DFE                     | CTLE<br>3-tap DFE   | CTLE<br>1-tap FIR &<br>1-tap IIR DFE |
| Maximum<br>CTLE Peaking | 2-stage<br>14dB                        | 2-stage<br>N/A                         | 2-stage<br>9 dB     | 1-Stage<br>6 dB                      |
| Channel-Loss            | 31dB <sup>1</sup>                      | 10dB <sup>2</sup>                      | 24dB <sup>1</sup>   | 20.8dB <sup>2</sup>                  |
| Area                    | 2.8mm <sup>2</sup><br>(2 TX/RX)        | 0.364mm <sup>2</sup>                   | 1.26mm <sup>2</sup> | 0.51mm <sup>2</sup>                  |
| Supply                  | 0.9V/1.2V/1.8V<br>(digital/analog/aux) | 0.9V/1.2V/1.8V<br>(digital/analog/aux) | 1V                  | 1.2V                                 |
| Power<br>Consumption    | 370mW<br>(RX excl. DSP)                | 230mW                                  | 382mW               | 259mW                                |
| Power<br>Efficiency     | 6.6mW/Gb/s                             | 4.1mW/Gb/s                             | 6.82mW/Gb/s         | 4.63mW/Gb/s                          |

<sup>1</sup> Including 3-tap TX FFE equalization <sup>2</sup> Including 2-tap TX FFE equalization

allows for a reduction in the total tap count relative to [2], while also extending the maximum supported channel loss.

## V. CONCLUSION

This paper presented a PAM4 quarter-rate receiver which employs a single-stage CTLE and a DFE with 1 FIR and 1 IIR-taps and utilizes a PLL-based CDR. Edge samplers are utilized for both CDR phase detection and DFE tap adaptation with independent per-slice values for the required PAM4 sensitivity. Sampler threshold adaptation is also achieved with a single per-slice error sampler that periodically scans the top and bottom PAM4 eyes.

#### ACKNOWLEDGEMENT

This work was supported by the SRC (Task 1836.143).

### REFERENCES

- Y. Frans *et al.*, "A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm FinFET," *IEEE JSSC*, vol. 52, no. 4, pp. 1101-1110, April 2017.
- [2] J. Im et al., "A 40-to-56Gb/s PAM-4 Receiver with 10-Tap Direct Decision-Feedback Equalization in 16nm FinFET", *ISSCC Dig. Tech. papers*, pp. 114-115, Feb. 2017.
- [3] S. Shahramian *et al.*, "Edge-Based Adaptation for a 1 IIR + 1 Discrete-Time Tap DFE Converging in 5µs," *IEEE JSSC*, vol. 51, no. 12, pp. 3192-3203, Dec. 2016
- [4] A. Roshan-Zamir *et al.*, "A Reconfigurable 16/32 Gb/s Dual-Mode NRZ/PAM4 SerDes in 65-nm CMOS ", *IEEE JSSC*, vol. 52, no. 9, pp. 2430-2447, Sept. 2017.
- [5] P. Peng et al., "A 56Gb/s PAM-4/NRZ Transceiver in 40nm CMOS", ISSCC Dig. Tech. papers, pp. 110-111, Feb. 2017.

TABLE I: PERFORMANCE SUMMARY