# A 14-GHz Bang-Bang Digital PLL with sub-150fs Integrated Jitter for Wireline Applications in 7nm FinFET

Dirk Pfaff<sup>1</sup>, Robert Abbott<sup>1</sup>, Xin-Jie Wang<sup>1</sup>, Babak Zamanlooy<sup>1,4</sup>, Shahaboddin Moazzeni<sup>1</sup>, Raleigh Smith<sup>1,3</sup>, Chih-Chang Lin<sup>2</sup> <sup>1</sup> TSMC Ottawa, Canada <sup>2</sup> TSMC, San Jose, CA <sup>3</sup> Carleton University, Ottawa, Canada <sup>4</sup> now with Huawei, Ottawa, Canada email: dpfaff@tsmc.com

*Abstract*— Increased transceiver throughput requires clock sources of ever greater fidelity. This work demonstrates a digital PLL with 143fs rms jitter (in 1kHz-100MHz band), enabled by a low noise, 13.5-15.7GHz digitally controlled oscillator providing fine resolution (1.2MHz/LSB) without relying on coarse band selection. The TDC-less PLL eliminates limit cycles by substantial reduction of the loop latency, achieved by a lookahead digital loop filter operated at 3.5GHz, or 10x the reference clock frequency. The 7nm FinFET implementation measures 0.06mm<sup>2</sup> and consumes 40mW.

Keywords—Bang-Bang PLL, limit cycle, jitter, class-C DCO, inversion mode varactor, look-ahead filter, transceiver clock generation, PAM-4, calibrated current source.

### I. INTRODUCTION

Wireline applications at 56Gb/s and beyond have seen the widespread use of digital receivers, enabled by FinFET CMOS scaling and advances in data converters, resulting in wireline transceivers becoming more digital. With most of the receiver equalization moved into the digital domain, the requirements and complexity of the remaining receiver analog frontend is greatly reduced. In contrast, clock jitter requirements, dictated by the Analog to Digital Converter (ADC), have become more stringent. Digital 56Gb/s PAM-4 receivers utilize 6 to 8-bit 28GS/s ADCs requiring extremely low jitter to maintain an acceptable Signal to Noise Ratio (SNR) at the Nyquist frequency. For example, 150fs rms clock jitter reduces the SNR to 38dB, de-rating an 8-bit, 28GS/s ADC Effective Number of Bits (ENOB) to 6. Digital receivers, in general, achieve this level of jitter performance by employing an analog charge pump Phase Locked Loop (PLL) [1-3]. Digital PLLs are rarely considered for wireline applications despite their advantages, which include reduced die area and scalability. The concerns associated with digital PLLs such as increased jitter caused by the quantized nature of the Digitally Controlled Oscillator (DCO) and Time to Digital Converters (TDC) are addressed in this paper. Design innovations crucial to low jitter digital PLL operation are discussed and validated by measurement results that rival recently reported analog charge pump PLLs for 56Gb/s and 112Gb/s PAM-4 wireline transceivers. As such, it is concluded that digital PLLs are a



Fig. 1. Digitally controlled oscillator schematic with a low noise, digitally calibrated current source and output buffer.

viable alternative and will likely become the preferred choice for future FinFET technology nodes as high performance analog circuits become increasingly difficult to implement.

#### II. DIGITAL BANG-BANG PLL

High performance digital PLLs often rely on a highresolution TDC to achieve low jitter as well as fractional-N frequency tuning. Integer-N PLLs, however, are adequate for wireline applications. Therefore, provided jitter requirements are addressed, the TDC can be replaced by a simple Bang-Bang Phase Detector (BBPD) leaving the Digitally Controlled Oscillator (DCO) as the only remaining analog block.

Unlike TDC-based digital PLLs, a Bang-Bang PLL has a small linear range determined by the random jitter present at the phase detector input. Low jitter operation demands that peak phase error at the BBPD remains within the phase detector's linear range. Exceeding the linear range introduces nonlinear effects known as limit cycles [4], causing an unacceptable increase in jitter. In order to address this issue, some authors propose dithering the reference clock to artificially enlarge the linear range of the phase detector [5], thereby greatly reducing the risk of operation in a limit cycle regime. However, it is preferable to avoid reference dithering as it potentially degrades the PLL's in-band phase noise profile. This work eliminates limit cycles without additional reference clock dithering. Instead, various design techniques are utilized to limit the peak-to-peak jitter at the BBPD input to within its intrinsic linear range as defined by the random jitter of the DCO and the reference clock. These techniques include improving the DCO frequency resolution, combined with a high frequency reference clock and a low latency digital feedback loop, as well as carefully chosen loop filter settingsall parameters that directly reduce quantization jitter [4]. Using these techniques and a DCO optimized for low phase noise, a TDC-less and reference clock dither-free BBPLL can achieve jitter performance inline with or better than recently reported analog PLLs.

#### III. DIGITALLY CONTROLLED OSCILLATOR

Fig. 1 shows the schematic of the low phase noise, fine resolution DCO. The LC tank relies on a ground connected center tap inductor (Q = 15 (*a*) 14GHz) resonating with a PMOS inversion mode varactor array organized in 22 rows by 8 columns. Each varactor is controlled by a D flip-flop and buffer circuit that sets the varactor drain/source node to 0V or 0.75V depending on the Frequency Control Word (FCW). A balanced clock tree ensures that all varactor controls switch concurrently, thereby avoiding glitches that could cause spurious tones. The varactor channel length and number of fins have been carefully chosen as the best compromise between loss, tuning range and frequency resolution. With optimized varactor sizing, the loaded Q of the LC tank is dominated by inductor loss. The majority of the array (18 rows) contains 2x6fin varactors, with the remaining 4 rows containing 1x6-fin and 1x5-fin varactors required for fine frequency resolution. The 1x6-fin and 1x5-fin sized varactors are grouped in pairs driven by complementary control signals [6], thereby reducing the effective array resolution to a single fin. The near uniform varactor sizing promotes efficient array layout and maximizes device matching over process, temperature and voltage. Given this configuration, a total of 145x12=1740 varactor array settings can be programmed, resulting in fine frequency resolution defined by the on/off capacitance difference of a single varactor fin, which is the ultimate limit dictated by the FinFET technology itself.

The DCO avoids the usage of coarse and fine tuning subarrays, often found in digitally controlled oscillators [5]. While a split coarse/fine varactor array is beneficial to tuning range and frequency resolution, the tracking range set by the fine tuning array is severely limited, causing loss of lock in the event of extreme environmental changes. The unified varactor array used in this work, combined with a 15% tuning range, guarantees maintaining frequency lock to 14GHz at any processing, voltage and temperature combination.

For best phase noise performance, the low loss LC tank is accompanied by the nominal 2Vpp oscillation amplitude, requiring an elevated bias voltage of the cross coupled PMOS gates, which is facilitated by AC-coupling, as shown in Fig. 1.



Fig. 2. BBPLL block and timing diagram. Not shown is a Phase-Frequency Detector for fast PLL locking.



Fig. 3. DLF with integral and proportional path. The PLL bandwidth and stability are set by programming  $A_P$  and the  $A_I/A_P$  ratio, respectively.

The noise contribution of the long channel cross coupled PMOS pair is minimized by class-C mode operation, established by additional capacitance,  $C_{\rm s}$ , at the common source node [7]. While class-C operation helps supress noise, amplitude squegging can occur if the  $C_{\rm s}$  capacitance is too large. Careful sizing is crucial.

The varactor capacitance is non-linear with respect to voltage; therefore, the DCO tuning characteristic is sensitive to the oscillation amplitude. Consequently, an accurately controlled oscillator bias current,  $I_{\text{DCO}},$  is required to establish the desired oscillation amplitude and tuning characteristic. While analog precision bias circuits provide a stable current, they tend to generate significant thermal and flicker noise, prompting phase noise degradation. A low noise current source is established by direct biasing of PMOS transistors to a supply voltage (0.75V/1.5V for enable/disable), eliminating the need for noisy current mirrors and other bias circuits. The PMOS channel length is maximized to reduce noise even further, while an additional flicker noise suppression of 7dB is achieved through resistive source degeneration. The current source, implemented in parallel branches as shown in Fig. 1, is digitally trimmed during PLL initialization against an external precision resistor, R<sub>CAL</sub>.

# IV. DSP IMPLEMENTATION

Digital PLLs utilize digital loop filters operating at the reference clock frequency, typically as high as several hundred MHz, assisted by a Sigma Delta Modulator (SDM) running in the low GHz range [5]. With a practical implementation requiring a few reference clock cycles to evaluate the BBPD output and update the varactor controls, the loop latency relative to the oscillator frequency is high, causing an increase in quantization jitter. The improved architecture presented in Fig. 2 substantially reduces loop latency and quantization jitter. This is achieved by two main architectural enhancements. First, the entire Digital Signal Processing (DSP) unit operates at a scaled DCO clock, dsp clk, instead of the much lower reference clock. As a result, a FCW can be computed by the Digital Loop Filter (DLF) in less than a reference clock cycle, as long as the DLF's number of pipeline stages is less than the available dsp clk cycles in a reference clock cycle. The second architectural improvement involves the introduction of a lookahead loop filter. In order to eliminate the DLF latency from the loop latency, two parallel DLF units simultaneously compute the next FCW based on the current state and speculative BBPD values. Once the BBPD output value becomes available, signalled by the dsp\_sync signal shown in Fig. 2, the correct FCW is multiplexed to the next stage while the other FCW is discarded. This approach allows the loop latency to be as short as a single dsp\_clk cycle. However, practical design considerations increase the latency to 3 dsp clk cycles, substantially faster compared to a filter operating at reference clock frequency.

The DCO's quantization jitter is further reduced by dithering between frequency control words  $fcw_s$  and its increment,  $fcw_{s+1}$ , computed by the DLF, see Fig. 2. Selection between the two FCWs is determined by a second order SDM [8] with an 8-bit control input, frac[7:0], which is also computed by the DLF. This method is found to be significantly more tolerant to varactor mismatch when compared to a multi-bit ditherer.

Fig. 3 shows a block diagram of the DLF that forms a second order type II PLL with proportional and integral gain coefficients A<sub>P</sub> and A<sub>I</sub>. With the varactor array providing two differently sized capacitive elements with a 12-to-1 ratio, the FCW computation requires radix-12 arithmetic. Fortunately, the complexity of hardware intense divide by 12 operations is avoided by using addition and subtraction which can be implemented more easily. This is achieved by splitting the integrator into a 14-bit binary weighed section and a 144-bit thermometer weighed section, with the former controlling the SDM and the 6/5-fin sized varactors while the latter controlling the 2x6-fin sized varactors. Prior to updating the DLF outputs, a Finite State Machine (FSM) inspects the 14-bit filter summing node for overflow/underflow events and corrects the integrator sections by iterative subtraction/addition (values are 256x{-12,+12} for 14-bit integrator and {+1, -1} for 144-bit integrator) until a correct FCW is established. Finally, fcw<sub>S+1</sub> is computed from fcws by incrementing the DLF's internal 6-bit value, or in case of overflow, clearing the latter and incrementing the internal 144-bit integrator output, as outlined in Fig. 3. A DSP clock rate of 3.5GHz is chosen as the best compromise between latency and ease of implementation-a



Fig. 4. Measured phase noise spectrum, from 1kHz to 100MHz offset from 14GHz carrier.

clock frequency which is handled effortlessly by synthesized logic and an automated place and route flow.

The PLL is completed by the DCO presented in the previous Section, two dividers and a BBPD, as shown in Fig. 2. Bang-bang phase detection is performed by a high speed D flip-flop, which oversamples the reference clock at 3.5GHz, followed by down-sampling to 350MHz. This approach minimizes frequency divider induced jitter. Not shown in Fig. 2 is an additional Phase-Frequency Detector (PFD) used to perform frequency locking during the PLL initialization sequence.

## V. MEASUREMENT RESULTS

The BBPLL has been fabricated in TSMC's 7nm FinFET technology. Phase noise is measured at a serial transmitter output, configured to deliver a 14GHz clock pattern, with the BBPLL locked to a 350MHz commodity off-chip crystal oscillator [9]. Fig. 4 shows the measured phase noise spectrum (Keysight E5052B), free of spurious tones and phase noise peaking, indicating limit cycle free operation and a loop phase margin of approximately 60°. The rms jitter integrated from 1kHz to 100MHz is 143fs. Fig. 5 provides more insight into the various phase noise contributors. In addition to the closed loop phase noise (with and without SDM dithering), the graph shows the free running DCO phase noise as well as reference phase noise scaled by the closed loop gain. Outside the loop bandwidth, the BBPLL phase noise is 5-6dB above the freerunning DCO phase noise (measured at -104dBc/Hz at 1MHz offset), indicating that quantization noise is not entirely eliminated by the SDM. With the SDM disabled, the integrated jitter raises to 270fs. In-band BBPLL phase noise is entirely determined by reference phase noise, as can be seen in Fig. 5. Also, free running DCO phase noise is not altered when the SDM is turned on, confirming the noise shaping properties of the SDM. Fig. 6 shows the measured tuning characteristic to be 13.7-15.7GHz. The same graph displays the frequency resolution, which on average measures 1.2MHz/LSB over the entire tuning range. Excluding the clock distribution to the transmitter, the BBPLL dissipates from two supplies



Fig. 5. Phase noise contributors to PLL phase noise

(0.75V/1.5V) a total of 40mW of which 63% is consumed by the DSP section. A comparison with recently published PLLs for wireline applications is shown in Table I. A die micrograph of the BBPLL occupying an area of 0.06mm<sup>2</sup> is shown in Fig. 7.

## VI. CONCLUSIONS

Bang-bang PLLs tend to be prone to limit cycles and high level of quantization noise, which often precludes them from low jitter applications such as wireline transceivers. This work shows that these limitations can be resolved by a single varactor fin resolution DCO and a low latency digital loop, facilitated by a digital look ahead loop filter clocked at multiple GHz. These innovations, combined with a DCO rigorously



Fig 7. Die Microgrpah



Fig. 6. DCO tuning characteristic and frequency step size

optimized for low phase noise, rivals jitter performance of analog PLLs at much improved scalability with FinFET technologies.

#### REFERENCES

- J. Kim et al., "A 112Gb/s PAM-4 Transmitter with 3-Tap FFE in 10nm CMOS," ISSCC, pp. 102-103, Feb. 2018
- [2] P. Upadhyaya et al., "A Fully Adaptive 19-to-56Gb/s PAM-4 Wireline Transceiver with a Configurable ADC in 16nm FinFET," ISSCC, pp. 108-109, Feb. 2018
- [3] M. Raj, et al., "A 164fsrms 9-to-18GHz Sampling Phase Detector Based PLL with In-Band Noise Suppression and Robust Freuqency Acquisition in 16nm FinFET," IEEE Symp. VLSI Circuits, pp. 182-183, June 2017
- [4] G. Marucci et al., "Analysis and Design of Low-Jitter Digital Bang-Bang Phase-Looked Loops," IEEE Trans. Circuits Syst. I, Vol. 61, No. 1, Jan. 2014
- [5] A. Rylyakov et al., "Bang-Bang Digital PLLs at 11 and 20GHz with sub-200fs Integrated Jitter for High-Speed Serial Communication Applications," ISSCC, pp. 94-95, Feb. 2009
- [6] J. Zhuang et al., "A 3.3GHz LC-Based Digitally Controlled Oscillator with 5kHz Frequency Resolution," A-SSCC, pp. 428-431, Nov. 2007
- [7] A. Mazzanti et al., "Class-C Harmonic CMOS VCOs, With a General Result on Phase Noise," JSSC, Vol. 43, No. 12, Dec. 2008
- [8] T. Riley et al., "Delta-Sigma Modulation in Fractional-N Frequency Synthesis," JSSC, Vol. 28, No. 5, May 1993
- Datasheed "https://www.silabs.com/documents/public/data-sheets/si545datasheet.pdf

 TABLE I.
 PERFORMANCE COMPARISON

|                          | [1] J. Kim<br>ISSCC 2018 | [2] P. Upadhyaya,<br>ISSCC 2018 | [3] M. Raj,<br>VLSI Circuits 2017 | [5] A. Rykyakov,<br>ISSCC 2009 |                         | This Work                |
|--------------------------|--------------------------|---------------------------------|-----------------------------------|--------------------------------|-------------------------|--------------------------|
| Technology               | 10nm FinFET              | 16nm FinFET                     | 16nm FinFET                       | 65nm CMOS                      |                         | 7nm FinFET               |
| Architecture             | Analog                   | Analog Fractional-N             | Analog                            | Digital BB PLL                 |                         | Digital BB PLL           |
| Oscillator               | LC                       | LC                              | LC                                | LC DCO                         |                         | LC DCO                   |
| Frequency                | 14GHz                    | 14GHz                           | 18GHz                             | 11GHz                          | 20GHz                   | 14GHz                    |
| Integrated rms<br>jitter | 185fs<br>(1kHz – 100MHz) | 180fs                           | 164fs<br>(1kHz – 100MHz)          | 345fs<br>(1MHz - 10GHz)        | 926fs<br>(1MHz - 10GHz) | 143fs<br>(1kHz - 100MHz) |
| Phase Noise at<br>100kHz | NA                       | NA                              | -102dBc/Hz                        | NA                             | NA                      | -103.5dBc/Hz             |
| Phase Noise at<br>1MHz   | -108dBc/Hz               | NA                              | -107.3dBc/Hz                      | NA                             | NA                      | -108.7dBc/Hz             |
| Phase Noise at<br>10MHz  | -119dBc/Hz               | NA                              | -114dBc/Hz                        | -121dBc/Hz                     | -112dBc/Hz              | -120.3dBc/Hz             |
| Power                    | NA                       | NA                              | 29.2mW                            | 30.4mW                         | 64mW                    | 40mW                     |
| Area                     | ~0.023mm <sup>2</sup>    | ~0.34mm <sup>2</sup>            | 0.39mm <sup>2</sup>               | 0.088mm <sup>2</sup>           | 0.113mm <sup>2</sup>    | 0.06mm <sup>2</sup>      |