# A Flexible, Low-Power Analog PLL for SoC and Processors in 14nm CMOS

Kuan-Yueh Shen, Syed Feruz Syed Farooq, Yongping Fan, Senior Member, IEEE, Khoa Minh Nguyen,

Qi Wang, Mark L. Neidengard, Nasser Kurd, and Amr Elshazly<sup>10</sup>, Member, IEEE

Abstract—This paper presents a PLL supporting diverse low-power clocking needs including wide input (6-200 MHz) and output (0.15-5 GHz) frequency ranges and SSC operation. Fabricated in 14nm FinFET CMOS, a low-power switched-cap loop filter is employed to enable high -3dB PLL bandwidth (>40% of  $f_{\text{REF}} = 19.2$  MHz), and the proposed reference current generator (IrefGen) provides accurate current with <4% tolerance without the need for external components or on-chip precision resistors. IrefGen decouples PLL loop dynamics from feedback divide ratio and provides immunity to systematic capacitor variation. Power gating of switched-cap loop filter's bias circuits results in more than 10% PLL total power savings. The PLL achieves 1.6-ps integrated RMS jitter at 4 GHz using 100-MHz reference while consuming 2.6 mW from 0.95 V. The PLL performance satisfies the stringent PCIe Gen2/3 jitter specifications without resorting to inductors.

*Index Terms*—Ring PLL, SoC, clock, adaptive, SSC, PCIe, sample-reset loop filter, microprocessors, core-clocking.

## I. INTRODUCTION

**M**ODERN Systems-on-Chip (SoCs) integrate numerous subsystems, whose operating frequencies must satisfy a patchwork of spread-spectrum clocking (SSC) and non-SSC requirements and must change on the fly (dynamically with fast relock time independent of output frequency) to reap dynamic voltage-frequency scaling (DVFS) power savings. Fig. 1 shows an example SoC where single platform crystal provides the reference for several on-die clock sources; such systems have boasted ~20 PLLs and account for >7% of total SoC power [1].

Fast PLL lock time implies high bandwidth, which also confers the side benefit of voltage controlled oscillator (VCO) self-noise attenuation. However, this is in disagreement with loop stability and the traditional rule-of-thumb limit of 10% reference frequency ( $f_{REF}$ ) [2]. In particular, PLLs with high intrinsic loop delay, such as those featuring elaborate digital filters, must resort to increasing  $f_{REF}$  in order to boost bandwidth: this consumes additional power [3] and complicates behavioral analysis [4]. Reducing PLL order from type-II to type-I has demonstrated high bandwidth [5], but it doesn't meet the requirements for lock acquisition range and static phase offset.

Manuscript received September 9, 2017; revised November 10, 2017; accepted December 1, 2017. This paper was recommended by Associate Editor E. Bonizzoni. (*Corresponding author: Amr Elshazly.*)

The authors are with Intel Corporation, Hillsboro, OR 97124 USA (e-mail: shazly@ieee.org).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2017.2784319



Fig. 1. Simplified typical SoC clocking with single crystal (XTAL).



Fig. 2. Block diagram of the proposed PLL with self-bandwidth control.

This work presents a low-power, low-jitter, wide-frequencyrange ( $F_{max}/F_{min} > 8$ ) type-II analog PLL with SSC support [6], fabricated in 14nm FinFET CMOS [7], [8]. It includes self-loop gain control to decouple bandwidth from frequency division ratio N; unlike prior art [9], our design doesn't require explicit discrete charge pump reconfiguration.

Bandwidth is extended via a switched-capacitor loop filter with low control-voltage ripple and highly-independent proportional and integral gain adjust, based on a precision on-die IrefGen circuit. A simplified Low Drop-Out (LDO) regulator design is facilitated by good intrinsic supply noise rejection (>15dB) in the VCO.

The rest of the paper is organized as follows. Section II describes the PLL architecture and self-bandwidth control. Section III presents *z*-domain analysis for the high-bandwidth loop filter. Circuit level details including loop filter and charge pump power gating are given in Section IV, with measurements in Section V. Section VI concludes with key results.

## II. PLL ARCHITECTURE AND SELF-BANDWIDTH CONTROL

Fig. 2 illustrates a simplified block diagram of the proposed PLL. It consists of a tristate phase-frequency detector (PFD), a lock detector, two charge-pumps (CPs) with bias currents from

1549-8328 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 3. Sample-reset loop filter and timing diagram. Ideally, the proportional charge of the previous cycle is instantly reset before the present CP event.



Fig. 4. Waveforms of  $V_{CTL}$  for RCC (A, B, C, D) and Sample-Reset (E) loop filter at the same phase error, and  $C_1$  is 50pF.

IrefGen that implement the proportional and integral paths, a switched-capacitor sample-reset loop filter (SRLF), a ringbased VCO followed by a post-VCO amplifier (PVA), and a feedback divider.

A startup circuit briefly pre-charges VCTL upon PLL enable for faster settling. Not shown in Fig. 2: an LDO which powers everything shown except the feedback divider, and a selectable R-C segment that increases loop filter order to 2. Shown also in Fig. 2 are two design-for-test (DFT) blocks: Lock Detect and CLK IN. The former asserts Lock signal that is monitored in high-volume manufacturing (HVM) test and by various SoC consumers. The lock detector circuit compares the feedback and reference clocks and asserts Lock when their phase error is smaller than a specified threshold set by a programmable delay line. The latter block facilitates PLL characterization via static and dynamic phase error injection. IrefGen's choice of input clock determines its operating mode: Fixed or Adaptive. In Fixed mode,  $I_{REF}$  is solely proportional to  $f_{REF}$ ; in Adaptive mode,  $I_{REF}$  is instead proportional to  $f_{VCO} = (N \cdot f_{REF})$ given fixed divisor M. The consequences of this for loop stability and bandwidth can be glimpsed from the s-domain expression for damping factor  $\zeta$  and natural frequency  $\omega_n$  of a simple 2<sup>nd</sup>-order CP-PLL with R-C loop filter [9]:

$$\zeta = \frac{R_P}{2} \sqrt{\frac{I_{CPi} C_1 K_{VCO}}{2\pi N}} \quad \text{and} \quad \omega_n = \sqrt{\frac{I_{CPi} K_{VCO}}{2\pi N C_1}} \quad (1)$$

with VCO gain  $K_{VCO}$  in rad/s/V, integral CP current  $I_{CPi}$ , filter resistance  $R_P$  and capacitance  $C_1$ . In Adaptive mode, both  $\zeta$  and  $\omega_n$  become independent of N. Note that typical PLLs uses 2<sup>nd</sup> order loop filter (RCC) that consists of a series R,  $C_1$  and a shunt capacitor  $C_s \sim (C_1/10)$  to mitigate the  $V_{CTL}$  ripple that comes from only pumping for a portion (<4% at steady state) of the reference clock cycle [10]; this  $C_s$  decreases loop bandwidth and adds loop delay. In contrast, our SRLF (see Fig. 3) applies the pumped charge  $Q_P$ (assuming  $C_1 \gg C_2$ ) as a stable voltage step, affecting almost the entire reference period equally, either from  $C_{2A}$ or  $C_{2B}$  (on alternate reference clocks) with the other cap reset to  $V_{RST}$  to ensure no memory effect. Note that S1, S2, R1, and R2 in Fig. 3 are non-overlapping clocks generated from  $f_{REF}$  divided-by-2. The SRLF achieves the same N-independence as above because its equivalent loop resistance (see Section III):

$$R_{P,SR} = \frac{I_{CPp}}{f_{REF}I_{CPi}C_1} \tag{2}$$

depends on  $I_{CPi}$  and proportional  $I_{CPp}$  scaled (independently) to the same  $I_{REF}$ . Note that  $V_{RST}$  is fixed in our SRLF, saving power and complexity compared to generation by auxiliary CPs [9] or unity-gain buffers from  $V_{CTL}$  [11–12]. Figure 4 compares  $V_{CTL}$  over one reference clock period for four cases of RCC loop filter tunings (A-D) with the SRLF (E); all cases are normalized to produce the same integrated VCO phase. In cases (A-B), low  $C_s$  values incurs high  $V_{CTL}$  ripple, plus reduced  $I_{CPi}$  heightening random device mismatch risk. Conversely, (C-D) have less ripple but longer time span in steady-state (multiples of reference period  $T_{REF}$ , or  $1/f_{REF}$ ): this degrades the maximum (stable) bandwidth of the PLL loop. Note that smaller  $V_{CTL}$  ripples lead to improved loop stability.

# III. PLL LOOP ANALYSIS AND TRANSFER FUNCTION

As some previous SRLF publications [11–12] have omitted transfer function analysis, we provide it here to motivate SRLF impact on PLL behavioral design. Our discrete-time (z-domain) formulation comprehends the sampling nature of the PFD, starting with an output/input phase relationship at steady-state which we linearize as in [13]. Because  $C_{2A} \ll C_1$ ,  $C_{2A}$  (in parallel with  $C_1$ ) is neglected during CP events. The integral CP current in one reference cycle starting at t = 0:

$$i_{CPi} = \begin{cases} I_{CPi} sgn[\theta_{e,0}]; & 0 < t < t_p \\ 0 & t_p < t < 2\pi / \omega_{REF} \end{cases}$$
(3)

where  $\theta_{e,t} = \theta_i(t) - \theta_0(t)$  is the sampled phase error between reference phase  $\theta_i(t)$  and feedback phase  $\theta_0(t)$ ,  $\omega_{REF}$  is the reference frequency in rad/s, and  $t_p$  is the net UP-DN on-time, approximated as  $t_p \cong |\theta_{e,0}| / \omega_{REF}$ .

The value of  $V_{CTL}$  can be expressed as  $V_{CTL}(t) = [Q_i(t) + Q_p(t)]/C_1 + V_{CTL}(0)$ , where  $Q_p$  and  $Q_i$  are the integration of  $I_{CPp}$  and  $I_{CPi}$ , respectively. The feedback frequency  $\omega_0(t)$  is:

$$\omega_0(t) = \frac{\Omega_0}{N} + \frac{K_{VCO}}{N} V_{CTL}(t)$$
(4)

where  $\Omega_0$  denotes the free running VCO frequency at t = 0.



Fig. 5. PLL transfer function from reference input to feedback divider output for low and high bandwidth setting, with and without the loop delay.



Fig. 6. VCO schematics with two tuning mechanisms: switchable load capacitors and adjustable PMOS control current.



Fig. 7. VCO supply noise rejection illustration and simulation results.

Resetting  $Q_p$ , for instance on C<sub>2A</sub>, finishes promptly (from  $t = t^{*-}$  to  $t^*$ ) by the next rising edge of UP-DN (at  $t = t^*$ ) for maximal  $Q_p$  retention in a cycle. To capture the phase transient in a cycle ( $0 \le t \le t^*$ ), we assume linear  $V_{CTL}$  ramp due to UP-DN, and integrate from t = 0, to  $t_p$ , and then  $t^*$  to obtain  $V_{CTL}$  and  $\theta_0$  at  $t^*$ :

$$V_{CTL}(t^{*}) = V_{CTL}(0) + \frac{\theta_{e,0}I_{CPi}}{\omega_{REF}C_{1}},$$
(5)  

$$\theta_{0}(t^{*}) = \theta_{0}(0) + \frac{\Omega_{0}}{N}t^{*} + \frac{K_{VCO}}{N}\left[V_{CTL}(0)t^{*} + \left(\frac{\left(I_{CPi} + I_{CPp}\right)t_{p}t^{*}}{C_{1}} - \frac{I_{CPi}t_{p}^{2}}{2C_{1}}\right)\right]$$
(6)

Replacing  $t^*$  by  $2\pi/\omega_{REF}$ :

$$\theta_0(t^*) \cong \theta_0(0) + \frac{2\pi}{\omega_{REF}} \left[ \frac{\Omega_0}{N} + \frac{K_{VCO}}{N} V_{CTL}(0) \right] \\ + \frac{K_{VCO} I_{CPi} \theta_{e,0}}{2\omega_{REF}^2 C_1 N} \left[ 4\pi (1+a) - \theta_{e,0} \right]$$
(7)



Fig. 8. Modified sample-reset loop filter.

where *a* is defined as  $I_{CPp}/I_{CPi}$ . Because  $|\theta_{e,0}|$  is reasonably  $\ll 4\pi$  at steady state, it is dropped to linearize  $\theta_0$ , in addition to dropping  $\Omega_0$  (as a phase step).

In z-transform space, we obtain the transfer function:

$$\frac{\theta_0(z)}{\theta_i(z)} = \frac{\frac{2\pi K_{VCO}I_{CPi}}{\omega_{REF}^2 C_1 N} + (z-1)\frac{K_{VCO}I_{CPi}}{\omega_{REF}^2 C_1 N} (2\pi (1+a))}{(z-1)^2 + (z-1)\frac{K_{VCO}I_{CPi}}{\omega_{REF}^2 C_1 N} (2\pi (1+a)) + \frac{2\pi K_{VCO}I_{CPi}}{\omega_{REF}^2 C_1 N}}$$
(8)

The same approach can be applied to derive the transfer function for a PLL with a conventional R-C loop filter:

$$\frac{\theta_0(z)}{\theta_i(z)} = \frac{\frac{2\pi K_{VCO}I_{CPi}}{\omega_{REF}^2 C_1 N} + (z-1) \frac{K_{VCO}I_{CPi}}{\omega_{REF} N} \left(R_P + \frac{2\pi}{\omega_{REF} C_1}\right)}{(z-1)^2 + (z-1) \frac{K_{VCO}I_{CPi}}{\omega_{REF} N} \left(R_P + \frac{2\pi}{\omega_{REF} C_1}\right) + \frac{2\pi K_{VCO}I_{CPi}}{\omega_{REF}^2 C_1 N}} \tag{9}$$

Eq. 9 here is identical to [2, eq. (40)] with N = 1 and shunt capacitor ( $C_2$  in [2]) removed. By substituting  $z-1 = e^{sT_{REF}} - 1 \cong sT_{REF}$  in Eq. 8, the s-domain transfer function can be written:

$$\frac{\theta_0(s)}{\theta_i(s)} = \frac{\frac{K_{VCO}I_{CPi}}{2\pi c_1 N} + s \frac{K_{VCO}I_{CPi}}{2\pi N} \frac{T_{REF}(1+a)}{c_1}}{s^2 + s \frac{K_{VCO}I_{CPi}}{2\pi N} \frac{T_{REF}(1+a)}{c_1} + \frac{K_{VCO}I_{CPi}}{2\pi C_1 N}}.$$
 (10)

The equivalent sample-reset loop filter resistance,  $R_{P,SR}$ , damping factor  $\zeta$ , and natural frequency  $\omega_n$  can be defined as:

$$R_{P,SR} = \frac{T_{REF} (1+a)}{C_1}, \quad \omega_n = \sqrt{\frac{K_{VCO} I_{CPi}}{2\pi C_1 N}},$$
$$\zeta = \frac{R_{P,SR}}{2} \sqrt{\frac{I_{CPi} C_1 K_{VCO}}{2\pi N}}$$
$$= \frac{T_{REF} (1+a)}{2C_1} \sqrt{\frac{I_{CPi} C_1 K_{VCO}}{2\pi N}}$$
(11)

And if a >> 1:

1

$$R_{P,SR} = \frac{aT_{REF}}{C_1},$$
  

$$\omega_n = \sqrt{\frac{K_{VCOICPi}}{2\pi C_1 N}}, \zeta = \frac{aT_{REF}}{2C_1} \sqrt{\frac{I_{CPi}C_1K_{VCO}}{2\pi N}}$$
(12)



Fig. 9. Block diagram of the proportional charge-pump.

The loop gain LG(z) can be derived from Eq. 8 as:

$$LG(z) = \frac{\frac{2\pi K_{VCO} I_{CPi}}{\omega_{REF}^2 C_1 N} + (z-1) \frac{K_{VCO} I_{CPi}}{\omega_{REF}^2 C_1 N} (2\pi (1+a))}{(z-1)^2}$$
(13)

Note that Eq. 8 must be amended to account for loop delay:

$$\frac{\theta_0\left(z\right)}{\theta_i\left(z\right)} = \frac{LG(z)z^{-LD}}{1 + LG(z)z^{-LD}}.$$
(14)

where LD is the effective loop delay normalized to  $T_{REF}$ . It represents the sum of the delays introduced in the loop, including the clock distribution, feedback divider, and CLK IN block. Nonzero LD widens the passband for reference clock noise (Fig. 5) and exacerbates peaking (a proxy for loop instability), especially at high bandwidth settings. However, LD decreases when  $I_{CPp}$  (and hence bandwidth) increases: in simulation, increasing  $I_{CPp}$  by 3.5 times cuts our LD roughly in half: a good omen for wide-bandwidth operation.

#### IV. CIRCUIT DESIGN AND IMPLEMENTATION

In this section, we elaborate on the key building blocks of proposed PLL, including the VCO, switched capacitor loop filter, charge pumps, IrefGen, and the feedback divider.

## A. Ring-Based VCO

The VCO (Fig. 6) is a 5-stage inverter-based ring oscillator (RO) with PMOS current starvation. Its AC-coupled PVA is sized for full-rail output swing with  $1\%/\sigma$  duty cycle variability at all configurations of interest. Digital controls set the electrical width of the starvation device (32 settings) and the amount of P-type gate load (8 settings) exposed to each RO stage. These knobs provide a wide space for tuning frequency range,  $K_{VCO}$ , and phase noise: they are intended for one-time calibration and need not be adjusted in the field. The current source's starvation PMOS device flicker noise power spectral density is inversely proportional to its area [14]. Since FinFETs are fixed in length and width [7], [8], realizing the desired starvation area requires a significant series-parallel grid of unit cells. A 2.5pF capacitor, laid out in a ring around the RO, provides low-latency stiffening of virtual supply  $V_{RO}$ . The dominant input for deterministic jitter (Dj) is supply noise on  $VCC_{PLL}$ , which can reach 3% without an LDO.



Fig. 10. The IrefGen circuit implementation and the simulated results for fast, slow, typical corners with temperatures of  $-40^{\circ}$ C to  $110^{\circ}$ C.

Our simulation and measurement indicate the VCO supply sensitivity is about 6.4ps/mV. Thus, with 30dB of total PSRR and a 3% supply noise from a 1.2V LDO supply,  $VCC_{PLL}$  noise is 1.1mV which results in Dj of 7ps.

As shown in Fig. 7,  $V_{CTL}$  is referenced to  $VCC_{PLL}$  instead of ground to stabilize the starvation device's  $V_{GS}$ . We also introduce controlled amounts of series routing resistance at the starvation device's source in layout (capitalizing on the 14nm FinFETs' low thresholds) to further insulate  $V_{RO}$  from  $VCC_{PLL}$ . The result is over 20dB of intrinsic VCO PSRR at 2.4GHz; even allowing guard-band for further process variability and other operating frequencies, we can still relax the LDO's design target to a mild 15dB.

### B. Switched-Cap Sample-Reset Loop Filter

The modified switched capacitor SRLF is shown in Fig. 8.  $V_{SR}$  is simply reset to  $VCC_{PLL}/2$ , giving the proportional UP and DN pumps equal voltage headroom when firing. The reset current is programmable (the intent being higher strength at higher  $f_{REF}$ ), and power-gated during the evaluate phase to save nearly 50% of the loop filter power. Reset un-gating precedes CP events by about 200-300ps (see Fig. 8).  $C_2/C_1$  is around 1/50, and bounded below by limits on maximum  $I_{CPp}$  and phase error. A small  $C_3$  cap at the  $VCC_{PLL}/2$  node reduces local voltage ripple when the power gating switches turn ON from OFF.

At 4GHz output frequency, the PLL transient simulation with  $f_{REF} = 100$ MHz indicates  $V_{CTL}$  ripple is smaller than 0.1mV at steady state. The proportional path is tantamount to the series resistor  $R_P$  in a conventional R-C loop filter as in Eq. 11;  $R_{P,SR}$  is tunable via the separate controls for  $I_{CPp}$  and  $I_{CPi}$ . In principle, when  $I_{CPp}$  is increased to extend the PLL bandwidth,  $I_{CPi}$  should be adjusted accordingly to retain loop stability.

## C. Charge-Pumps

The proportional and integral charge-pumps share the same topology (see Fig. 9). They receive  $I_{REF}$  into a tunable diode-connected NMOS, and mirror it through a chargepump replica bias to produce the UP/DN currents. To limit subthreshold leakage in the OFF state, T-switches driven by non-overlapping clocks derived from UP and DN are used in each CP branch. The CP replica bias contains power gating, and must awaken and stabilize prior to a UP/DN event. To achieve this, the PFD UP and DN signals are delayed by approximately 200ps before passing them to the CP. During the delay window, a CP enable pulse primes the replica bias. CP enable falls with the de-assertion of the delayed UP and DN. When the PLL is settled, this puts the bias in powersaving mode for more than 95% of the time. Taken together, the CP and SRLF power gating features save more than 10% of the total PLL power for a 100MHz reference. Devices are sized so charge pump current variation is <5% when  $I_{REF} = 25\mu$ A.

#### D. Reference Current Generator (IrefGen)

The IrefGen, shown in Fig. 10, constructs  $I_{REF}$  by moving a given amount of charge per input clock cycle:

$$I_{REF} = V_{REF} \cdot 2C_X \cdot f_{CK} \cdot P \tag{15}$$

where *P* is the output mirroring ratio. In equilibrium, that charge is removed and replenished every cycle from the large  $C_Y$ , which works in tandem with LPF1 and LPF2 to filter transient noise from switching activity. A PMOS output driver is chosen for its smaller minimum voltage drop ( $V_{DSon}$ ) than the NMOS in [15] ( $V_{th} > V_{DSon}$ ). Across PVT corners, not including  $C_X$  variation, the variability of the converged  $I_{REF}$  is <4%, settling in <200ns and <700ns for 100MHz and 19.2MHz reference clocks, respectively (see Fig. 10). To improve settling time, the R's of LPF1 and LPF2 are bypassed during the startup.

The same type of unit capacitor cells are used to construct  $C_X$  as are used for PLL's main  $C_1$  and  $C_2$ , cancelling systematic variation from the quantity  $(I_{CP}/C)$  that governs PLL's dynamics. Therefore, we may say that  $C_X$  and P terms in Eq. 15 are ratio-metrically precise.  $f_{CK}$  is also precise insofar as the PLL's input clock is stable, and in Adaptive mode that the PLL has attained frequency lock.

The primary remaining source of error is  $V_{REF}$ , derived from  $VCC_{PLL}$  via  $D_{REF}$  through a resistor-ratiometric DAC. Although  $I_{REF}$  has low-pass filtering to combat  $VCC_{PLL}$ ripple, and  $VCC_{PLL}$  itself sees some regulation benefit from its LDO, those seeking greater independence for  $I_{REF}$  may wish to derive  $D_{REF}$  instead from a bandgap reference or similar structure.

Ratio[1] Ratio[3:2] Ratio[7:4] FracRatio Ratio[0]

Fig. 11. Low-power feedback divider block diagram.



Fig. 12. PLL die photograph.

## E. Feedback Divider

The feedback divider (Fig. 11) consists of a 7-bit binary down-counter plus an additional flop+latch output module. Based on whether the Ratio word is even or odd, and the current state of clkout, the counter either counts down to 1 or 0, whereat it is reloaded with Ratio/2. Reloading also toggles the output flop, with the latch value trailing by an input clock phase. The level of the latch, ANDed with Ratio[0], conditionally delays the falling edge of clkout for near-50% output duty cycle at all Ratio settings.

This behavior supports integer ratios 2-255. To produce "half-integer" ratios 2.5-255.5, a mux upstream of the divider core may optionally insert an extra clkin phase once per clkout period by inverting the divider core's clock. Half-integers are useful both for constant-frequency applications, and for reducing feedback pattern jitter in multi-modulus operation (for fractional and/or SSC synthesis). Multi-modulus operation is further facilitated by the divider's indifference to the Ratio word except at the moment of counter reload.

Active power is mitigated by breaking the counter's clock into segments, whose enables are simply the borrow-out of the segments immediately below. Besides reducing clock toggling in buffers and sequential logic, this saves data power via simplified intra-segment logic while accounting for the mindelay-protecting latches inserted on the clock gate enables. At 4GHz (100MHz x 40) and 0.95V, simulated power consumption is less than  $300\mu$ W across PVT.

## F. Measurement Results

The PLL was fabricated in Intel's SoC/microprocessor 14nm FinFET CMOS technology. Die micrograph is shown in Fig. 12. It occupies an active area of 0.021mm<sup>2</sup> (116 $\mu$ m × 185 $\mu$ m). The loop filter, charge-pumps, IrefGen,

|                            |                         | This   | Work   |          | T. Tsai<br>ISSCC'15     | L. Kong<br>ISSCC'15 | J. Liu<br>ISSCC'14    | N. August<br>ISSCC'12 | A. Elshazly<br>VLSI'12 |
|----------------------------|-------------------------|--------|--------|----------|-------------------------|---------------------|-----------------------|-----------------------|------------------------|
| Architecture               | Type II Charge Pump PLL |        |        |          | ADPLL TDC<br>Int-N Mode | Analog<br>Type I    | BB-DPLL<br>Int-N Mode | ADPLL                 | BB-DPLL                |
| Technology                 |                         | 14     | nm     |          | 16nm                    | 45nm                | 20nm                  | 22nm                  | 130nm                  |
| Core Area (mm2)            |                         | 0.0    | )21    |          | 0.029                   | 0.015               | 0.012                 | 0.017                 | 0.2                    |
| PLL Supply (V)             | 0.95 (0.6-0.95)         |        |        | 0.52-0.8 | 1                       | 0.9                 | 1                     | 1.1                   |                        |
| Reference (MHz)            | 100                     | 100    | 100    | 19.2     | 100                     | 22.6                | 25                    | 100                   | 375                    |
| Output (GHz)               | 4                       | 1.6    | 0.8    | 2.4      | 3 (0.25-4)              | 2.4 (2-3)           | 1.6 (0.025-1.6)       | 3.2 (0.3-3.2)         | 1.5 (0.8-2)            |
| Integrated RMS Jitter (ps) | 1.59                    | 3.02   | 5.82   | 2.99     | 2                       | 0.97                | 28                    | 3.1                   | 3.2                    |
| Power (mW)                 | 2.6                     | 1.1    | 0.7    | 1.8      | 6.10                    | 4                   | 3.1                   | 3.4                   | 1.35                   |
| FoM (dB)                   | -231.9                  | -230.1 | -226.1 | -227.9   | -226.1                  | -234.1              | -206.1                | -224.9                | -228.6                 |

TABLE I PLL PERFORMANCE SUMMARY AND COMPARISON

FoM=10·log(( $\sigma_{RMS}$  /1s)<sup>2</sup>·(Power/1mW))



Fig. 13. Measured PLL transfer function overlaid with proposed model ( $f_{REF} = 19.2$ MHz). Measured BW<sub>-3dB</sub> of 0.09  $f_{REF}$ , 0.32  $f_{REF}$ , 0.42  $f_{REF}$ , and 0.49  $f_{REF}$ .



Fig. 14. Lock repeats for >50 times overlaid.  $f_{REF}$  is 19.2MHz for -3dB PLL bandwidth of 42%  $f_{REF}$ .

and VCO occupy approximately 35%, 25%, 15%, and 8% respectively. The nominal supply ranges from 0.6V to 0.95V, with corresponding output frequency ranges of 0.2-1.8GHz and 0.4-5GHz, respectively. The following results consist of PLL lock time statistics, bandwidth, jitter and power consumption under different conditions; unless otherwise stated, a 0.95V supply voltage is used. Figure 13 shows



Fig. 15. Measured PLL transfer function peaking and -3dB bandwidth of  $I_{CPp}I_{CPi}$  sweeps for two  $I_{CPi}$  levels.



Fig. 16. Measured PLL reference clock spur of -51.5dB with  $f_{REF} = 19.2$ MHz.

measured PLL transfer functions superimposing the analytical model of Eq. 14 for different -3dB bandwidth settings of  $0.09 \cdot f_{REF}$ ,  $0.32 \cdot f_{REF}$ ,  $0.42 \cdot f_{REF}$ , and  $0.49 \cdot f_{REF}$ .

In each case, *LD* has been adjusted to fit the observed change in  $I_{CPp}$  (down to  $0.2 \cdot T_{REF}$  at  $0.49 \cdot f_{REF}$ ). Using a 19.2MHz reference clock, the measured -3dB bandwidth for  $0.42 \cdot f_{REF}$  and  $0.49 \cdot f_{REF}$  settings are 8.1MHz and 9.5MHz with transfer function peaking <2.4dB and <4dB, respectively.

SHEN et al.: FLEXIBLE, LOW-POWER ANALOG PLL FOR SoC AND PROCESSORS IN 14nm CMOS



Fig. 17. Measured power, jitter, bandwidth, and peaking with  $f_{REF} = 100MHz$ .



Fig. 18. Measured phase noise for 100MHz reference and  $f_{VCO} = 4$ GHz (divided-by-2). The RMS jitter is 1.588ps integrated (100kHz to 1GHz).

Figure 14 shows the frequency lock trajectory (blue) superimposed with a persistence plot of the Lock indicator for >50 PLL relock events, which is used in HVM testing. The trajectory of  $f_{VCO}/2$  mirrors that of the control voltage,  $V_{CTL}$ . Upon PLL enable, the startup circuit initially pulls  $V_{CTL}$ toward  $VCC_{PLL}/2$ . The lock time for  $f_{REF} = 19.2$ MHz, measured from PLL enable to Lock, averages  $3\mu$ s (55 reference cycles) with a standard deviation of  $0.23\mu$ s (4 reference cycles). At  $f_{REF} = 100$ MHz, lock time improves to  $1\mu$ s.

The impact of the charge pump current variation on the high bandwidth PLL transfer function is measured and the results are shown in Fig. 15 for  $f_{REF} = 19.2$ MHz and  $f_{VCO} = 2.4$ GHz. For -3dB bandwidth of  $0.42 \cdot f_{REF}$ , when



Fig. 19. Measured PLL power and integrated RMS jitter in Mode Adaptive with  $f_{REF} = 100MHz$  across  $f_{VCO}$  range of 0.2-1.8 GHz under a 0.6V supply.



Fig. 20. PLL diagram with SSC/fractional-N capability.

only  $I_{CPi}$  is increased by 6.3%, the measured PLL -3dB bandwidth increases by 0.7%, and the peaking increases by +0.08dB. In the case where  $I_{CPp}$  (or  $I_{CPp}/I_{CPi}$ ) is increased by 10% while the reference  $I_{CPi}$  unchanged, the measured PLL -3dB bandwidth changes by +4% and the corresponding peaking changes by +0.7dB. The amount of  $I_{CPi}$  and  $I_{CPp}$  (or  $I_{CPp}/I_{CPi}$ ) variation are based on post-layout simulation. The measured phase noise at  $0.42 \cdot f_{REF}$  -3dB bandwidth indicated 3ps integrated RMS jitter.

In typical usage, the PLL bandwidth is targeted to  $0.3 \cdot f_{REF}$  or lower, in consideration of the balance between reference clock noise versus self-noise, and the desire for transfer function peaking separation of PLLs in cascade. The measured reference spur is -51.5dB, shown in Fig. 16, where the PLL bandwidth setting is  $0.06 \cdot f_{REF}$ , and output is 2.4GHz. The smaller spurs at  $f_{REF}/2$  shown in Fig. 16 are due to RC mismatch between the  $C_{2A}$  and  $C_{2B}$  paths. For performance on demand, processor core clocking can change operating frequencies via feedback ratio (*N*) settings.

Using Adaptive IrefGen with a 100MHz reference, we obtain the power, integrated RMS jitter, -3dB bandwidth, and transfer function peaking measurements shown in Fig. 17 by holding all settings except *N*. The measured -3dB bandwidth is  $13.6\pm2.7$ MHz with  $1.8\pm0.4dB$  peaking for 0.8-4GHz, demonstrating how Mode Adaptive manages the N-independent loop dynamics before  $K_{VCO}$  rolls off  $(f_{VCO} < 0.8$ GHz or >4GHz). The PLL consumes 0.5-3mW across 0.4-4.4GHz output frequency. Shown in Fig. 18 is the measured output phase noise spectrum. The measured



Fig. 21. Measured spectrum with SSC enabled/disabled (mode Fixed is used).

TABLE II Data Showing PLL Meets PCIe Gen2/3 Jitter Specifications

|                                  | PCle Gen2<br>Spec | Measurement<br>@5GHz (5Gb/s) | PCle Gen3<br>Spec | Measurement<br>@4GHz (8Gb/s) |
|----------------------------------|-------------------|------------------------------|-------------------|------------------------------|
| Tj (ps) (BER=10 <sup>-12</sup> ) | <50               | 18.16                        | <30               | 15.1                         |
| Rj (ps)                          | <1.43             | 1.20                         | <1                | 0.91                         |
| Dj (ps)                          | <30               | 1.03                         | <16               | 2.10                         |
| PLL Bandwidth (MHz)              | 8-16              | 9                            | 2-4               | 3                            |
| PLL Power (mW)                   |                   | 3.5@0.95V                    |                   | 2.6@0.95V                    |

integrated RMS jitter (from 100k to 1GHz) is 1.6ps at 4GHz output frequency from a 100MHz reference clock using the previous control settings. The 100kHz tone and its harmonics are due to the test-setup system/equipment, and are not intrinsic to the PLL.

The corresponding figure-of-merit (FoM) of -232dB and -226dB at 4GHz and 800MHz output frequencies, respectively. The FoM declines rapidly from 1.6GHz to 800MHz due to the rapid reduction in the current-starved ring oscillator swing and lower PLL bandwidth. To demonstrate control flexibility, we reconfigure the PLL for usable  $f_{VCO}$  past the  $K_{VCO}$  roll-off point around 4GHz and operate the PLL at 5GHz while consuming 3.5mW. Reconfiguring for operation at low  $VCC_{PLL}$  (0.6V; Fig. 19), the PLL demonstrates  $f_{VCO}$ of 0.2-1.8 GHz while consuming 0.27-0.7mW for  $f_{REF} =$ 100MHz. With  $f_{REF} = 19.2$ MHz, the range extends further to 153.6MHz (N=8) where the PLL consumes only 170 $\mu$ W.

The PLL diagram with SSC capability is illustrated in Fig. 20 with multi-modulus operation which modulates the feedback divider in half-integer steps, and the supplemental loop filtering (third pole) enabled to suppress the DSM-induced quantization noise. No duty cycle correction is applied. The measured SSC output spectrum is shown in Fig. 21 for a 19.2MHz reference as in a typical SoC clock generation hub. The observed EMI reduction is 17.3dB when a 2.4GHz output is down-spread by 5000ppm.

Table 1 summarizes the measured performance and compares the results to state-of-the-art ring-based PLLs [3], [5], [16]–[18]. The same PLL, with reconfigured

controls, can contend with LCPLLs – which have higher power and lower jitter – for meeting PCIe Gen2/Gen3 jitter specs [19], shown in Table 2.

The measured jitter in Table 2 (Rj/Dj/Tj) are obtained using a real-time oscilloscope where built-in analysis separates the random and deterministic jitter (Rj/Dj) from the total jitter Tj assuming a bit error rate (BER) of  $10^{-12}$ . For Gen2 (5Gb/s), the VCO is run at 5GHz with measured Tjaround 20ps, versus the 50ps specifications. For Gen3 (8Gb/s), the VCO is run at 4GHz to drive a quadrature-generating DLL. Here, measured Tj is about 15ps versus the 30ps spec. Both measurements use the PCIe-specified noise high-pass filter, and both scenarios yield PLL transfer function peaking less than 2dB.

### V. CONCLUSION

We have presented a flexible, low-power, high-performance charge-pump-based PLL. It can address an array of SoC and CPU clocking needs through wide input- and output-frequency range, highly tunable behavioral parameters (including viable performance at a bandwidth over 40% of  $f_{REF}$ ), ratio-independent bandwidth to support rapid frequency changes, and multi-modulus operation for SSC applications. In our design, the higher bandwidth is enabled due to lower control voltage ripple and less loop delay.

The proposed PLL performance satisfies the stringent PCIe Gen2/Gen3 jitter specs without the need for inductors, while also supporting ultra-low power internet-of-things (IoT) uses via its low  $VCC_{PLL}$  floor. This PLL, which is deployed in multiple Intel SoCs, tallies a FoM of -226dB to -232dB with a frequency range 0.8-5GHz, while consuming only 0.7-3.5mW from 0.95V; or  $270-710\mu$ W from 0.6V for 0.2-1.8 GHz.

#### ACKNOWLEDGEMENTS

The authors thank Byron Grossnickle and Joshua Bondie for Lab and measurement support; Dan Zhang for validation; Andrey Mezhiba, James Ayers, Gennady Goltman, Praveen Mosalikanti and Sameer Somvanshi for circuit consult and feedback; Amanda Duncan for project support.

#### REFERENCES

- [1] N. Kurd *et al.*, "Haswell: A family of IA 22 nm processors," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 49–58, Jan. 2015.
- [2] P. K. Hanumolu, M. Brownlee, K. Mayaram, and U.-K. Moon, "Analysis of charge-pump phase-locked loops," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 51, no. 9, pp. 1665–1674, Sep. 2004.
- [3] T.-H. Tsai, M.-S. Yuan, C.-H. Chang, C.-C. Liao, C.-C. Li, and R. B. Staszewski, "A 1.22 ps integrated-jitter 0.25-to-4 GHz fractional-N ADPLL in 16 nm FinFET CM0S," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 450–451.
- [4] N. D. Dalt, "A design-oriented study of the nonlinear dynamics of digital bang-bang PLLs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 52, no. 1, pp. 21–31, Jan. 2005.
- [5] L. Kong and B. Razavi, "A 2.4 GHz 4 mW inductorless RF synthesizer," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 52–53.
- [6] K.-Y. J. Shen et al., "A 0.17-to-3.5 mW 0.15-to-5 GHz SoC PLL with 15 dB built-in supply noise rejection and self-bandwidth control in 14 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2016, pp. 330–331.
- [7] S. Natarajan *et al.*, "A 14 nm logic technology featuring  $2^{nd}$ -generation FinFET, air-gapped interconnects, self-aligned double patterning and a 0.0588  $\mu$ m<sup>2</sup> SRAM cell size," in *IEDM Tech. Dig.*, Dec. 2014, pp. 3.7.1–3.7.3.

- [8] E. Karl et al., "A 0.6V 1.5 GHz 84 Mb SRAM design in 14 nm FinFET CMOS technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [9] J. G. Maneatis, J. Kim, I. McClatchie, J. Maxey, and M. Shankaradas, "Self-biased high-bandwidth low-jitter 1-to-4096 multiplier clock generator PLL," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1795–1803, Nov. 2003.
- [10] B. Razavi, *Design of Analog CMOS Integrated Circuits*. New York, NY, USA: McGraw-Hill, 2002, pp. 560–562.
- [11] A. Maxim, B. Scott, E. Schneider, M. Hagge, S. Chacko, and D. Stiurca, "A low-jitter 125-1250-MHz process-independent and ripple-poleless 0.18-μm CMOS PLL based on a sample-reset loop filter," *IEEE J. Solid-State Circuits*, vol. 36, no. 11, pp. 1673–1683, Nov. 2001.
- [12] P. J. Lim, "An area-efficient PLL architecture in 90-nm CMOS," in *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2005, pp. 48–49.
- [13] F. Gardner, "Charge-pump phase-lock loops," *IEEE Trans. Commun.*, vol. C-28, no. 11, pp. 1849–1858, Nov. 1980.
- [14] A. A. Abidi, "Phase noise and jitter in CMOS ring oscillators," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1803–1816, Aug. 2006.
- [15] B. Razavi, *Design of Analog CMOS Integrated Circuits*. New York, NY, USA: McGraw-Hill, 2002, p. 393.
  [16] J. Liu *et al.*, "A 0.012 mm<sup>2</sup> 3.1 mW bang-bang digital fractional-N
- [16] J. Liu et al., "A 0.012 mm<sup>2</sup> 3.1 mW bang-bang digital fractional-N PLL with a power-supply-noise cancellation technique and a walkingone-phase-selection fractional frequency divider," *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 268–269.
- [17] N. August, H.-J. Lee, M. Vandepas, and R. Parker, "A TDC-less ADPLL with 200-to-3200 MHz range and 3 mW power dissipation for mobile SoC clocking in 22 nm CMOS," *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2012, pp. 246–247.
- [18] A. Elshazly, R. Inti, M. Talegaonkar, and P. K. Hanumolu, "A 1.5 GHz 1.35 mW -112 dBc/Hz in-band noise digital phase-locked loop with 50 fs/mV supply-noise sensitivity," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2012, pp. 188–189.
- [19] (Nov. 10, 2010). PCI Express Base Specification Revision 3.0. [Online]. Available: https://pcisig.com/specifications



Kuan-Yueh Shen received the B.S. and M.S. degrees in electronics engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1995 and 1997, respectively, and the Ph.D. degree in electrical engineering from the University of California at Los Angeles, Los Angeles, CA, USA, in 2005.

He has been a Senior Staff Analog Design Engineer with Intel Corporation, where he has been involved in the generations of low power clocking and analog circuits as well as technology development since 2005.



**Syed Feruz Syed Farooq** joined the Special Circuits Group, Intel, Hillsboro, OR, USA, in 2010, as an Analog Engineer, after graduating from Oregon State University. Since 2010, he has been designing and testing voltage regulators, PLL, bandgap reference, and thermal sensor for three processor generations. He is currently leading the effort defining low power clocking features for the next generation Intel Core Processor.



**Yongping Fan** (SM'06) received the B.S. degree in physics from Northwest University, Xi'an, China, the M.S. degree in electronic engineering from Xi'an Jiaotong University, China, and the M.S. degree in physics and the Ph.D. degree in electrical engineering from the Graduate School of Purdue University in 1990 and 1994, respectively. During his graduate study at Purdue University, he was a key member of the team that developed the continuous wave blue/green semiconductor quantum well laser. He joined Intel Corporation in 1997. Since 1997, he has

been involved in low-power and high-performance analog and mixed signal circuit designs, including LCPLL, ring-oscillator PLL, DLL, phase interpolator, voltage regulator, bandgap reference, and IO circuits for microprocessors, SOC, and IOT products. He is currently a Senior Principal Engineer and the Manager of analog circuit technology with the Portland Technology Development Center, Intel Corporation, Portland, OR, USA.

He is the author/co-author of over 30 published papers and inventor/coinventor of 17 U.S. patents.



Khoa Minh Nguyen received the B.S. degree in electrical engineering and computer science from the University of California at Berkeley, Berkeley, CA, USA, in 2004, and the S.M. and Ph.D. degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, MA, USA, in 2006 and 2011, respectively.

In 2011, he joined the Portland Technology Development Division, Intel Corporation, Hillsboro, OR, USA. He has involved in frequency synthesizers and PLL architectures for RF, CPU, and SOC applications over multiple process generations.



**Qi Wang** received the B.S. and M.S. degrees from Fudan University, Shanghai, China, in 1997 and 2000, respectively, and the Ph.D. degree in electrical engineering from the University of Washington, Seattle, in 2006. He was a Staff Design Engineer at Cypress Semiconductor Inc. from 2006 to 2010 developing very low noise PLLs and frequency synthesizers. He joined Intel in 2011 to develop next generation clocking technologies. He has been involved in advanced clocking design and analog design in several generations of Intel microprocessors.



Mark L. Neidengard received the B.S. and M.S. degrees in computer science from the California Institute of Technology, in 1997 and 1998, respectively, and the Ph.D. degree in electrical and computer engineering from Cornell University in 2002. He has been a Senior Staff Analog Design Engineer and the Domain Lead for Analog Methodology at Intel Corporation, since 2002. He holds eight patents and has co-authored several papers in the field.







Amr Elshazly (S'04–M'13) received the B.Sc. (Hons.) and M.Sc. degrees from Ain Shams University, Cairo, Egypt, in 2003 and 2007, respectively, and the Ph.D. degree from the Oregon State University, Corvallis, OR, USA, in 2012, all in electrical engineering.

He is currently with Intel Corporation, Hillsboro, OR, USA, developing high-performance high-speed I/O circuits and architectures for next generation process technologies. His research interests include serial-links, frequency synthesizers, PLLs, MDLLs, CDRs, data converters, and low-power mixed-signal circuits.

Dr. Elshazly received the Analog Devices Outstanding Designer Award in 2011, the Center for Design of Analog-Digital Integrated Circuits Best Poster Award in 2012, and the Graduate Research Assistant of the year Award in 2012 from the College of Engineering, Oregon State University. He serves as a reviewer for several IEEE journals, including JSSC, TCAS-I and II, TVLSI, and the IEEE ISSCC, VLSI, and ASSC conferences.