# A 35 dBm Output Power and 38 dB Linear Gain PA With 44.9% Peak PAE at 1.9 GHz in 40 nm CMOS

Haoyu Qian, Student Member, IEEE, Qiyuan Liu, Student Member, IEEE, Jose Silva-Martinez, Fellow, IEEE, and Sebastian Hoyos, Senior Member, IEEE

Abstract—This paper presents a 1.9 GHz linear power amplifier (PA) architecture that improves its power efficiency in the power back-off (PBO) region. The combination of power transistor segmentation and digital gain compensation effectively enhances its power efficiency. A fast switching scheme is proposed, such that PA drivers and segments are switched ON and OFF according to signal power; thus, the PA power consumption correlates with the power of the input signal. Binary power gain variations due to PA segmentation are dynamically compensated in the digital domain. The proposed solution overcomes the tradeoffs between power efficiency and linearity by employing the digital predistortion technique. The PA is implemented in a 40 nm CMOS process. It delivers a saturated output power of 35 dBm with 44.9% peak power-added efficiency (PAE) and a linear gain of 38 dB. The adjacent channel leakage ratio (ACLR) at  $\pm 5$  MHz at a maximum linear output power of 31 dBm for a baseband WCDMA signal is -35.8 dBc.

*Index Terms*—CMOS RF PA, CMOS RF power amplifier, CMOS transmitter, highly efficient PA, highly linear PA, linear power amplifier, RF transmitter, segmented PA.

#### I. INTRODUCTION

**T** HE POWER amplifier (PA) is one of the major power consumers in the RF transceiver [1]–[3], and the design and implementation of high efficient CMOS PA has been a very active research and development area during the last few years [4]–[6]. The 3G–5G communication standards use a high data rate and bandwidth efficient modulations that result in a high peak-to-average power ratio (PAPR). Because of the high PAPR in such modulations during orthogonal frequency-division multiplexing (OFDM), the probability density function (pdf) of the transmitted power will peak in the power back-off (PBO) region. However, the power efficiency of linear PAs reaches maximum at the peak output power, and drops drastically in the PBO region.

Envelope tracking [5], [7]–[10] and PA segmentation [4], [11]–[19], [20], [21], [23], [24] are two efficient enhancement techniques that have gained much interest recently. However, the envelope tracking system is becoming less effective in advanced CMOS technologies as the power supply scales down. The minimum drain-source voltage required by PA transistors and the limited drain-source voltage allowed by

Digital Object Identifier 10.1109/JSSC.2015.2510026

the technology limit the benefits of this approach; the use of stacked transistors may help to tolerate more signal swing. Additionally, wide bandwidth standards require a high switching frequency switching regulator, which serves as a tradeoff between regulator power efficiency, output ripple, and tracking error [9], [10].

The use of on-chip transformers in segmented PAs usually presents tolerances, especially the magnetic coupling factor that may result in unreliable impedance matching, and as the distance between the metal layers to the silicon substrate continues to shrink, the loss due to substrate coupling would cut into the power saved by such architectures. Therefore, these segmentations must be accompanied by a tunable impedance matching network that makes these solutions sensitive to processvoltage-temperature variations [14], [15]. In this approach, some PA sections are deactivated in the low-power mode, such that the overall efficiency for low-power standards is improved. Such architectures do not provide means to improve average efficiency within each mode of operation. On the other hand, the PA based on DAC switching [21] used in polar PAs is an interesting approach that is further exploited in this design.

In this paper, the design of a 1.9 GHz linear segmented PA is presented. To improve efficiency in the PBO region, a combination of PA segmentation and digital signal processing (DSP) is employed. The PA sections are directly connected to the output impedance matching network equipped with class AB common-mode feedback (CMFB) mechanism to reduce common-mode variations when (de)activating the PA segments. The proposed PAs' efficiency in the back-off region is significantly improved since the drivers and PA active sections are correlated with input signal power. The discrete power gain variations were effectively compensated using a digital prewarping technique employing noiseless, fast, precise, and cheap digital amplification. The digital prewarping scheme increases the power of weak signals improving the signal-to-noise ratio of the solution under PBO conditions. Preliminary results of this work were recently reported in [19].

This paper is organized as follows. Section II reviews three popular PA architectures aimed at improving power efficiency in the PBO, namely the envelope tracking system, power combining, and DAC-based technique. In Section III, the proposed architecture is described in detail, and an in-depth analysis of the impact of linearity due to timing mismatch is carried out. The design of the PA building blocks is presented in Section IV, and the measurement results and discussions are presented in Section V. Finally, the conclusion is drawn in Section VI.

0018-9200 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received June 23, 2015; revised October 11, 2015 and November 14, 2015; accepted December 02, 2015. Date of publication February 18, 2016; date of current version March 02, 2016. This paper was approved by Associate Editor Waleed Khalil.

The authors are with Texas A&M University, College Station, TX 77843 USA (e-mail: haoyuqian@tamu.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.



Fig. 1. Conceptual schematic of an ET system.

### **II. EFFICIENCY ENHANCEMENT TECHNIQUES**

Current and future generations of communication systems use high PAPR modulation schemes due to the need for bandwidth efficiency and accommodation of multimode and multistandard applications; therefore, the target goal is to improve power efficiency in the PBO region. A brief description of these techniques follows.

## A. Envelope Tracking

One of the possible envelope tracking topologies is shown in Fig. 1. Baseband amplitude (the "envelope") is extracted in the DSP and converted to analog through the digital-toanalog converter (DAC). The envelope signal is then fed into a switching regulator, usually combined with a linear regulator (not shown in this figure) used to reduce  $V_{\text{DRAIN}}$  ripple. The PA's  $V_{\text{DRAIN}}$  is dynamically varied, tracking the baseband signal amplitude; PA's efficiency improves at PBO region. One of the major flaws of this architecture is that the timing misalignment of the PA supply voltage to the RF signal will introduce nonlinearity, and most effort to align the two paths are sensitive to process, voltage, and temperature variations. As the CMOS technology scales toward lower breakdown voltages where the PA output voltage swing is limited, the envelope tracking technique becomes less effective. On the other hand, the switching regulator must be agile to track the fast variations in the input signal but also with small ripple. These issues demand large switching frequencies, even > 100 MHzfor signal bandwidths of 20 MHz with stringent slew-rate specifications. Unfortunately, the increased switching loss of the switching regulator degrades the overall power efficiency when high frequency clocks are employed. Power efficiency degrades due to the use of an auxiliary linear amplifier needed to reduce output voltage's ripple.

## B. Power Combining: Segmented PA

The PA can be segmented and the control system deactivates one or more sections depending on the power demanded by different standards, as shown in Fig. 2(a). This approach is well suited for multimode multistandard applications where several sections of the PA can be deactivated when the system is used in low-power mode operation [11]–[13], [15], [16]. This technique can also be used in switching-mode PAs as demonstrated by [20].



Fig. 2. Conceptual schematic of power combining architecture with switchable PAs. (a) PA for multistandard applications. (b) DAC-based PA with optimized current efficiency.

#### C. Power DAC: Segmented PA

This approach was developed for polar amplifiers; see for instance [21]. It employs a DAC embedded at the output of the RF PA as depicted in Fig. 2(b). The phase of the input signal modulates the carrier and the modulated signal then feeds the linear preamplifiers and so the PA sections  $2^{M}(W/L)_{0}$ ,  $2^{M-1}(W/L)_0,\ldots,2^0(W/L)_0$ . The PA is binary segmented; then, its output current is correlated with the magnitude of the input signal (determined by  $b_M, b_{M-1}, \ldots, b_0$ ) implementing an embedded DAC. In theory, the PA's current efficiency would be maintained close to the maximum attainable in every segment due to the fact that the digital predistortion adjusts the signal power to fit within the maximum linear range. However, a number of practical limitations (such as larger dc current than peak ac current in every PA segment is needed for good linearity) degrade it. The PA driver amplifies the phase modulated waveform in a linear fashion to preserve the information, and then demands the use of power hungry class A drivers. Under POB conditions, the power consumption might be drastically limited by the PA drivers rather than by the PA itself. PA drivers can also turn OFF when the corresponding branch is OFF.

# III. PA ARCHITECTURE

Since most communication systems in 3G and onward have a Gaussian distribution power transmission pdf as a function of output power in dBm, the architecture targeted at such communication systems partitions the signal in a linear-in-dB manner to maximize its effectiveness. On the other hand, the best power efficiency in current PAs is obtained for large signals, then the aim of the proposed approach is to maintain the PA input signal large; for this purpose, digital prewarping techniques are



Fig. 3. Correlation between control phases and baseband signal amplitude.

employed. The incoming signal is segmented into four regions with adjacent regions, which differ in maximum voltage by 6 dB as shown in Fig. 3. More segments can always be used if appropriated for other designs. The four regions are distinguished by the values of the control phases  $\varphi_1 - \varphi_3$ . These control bits correspond to the two most significant bits (MSBs) of the baseband signal; thus, baseband signal power is identified in the DSP. The control phases manage the segments of the PA, thus correlating the PA current consumption and gain with the signal MSBs. The prewarped LSBs are then processed using linear amplifiers.

Fig. 4 shows the conceptual schematic of the proposed system. Ignoring the sign bit, the MSBs of the digital representation of the baseband signal magnitude manage the segments, while the least significant are converted into analog format and then up-converted by the mixer. The PA and its driver are divided into four sections in a binary fashion; it is straightforward to realize this operation in the digital domain since the two MSBs provide that information; for better control of the architecture, the MSBs are converted into thermometric format. The control bits  $\phi_1 - \phi_3$  drive the PA sections through the drivers. If the signal strength falls in the region  $\phi_0$ , for instance, the control phases  $\phi_1 - \phi_3$  are zero, and then only the unswitchable section manages the signal  $S_{in}(t)$ . To minimize the switches in the signal path, the drivers are turned OFF by disconnecting the transistor drain from VDD; dc coupling is used to drive the PA sections to avoid the use of large capacitors that introduce significant delay in signal path. The architecture is designed such that when the drivers are turned OFF, the PA sections also shut OFF. As a result, the drivers and PA sections are dynamically correlated with signal power providing further power savings.

Due to the manipulation of the segments, the PA power gain follows this pattern, which is a desirable property for polar amplifiers, but makes the PA gain signal dependent for linear amplifiers. An elegant yet efficient solution is to use digital gain equalization to overcome this shortcoming. The signal strength is evaluated and amplified accordingly in the digital domain such that the digital gain and gain attenuation due to PA switching compensate each other leading to a constant power gain factor across all operating conditions. The MSBs used to control the PA segments are also used to manipulate the least significant bits implementing digital gain factors of  $2^0$ ,  $2^1$ ,  $2^2$  and  $2^3$ . The realization of these operations is trivial since they correspond to left data shifting by 0, 1, 2, or 3 spots.



Fig. 4. Simplified schematic of the proposed architecture employing three binary weighted switchable arrays.



Fig. 5. Simplified model for timing mismatch analysis.

A unique property of this approach is that small signals are noise-free amplified in the digital domain, making them more tolerant to thermal noise due to the mixer, PA drivers, and PA sections. The digital amplification does not saturate the RF sections since the magnitude of the prewarped input signal is always within the linear range of the active drivers and PA blocks. The digital gain by multiples of 2 is a very easy and cheap operation since it only requires a bit-shift to the left in the digital domain. If the digital gain equalized signal reaching PA input is fully synchronized with the manipulation of the PA sections, the PA output signal is smooth when transitioning across different segments. However, a common-mode current step (when switching across segments) is an issue that requires further attention.

#### A. Timing Mismatch Analysis

One concern is the timing alignment of the RF signal path and the digital control phase path. A simplified model of the system shown in Fig. 5 is used to capture the essence of the timing mismatch. Let us consider the case of only one-bit control  $\varphi_3$ . Suppose that there is a timing delay of  $\tau$  seconds between the RF signal path and the control phase, i.e., the control signal arrives at the switch before the corresponding RF signal reaches the PA cells. Assume a modulated input signal  $s_{in}(t) =$  $s_{\rm BB}(t)s_{\rm RF}(t) = \cos(\omega_{\rm BB}t)\cos(\omega_{\rm RF}t)$ , where  $\omega_{\rm BB}$  and  $\omega_{\rm RF}$ represent the baseband and RF angular frequencies, respectively. For simplicity, the amplitude of the input tone and gain of the mixer are chosen to be unity. If all PA sections are active, the output power is then described as  $s_{out-N}(t) = A_{VPA}s_{in}(t)$ where  $A_{\rm VPA}$  is the PA gain. When the PA is partitioned at its half-way point, the PA output is  $s_{out - N}(t) = 0.5A_{VPA}s_{in}(t)$ . However, the baseband equalizer recognizes that the signal



Fig. 6. PA output waveforms (RF component is not shown for simplicity). (a) Prewarped signal with and without timing delay. (b) Error waveform due to timing mismatch between  $\phi_3$  and  $S_i(t - \tau)$ .

power is small and amplifies it by 6 dB;  $s_{in}(t)$  is then a preequalized version of the original baseband input signal and can be expressed as follows for the case of a single tone:

$$s_{in}(t) = \begin{cases} 2 s_{BB}(t) s_{RF}(t), & \text{if} - 0.5 \le S_{BB}(t) \le 0.5\\ s_{BB}(t) s_{RF}(t), & \text{if} - 0.5 > S_{BB}(t) > 0.5. \end{cases}$$
(1)

In Fig. 6,  $t_i$ , i = 1, 3, 5, 7 which corresponds to the breaking points of the segmentation algorithm. If the timing is perfectly aligned, while the magnitude of the baseband signal is smaller than the threshold voltages, the PA gain reduces by a factor of 2. At the same time, the signal is digitally amplified by two while in this region and, thus, the overall gain remains constant since the digital amplification and PA attenuation are fully synchronized. On the other hand, if there is a timing mismatch of  $\tau$  seconds between the time we manipulate the PA segments and signal traveling through the up-convertor and amplification chain, then the operations are misaligned resulting in an error (glitch like) at the PA output. The delay occurs when the signal travels through the DAC, the mixer, drivers, and PA sections. If the PA sections are turned OFF earlier, the PA gain drops by 6 dB and stays in this condition until the equalized signal reaches the gate of the PA. This scenario is illustrated in Fig. 6(b) where the PA input signal becomes

$$S_{\rm in}(t) = \begin{cases} S_{\rm BB}(t-\tau) S_{\rm RF}(t), & t < t_2\\ 2S_{\rm BB}(t-\tau) S_{\rm RF}(t), & t_2 \le t < t_4\\ S_{\rm BB}(t-\tau) S_{\rm RF}(t), & t_4 \le t < t_6\\ 2S_{\rm BB}(t-\tau) S_{\rm RF}(t), & t_6 \le t < t_8 \end{cases}$$
(2)

where  $S_{BB}(t)$  is the incoming baseband signal. Defining the error signal at the output to be the difference between PA output current with timing errors and ideal output current, then

$$i_{e}(t) = \begin{cases} -\frac{1}{2}G_{m-\text{VPA}}s_{\text{BB}}(t)s_{\text{RF}}(t), & t_{1} \leq t < t_{2} \\ G_{m-\text{VPA}}s_{\text{BB}}(t)s_{\text{RF}}(t), & t_{3} \leq t < t_{4} \\ -\frac{1}{2}G_{m-\text{VPA}}s_{\text{BB}}(t)s_{\text{RF}}(t), & t_{5} \leq t < t_{6} \\ G_{m-\text{VPA}}s_{\text{BB}}(t)s_{\text{RF}}(t), & t_{7} \leq t < t_{8} \\ 0, & \text{otherwise} \end{cases}$$
(3)

with  $G_{m-PA}$  being the transconductance gain of the PA. The resulting error signal is plotted in Fig. 6(b); the RF component is not shown to simplify the plot. In general, the error signal resulting from the timing mismatch would be manifested as the convolution of the signal LSBs with a time delay of  $\tau$  seconds, the MSBs, and a time window of  $\tau$  seconds. For the sake of simplicity, let us denote  $\theta = \omega_{BB}t$ ; then, the third Fourier coefficient of the error signal can be calculated as follows:

$$a_{3} = \left(\frac{G_{m-PA}}{\pi}\right) \left(-\frac{1}{2} \int_{\theta_{1,5}}^{\theta_{1,5}+\theta_{\tau}} \cos\theta \cos 3\theta \ d\theta + \int_{\theta_{3,7}}^{\theta_{3,7}+\theta_{\tau}} \cos\theta \cos 3\theta \ d\theta\right)$$
(4)

where  $\theta_i = \omega_{\text{BB}} t_i$ , i = 1, 2, 3, 4 and  $\theta_{\tau} = \omega_{\text{BB}} \tau$ . Calculating of the integrations and then rearranging the expanded terms, noting from Fig. 6 that  $\theta_1 = \frac{\pi}{3}$ ,  $\theta_3 = \frac{2\pi}{3}$ ,  $\theta_5 = \frac{4\pi}{3}$ ,  $\theta_7 = \frac{5\pi}{3}$ , would lead to

$$a_{3} = \left(\frac{\sqrt{7}G_{m-\text{PA}}}{2\pi}\right) \left(\frac{1}{2}\sin 2\theta_{\tau}\sin\left(2\theta_{\tau}-\phi\right) - \sin\theta_{\tau}\sin\left(\theta_{\tau}+\phi\right)\right)$$
(5)

where  $\phi = \tan^{-1} \left( \frac{1}{3\sqrt{3}} \right) = 0.19$  rad. If we assume that  $\theta_{\tau} \ll \phi$ , then (5) reduces to the simpler yet intuitive result

$$a_3| \approx \left(\frac{\tau}{T_{\rm BB}}\right) G_{m-\rm PA}$$
 (6)

where  $T_{\rm BB}$  is the baseband signal period. Since  $a_1 \approx G_{m-\rm PA}S_{\rm BB-pk}$  in this simplified analysis, the third-order intermodulation distortion due to the timing mismatch IMD<sub>3</sub> is proportional to  $\frac{3\tau}{4T_{\rm BB}}$ . For a baseband signal of 10 MHz ( $T_{\rm BB} = 10^{-7}$  s), the delay error  $\tau$  must be under  $1.3 \times 10^{-9}$  secs to maintain IM3 under -40 dB. Timing mismatches in other 3 PA segments add similar effects and increase the PA sensitivity to time delay mismatches. Even more, in practice the computation is more complicated since the spectral leakage is the result of the convolution of the MSBs used to control  $\phi_1 - \phi_3$  and the



Fig. 7. Timing mismatch effects on ACLR.

signal power of the least significant bits  $S_i(t)$  and a time window of  $\tau$  seconds correlated with the MSBs; notice in (3) and (4) that the magnitude and sign of the windowing is function of the direction of the transition of the MSBs: -1/2 when the MSB transition from 1 to 0 and +1 when transitioning from 0 to 1.

To reduce the nonlinearity caused by the timing mismatch, a delay cell is added to the system to reduce the timing mismatch, as shown in Fig. 5. The delay cell includes a replica of the preamp, but it acts as a digital driver. After fine tuning the size of the delay cell using extensive post-layout simulations, the on-chip delay mismatch was under 100 ps for all segments and under PVT variations. Timing mismatches generate glitches (MSBs and least significant bits are not well aligned as depicted in Fig. 6) that may not significantly degrade the received constellation if properly sampled at the receiver. These effects have more effect on ACLR since these glitches are signal dependent. Extensive simulations in a WCDMA system, where the channel bandwidth is 3.84 MHz, a timing mismatch of 500 ps would result in PA neighbor channel leakage power under  $-40 \,\mathrm{dB}$  as illustrated in Fig. 7. The timing delay block was not manipulated during characterizations.

### IV. PA SYSTEM DESIGN

The critical design of the proposed system is the switching scheme, which is applied to both the PA sections and their drivers. The PA design details are described in this section.

## A. Output Stage Design

Fig. 8 shows the schematic of the PA stage. Cascode configuration is used to improve its reliability. The common-source transistors are standard thin oxide transistors that have lower input capacitance and higher transconductance; the commongate transistors have thick oxide to withstand larger voltage swing. At maximum RF output power, the voltage swing at the drain terminal of the cascode device and the common-source device are 2.5 and 0.75 Vpk, respectively. The transistors are optimized for linearity, and their sizes are also included in Fig. 8. The nominal gate overdrive voltage for the transistors



Fig. 8. Schematic of the PA output stage; the core consists of 1536 replicas.

are  $V_{\rm OV1} = 300 \text{ mV}$  and  $V_{\rm OV2} = 400 \text{ m}$ . At maximum RF output power, the simulated bias current of the output stage and driver stage are 980 and 320 mA, respectively. Maximum RF current is expected at this stage; thus, extra care is needed in the design layout. Multiple pads for the output and ground nodes are used, and the ground pads of this stage are not shared with the remaining parts of the chip. The bondwires are explicitly drawn to indicate that those pads are for the output stage exclusively. The transistors are organized in clusters employing common-centroid techniques to facilitate the connectivity and to minimize transistor mismatches. The PA transistors are dc connected to the PA drivers; thus, no additional switches are required to enable or disable these sections. When M1 transistors are switched ON/OFF, there is a significant common-mode step in current that may produce significant common-mode ringing and up to 1 V common-mode peak variation. To alleviate this issue, a fast class AB CMFB circuit shown in Fig. 9 is allocated at PA output. A couple of single stage amplifiers compare the common-mode output signal and  $V_{\rm DRAIN}$  and drive the class AB amplifiers composed by transistors  $M_{\rm C1}$  and  $M_{C2}$ . These transistors are biased through  $R_B$  and  $V_{B1,2}$  at the unset of subthreshold region to save power. Class AB amplifiers  $M_{C1}$  and  $M_{C2}$  minimize the power consumption but are able to deliver/sink enough instantaneous current reducing the common-mode glitches generated by the transistor's switching.

#### B. PA Drivers

The schematic of the driver stage is shown in Fig. 10. It consists of a differential pair with resistive load, a switch controlled by the control code, and a CMFB loop.  $C_p$  and  $C_{PA}$  in Fig. 10 represent the effective parasitic capacitance at the commonsource node and the output nodes, respectively. Direct coupling between the driver and the PA stages reduces the switching time. When the switch is opened, the driver's output commonmode voltage moves down very quickly, putting the differential pair transistors in the triode region. The common-mode voltage drops, and then breaks the loop during this condition, which helps turning down quickly the preamplifier outputs. When the switch is closed again, the output voltage of the driver moves toward  $V_{DD}$  and is only limited by the time constant  $R_L C_{PA}$ .



Fig. 9. Schematic showing the CMFB circuit allocated at PA output.



Fig. 10. Conceptual schematic of the driver stage.

Since the load resistors  $R_L$  are small (in this case around 30  $\omega$  for the unit-cell driver), the time constant is small, and fast low-to-high transition is achieved. As soon as the common-mode level exceeds the reference voltage, the loop tries to reach its steady state; then, settling time of the CMFB is function of the loop properties. Therefore, the use of fast CMFB is a must.

Open-loop gain, closed-loop bandwidth, and stability are all important parameters to be considered when designing the CMFB loop. Simulation results of the common-mode voltage as the switching takes action are illustrated in Fig. 11. The common-mode voltage moves very quickly until 400 mV is reached because the loop is still broken due to the lack of current in M4. The knee during the rising transition is due to the fact that the fast voltage variation at the drain of M3 put them in a saturation region, allowing the generation of instantaneous discharging current until the parasitic capacitor  $C_p$  can get charged. Then, the drain current of M3 reduces again and



Fig. 11. Simulation results of common-mode voltage transient response.



Fig. 12. Two-section impedance matching network.

the common-mode voltage rises very fast again until reaching its steady-state condition. The 1% settling time under the worst-process corner is less than 8 ns, which means that even if the baseband signal bandwidth is 10 MHz, the switching process would only take 8% of the signal time period in the worst case. The generation of instantaneous discharging current until the parasitic capacitor  $C_p$  can get charged. Then, the drain current of M3 reduces again and the common-mode voltage rises very fast again until reaching its steady-state condition. The 1% settling time under the worst-process corner is less than 8 ns, which means that even if the baseband signal bandwidth is 10 MHz, the switching process would only take 8% of the signal time period in the worst case. The common-mode settling time issues arise during the transition time of the incoming data, generating data-dependent glitches that may degrade the ACLR and EVM figures.

When all the PA and driver sections are active, the simulated power gain of the driver stage is about 18 dB, whereas the power gain of the output stage is around 20 dB.

# C. Output Impedance Matching Design

For the output impedance matching circuit, a multisection network was implemented. Fig. 12 shows a half representation of the matching network.  $C_D$  and  $L_{\rm bnd}$  stand for the drain capacitance and the bondwire inductance, respectively. The parasitic capacitance at the package on the PCB is accounted for in  $C_1$ . The transmission line with a length d and characteristic impedance  $Z_0$  is formed by a microstrip line consisting of the PCB trace and the ground plane underneath.  $R_L$  represents the input impedance of a balun, which is 25  $\Omega$  for a half circuit. The optimal PA load impedance  $R_T$  is determined by maximum linear output power design specification. The load pull simulations further allow us to optimize the choice of  $R_T$ . The RF choke inductor is off-chip and not shown in Fig. 12. Its value is chosen such that at RF, it is seen as a high impedance to the PA, while at the switching frequency of the switching regulator, it is seen as a low impedance to the switching regulator. The matching

TABLE I IMPEDANCE MATCHING NETWORK COMPONENT VALUES



Fig. 13. Insertion loss simulation with process variations.



Fig. 14. Transient simulation results. (a) Input signal before and after digital prewarping (top trace). (b) Output signal at drain voltage (middle trace). (c) Output signal after impedance matching network.

network component values can be determined by hand calculations, the Smith chart, or existing software packages. The summary of the component values is given in Table I.

Two important design specifications for the output matching network are bandwidth and insertion loss. The matching circuit used in the proposed system is effectively a multisection design, and its bandwidth is sufficient for WCDMA applications. To ensure robustness, the insertion loss of the output matching is simulated under process variations, as shown in Fig. 13. The worst case of the simulated insertion loss is around 1 dB when all component values are shrunk by 30%. However, this is the less likely case, since nonidealities usually result in additional parasitic components, making the effective component values larger. If all component values increase by 30%, the insertion loss is simulated to be only 0.2 dB.

The Q of the bondwire inductance was assumed to be 50 in simulations. Both the change in Q and the value of the bondwire inductance affect the output matching network. This



Fig. 15. Microphotograph of the chip.



Fig. 16. Measured gain, output power, and PAE as a function of input at 1.9 GHz.



Fig. 17. Simulation and measurement results of PA's S22 when the control bits are (a) 111; (b) 011; (c) 001; and (d) 000.

effect manifests itself in higher insertion loss at the frequency of interest. The insertion loss at 1.9 GHz was simulated with various Q and L values of the bondwire inductance across all four modes of operations. Under extreme conditions, i.e.,



Fig. 18. ACLR measured at maximum output power of 31 dBm.



Fig. 19. SEM measured at maximum output power of 31 dBm.

inductance increased by 25% and Q = 30, the insertion loss barely exceeded 1 dB.

As the PA switches ON and OFF different transistor sections, the output impedance of the transistor changes. However, the transistor is in either an active region or a cutoff region, and the drain capacitance is mainly due to the depletion region capacitance between the drain and the substrate plus the gate-drain overlap capacitance. To test if the impedance matching network works properly in each switching scenario, a 2 MHz sinusoidal baseband signal modulated to the carrier frequency is applied to the system. The transient waveforms are shown in Fig. 14. The top plot shows the original sinusoidal baseband signal, along with a predistorted baseband that is to be input to the PA. As shown in Fig. 14, the impedance matching network functions properly at all switching scenarios. The peak differential voltage amplitude before and after the impedance



Fig. 20. ACLR as a function of maximum output power.



Fig. 21. EVM as a function of maximum output power.



Fig. 22. Phase error as a function of maximum output power.

matching network are 3.1 and 10.7 V, respectively. The voltage transformation ratio of 3.45 thus implies an impedance transformation ratio of 11.9, as desired ( $Z_L/Z_T = 50/4.5 = 11.1$ ). Notice that if the mismatches in segmented PA are small, the ac current delivered to the matching network is smooth for the entire power range. In practice, some glitches are present when switching between segments mainly due to the unavoidable parasitic capacitors and timing offsets.

# V. MEASUREMENT RESULTS

The PA was fabricated in a TSMC 40 nm CMOS process, and Fig. 15 shows the microphotograph of the chip. The chip area is approximately 2.88 mm<sup>2</sup>. A single-tone continuous-wave (CW) signal of 1.9 GHz was applied to characterize the PA in all four operation modes. Fig. 16 shows the measured gain, output

|   | Reference | Frequency | PSAT/PAE  | VDD | CMOS | Size               | Number of | PAE increase at PBO |       |       |
|---|-----------|-----------|-----------|-----|------|--------------------|-----------|---------------------|-------|-------|
|   |           | (GHz)     | (dBm/%)   | (V) | (nm) | (mm <sup>2</sup> ) | modes     | 7 dB                | 10 dB | 15 dB |
| Ĩ | [12]      | 2.4       | 23.1/42   | 1.5 | 130  | 5.48               | 2         | 3.3                 | 4.3   | 3.6   |
|   | [13]      | 2.4       | 27/32     | 1.2 | 130  | 2                  | 2         | 5.4                 | 4.0   | 3.4   |
|   | [16]      | 2.4       | 23.1/42   | 3.3 | 180  | 0.88               | 2         | 8.7                 | 8.7   | 7.2   |
|   | [17]      | 2         | 23/38     | 2.5 | 250  | 2.48               | 3         | 10                  | N/A   | N/A   |
|   | [18]      | 2.45      | 31.5/25   | 3.3 | 65   | 2.7                | 3         | 10                  | 5     | N/A   |
|   | [23]      | 2.45      | 26.3/33   | 2   | 90   | 1.88               | 2         | 9                   | 7     | 3     |
|   | [24]      | 2.2       | 43        | 1.2 | 65   | 6.25               | 2         | 6.2                 | 4     | 2.1   |
|   | This work | 1.9       | 35.3/44.9 | 2.5 | 40   | 4                  | 4         | 13                  | 7.36  | 9     |
|   |           |           |           |     |      |                    |           |                     |       |       |

TABLE II Comparison With Recent Publications

power, and power-added efficiency (PAE) as a function of the input power. The PCB and cable losses are de-embedded in the performance. The PA's output  $P_{1\,dB}$  and  $P_{SAT}$  are measured as 31 and 35 dBm, respectively. The average power gain is 38 dB, and the PAEs at  $P_{1\,dB}$  and  $P_{SAT}$  are 28.8% and 44.9%, respectively. As a comparison, the PAE of the PA without the proposed power efficiency improvement techniques was measured, and displayed in Fig. 16 as the dashed curve. The PAE improvement in the PBO region is apparent. For instance, at 20 dB back-off from  $P_{SAT}$ , PAEs of the PA with and without segmentation are 21.3% and 8.1%, respectively. If required, more segments can be added to improve PA power efficiency at higher power levels.

The S22 of the PA in each mode of operation is simulated and measured as shown in Fig. 17. Although there is some mismatch due to nonidealities, it is manageable, and can be optimized by tweaking the output matching network component values. Note that both simulation and measurement show that S22 does not vary much across different modes of PA operation. Due to the cascode topology of the output stage, PA's output resistance is kept large as compared with the transformed RT; therefore, the variation in PA output resistance is absorbed by the output matching network.

A WCDMA baseband signal that is compliant with the 3GPP standard [22] was generated, preprocessed in digital, and up-converted to 1.9 GHz with a bandwidth of 3.84 MHz. According to [22], the adjacent channel leakage ratio (ACLR) at  $\pm 5$  MHz should be kept below -33 dBc for cellular handsets to comply with the standards. The measured output power spectrum is shown in Fig. 18, with the PA under test transmitting a maximum linear power of 31 dBm. Data analysis from the spectrum analyzer shows that the ACLR at a maximum power of 31 dBm is -35.8 dBc. The spectrum emission mask (SEM) measurement was carried out, and the result is shown in Fig. 19. Under maximum output power condition, the PA meets the 3 GPP SEM specifications.

With a fixed set of switching thresholds for the PA and the switching regulator, the ACLR as a function of maximum output power was measured, and the result is shown in Fig. 20. Contrary to the classic PA cases where amplifier's linearity improves at low power, the linearity of the proposed architecture was compromised at the PBO region due to the digital amplification. The PA, however, still met the required specifications. This is because the PA transistors work close to their maximum power capacity most of the time. These results show that the proposed architecture has a good balance between power efficiency and linearity. Another linearity figure of merit is the error vector magnitude (EVM). For 3G WCDMA, the specification for EVM is less than -15 dB (17%). The EVM as a function of the maximum output power of the PA is shown in Fig. 21. At a maximum output power of 31 dBm, the EVM is -21 dB (8.9%).

The phase error is measured as an indication of the PA's AM/PM nonlinearity. The phase error as a function of the output power is shown in Fig. 22. The phase error is under 2.5% up to 35 dBm output power. Recently, reported linear PAs with segmentation technique to improve PAE were compared with the proposed PA in Table II. The proposed PA achieved a remarkable peak PAE as well as outstanding  $P_{\text{SAT}}$  and marks at PBO regions. Such an improvement was achieved by the combination of segmentation and proposed digital predistortion technique. Moreover, the proposed PA enables switching between different modes within a very short time frame, which is the first to report such a feature, to the author's best knowledge.

#### VI. CONCLUSION

A 1.9 GHz segmented linear PA was designed and implemented in 40 nm CMOS technology. The input signal is segmented and strategically amplified in the digital domain, while the PA is segmented and its segments are properly manipulated to maintain its power gain invariant with voltage while achieving significant power savings. The architecture emulates the operation of the conventional class-B amplifier, thus achieving similar power efficiency. However, the fact that the PA drivers are made switchable, then this architecture may result in better power efficiency. The PA achieved a saturated/maximum linear output power of 35/31 dBm with corresponding peak PAE's of 44.9% and 28.8%, respectively. A fast yet efficient switching scheme that employs direct coupling between PA sections and drivers was demonstrated, which enabled the PA to improve efficiency in the PBO region within a wideband communication standard. The architecture can be combined with envelope tracking techniques to achieve better power efficiency figures. The proposed techniques are general and can be used in other PA architectures as well.

#### ACKNOWLEDGMENT

The authors would like to thank TSMC for chip fabrication and NVIDIA Corporation for their help in testing the PA.

#### REFERENCES

- L. Larson, "RF and microwave hardware challenges for future radio spectrum access," *Proc. IEEE*, vol. 102, no. 3, pp. 321–333, Mar. 2014.
- [2] K. Okada et al., "A 64-QAM 60 GHz CMOS transceiver with 4-channel bonding," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers* (*ISSCC'14*), Feb. 2014, pp. 346–347.
- [3] M. Ebrahimi, M. Helaoui, and F. Ghannouchi, "Delta-sigma-based transmitters: Advantages and disadvantages," *IEEE Microw. Mag.*, vol. 14, no. 1, pp. 68–78, Jan. 2013.
- [4] E. Kaymaksut and P. Reynaert, "A dual-mode transformer-based Doherty LTE power amplifier in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC'14)*, Feb. 2014, pp. 64–65.
- [5] W.-Y. Kim, H. Son, J. Kim, J. Jang, I. Oh, and C. Park, "A CMOS envelope-tracking transmitter with an on-chip common-gate voltage modulation linearizer," *IEEE Microw. Wireless Compon. Lett.*, vol. 24, no. 6, pp. 406–408, Jun. 2014.
- [6] K. Oishi et al., "A 1.95 GHz fully integrated envelope elimination and restoration CMOS power amplifier with envelope/phase generator and timing aligner for WCDMA and LTE," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC'14)*, Feb. 2014, pp. 60–61.
- [7] G. Hanington, P.-F. Chen, P. Asbeck, and L. Larson, "High-efficiency power amplifier using dynamic power-supply voltage for CDMA applications," *IEEE Trans. Microw. Theory Techn.*, vol. 47, no. 8, pp. 1471–1476, Aug. 1999.
- [8] J. Staudinger *et al.*, "High efficiency CDMA RF power amplifier using dynamic envelope tracking technique," in *Proc. IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2000, vol. 2, pp. 873–876.
- [9] B. Sahu and G. Rincon-Mora, "A high-efficiency linear RF power amplifier with a power-tracking dynamically adaptive buck-boost supply," *IEEE Trans. Microw. Theory Techn.*, vol. 52, no. 1, pp. 112–120, Jan. 2004.
- [10] I. Rippke, J. Duster, and K. Kornegay, "A single-chip variable supply voltage power amplifier," in *Proc. IEEE Radio Freq. Integr. Circuits* (*RFIC*) Symp. Dig. Papers, Jun. 2005, pp. 255–258.
- [11] A. Shirvani, D. Su, and B. Wooley, "A CMOS RF power amplifier with parallel amplification for efficient power control," *IEEE J. Solid-State Circuits*, vol. 37, no. 6, pp. 684–693, Jun. 2002.
- [12] P. Reynaert and M. S. Steyaert, "A 2.45-GHz 0.13-μm CMOS PA with parallel amplification," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 551–562, Mar. 2007.
- [13] G. Liu, P. Haldi, T.-J. K. Liu, and A. Niknejad, "Fully integrated CMOS power amplifier with efficiency enhancement at power back-off," *IEEE J. Solid-State Circuits*, vol. 43, no. 3, pp. 600–609, Mar. 2008.
- [14] D. Chowdhury, C. Hull, O. Degani, Y. Wang, and A. Niknejad, "A fully integrated dual-mode highly linear 2.4 GHz CMOS power amplifier for 4G WiMax applications," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3393–3402, Dec. 2009.
- [15] J. Kim *et al.*, "A linear multi-mode CMOS power amplifier with discrete resizing and concurrent power combining structure," *IEEE J. Solid-State Circuits*, vol. 46, no. 5, pp. 1034–1048, May 2011.
- [16] Y. Yoon *et al.*, "A dual-mode CMOS RF power amplifier with integrated tunable matching network," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 1, pp. 77–88, Jan. 2012.
- [17] H. Hedayati *et al.*, "A 2-GHz highly linear efficient dual-mode BiCMOS power amplifier using a reconfigurable matching network," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2385–2404, Oct. 2012.
- [18] A. Afsahi and L. Larson, "Monolithic power-combining techniques for watt-level 2.4-GHz CMOS power amplifiers for WLAN applications," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 3, pp. 1247–1260, Mar. 2013.
- [19] H. Qian and J. Silva-Martinez, "A 44.9% PAE digitally-assisted linear power amplifier in 40 nm CMOS," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC'14)*, Nov. 2014, pp. 349–352.
- [20] A. Niknejad, D. Chowdhury, and J. Chen, "Design of CMOS power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 6, pp. 1784– 1796, Jun. 2012.
- [21] P. T. M. van Zeijl and M. Collados, "A digital envelope modulator for a WLAN OFDM polar transmitter in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 10, pp. 2204–2211, Oct. 2007.
- [22] 3GPP, "3rd Generation Partnership Project," Rev. 12.3.0, 3GPP TS 25.101 Technical Specification, Mar. 2014 [Online]. Available: http://www.3gpp.org
- [23] E. Kaymaksut and P. Reynaert, "Transformer-based uneven doherty power amplifier in 90 nm CMOS for WLAN applications," *IEEE J. Solid-State Circuits*, vol. 47, no. 7, pp. 1659–1671, Oct. 2012.

[24] L. Ye, J Chen, L Kong, P Cathelin, E Alon, and A Niknejad, "A digitally modulated 2.4 GHz WLAN transmitter with integrated phase path and dynamic load modulation in 65 nm CMOS," in *Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC'13)*, Feb. 2013, pp. 330–331.



**Haoyu Qian** (S'12) received the B.S. degree in physics from Peking University, Beijing, China, in 2006, and the M.S. degree in physics from Texas A&M University, College Station, TX, USA, in 2009, where he is currently pursuing the Ph.D. degree in electrical engineering.

He was an RFIC Design Intern with CSR (formerly Microtone), Plano, TX, USA, in 2011, working on PLL-based frequency synthesizer design for TV tuner applications. He has been with Qualcomm, San Diego, CA, USA, since 2014, working on RF front-

end IC design. His research interests include voltage regulator integrated circuit design such as low drop-out (LDO) regulator and switching-mode voltage regulator, and RF integrated circuit design such as frequency synthesizer, mixer, VCO, RF switch, RF filter, and power amplifier.



**Qiyuan Liu** (S'14) was born in Liaoyang, China. He received the B.S. degree in microelectronics from Tianjin University, Tianjin, China, in 2011. He has been pursuing the Ph.D. degree in electrical engineering at Texas A&M University, College Station, TX, USA, since 2011.

In Spring and Summer 2013, he was an Analog IC Design Intern with Broadcom Corporation, Irvine, CA, USA, where he worked on phase locked loop (PLL) design. During Summer 2014, he was with TSMC, Austin, TX, USA, working on sloping

analog-to-digital converter (ADC) design for image sensing applications. His research interests include data converters, image sensing interfaces, and power amplifiers.



**Jose Silva-Martinez** (SM'98–F'10) was born in Tecamachalco, México. He received the M.Sc. degree in electrical engineering from the Instituto Nacional de Astrofísica Optica y Electrónica (INAOE), Puebla, México, in 1981, and the Ph.D. degree in electrical engineering from Katholieke Univesiteit Leuven, Leuven, Belgium, in 1992.

In 1993, he joined the Electronics Department, INAOE, and from May 1995 to December 1998, he was the Head of the Electronics Department. He was a Co-Founder of the Ph.D. program on Electronics

in 1993. He is currently with the Department of Electrical and Computer Engineering, Texas A&M University (TAMU), College Station, TX, USA, where he holds the position of Texas Instruments Professor. He is currently serving as an Associate Department Head for Graduate Student Affairs with the Department of Electrical and Computer Engineering, TAMU. He has authored over 110 and 170 journal and conference papers, respectively, 2 books and 12 book chapters, 1 granted patent, and 5 more filed. His research interests include design and fabrication of integrated circuits for communication, radar and biomedical applications.

Dr. Silva-Martinez is serving as the (2014–2015) Editor-in-Chief of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM II and the Conference Chair of MWCAS-2014, a member of the DLP program of CASS 2013–2014 and a Senior Editorial Board member of the IEEE JETCAS 2014–2015. He has served as the IEEE CASS Vice President Region-9 (1997–1998), and as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 1997 to 1998 and 2002 to 2003, an Associate Editor of the IEEE TCAS I 2004–2005 and since 2007, and currently serves in the board of editors of three other major journals. He was the recipient of the 2005 Outstanding Professor Award by the ECE Department, Texas A&M University, coauthor of the papers that received the MWCAS 2011 and RF-IC 2005 Best Student Paper Awards, and the 1990 European Solid-State Circuits Conference Best Paper Award.



**Sebastian Hoyos** (M'06–SM'15) received the B.S. degree from Pontificia Universidad Javeriana (PUJ), Bogota, Colombia, in 2000, and the M.S. and Ph.D. degrees from the University of Delaware, Newark, DE, USA, in 2002 and 2004, respectively, all in electrical engineering.

He was with Lucent Technologies Inc., Bogota, Colombia, from 1999 to 2000, for the Andean region in South America. He was a Lecturer with PUJ, where he lectured on microelectronics and control theory. He was with PMC-Sierra Inc., Sunnyvale,

CA, USA, the Delaware Research Partnership Program, and the Army Research Laboratory Collaborative Technology Alliance in Communications and Networks. He was a Postdoctoral Researcher (2004–2006) with the Department of Electrical Engineering and Computer Sciences, Berkeley Wireless Research Center, University of California, Berkeley, CA, USA. He joined Texas A&M University, College Station, TX, USA, in 2006, where he is currently an Associate Professor with the Department of Electrical and Computer Engineering. His research interests include telecommunication systems, digital signal processing, and analog and mixed-signal processing and circuit design.