# Synchronous Ultra-High-Density 2RW Dual-Port 8T-SRAM With Circumvention of Simultaneous Common-Row-Access

Koji Nii, Member, IEEE, Yasumasa Tsukamoto, Makoto Yabuuchi, Yasuhiro Masuda, Susumu Imaoka, Keiichi Usui, Shigeki Ohbayashi, Hiroshi Makino, Member, IEEE, and Hirofumi Shinohara

Abstract—We propose an access scheme for a synchronous dualport (DP) SRAM that minimizes the 8T-DP-cell area and maintains cell stability. A priority row decoder circuit and shifted bitline access scheme eliminates access conflict issues. Using 65 nm CMOS technology (hp90) with the proposed scheme, we fabricated 32 kB DP-SRAM macros. We obtained a 0.71  $\mu$ m<sup>2</sup> 8T-DP-cell for which the cell size is only 1.44× larger than a 6T-single-port (SP)cell. The bit-density of the fabricated 32 kB DP-RAM macro is 667 kbit/mm<sup>2</sup>, which is 25% larger than a conventional 8T SRAM. The standby leakage is 27% less because of the small drive-NMOS transistor of the proposed 8T-DP-cell.

*Index Terms*—CMOS, dual-port, embedded SRAM, high density, low power, low voltage, memory, 65 nm, stability, two-port, variability.

#### I. INTRODUCTION

**I** N DEEP submicron technology, System-on-Chip (SoC) products require a high-speed and low-power embedded memory to support increased storage capability. Typically, the static random access memory (SRAM) has been widely used for SoC products. So far, the most part of the embedded memory is single-port SRAM, which has one access port for reading and writing operations, while the demands for multi-port SRAM continue to increase to accommodate high-speed communications and image processing. The multi-port SRAM is suitable for parallel operation and improves the total chip performance [1]–[11].

Underlying the trend is the fact that SRAMs face limitations in terms of power dissipation through increasing the clock frequency to improve the performance of SoCs as a technology advancement. Accordingly, the system architecture has moved to parallel operations to increase the practical computation speed through increased parallel processing rather than increasing the clock frequency. Many reports have described high-performance and low-power multi-core processors that

K. Usui is with Daioh Electric Co. Ltd., Itami, Hyogo 664-0002, Japan.

H. Makino is with the Osaka Institute of Technology, Hirakata, Osaka 573-0196, Japan.

Digital Object Identifier 10.1109/JSSC.2009.2013766

have plural CPUs within a die. The number of memory accesses increases considerably, so the memory access speed becomes a system bottleneck. That fact creates increasing demand for a multi-port SRAM that can access from plural ports simultaneously.

Although the memory access speed (the number of clock cycles) improves with increasing number of access ports of the SRAM, its area penalty also increases with the number of ports. Consequently, a multi-port SRAM with more than three access ports has low capability on a die; it is used particularly for highspeed register files in a data path [8]–[10] or as buffer memory for a video image processor engine [11], etc. Alternatively a dual-port SRAM with two access ports is frequently used for recent SoC chips with large capability as well as SP-SRAM. For example, it is used as buffer memory in multimedia applications [5] or a data cache in a multi-core processor [6], [7]. From the point of view described above, the embedded DP-SRAM is an essential IP block and tends to increase its capability.

In this paper, we briefly demonstrate that the embedded dual-port SRAM can increase the internal memory access speed. Fig. 1 presents simple block diagrams and timing charts of the memory access. Fig. 1(a) portrays the case of sequential memory access using a typical single-port (SP)-SRAM block, whereas Fig. 1(b) depicts the case of parallel memory access by a dual-port SRAM block. In Fig. 1(a), two functional units (UNIT-A, UNIT-B) must access SP-SRAM in series through the internal data bus because there is only one-port accessibility. Consequently, two clock cycles are required if each UNIT accesses the SRAM once. On the other hand, both UNIT-A and UNIT-B can access a DP-SRAM block simultaneously within a cycle. Thereby, the parallel memory access can increase the memory access speed in relation to sequential memory access.

By increasing the DP-SRAM capability, the occupation of a chip increases, so that a higher density of the DP-SRAM is strongly required. In general, the unit-cell size of the dual-port SRAM is about twice as large as that of single-port SRAM to date. Although the area penalty has been reduced by the new layout structure, it is still  $1.63 \times$  larger than the SP-cell [12]. This is the reason why the unit-cell of DP-SRAM has eight transistors while that of SP-SRAM has six transistors. In addition, some transistors must expand the gate channel length and width to maintain the cell stability and access speed. This expansion of the transistors in a unit-cell is considered to be the inherently worst-case design of the DP-SRAM when possible simultaneous access from both ports occurs.

Manuscript received December 21, 2007; revised December 17, 2008. Current version published February 25, 2009.

K. Nii, Y. Tsukamoto, M. Yabuuchi, S. Ohbayashi and H. Shinohara are with Renesas Technology Corporation, Itami, Hyogo 664-0005, Japan (e-mail: nii. koji@renesas.com).

Y. Masuda and S. Imaoka are with Renesas Design Corporation, Itami, Hyogo 664-0005, Japan.



(b) Parallel memory access

Fig. 1. System block diagrams and timing charts of the memory access: (a) sequential memory access, (b) parallel memory access.

For this study, we propose a priority row decoder and shifted bitline (BL) access scheme for synchronous DP-SRAM. In addition, we introduce the physical layout of the 8T-DP-cell, which has been contrived to reduce its area. The local and global variations of threshold voltage ( $V_{\rm th}$ ) are well considered to determine the unit-cell transistors. This approach engenders no access penalty and the smallest memory cell size ever reported in a 65 nm technology [13]. This circumventive scheme must operate with a common internal clock, as shown Fig. 1. The proposed scheme cannot be adopted if both clocks have asynchronous frequencies which are mutually independent. In the case in which both clock phases are synchronized, however, this scheme is available for use even if the clocks are not exactly the same frequency [3].

This paper is organized as follows. In Section II, we first discuss the access conflict issues related to the dual-port SRAM. In the subsequent section, we introduce the proposed circumventive common-row-access scheme to reduce the cell size while maintaining the read-stability, write-ability, and access speed. In Section IV, we explain the design of the high-density 8T-SRAM cells with discussion of the cell stability by SPICE simulation. We also present evaluation results of our test chips fabricated on 65 nm CMOS technology in Section V. A brief summary is given in Section VI.

#### II. ACCESS CONFLICT ISSUE OF DUAL-PORT SRAM

Fig. 2 shows memory cell circuits for single-port and dual-port SRAM. The standard single-port SRAM cell shown in Fig. 2(a) comprises six transistors: two pull-up PMOSs (load-PMOS), two pull-down NMOSs (drive-NMOS), and two transfer NMOSs (access-NMOS). The single-port SRAM realizes either read-operation or write-operation, so that its operation is often denoted as "1RW". Normally, as shown in Fig. 2(b) and (c), two major types of memory cells are used for the dual-port SRAM. Although both memory cells have eight transistors in common, their function differs greatly. Fig. 2(b) portrays the one-read/one-write (1R1W) type DP-SRAM cell, in which only one of the two ports is allowed for read operation [14], [15]. This 1R1W memory cell has stable read



Fig. 2. SRAM memory cell circuits.

operation, though its single-ended read-bitline (RBL) structure might have an impact of access-time degradation unfortunately because of the large amplitude RBL swing. Fig. 2(c) shows the two read-write (2RW) type of 8T-SRAM memory cells corresponding to Fig. 1. In this type of dual-port memory cell, both ports are available for reading and writing, which indicates that the 2RW type of memory cell can also operate as a 1R1W, although the 1R1W type of memory cell cannot operate as a 2RW. In this way, the 2RW type of 8T-DP-cell has more access-flexibility. Hereafter, we specifically address this type of DP-SRAM in this study.

Fig. 3 shows the variety of the access situations of the 2RW dual-port SRAM when both ports are enabled simultaneously. We simply show the memory cell array with activated 8T-cells, wordlines (WLA, WLB), and bitlines (BLA, BLB). The buffers of both sides of memory cell array designate the addressed WL drivers of both ports. Fig. 3(a) depicts a situation in which the different row and column are accessed from both ports designated independently by each address input. Fig. 3(b) shows the different row and common column access situation. These



Fig. 3. Assortment of the access modes of the dual-port SRAM.



Fig. 4. Butterfly curves and static noise margin of the DP-8T-cell for both common row access and different row access.

two situations have no issues in terms of the access conflict of both ports because the selected each memory cell, of which either WLA or WLB is activated, operates as a single port access. Fig. 3(c) and (d) respectively show the common row and different column access, and the common row and common column access. In these common row access situations, we must take care of the cell stability as a worst case for reading because the enabled two wordlines affect the static noise margin (SNM) degradation for all memory cells along with the selected row. Both ports operate as reading; also, one port operates as writing or both ports operate as writing. Therefore, the write-ability is also considered as a worst case of the selected memory cell. The read-stability is still considered in writing operations because the half-selected (selected row and unselected column) memory cells are equal to reading situations even if one or both ports is performing a writing operation. In general, absolutely consistent address access for a writing operation from each port, as shown in Fig. 3(d), is inhibited because of the abnormal leakage current flows in the accessed memory cell if the writing data are different (namely opposite data) from both ports. Still, the simultaneous reading operation or reading and writing operations from both ports is frequently required from the system. Therefore, the conventional DP-SRAM design must satisfy such a worst-case access situation: the size of 8T-DP-cell necessarily becomes large because of increasing gate width of drive-NMOS transistors to improve the cell stability.

Fig. 4 shows simulated butterfly curves of the SNM for the 8T-DP-cell. As described earlier, the 8T-DP-cell has two different SNM values depending on the access situation: one is a common access situation in which two wordlines (WLs) within the same row are selected; the other is a different access situation in which two WLs in two different rows are selected. In the common access situation shown in Fig. 2(c) and (d), both WLs are activated, so that the electrical  $\beta$  ratio of the 8T-DP-cell is expressed as  $\beta_{\text{ND1}}/(\beta_{\text{NA1}} + \beta_{\text{NA2}})$ . Here,  $\beta_{\text{ND1}}$ ,  $\beta_{\text{NA1}}$ , and



Fig. 5. Concept of proposed circumventing simultaneous common-row-access.

 $\beta_{\rm NA2}$  respectively indicate the coefficients of source-drain currents of the drive-NMOS transistor, the access-NMOS for the A-port, and the access-NMOS for the B-port. On the other hand, as for the different access situation, the corresponding  $\beta$  ratio becomes  $\beta_{\rm ND1}/\beta_{\rm NA1}$  or  $\beta_{\rm ND1}/\beta_{\rm NA2}$  because of single activation of the WL. In general, a lower  $\beta$  ratio reduces the read-stability, SNM, which indicates that we should discuss the SNM in common access situation for the worst-case design of the 8T-DP-cell.

# III. CIRCUMVENTING ACCESS SCHEME OF SIMULTANEOUS COMMON ROW ACTIVATION

Fig. 5 presents the fundamental concept of our DP-SRAM access scheme. For convenience, we define that port A connected to the pair of BLA and /BLA is primary, whereas the port B connected to that of BLB and /BLB is secondary. In the secondary



Fig. 6. Block diagram and timing chart of the proposed access scheme.



Fig. 7. Circuit of the row-address comparator (RAC).

port B, we introduce the row address comparator (RAC) and the bitline shifter. Fig. 6 expresses more detailed operations depending on the access mode. The implemented circuitry in our test chip design is portrayed in Figs. 7 and 8. Fig. 6(a) shows that the address input signal AA $\langle \rangle$  activates WLA in the *m*th row (WLAm), whereas the AB $\langle \rangle$  activates WLB in the *n*th row (WLBn), which means a different access mode. In this condition, the RAC is designed to the output "H" level so that the DP-SRAM as a whole should realize a standard read or write operation. Once the AA $\langle \rangle$  and the AB $\langle \rangle$  select the WLs in a common row, as shown in Fig. 6(b), the row decoder for port B is disenabled because of the RAC. Consequently, only the WLAn is accessible to the memory cell. Simultaneously, the "L" level generated by the RAC (see also Fig. 7) modifies the connection of secondary port B from the pair of BLB to that of BLA, making it possible to read data stably without SNM degradation. In other words, this scheme circumvents the common access mode, so that it is possible to reduce the drive-NMOS transistor width, which directly contributes to the reduction of the DP-SRAM unit cell area. In addition, this circuitry we proposed has a strong effect on the write operation. In fact, the common access mode becomes a critical problem in the write operation because the read operation takes place in unselected columns, which means that the data to be stored might be flipped during writing. However, our scheme keeps the WLBs at "L" levels as well as in the write operation. For that reason, whenever the common access mode occurs, we can safely avoid this type of error. In this way, we can circumvent the fatal risk associated with the specific operation in the DP-SRAM. Furthermore, it is noteworthy that the introduction of the additional circuitry is compensated by the reduction of the cell area of a unit DP-SRAM.

# IV. 8T-DUAL-PORT CELL DESIGN

#### A. Scaling Trend of Memory Cell Sizes

Fig. 9 shows scaling trends of embedded SRAM cell size of 6T-SRAM (for a 1RW single-port) and 8T-SRAM (for a 2RW dual-port). The cell size of 6T-SRAM shrinks by half as one technology node advances. Conventionally, the 8T-DP-cell sizes were more than two times larger than 6T-SP-cell sizes until 130 nm technology. In our previous work, the new elongated 8T-DP-cell layout was proposed; its cell size was 2.04  $\mu$ m<sup>2</sup>, which is only 1.63 times larger than 6T-SP-cell of the 1.25  $\mu$ m<sup>2</sup> in 90 nm technology [12], [16]. According to the scaling trend, both the 6T-SP-cell and 8T-DP-cell sizes become approximately half, which are 0.61  $\mu$ m<sup>2</sup> and 0.99  $\mu$ m<sup>2</sup> respectively in 65 nm technology with the same layout topology [17]. In this work, we apply the new access scheme described in Section III to achieve a smaller cell beyond the scaling trend. In



Fig. 8. Circuit of the bitline shifter for secondary port.



Fig. 9. Scaling trend of the SRAM memory cell size.

addition, it helps aggressive shrinkage that improving the printability of the cell layout is adopted from the design for manufacturability (DFM) point of view such as regular polygons of active diffusion and poly-silicon gates. As a result, the proposed thin 8T-DP-cell size is 0.71  $\mu$ m<sup>2</sup>, which is 30% smaller than a normal 8T-DP-cell and is only 1.44× the cell size of an advanced high-density 6T-SP-cell [13].

## B. Contrived 8T-SRAM Cell Layout

Below 100 nm technology, the major memory cell layout of 6T-SRAM becomes the wide and thin rectangle type, which includes two well-bounded regions. Extending the same layout topology, the conventional 2RW type of an 8T-SRAM cell layout [12] was also a thin rectangle type similar to a 6T-SRAM cell. The proposed high-density 8T-DP-cell layout is based on these wide and thin rectangle types. Figs. 10 and 11 show the layout and an SEM image of the proposed 8T-DP-cell using our 65 nm LSTP CMOS technology. As well as conventional 8T-DP-cell, four shared contacts of tungsten plugs connect the poly-silicon gate and diffusion region directly to achieve a smaller cell size. In terms of front-end-of-line (FEOL), we can shrink the cell width (x direction) aggressively because the transistor width of drive-NMOS transistor can be reduced to about half that of the normal cell. Regarding the back-end-of-line (BEOL), however, no scaling down occurs in the x direction because the second metal tracks consist of BL pairs, WL islands, and power line are almost completely occupied even for a conventional 8T-DP-cell.

To resolve this BEOL bottleneck, we change the layers of BLs, WLs, and the power-line to upper layers in each, as shown in Fig. 10(b). The BLs and power-line run with the third metal layer in vertical direction and the WLs run with the fourth metal layer in the horizontal direction. The ground-line maintains a second metal layer, but it is connected directly with both sides in each cell in a zigzag wire, as in a snake pattern. As a result, the required second metal tracks are reduced to seven from nine; the cell width is then determined by FEOL, not BEOL.

In our design, the electrical  $\beta$  ratio is reduced to one, which minimizes the 8T-DP-cell width in x directions, i.e., the  $\beta_{\text{ND1}} = \beta_{\text{NA1}} = \beta_{\text{NA2}}$ , which is the same ratio as that of the 6T-SP-cell [17]. For that reason, the regions of n-type active diffusions and poly-silicon gates become a straight polygon pattern, which presents advantage from the DFM point of view. It is lithographically friendly or robust against misalignment of mask steps because of reduction of the corner round shapes. Therefore, the minimum dimensions of FEOL can be reduced aggressively with little impact on the yield loss. The dimensions of each cell are summarized in Table I. Regarding concerns about the read-stability attributable to the small electrical  $\beta$  ratio, we discuss that topic in the following two sub-sections.

|                                |                                        | 90 nm                |                      | 65 nm               |                      | Advanced 65 nm         |                      |
|--------------------------------|----------------------------------------|----------------------|----------------------|---------------------|----------------------|------------------------|----------------------|
|                                |                                        | 8T (DP)<br>Ref. [12] | 6T (SP)<br>Ref. [16] | 8T (DP)<br>(Normal) | 6T (SP)<br>Ref. [17] | 8T (DP)<br>(this work) | 6T (SP)<br>Ref. [17] |
| Rectangle size (µm)            |                                        | 2.84 × 0.72          | 1.76 × 0.72          | 1.90 × 0.52         | 1.18 × 0.52          | 1.48 × 0.48            | 1.03 × 0.48          |
| Cell area (µm <sup>2</sup> )   |                                        | 2.04                 | 1.25                 | 0.99                | 0.61                 | 0.71                   | 0.49                 |
|                                | A <sub>DP</sub> /A <sub>SP</sub> ratio | 1.63                 | 1                    | 1.61                | 1                    | 1.44                   | 1                    |
| Tr. width<br>(nm)              | Load-PMOS                              | 140                  |                      | 90                  |                      | 80                     |                      |
|                                | Access-NMOS                            | 140                  |                      | 90                  |                      | 120                    |                      |
|                                | Drive-NMOS                             | 400                  | 200                  | 260                 | 130                  | 1:                     | 20                   |
| Physical<br>Dimensions<br>(nm) | Tr. pitch                              | 360                  |                      | 260                 |                      | 240                    |                      |
|                                | P/N Iso.                               | 280                  |                      | 200                 |                      | 150                    |                      |
| (1111)                         | STI Iso.                               | 140                  |                      | 100                 |                      | 100                    |                      |
|                                | Gate Iso.                              | 120                  |                      | 120                 |                      | 110                    |                      |
|                                | Metal1 (L/S)                           | 120 / 120            |                      | 90 / 90             |                      | 90 / 90                |                      |
|                                | Metal2-4 (L/S)                         | 140 / 140            |                      | 100 / 100           |                      | 100 / 100              |                      |

TABLE I DIMENSIONS OF 8T-DP-CELLS





Fig. 10. The 8T-DP-cell layout.

#### C. Simulated Butterfly Curves for the Static Noise Margin

Next, we verify the read-stability of the proposed 8T-DP-cell. Fig. 12 shows the simulated butterfly curves both of conventional DP-SRAM cell and proposed ultra-high-density (UHD) 8T-SRAM cell in our 65 nm technology. The plotted data show the process under typical conditions: 1.2 V supply voltage and room temperature. Conventional 8T-SRAM must be considered the worst case of the common row access situation. On the other



(a) FEOL



(b) BEOL

Fig. 11. SEM image of 8T-DP-cell after poly etching and metal-2 damascening.

hand, the proposed 8T-DP-SRAM is considered to be the case in which either WLA or WLB is activated like a 6T-SP-SRAM. The dc simulation result shows that the SNM values are 186 mV and 194 mV, respectively, for conventional and proposed 8T-DP-SRAMs. In spite of the small electrical  $\beta$  ratio, the SNM of the proposed UHD-8T-SRAM cell is slightly larger than the conventional DP-SRAM cell under typical conditions. This is because that the  $V_{\rm th}$  of small access-MOS transistor of conventional unit-cell is lowering due to the reverse narrow effect.



Fig. 12. Measured SNM for conventional and proposed 8T-DP-cells.



Fig. 13. Read-stability and write-ability analysis by  $V_{\rm th}$  curve simulations.

Meanwhile the  $V_{\rm th}$  of access-MOS transistor of the proposed cell has almost as same as that of drive-MOS transistor [17].

### D. Read-Stability and Write-Ability Analysis

We verify the read-stability and write-ability of proposed DP-SRAM cell considering global and local  $V_{\rm th}$  variation. The global variation means the inter-die variation, which is caused by variation of the gate length, gate width, gate oxide thickness, and dopant implantation. The local variation is the intra-die variation, which is caused by dopant fluctuation of channel and gate line-edge-roughness (LER). Fig. 13 shows the result of read-stability and write-ability analysis by  $V_{\rm th}$  curve simulation [18] considering such both global and local  $V_{\rm th}$  variation. The read- and write-boundary are solved using "worst case model analysis". In this analysis, we assume that the total memory capability of DP-SRAM in one die is up to 1-Mbit. The temperature is -40 °C to 125 °C; the supply voltage is  $1.2 \text{ V} \pm 10\%$ variation. As shown in Fig. 13, the read and write margin is sufficiently good for the global corner models FF, FS, SS, and SF as well as typical model CC. Here, FS means fast-NMOS and slow-PMOS, and SF means slow-NMOS and fast-PMOS, etc. This simulation result shows that we can ensure that there is little impact on the yield loss for mass production on account of DP-SRAM instability.

Note that it was not introduced the reading and writing enhanced techniques [17] to our proposed DP-SRAM in spite of the same transistor sizes of SP-SRAM. Though the DC characteristics of the read-stability and write-ability for 8T dual-port unit-cell become the same as the 6T single-port unit-cell, the one-tenth smaller total memory capability contributes to satisfy the operating margin without assist circuit for our 65 nm CMOS



Fig. 14. Estimation of the standby leakage of the 8T-DP-cell by SPICE simulation.

technology. If the larger memory capability is required for the DP-SRAM in a chip, we have to introduce some kind of the reading and writing assist circuit technique. This will be discussed as future work in Section IV.

### E. Simulated Standby Leakage

The small drive-NMOS transistor contributes to not only area but also standby leakage reduction. Fig. 14 shows a comparison of the simulated standby leakages of the reference 0.49  $\mu$ m<sup>2</sup> 6T-SRAM (SP) cell, the proposed 0.71  $\mu$ m<sup>2</sup> 8T-SRAM (DP), and the conventional 0.99  $\mu$ m<sup>2</sup> 8T-SRAM (DP) cell, respectively, using our 65 nm CMOS technology. For each cell, we estimate the total leakage current flow, which is sum of the subthreshold leakage current, the gate induced drain current (GIDL) and gate leakage of all transistors. The typical standby leakage of the proposed DP-8T-cell is 9.0 pA/cell at the 1.2 V supply voltage and room temperature is reduced by 30% from that of the conventional 8T-DP-cell. We suppress the increasing standby leakage of proposed 8T-DP-cell to only 1.4 times of the 6T-SP-cell because of just twice of leakage component of access-NMOS transistors.

## V. IMPLEMENTATIONS AND EVALUATION

#### A. Design and Fabrication of Test Chip

We designed and fabricated test chips with eight embedded 32 kB DP-SRAM macros with 65 nm CMOS technology. Fig. 15 shows a microphotograph of the 36.2 mm<sup>2</sup> test chip. The four macros at the right side are the proposed UHD-DP-SRAM, whereas the other four macros at the left side are normal DP-SRAM macros. Although the test chips were fabricated with eight metal layers, both conventional and proposed SRAM macros were implemented within four metal layers.

Fig. 16 shows the layout plot of the proposed 32 kB UHD-DP-SRAM macros. Two row decoders for both the A-port and B-port are placed exactly at the center of the macro, so that the memory cell array is divided into two cell arrays by the row decoder, thereby shortening the wordline. There are primary data I/O for the A-port located at the upper side and secondary data I/O for B-port located at the opposite lower side. The BL shifter is inserted between the cell array and secondary data I/O not inserted in the primary data I/O. The RAC is placed into the secondary address buffer region. The total cell array region is decreased by 30% because of the small memory cell compared to conventional one. On the other hand, the BL shifter and the RAC in the peripheral part increase their

Fig. 15. Die photograph of a test chip.



868 µm

Primary add. buffer

..............................

Conv. DP-SRAM (32kB ×4, 0.99 μm<sup>2</sup>)

Prop. DP-SRAM

(32kB ×4, 0.71 µm<sup>2</sup>)

Fig. 16. Layout plot of fabricated 32 kB UHD-DP-SRAM macros.



Fig. 17. Bit-densities and cell size ratios.

area slightly by 5%. The physical layout of the 32 kB macro is  $868 \times 442 \ \mu\text{m}$ ; the bit-density is 667 kbit/mm<sup>2</sup>. Fig. 17 shows comparisons of bit-densities with previous works as well as area overheads of 8T-cell over 6T-cell. This work achieves 25% increases of bit-density, as shown in Fig. 17.

## B. Measurement Result

We tested all of the 32 kB macros and confirmed fully functional operation. Additionally, we measured the SNMs of proposed 8T-cells for both ports, as shown in Fig. 18. The results



Fig. 18. Measured SNM.



Fig. 19. Shmoo plot.

 TABLE II

 FEATURES OF THE FABRICATED SRAM MACRO

|                       | Conv. DP-SRAM                       | Proposed                   |  |  |
|-----------------------|-------------------------------------|----------------------------|--|--|
| Technology            | 65 nm (hp90) LSTP CMOS              |                            |  |  |
| Configuration         | 16 bit × 16 k word × 4 macro        |                            |  |  |
| MAT size              | 512 row × 256 column × 2 MAT /macro |                            |  |  |
| Mux I/O               | 32                                  |                            |  |  |
| Memory cell size      | 0.99 μm²                            | 0.71 μm²                   |  |  |
| Physical macro size   | 1084 μm × 442 μm                    | 868 μm × 442 μm            |  |  |
| Bit density           | 534k bit/mm <sup>2</sup>            | 667k bit/mm²               |  |  |
| Read access time@1.2V | 3.1 ns                              | 3.0 ns                     |  |  |
| Standby leakage@1.2V  | 27 μA/Mbit                          | <b>20</b> μ <b>Α/M</b> bit |  |  |

verified that the SNM for A-port and B-port was well balanced and we found that the measured mean value correlates with the SPICE simulation result. The SNM for both WL activated simultaneously need not be considered because that situation never occurs with this design.

Fig. 19 presents a typical shmoo plot depending on the supply voltage versus clock access time under room temperature conditions. It shows the measured SRAM macro functions of 0.8–1.44 V. The measured clock access time was 3.0 ns at typical supply voltage 1.2 V, while the conventional one is 3.1 ns. This indicates that there is no access time penalty (see Table II). The measured typical standby leakage of four 32 kB macros (in total 128 kB) including both the cell array and peripheral was 20  $\mu$ A, which was reduced by 27% compared to the conventional one because of the small drive-NMOS transistor of the DP-SRAM cell. Table II summarizes the test chip features.

### VI. CONCLUSION

We proposed a new access scheme for an ultra-high-density synchronous DP-SRAM which maintains stable reading and writing operation. Using 65 nm CMOS technology, we designed and fabricated 32 kB DP-SRAM macros using this scheme. We obtained the smallest 8T-DP-cell and the highest bit-density ever reported in the 65 nm era. Test results show that the speed penalty was negligible; standby leakage was reduced by 27% because of the small cell size.

The next generation of 45 nm or 32 nm advanced SoC products will require further consideration of the device variation. For this work, we did not apply the assist technique to enhance the read-stability and write-ability like a single-port SRAM, as reported recently [17], [19]-[22] because the total size of memory embedded in one die is not as great as that of an SP-SRAM and the variability is still within allowed limits. In the near future, system applications will absolutely demand increased total amounts of dual-port SRAM capability. Furthermore, the variability is increased indefinitely according to the shrinkage. For such cases, read-stability and write-ability enhancement techniques become necessary for the DP-SRAM as well as the SP-SRAM [23], [24], so that the shrinkage of the DP-SRAM cell is continuing. Therefore, we are sure that the proposed circumvention of the access scheme for the DP-SRAM would help the area reduction and leakage suppression without any speed overhead for future advanced SoC products.

#### ACKNOWLEDGMENT

The authors would like to thank Y. Oda, M. Igarashi, K. Tomita, N. Tsuboi, A. Ishii, T. Oashi, K. Tsukamoto, and K. Ishibashi for their technical support and encouragement.

#### REFERENCES

- [1] M. Yamashina, T. Enomoto, T. Kunio, I. Tamitani, H. Harasaki, T. Nishitani, M. Satoh, and K. Kikuchi, "A micro programmable real-time video signal processor (VSP) LSI," *IEEE J. Solid-State Circuits*, vol. SSC-22, pp. 1117–1123, Dec. 1987.
- [2] M. Inamori, J. Naganuma, and M. Endo, "A memory-based architecture for MPEG2 system protocol LSIs," *IEEE Trans. VLSI Syst.*, vol. 7, no. 3, pp. 339–344, Sep. 1999.
- [3] C.-W. Yoon, R. Woo, J. Kook, S.-J. Lee, K. Lee, and H.-J. Yeo, "An 80/20-MHz 160-mW multimedia processor integrated with embedded dram, MPEG-4 accelerator and 3-D rendering engine for mobile applications," *IEEE J. Solid-State Circuits*, vol. 36, no. 11, pp. 1758–1767, Nov. 2001.
- [4] H.-J. Stolberg, S. Moch, L. Friebe, A. Dehnhardt, M. B. Kulaczewski, M. Berekovic, and P. Pirsch, "An SoC with two multimedia DSPs and a RISC core for video compression applications," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 330–531.
- [5] J. Kim, Y. Choi, J. Jeong, S. Lee, and S. Kim, "The v2.0 + EDR Bluetooth SoC architecture for multimedia," *IEEE Trans. Consum. Electron.*, vol. 52, no. 2, pp. 436–444, May 2006.
- [6] T. Shiota, K. Kawasaki, Y. Kawabe, W. Shibamoto, A. Sato, T. Hashimoto, F. Hayakawa, S. Tago, H. Okano, Y. Nakamura, H. Miyake, A. Suga, and H. Takahashi, "A 51.2 GOPS 1.0 GB/s-DMA single-chip multi-processor integrating quadruple 8-way VLIW processors," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 194–593.
- [7] M. Nakajima, T. Yamamoto, M. Yamasaki, K. Kaneko, and T. Hosoki, "Homogeneous dual-processor core with shared L1 cache for mobile multimedia SoC," in *Symp. VLSI Circuits 2007 Dig. Tech. Papers*, Jun. 2007, pp. 216–217.

- [8] H. Wei, R. V. Joshi, and W. H. Henkels, "A 500-MHz, 32-word x 64-bit, eight-port self-resetting CMOS register file," *IEEE J. Solid-State Circuits*, vol. 37, no. 5, pp. 56–67, May 1999.
- [9] R. K. Krishnamurthy, A. Alvandpour, G. Balamurugan, N. R. Shanbhag, K. Soumyanath, and S. Y. Borkar, "A 130-nm 6-GHz 256 x 32 bit leakage-tolerant register file," *IEEE J. Solid-State Circuits*, vol. 37, no. 5, pp. 624–632, May 2002.
- [10] N. Tzartzanis and W. W. Walker, "A differential current-mode sensing method for high-noise-immunity, single-ended register files," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 506–543.
- [11] M. Miyama, J. Miyakoshi, Y. Kuroda, K. Imamura, H. Hashimoto, and M. Yoshimoto, "A sub-mW MPEG-4 motion estimation processor core for mobile video application," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1562–1570, Sep. 2004.
- [12] K. Nii, Y. Tsukamoto, S. Imaoka, and H. Makino, "A 90 nm dualport SRAM with 2.04 μmm<sup>2</sup> 8T-thin cell using dynamically-controlled column bias scheme," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2004, pp. 508–543.
- [13] K. Nii, Y. Masuda, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, M. Igarashi, K. Tomita, N. Tsuboi, H. Makino, K. Ishibashi, and H. Shinohara, "A 65 nm ultra-high-density dual-port SRAM with 0.71 μmm<sup>2</sup> 8T-cell for SoC," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2006, pp. 162–163.
- [14] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, "A stable 2-port SRAM cell design against simultaneously read/writedisturbed accesses," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2109–2119, Sep. 2008.
- [15] S. Ishikura, M. Kurumada, T. Terano, Y. Yamagami, N. Kotani, K. Satomi, K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, T. Oashi, H. Makino, H. Shinohara, and H. Akamatsu, "A 45-nm 2-port 8T-SRAM using hierarchical replica bitline technique with immunity from simultaneous R/W access issues," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 938–945, Apr. 2008.
- [16] K. Nii, Y. Tsukamoto, T. Yoshizawa, S. Imaoka, Y. Yamagami, T. Suzuki, A. Shibayama, H. Makino, and S. Iwade, "A 90-nm low-power 32-kB embedded SRAM with gate leakage suppression circuit for mobile applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 4, pp. 684–693, Apr. 2004.
- [17] S. Ohbayashi, M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Imaoka, Y. Oda, T. Yoshihara, M. Igarashi, M. Takeuchi, H. Kawashima, Y. Yamaguchi, K. Tsukamoto, M. Inuishi, H. Makino, K. Ishibashi, and H. Shinohara, "A 65-nm SoC embedded 6T-SRAM designed for manufacturability with read and write operation stabilizing circuits," *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 820–829, Apr. 2007.
- [18] Y. Tsukamoto, K. Nii, S. Imaoka, Y. Oda, S. Ohbayashi, T. Yoshizawa, H. Makino, K. Ishibashi, and H. Shinohara, "Worst-case analysis to obtain stable read/write DC margin of high density 6T-SRAM-array with local Vth variability," in *ICCAD Dig.*, 2005, pp. 398–405.
- [19] K. Zhang, U. Bhattacharya, C. Zhanping, F. Hamzaoglu, D. Murray, N. Vallepalli, Y. Wang, B. Zheng, and M. Bohr, "A 3-GHz 70-Mb SRAM in 65-nm CMOS technology with integrated column-based dynamic power supply," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 146–151, Jan. 2006.
- [20] M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, and T. Kawahara, "90-nm process-variation adaptive embedded SRAM modules with power-line-floating write technique," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 705–711, Mar. 2006.
- [21] H. Pilo, C. Barwin, G. Braceras, C. Browning, S. Lamphier, and F. Towler, "An SRAM design in 65-nm technology node featuring read and write-assist circuits to expand operating voltage," *IEEE J. Solid-State Circuits*, vol. 42, no. 4, pp. 813–819, Apr. 2007.
- [22] M. Yabuuchi, K. Nii, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, H. Makino, Y. Yamagami, S. Ishikura, T. Terano, T. Oashi, K. Hashimoto, A. Sebe, G. Okazaki, K. Satomi, H. Akamatsu, and H. Shinohara, "A 45 nm low-standby-power embedded SRAM with improved immunity against process and temperature variations," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 326–327.
- [23] D. P. Wang, H. J. Liao, H. Yamauchi, Y. H. Chen, Y. L. Lin, S. H. Lin, D. C. Liu, H. C. Chang, and W. Hwang, "A 45 nm dual-port SRAM with write and read capability enhancement at low voltage," in *SoC Conf. Dig. Tech. Papers*, Sep. 2007, pp. 211–214.
- [24] K. Nii, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, Y. Oda, K. Usui, T. Kawamura, N. Tsuboi, T. Iwasaki, K. Hashimoto, H. Makino, and H. Shinohara, "A 45-nm single-port and dual-port SRAM family with robust read/write stabilizing circuitry under DVFS environment," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2008, pp. 212–213.



**Koji Nii** (M'99) was born in Tokushima, Japan, in 1965. He received the B.E. and M.E. degrees in electrical engineering from Tokushima University, Tokushima, Japan, in 1988 and 1990, respectively, and the Ph.D. degree in informatics and electronics engineering from Kobe University, Hyogo, Japan, in 2008

In 1990, he joined the ASIC Design Engineering Center, Mitsubishi Electric Corporation, Itami, Japan, where he has been working on designing embedded SRAMs for advanced CMOS logic process.

In 2003, he was transferred to Renesas Technology Corporation, Itami, Japan, which is a joint company of Mitsubishi Electric Corp. and Hitachi Ltd. in the semiconductor field. He currently works on the research and development of deep-submicron embedded SRAM in the Advanced Design Framework Development Department of Renesas Technology Corporation, Itami, Japan.

Dr. Nii is a member of the IEEE Solid-State Circuits Society and the IEEE Electron Devices Society.



**Yasumasa Tsukamoto** received the B.S., M.S., and Ph.D. degrees in applied physics from Osaka University, Osaka, Japan, in 1996, 1998, and 2001, respectively.

He joined Mitsubishi Electric Corporation after graduation, and in 2003, he was transferred to Renesas Technology Corporation. He has been engaged in the development of embedded SRAMs for advanced CMOS logic process. His current research interests focus on variability issues on SRAM cells of the sub-50 nm generation. Since October 2008,

he has been conducting his research on advanced SRAMs at the University of California, Berkeley, as a Visiting Industrial Fellow from Renesas Technology Corporation.



**Makoto Yabuuchi** was born in Toyama, Japan, in 1979. He received the B.S. and M.S. degrees in electronic engineering from Kanazawa University, Ishikawa, Japan, in 2004.

In 2004, he joined the Advanced Design Framework Development Department of Renesas Technology Corporation, Itami, Japan, where he has been working on designing embedded SRAMs for advanced CMOS logic process.



Yasuhiro Masuda was born in Osaka, Japan, in 1967. He received the B.E. degree in electronic engineering from Osaka Institute of Technology, Osaka, Japan, in 1990.

In 1990, he joined the LSI Design Engineering Department, Mitsubishi Electric Engineering Corporation, Itami, Japan. In 2005, he was transferred to Renesas Design Corporation, Itami, Japan, where he has been engaged in designing embedded SRAMs for advanced CMOS logic process.



**Susumu Imaoka** was born in Hiroshima, Japan, in 1965. He received the B.S. degree in electrical engineering from Fukuoka Institute of Technology, Fukuoka, Japan, in 1987.

In 1987, he joined the Electronic Devices Design Center, Mitsubishi Electric Engineering Corporation, Itami, Japan. He moved to Renesas Design Corporation, Hyogo, Japan, in 2005, where he has been working on designing embedded SRAMs for advanced CMOS logic processes.



Keiichi Usui was born in Hyogo, Japan, in 1969. He received the B.E. degree in electrical engineering from Osaka Electro-Communication University, Osaka, Japan, in 1993.

In 1997, he joined the Electronic Devices Design Center, Daioh Electoric Corporation, Itami, Japan. Since then, he has been engaged in the development of memory test engineering. He is currently involved in the development of test methods for embedded SRAMs using advanced CMOS logic processes.

Shigeki Ohbayashi was born in Hiroshima, Japan,

in 1962. He received the B.S. and M.S. degrees in

electronic engineering from Hiroshima University,



Hiroshima, Japan, in 1985 and 1987, respectively.
 He joined the LSI Laboratory, Mitsubishi Electric
 Corp., Itami, Japan, in 1987. From 1987 to 1990,

he was engaged in the research and development of BiCMOS SRAMs. In 1990, he transferred to Mitsubishi's Kita-Itami Works, Itami, Japan, where he was working on the development of BiCMOS/CMOS fast asynchronous SRAMs and

CMOS synchronous SRAM's.



**Hiroshi Makino** (M'08) was born in Osaka, Japan, in 1959. He received the B.S. degree in physics from Kyoto University, Kyoto, Japan, in 1983, and the Ph.D. degree in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1997.

In 1983, he joined the LSI R&D Laboratory, Mitsubishi Electric Corporation, Itami, Japan, where he worked on the research and development of GaAs digital LSIs until 1990. From 1991 to 2002, he was engaged in the research and development of Si CMOS high-speed and low-power digital circuits at

the System LSI Laboratory and System LSI Development Center. From 2003 to 2007, he continued the same study at the Advanced Design Framework Development Department of Renesas Technology Corporation, Itami, Japan. In 2008, he became an Associate Professor at Osaka Institute of Technology, Osaka, Japan, where he is working on the research and education of system LSI design.

Dr. Makino received the Best Paper Awards at the IEEE International Conference on Computer Design (ICCD) in 1993 and at IEEE International Conference on Microelectronic Test Structures (ICMTS) in 2007. He worked as a member of the program committee for the IEEE International Symposium on Low Power Electronics and Design (ISLPED) in 2003 and the IEEE International Solid-State Circuits Conference (ISSCC) from 2005 to 2007. He is currently a member of the Institute of Electronics, Information and Communication Engineers of Japan (IEICE).



**Hirofumi Shinohara** was born in Hyogo, Japan, in 1954. He received the B.S. and M.S. degrees in electronics engineering and the Ph.D. degree in informatics from Kyoto University, Kyoto, Japan, in 1976, 1978, and 2008, respectively.

After joining the LSI Laboratory, Mitsubishi Electric Corporation, Itami, Japan, in 1978, he worked on research and development of MOS SRAMs ranging from 16 kb to 1 Mb. From 1987, he was involved in the area of SoC elemental circuit technology including memory generators, a PLA compiler, a high-

speed multiplier, and neural network chips. Since moving to Renesas Technology Corp. in 2003, he has been working on a design framework for SoC and MCU. His research interests include advanced SRAM, low-power circuits, variation-aware design, and design for manufacturing.