# Implementing Spike-Timing-Dependent Plasticity and Unsupervised Learning in a Mainstream NOR Flash Memory Array

G. Malavena, A. S. Spinelli, and C. Monzio Compagnoni

Politecnico di Milano, piazza L. da Vinci 32, 20133 Milano, Italy, e-mail: gerardo.malavena@polimi.it

Abstract-In this work, we present the first implementation of spike-timing-dependent plasticity (STDP) and unsupervised learning in a mainstream NOR Flash memory array based on floating-gate cells. A simple yet effective word-line and bit-line pulse scheme is proposed to make a common-ground doublepolysilicon NOR array in 40 nm embedded technology work as an artificial synaptic array in a spiking neural network learning according to the STDP rule, with no change required either to the array or to the cell design. With this scheme, long-term potentiation and long-term depression of the synaptic weights are achieved, respectively, by hot-hole injection and channel hotelectron injection at the drain side of the cells. Unsupervised learning is experimentally demonstrated in the array, paving the way for the development of large-scale and high-density neuromorphic systems based on mainstream nonvolatile memory technologies.

## I. INTRODUCTION

The idea of developing nonvolatile memory arrays working as artificial synaptic arrays in neuromorphic networks has been attracting considerable interest since its first proposal [1], [2]. Among the different cell structures fit for the purpose, those based on charge storage in floating-gate or chargetrap layers offer the benefits of virtually analog tuning of the synaptic weights, low-power consumption and excellent CMOS compatibility [3]–[7]. In spite of all these benefits, what may make a storage solution far more favorable than the others is the possibility to create the articial synaptic array directly from mature, reliable and highly-scaled mainstream nonvolatile memory technologies with just slight changes in the array design [4].

In this work, we demonstrate the operation of a mainstream common-ground double-polysilicon NOR Flash array in 40 nm embedded technology as an artificial synaptic array learning according to the spike-timing-dependent plasticity (STDP) rule [8]. With no change either in the cell or in the array design, long-term potentiation (LTP) and long-term depression (LTD) of the synaptic weights are achieved through a simple yet effective word-line (WL) and bit-line (BL) pulse scheme, triggering either hot-hole injection (HHI) or channel hot-electron injection (CHEI) at the drain side of the cells. Starting from this pulse scheme, unsupervised learning in the array is, finally, experimentally proved.

## II. ARRAY STRUCTURE AND OPERATION

Fig. 1 shows the schematic structure of the common-ground NOR Flash array investigated in this work, featuring stacked-

gate nonvolatile memory cells. Test elements of the array, developed in a mainstream 40 nm embedded technology by STMicroelectronics [9] and allowing flexible external biasing of the array lines, were experimentally tested for STDP and unsupervised learning implementation. A mandatory requirement for this implementation is the possibility to perform not only program but also erase operations with single-cell selectivity, overcoming the parallel cell erase typical of Flash arrays. Keeping the CHEI mechanism for cell programming, this was achieved by moving from Fowler-Nordheim (FN) tunneling erase to HHI erase (Tab. I). As schematically depicted in Fig. 2, in fact, both (a) CHEI and (b) HHI need simultaneous WL and BL biases ( $V_{WL}$  and  $V_{BL}$ , respectively), which guarantee the selectivity of the program and erase operations at the single-cell level in the NOR array. Besides, Fig. 3 shows that large threshold-voltage ( $V_T$ , extracted as the  $V_{WL}$  giving a constant BL current  $I_{BL} = 10$  nA with  $V_{BL} = 200$  mV) shifts can be achieved over comparable timescales during (a) CHEI program and (b) HHI erase with the same  $V_{BL} = 4.5$  V and relatively low  $|V_{WL}|$  ranging from 3 to 8 V. No significant change of cell  $V_T$  appears, instead, in Fig. 3(b) when looking at the results for an FN tunneling erase with  $V_{WL} = -10$  V and grounded BL, source and pwell, confirming that the reduction of cell  $V_T$  during the tests with  $V_{BL} = 4.5$  V and negative  $V_{WL}$  is due to HHI and not to FN tunneling over the channel or the drain area. In order to compare the efficiency of the CHEI and HHI mechanisms, we directly measured  $I_{BL}$  during the program and erase pulses (Fig. 4) and compared it with the injection current  $(I_{ini})$  to the floating-gate extracted from the  $V_T$  transients of Fig. 3 [10], avoiding any indirect assessment through equivalent transistor analyses [10]. Results are shown in Figs. 5-6 and reveal that an injection efficiency  $I_{inj}/I_{BL}$  close to  $10^{-6}$  can be extracted for both the mechanisms in the explored biasing conditions. This makes HHI an acceptable erase mechanism even from the power consumption standpoint.

## III. STDP AND UNSUPERVISED LEARNING

## A. STDP

When operating in the subthreshold regime, each cell in the NOR array can be considered as an artificial synapse with weight  $w = \exp(-q\alpha_G\Delta V_T/mkT)$  [1], [2], where q is the elementary charge,  $\alpha_G$  is the control-gate–to–floating-gate capacitive coupling ratio, m is the subthreshold slope ideality factor of the equivalent transistor, kT is the thermal energy and  $\Delta V_T$  is the cell  $V_T$  shift from a reference condition. Starting from the CHEI and HHI results of the previous section, STDP of the synaptic weight w can be easily achieved with the pulse scheme depicted in Fig. 7. A presynaptic spike at time  $t_{pre}$ triggers a double-triangular pulse on the WL of the associated synapse, making  $V_{WL}$  linearly grow up to  $V_{WL}^{max} = 4$  V, then suddenly drop to  $V_{WL}^{min} = -7$  V and finally linearly return to zero. The total pulse duration was set to  $t_{WL} = 2$  ms, equally split between the positive and the negative front of the waveform. A postsynaptic spike at time  $t_{post}$  triggers, instead, a rectangular pulse on the BL connected to the synapse, delayed by  $t_{WL}/2 = 1$  ms and with duration  $t_{BL} = 10 \ \mu s$ . The pulse amplitude was set to  $V_{BL} = 4.5$  V. Depending on the time delay  $\Delta t = t_{post} - t_{pre}$  between the post- and the pre-synaptic spike, the scheme makes the BL pulse occur either during the negative (Fig. 7(a),  $\Delta t > 0$ ) or during the positive (Fig. 7(b),  $\Delta t < 0$ ) front of the WL waveform. As a result, HHI (reducing cell  $V_T$ ) and CHEI (increasing cell  $V_T$ ) take place, respectively, in the former and in the latter case, giving rise to LTP and LTD of w and reproducing the STDP learning rule. This is proved in Fig. 8, where the evolution of w when repeatedly applying the pulse scheme of Fig. 7 with  $\Delta t$  equal to  $0^+$  (max. LTP) or  $0^-$  (max. LTD) is shown, taking  $V_T = 4$  V as a reference for  $\Delta V_T$  extraction. Results reveal that large changes of w can be achieved through the cumulative effect of the LTP and LTD pulses. Besides, Fig. 9 shows that the ratio between the final  $(w_f)$  and the initial  $(w_i)$ value of w when applying the STDP pulse scheme displays an exponential dependence on  $\Delta t$ , mimicking the behavior of biological synapses [8]. LTD and LTP, moreover, display a relevant dependence on  $w_i$ , with the former getting weaker for decreasing  $w_i$  (from part (a) to (c) of Fig. 9) and the latter showing the opposite trend. Finally, Figs. 10-11 show that synapses can withstand large changes of their w for at least  $10^5$  times with relatively low degradation and preserving their STDP learning capability.

## B. Unsupervised learning

We implemented unsupervised learning in the NOR array by considering it as an artificial synaptic array which receives voltage pulses on its WLs as a result of the activity of  $N_i$ input neurons and produces an excitatory postsynaptic current (EPSC) on each of its BLs, increasing the membrane potential  $(V_m)$  of an output neuron (Fig. 12). Firing of the output neuron occurs when  $V_m$  overcomes a threshold value. Following [11], synchronous firing of the input neurons with time periodicity  $t_{WL}$  was assumed, switching the firing pattern between a signal pattern (SP) to be learned and a noise pattern (NP). To achieve LTP of the synapses excited by the SP and LTD of the other synapses, we modified the pulse scheme of Fig. 7 by: i) simplifying the WL waveform in a double-rectangular pulse of positive and negative amplitude equal to, respectively,  $V_{WL}^{max}$ and  $V_{WL}^{min}$ ; and ii) introducing a second BL pulse delayed by  $t_{WL}/2$  with respect to the first (Fig. 12). This allows to maximize CHEI and HHI in the presence of the BL pulse,

speeding up the learning process. Besides, timings allow to reproduce an unsupervised STDP rule thanks to the uniform firing of the output neuron during the positive interval of the WL waveforms. This results in the application of the first BL pulse during the negative interval of the WL waveforms of the same pattern, giving rise to the LTP of the synapses that fired before the output neuron. The second BL pulse, instead, occurs during the positive interval of the WL waveforms of the subsequent pattern, contributing to the LTD of the synapses that fired after the output neuron.

To prove the functionality of the proposed unsupervised learning scheme, Fig. 13 demonstrates, first of all, that no change in the w of a synapse occurs when (a) only BL pulses or (b) only WL pulses are applied, confirming that LTP and LTD take place just when a postsynaptic spike occurs in the presence of an excited synapse. LTP of the synapses excited by the SP and LTD of the other synapses are experimentally proved in Fig. 14, where the evolution of the w of 8 synapses is reported as a function of the learning epoch (number of SP and NP applied at the input). As done in [11], the definition of the firing patterns of the input neurons, the integration of the EPSC and triggering of the BL pulses were performed by an ad-hoc circuit board driven by a microcontroller. By keeping the number of input neurons firing during the NP low, firing of the output neuron occurs mainly in the presence of the SP. This results in the LTP of the synapses excited by the SP and in the LTD of the other synapses during the subsequent NP phases, giving rise to unsupervised learning in the array.

## **IV. CONCLUSIONS**

In this work, we reported the first implementation of STDP and unsupervised learning in a mainstream NOR Flash memory array operated as an artificial synaptic array in a spiking neural network. LTP and LTD of the synaptic weights according to the STDP learning rule were achieved by a simple pulse scheme triggering HHI and CHEI at the drain side of the cells, without the need of changes either in the cell or in the array design. Results are an important step to the development of large-scale and high-density neuromorphic systems based on mainstream memory technologies.

## V. ACKNOWLEDGMENTS

The authors would like to thank P. Cappelletti and F. Piazza from STMicroelectronics for support. This article received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Grant Agreement No. 648635).

#### REFERENCES

- [1] C. Diorio, et al., IEEE-TED, vol. 43, pp. 1972-1980, 1996.
- [2] C. Diorio, et al., IEEE-TED, vol. 44, pp. 2281-2289, 1997.
- [3] S. Ramakrishnan, et al., IEEE-TBCS, vol. 5, pp. 244-252, 2011.
- [4] X. Guo, et al., in IEDM Tech. Dig., pp. 151–154, 2017.
- [5] H.-S. Choi, et al., IEEE-TED, vol. 65, pp. 101-107, 2018.
- [6] H. Kim, et al., *IEEE-EDL*, vol. 39, pp. 630–633, 2018.
- [7] C.-H. Kim, et al., IEEE-TED, vol. 65, pp. 1774-1780, 2018.
- [8] G.-Q. Bi and M.-M. Poo, *J. Neurosci.*, vol. 18, pp. 10464–10472, 1998.
  [9] C. Boccaccio, "Embedded 1T Flash NOR: still alive at 40 nm. And
- beyond?," in *LETI Memory Workshop*, 2013.
  [10] B. Eitan and D. Frohman-Bentchkowsky, *IEEE-TED*, vol. 28, pp. 328–340, 1981.
- [11] G. Pedretti, et al., in IEDM Tech. Dig., pp. 653-656, 2017.



Fig. 1: Schematic for the connections of the stacked-gate cells in the common-ground NOR Flash array investigated in this work.

|         | Standard operation  | This work             |
|---------|---------------------|-----------------------|
| Program | CHEI<br>(few bytes) | CHEI<br>(single cell) |
| Erase   | FN tunn.<br>(block) | HHI<br>(single cell)  |

Tab. I: Physical mechanisms used for cell programming and erasing in the standard operation of the NOR array and in this work.



Fig. 4:  $I_{BL}$  during a CHEI program transient ( $V_{BL} = 4.5$  V,  $V_{WL} = 4$  V), as measured through the setup shown in the inset. The vertical dashed lines identify the stretches of time corresponding to the applied programming pulses. The slow rising front of  $I_{BL}$  at short times is due to the limited bandwidth of the transimpedance amplifier (TIA).



Fig. 7: Pulse scheme exploited to implement STDP in the investigated common-ground NOR Flash array, in the case of (a) presynaptic spike preceeding the postsynaptic spike (LTP of w) and (b) presynaptic spike following the postsynaptic spike (LTD of w). By adopting HHI and CHEI for, respectively, LTP and LTD, the proposed pulse scheme is simpler than those previously assumed for other charge-storage artificial synapses [3], [5].



Fig. 2: Schematic for the stacked-gate cells in the investigated NOR array, highlighting the biasing conditions used for (a) CHEI program and (b) HHI erase (holes are generated by band-to-band tunneling at the drain and become hot by moving towards the pregion). The source-line and the p-well were always grounded throughout our work.



Fig. 5:  $I_{inj}$  as a function of the floatinggate potential  $V_{FG}$ , as extracted from the CHEI program  $V_T$  transients [10], for different  $V_{BL}$  and  $V_{WL}$ . Results for the average  $I_{BL}$  during the program pulses (see Fig. 4) are also reported for  $V_{BL} = 4.5$  V.



Fig. 3: Experimental cell  $V_T$  transients during (a) CHEI program and (b) HHI erase, for  $V_{BL} = 4.5$  V and different  $V_{WL}$ . In (b) the results for an FN tunneling erase at  $V_{WL} = -10$  V are also reported.



Fig. 6:  $I_{inj}$  as a function of the floatinggate potential  $V_{FG}$ , as extracted from the HHI erase  $V_T$  transients [10], for different  $V_{BL}$  and  $V_{WL}$ . Results for the average  $I_{BL}$  during the erase pulses (similar to Fig. 4) are also reported for  $V_{BL} = 4.5$  V.



Fig. 8: Evolution of w when repeatedly applying the pulse scheme of Fig. 7 with  $\Delta t$  equal to 0<sup>+</sup> (max. LTP) or 0<sup>-</sup> (max. LTD).  $V_T = 4$  V was taken as reference for  $\Delta V_T$  and  $w = \exp(-q\alpha_G \Delta V_T/mkT)$ throughout this work.

so reported for  $V_{BL}$ 

t<sub>B</sub>

Time

Time

Time

Time



Fig. 9: Ratio between the final  $(w_f)$  and the initial  $(w_i)$  value of w when applying the STDP pulse scheme of Fig. 7, as a function of  $\Delta t$  (only one single pulse per each  $\Delta t$  value was applied). Results for different  $w_i$  (corresponding to a different initial threshold-voltage  $V_{T,i}$  of the artificial synapse) are shown in parts (a), (b) and (c).



Fig. 10: Dependence of (a) w and (b)  $V_T$  on the number of repetitions of an STDP test made of 50 LTD pulses with  $\Delta t = 0^-$  and 50 LTP pulses with  $\Delta t = 0^+$ .



Synchronous firing of the input neurons with period equal to  $t_{WL}$  is assumed, with firing

pattern alternated between the SP and a NP. This firing triggers double-rectangular WL pulses,

giving rise to an EPSC at the array BLs (for better clarity, only one BL is shown in the picture). The EPSC is integrated and converted into a membrane potential  $V_m$  by an output neuron, firing when  $V_m$  overcomes a selected threshold. This latter firing triggers two pulses on the

10 10 ≷ 10 ₹ 10 LTP LTD 10 50 pulse 10 fter the test of 0.25 ( t<sub>pre</sub> [ms] -0.75 -0.5 -0.25 0.5 0.75  $\Delta t = t_{post}$ 

Fig. 11:  $w_f/w_i$  resulting from the STDP experiment (the cumulative effect of 50 pulses has been considered) on the synapse previously subjected to the test of Fig. 10.



Fig. 13: Impact on w of (a) an increasing number of BL pulses (no WL pulses applied) and (b) an increasing number of WL pulses (no BL pulses applied).



Fig. 14: Results of the unsupervised learning test, for different probability of firing of the input neurons during the NP phase, equal to (a) 5%, (b) 3% and (c) 1%. Shaded lines are the w of the 8 synapses involved in the test, blue lines are the average trend of the w of the synapses excited by the SP and red lines are the average trend of the w of the other synapses. At the beginning of the test, a random initialization of the w was adopted. Note that, in the explored range, the reduction of the probability of firing during the NP results in a slower learning.